Jump to content
Covecube Inc.
  • 0
Phidaissi

Balancing hardlinks

Question

Having looked through previous discussions a bit I know that hardlinks are not supported through the drivepool drive itself.

Would it be possible to have the balancing system preserve hardlinks when moving files though?

Currently I download a torrent to a smallish temporary drive, and when it's complete it gets moved into my pool.

In order to continue to seed the files while also having them sorted into their appropriate places, the files get moved into a staging area (ie, E:\Unsorted) and after they've been moved there I have a script that hardlinks them to a seeding location (ie, E:\Seeding) via the underlying PoolParts and locating each file as necessary on the right drive to do so.
Then I can relocate the unsorted files to where I want them to actually be, and all the while it continues to seed them no matter. (Thus cannot be done with a symlink from seeding to unsorted, as this would break when I move the unsorted files)
When I'm done seeding I can just delete them from seeding without concern. (Thus cannot be done with a symlink from unsorted to seeding, as removal of seeding would then remove the file entirely and simply break the unsorted reference to it)

This works fine, and has the expected consequence that if drivepool decides to balance one of the 'copies' of the hardlink, it effectively becomes two files on different drives.

That's an acceptable consequence most of the time, but mildly annoying when it's not too difficult to fix. I'm just not sure if it's viable to do with the balancer as it is currently, and don't want to invest the time to attempt creating a plugin myself to handle this if it turns out to be impossible under the current design!

So could the balancer be made to recognise when it's about to move a hard linked file (discoverable in sysinternals or with fsutil) and after moving the file replicate the hardlinks within the new poolpart and remove them on the old one?
This way hard linked files can still be balanced across the pool.

Would it be possible to tell balancers to ignore a directory?

Being able to say "ignore seeding directory" (since it's pretty much only hardlinks) and have drivepool balance files as needed otherwise would make this work even better by avoiding potential for the balancer to get confused. :lol:

It would be nice if drivepool could allow the creation of hardlinks by passing the hardlink calls to the underlying NTFS on the relevant poolpart.
This is effectively what I do within my script, just with the added annoyance of me having to locate which poolpart the file is in.
Given that previous discussions have basically just said "we don't support it" I'm assuming there must be some other issue with supporting them that I'm missing and I'm curious about what that is, as it's not really explained elsewhere that I saw, but I'd be happy for it to just respect hardlinks manually created when moving files around.

Share this post


Link to post
Share on other sites

8 answers to this question

Recommended Posts

  • 0
On 1/5/2019 at 11:01 PM, Phidaissi said:

Would it be possible to have the balancing system preserve hardlinks when moving files though?

No.  Hardlinks work by creating multiple directory entries for the same data.  That means that this data HAS to be on the same disk, and the same volume.  

So, if it gets balanced, there is no way to maintain this.  That's part of why we don't support hard links. 

The other reason, is that the pool does not store any data on it, so there is no way to hard link the data on the pool.   So.... it's just not possible to support hard links on the pool. 

 

On 1/5/2019 at 11:01 PM, Phidaissi said:

Would it be possible to tell balancers to ignore a directory?

You could use the File Placement Rules to force the folder to a specific disk.  That should accomplish what you want.

https://stablebit.com/Support/DrivePool/2.X/Manual?Section=File Placement

Share this post


Link to post
Share on other sites
  • 0
2 hours ago, Christopher (Drashna) said:

No.  Hardlinks work by creating multiple directory entries for the same data.  That means that this data HAS to be on the same disk, and the same volume.  

So, if it gets balanced, there is no way to maintain this.  That's part of why we don't support hard links. 

The other reason, is that the pool does not store any data on it, so there is no way to hard link the data on the pool.   So.... it's just not possible to support hard links on the pool. 

 

You could use the File Placement Rules to force the folder to a specific disk.  That should accomplish what you want.

https://stablebit.com/Support/DrivePool/2.X/Manual?Section=File Placement

Sorry, but did you read only the questions in bold and not my explanation following them that addresses the how? :( 

I know how they work and that's why I explicitly mentioned the solution to both of those things. The balancer moves files between disks/volumes and I was suggesting that when a file with hard links is moved that all links move with it, which is as you said, necessary for them to actually be hardlinked and not just additional copies on multiple drives. 

You can check if a file has more than one hard link (the first entry counts, so a regular file has a count of 1), and if it is hardlinked you can create identical links on the target drive! Looking at the link count has basically no overhead, and you'd only need to check the list of links when you actually encounter one, so the minor penalty would only apply to when actually moving a hard linked file. 

This information is stored in NTFS, drivepool doesn't need to store it, and you can get the information via several methods - I mentioned two of them above, but I can get you the exact api calls necessary to do the checks if that helps. 

That means it's possible to modify file move behaviour to do this:

  1. Check if file has link count > 1; if not do regular move from drive A to B
  2. Get list of linked directory entries in file system on A
  3. Continue regular file move from A to B
  4. Replicate hard links discovered on A in (2) on B
  5. Remove linked directory entries from A that were discovered in (2)

This moves a file and it's associated hard links from A to B, balancing the file with all hard links. 

I actually do some of this already via a script I run regularly - look in seeding for still existing hard links, replicate links for files that have been moved between drives by balancer on the drive they were moved to, remove redundant copy left that wasn't balanced. 

If this isn't a change you'd be happy to make, then my question is if it's possible to do this in a balancing plug in which I write myself. I'm not sure what information balancers actually get and whether they can alter this behaviour without drivepool being updated to make it possible, which is why I asked.  :)

Basically, I am already doing this and know how it's done in an automated way that detects the changes, but doing this with scripts to fix it after the fact isn't very elegant, and I can think of a couple edge cases where if it's done in drivepool balancing it can be greatly improved, like a few new balancing rules that would be great in a plug in. 

Share this post


Link to post
Share on other sites
  • 0

For reference, these are the win32 fileapi methods that do this:

To determine if a file has hard links, fileapi function GetFileInformationByHandle gives a result where BY_HANDLE_FILE_INFORMATION .nNumberOfLinks is the number of directory entries for a given file. ie, 1 for typical files, 2 or more for files that have hard links. As you're moving the file anyway when balancing this is likely an action with negligible overhead.

To get the list of all directory entries associated with a hard link, you use the fileapi function FindFirstFileNameW with FindNextFileNameW, which should give the same number of results as the nNumberOfLinks in prior step.

I'm actually using a script in python so I use python wrappers that access those win32 APIs:

  1. Use os.stat(file_name).st_nlink to determine number of links - which actually works on the drivepool volume already because I presume it's just passed from the source volume
  2. If it has hard links (st_nlink > 1), I then determine the underlying location of it - via folder mounts in a known location where I check that subpath in each PoolPart. A balancer would negate this step as it knows which real volume it's on already
  3. Use win32file.FindFileNames(file_name) on the directory entry on the real volume (xx/PoolPart.xx/xx) to get the other directory entries on that volume - this doesn't work on the drivepool volume itself at the moment, so needs to be passed to the specific drive like this to work

 

Share this post


Link to post
Share on other sites
  • 0

Phidaissi,

You are right, technically it is possible to move a file together with its entire set of hard link references from one drive to another. But for DP your request might turn out much more complicated than it seems on the first glance:

For instance, what, if a file has hard link references outside the PoolPart folder? How should DP handle that correctly? I use hard links for sending existing files on a drive quickly to the pool, the drive is member of, instead of moving them. These files are hard linked inside and outside the PoolPart folder and by this show up twice, in the pool and on the original drive.

And considering, that your requirement and use case seems to be unique and might not be desired by a bigger DP user community, I doubt it’s worth the necessary development effort.

Share this post


Link to post
Share on other sites
  • 0

Ah, sorry for misunderstanding the question. 

But Viktor is absolutely correct, too. 

IIRC, the balancing engine doesn't check files outside of the pool folder structure, nor supports checking that.

But you can find sample code here: http://wiki.covecube.com/StableBit_DrivePool_-_Develop_Balancing_Plugins

 

 

Edit:

Additionally, StableBit DrivePool does not support this, because outside of modifying the files manually, there should be no way for there to be a hard link in the poolpart folders. And accessing the PoolPart folders is not supported.   So it's a completely unsupported scenario. 

And while it may be possible to add support for doing this, I'm not really sure that it's something that we should even do, just due to all the potential issues that this could cause. 

Share this post


Link to post
Share on other sites
  • 0
1 hour ago, Viktor said:

Phidaissi,

You are right, technically it is possible to move a file together with its entire set of hard link references from one drive to another. But for DP your request might turn out much more complicated than it seems on the first glance:

For instance, what, if a file has hard link references outside the PoolPart folder? How should DP handle that correctly? I use hard links for sending existing files on a drive quickly to the pool, the drive is member of, instead of moving them. These files are hard linked inside and outside the PoolPart folder and by this show up twice, in the pool and on the original drive.

And considering, that your requirement and use case seems to be unique and might not be desired by a bigger DP user community, I doubt it’s worth the necessary development effort.

This is definitely one of the edge cases I thought of too and why having it on the balancer would be nice. 

I agree that the use of hardlinks as a general rule is not a requirement of the average user, and gave my use case as an example of a situation where hardlinking was the clear best option - as opposed to say things like plex where hardlinking isn't the best option but they do it anyway *grumble*.

Balancing rules would be great for that sort of thing. For example, "Hardlinks outside drivepool volume will anchor file" (balancer may not move) as an option, where the default is to ignore anything outside it, creating a copy if moved by balancer (current behaviour).

The most annoying case I thought was if you have hardlinks in two locations that the balancer wants to move in different ways, thus why ignoring a directory might be nice, only the links outside that directory would be considered for moving, and the links inside the directory would just get taken along for the ride where ever the balancer takes the outside one. :D

But yes it might be something not wanted by others, and why having it optional (such as via plugin) would be nice. Though tbh, unless there are some big problems they create that I'm missing, moving hardlinks together is most of the way to making hardlinks 'just work', and potentially exposing them for direct use in the future.

I have a small python function specifically for "create hardlink between these two drivepool paths" and so balancing is the only other part to it AFAICT. 

1 hour ago, Christopher (Drashna) said:

Ah, sorry for misunderstanding the question. 

But Viktor is absolutely correct, too. 

IIRC, the balancing engine doesn't check files outside of the pool folder structure, nor supports checking that.

But you can find sample code here: http://wiki.covecube.com/StableBit_DrivePool_-_Develop_Balancing_Plugins

 

 

Edit:

Additionally, StableBit DrivePool does not support this, because outside of modifying the files manually, there should be no way for there to be a hard link in the poolpart folders. And accessing the PoolPart folders is not supported.   So it's a completely unsupported scenario. 

And while it may be possible to add support for doing this, I'm not really sure that it's something that we should even do, just due to all the potential issues that this could cause. 

If it's viable to do these checks in a plug in I'm pretty likely to do it myself, but it would require changing the behaviour of the FileMover, which I'm uncertain if possible. My guess was that it's not possible based on my initial reading, and that's why I asked. Even something as simple as hooks for before move and after move that the balancers could place on files they believe will need to be touched when they get moved would probably make it viable to do in the balancer, but I don't think that's possible right now.

I would assume that balancers can call APIs directly to check for hardlinks, and my suggested options would only involve "found some outside, so ignore those" or "don't move because..." so it would never actually touch files outside, but it could be aware they exist via the hard link listing of the file.

But there are a lot of handy use cases for hardlinks, so having them handled decently, even if not officially supported and requiring us to make them external to the drivepool volume would be nice, so it was worth asking. Especially having the FileMover support moving them, even if it's an option we need to turn on via a setting.

In any case, "we don't support it you silly hardlinking maniac" is a totally reasonable answer if that's what it is! :lol:

 

Share this post


Link to post
Share on other sites
  • 0
20 hours ago, Phidaissi said:

In any case, "we don't support it you silly hardlinking maniac" is a totally reasonable answer if that's what it is! :lol:

:lol:

Well, I did talk to alex (the developer) about this, and I've flagged it as a feature request, so we'll look into this, eventually. (eg, I can't give you a timeframe for that).  But keep in mind that if we do implement something like this, we'd probably ONLY move hardlinked files in the pool, around. We wouldn't touch anything outside of the PoolPart folder structure, for sanity's sake. 

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Answer this question...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...

×
×
  • Create New...