Jump to content
  • 1

Balancing hardlinks


Phidaissi

Question

Having looked through previous discussions a bit I know that hardlinks are not supported through the drivepool drive itself.

Would it be possible to have the balancing system preserve hardlinks when moving files though?

Currently I download a torrent to a smallish temporary drive, and when it's complete it gets moved into my pool.

In order to continue to seed the files while also having them sorted into their appropriate places, the files get moved into a staging area (ie, E:\Unsorted) and after they've been moved there I have a script that hardlinks them to a seeding location (ie, E:\Seeding) via the underlying PoolParts and locating each file as necessary on the right drive to do so.
Then I can relocate the unsorted files to where I want them to actually be, and all the while it continues to seed them no matter. (Thus cannot be done with a symlink from seeding to unsorted, as this would break when I move the unsorted files)
When I'm done seeding I can just delete them from seeding without concern. (Thus cannot be done with a symlink from unsorted to seeding, as removal of seeding would then remove the file entirely and simply break the unsorted reference to it)

This works fine, and has the expected consequence that if drivepool decides to balance one of the 'copies' of the hardlink, it effectively becomes two files on different drives.

That's an acceptable consequence most of the time, but mildly annoying when it's not too difficult to fix. I'm just not sure if it's viable to do with the balancer as it is currently, and don't want to invest the time to attempt creating a plugin myself to handle this if it turns out to be impossible under the current design!

So could the balancer be made to recognise when it's about to move a hard linked file (discoverable in sysinternals or with fsutil) and after moving the file replicate the hardlinks within the new poolpart and remove them on the old one?
This way hard linked files can still be balanced across the pool.

Would it be possible to tell balancers to ignore a directory?

Being able to say "ignore seeding directory" (since it's pretty much only hardlinks) and have drivepool balance files as needed otherwise would make this work even better by avoiding potential for the balancer to get confused. :lol:

It would be nice if drivepool could allow the creation of hardlinks by passing the hardlink calls to the underlying NTFS on the relevant poolpart.
This is effectively what I do within my script, just with the added annoyance of me having to locate which poolpart the file is in.
Given that previous discussions have basically just said "we don't support it" I'm assuming there must be some other issue with supporting them that I'm missing and I'm curious about what that is, as it's not really explained elsewhere that I saw, but I'd be happy for it to just respect hardlinks manually created when moving files around.

Link to comment
Share on other sites

20 answers to this question

Recommended Posts

  • 0
On 1/5/2019 at 11:01 PM, Phidaissi said:

Would it be possible to have the balancing system preserve hardlinks when moving files though?

No.  Hardlinks work by creating multiple directory entries for the same data.  That means that this data HAS to be on the same disk, and the same volume.  

So, if it gets balanced, there is no way to maintain this.  That's part of why we don't support hard links. 

The other reason, is that the pool does not store any data on it, so there is no way to hard link the data on the pool.   So.... it's just not possible to support hard links on the pool. 

 

On 1/5/2019 at 11:01 PM, Phidaissi said:

Would it be possible to tell balancers to ignore a directory?

You could use the File Placement Rules to force the folder to a specific disk.  That should accomplish what you want.

https://stablebit.com/Support/DrivePool/2.X/Manual?Section=File Placement

Link to comment
Share on other sites

  • 0
2 hours ago, Christopher (Drashna) said:

No.  Hardlinks work by creating multiple directory entries for the same data.  That means that this data HAS to be on the same disk, and the same volume.  

So, if it gets balanced, there is no way to maintain this.  That's part of why we don't support hard links. 

The other reason, is that the pool does not store any data on it, so there is no way to hard link the data on the pool.   So.... it's just not possible to support hard links on the pool. 

 

You could use the File Placement Rules to force the folder to a specific disk.  That should accomplish what you want.

https://stablebit.com/Support/DrivePool/2.X/Manual?Section=File Placement

Sorry, but did you read only the questions in bold and not my explanation following them that addresses the how? :( 

I know how they work and that's why I explicitly mentioned the solution to both of those things. The balancer moves files between disks/volumes and I was suggesting that when a file with hard links is moved that all links move with it, which is as you said, necessary for them to actually be hardlinked and not just additional copies on multiple drives. 

You can check if a file has more than one hard link (the first entry counts, so a regular file has a count of 1), and if it is hardlinked you can create identical links on the target drive! Looking at the link count has basically no overhead, and you'd only need to check the list of links when you actually encounter one, so the minor penalty would only apply to when actually moving a hard linked file. 

This information is stored in NTFS, drivepool doesn't need to store it, and you can get the information via several methods - I mentioned two of them above, but I can get you the exact api calls necessary to do the checks if that helps. 

That means it's possible to modify file move behaviour to do this:

  1. Check if file has link count > 1; if not do regular move from drive A to B
  2. Get list of linked directory entries in file system on A
  3. Continue regular file move from A to B
  4. Replicate hard links discovered on A in (2) on B
  5. Remove linked directory entries from A that were discovered in (2)

This moves a file and it's associated hard links from A to B, balancing the file with all hard links. 

I actually do some of this already via a script I run regularly - look in seeding for still existing hard links, replicate links for files that have been moved between drives by balancer on the drive they were moved to, remove redundant copy left that wasn't balanced. 

If this isn't a change you'd be happy to make, then my question is if it's possible to do this in a balancing plug in which I write myself. I'm not sure what information balancers actually get and whether they can alter this behaviour without drivepool being updated to make it possible, which is why I asked.  :)

Basically, I am already doing this and know how it's done in an automated way that detects the changes, but doing this with scripts to fix it after the fact isn't very elegant, and I can think of a couple edge cases where if it's done in drivepool balancing it can be greatly improved, like a few new balancing rules that would be great in a plug in. 

Link to comment
Share on other sites

  • 0

For reference, these are the win32 fileapi methods that do this:

To determine if a file has hard links, fileapi function GetFileInformationByHandle gives a result where BY_HANDLE_FILE_INFORMATION .nNumberOfLinks is the number of directory entries for a given file. ie, 1 for typical files, 2 or more for files that have hard links. As you're moving the file anyway when balancing this is likely an action with negligible overhead.

To get the list of all directory entries associated with a hard link, you use the fileapi function FindFirstFileNameW with FindNextFileNameW, which should give the same number of results as the nNumberOfLinks in prior step.

I'm actually using a script in python so I use python wrappers that access those win32 APIs:

  1. Use os.stat(file_name).st_nlink to determine number of links - which actually works on the drivepool volume already because I presume it's just passed from the source volume
  2. If it has hard links (st_nlink > 1), I then determine the underlying location of it - via folder mounts in a known location where I check that subpath in each PoolPart. A balancer would negate this step as it knows which real volume it's on already
  3. Use win32file.FindFileNames(file_name) on the directory entry on the real volume (xx/PoolPart.xx/xx) to get the other directory entries on that volume - this doesn't work on the drivepool volume itself at the moment, so needs to be passed to the specific drive like this to work

 

Link to comment
Share on other sites

  • 0

Phidaissi,

You are right, technically it is possible to move a file together with its entire set of hard link references from one drive to another. But for DP your request might turn out much more complicated than it seems on the first glance:

For instance, what, if a file has hard link references outside the PoolPart folder? How should DP handle that correctly? I use hard links for sending existing files on a drive quickly to the pool, the drive is member of, instead of moving them. These files are hard linked inside and outside the PoolPart folder and by this show up twice, in the pool and on the original drive.

And considering, that your requirement and use case seems to be unique and might not be desired by a bigger DP user community, I doubt it’s worth the necessary development effort.

Link to comment
Share on other sites

  • 0

Ah, sorry for misunderstanding the question. 

But Viktor is absolutely correct, too. 

IIRC, the balancing engine doesn't check files outside of the pool folder structure, nor supports checking that.

But you can find sample code here: http://wiki.covecube.com/StableBit_DrivePool_-_Develop_Balancing_Plugins

 

 

Edit:

Additionally, StableBit DrivePool does not support this, because outside of modifying the files manually, there should be no way for there to be a hard link in the poolpart folders. And accessing the PoolPart folders is not supported.   So it's a completely unsupported scenario. 

And while it may be possible to add support for doing this, I'm not really sure that it's something that we should even do, just due to all the potential issues that this could cause. 

Link to comment
Share on other sites

  • 0
1 hour ago, Viktor said:

Phidaissi,

You are right, technically it is possible to move a file together with its entire set of hard link references from one drive to another. But for DP your request might turn out much more complicated than it seems on the first glance:

For instance, what, if a file has hard link references outside the PoolPart folder? How should DP handle that correctly? I use hard links for sending existing files on a drive quickly to the pool, the drive is member of, instead of moving them. These files are hard linked inside and outside the PoolPart folder and by this show up twice, in the pool and on the original drive.

And considering, that your requirement and use case seems to be unique and might not be desired by a bigger DP user community, I doubt it’s worth the necessary development effort.

This is definitely one of the edge cases I thought of too and why having it on the balancer would be nice. 

I agree that the use of hardlinks as a general rule is not a requirement of the average user, and gave my use case as an example of a situation where hardlinking was the clear best option - as opposed to say things like plex where hardlinking isn't the best option but they do it anyway *grumble*.

Balancing rules would be great for that sort of thing. For example, "Hardlinks outside drivepool volume will anchor file" (balancer may not move) as an option, where the default is to ignore anything outside it, creating a copy if moved by balancer (current behaviour).

The most annoying case I thought was if you have hardlinks in two locations that the balancer wants to move in different ways, thus why ignoring a directory might be nice, only the links outside that directory would be considered for moving, and the links inside the directory would just get taken along for the ride where ever the balancer takes the outside one. :D

But yes it might be something not wanted by others, and why having it optional (such as via plugin) would be nice. Though tbh, unless there are some big problems they create that I'm missing, moving hardlinks together is most of the way to making hardlinks 'just work', and potentially exposing them for direct use in the future.

I have a small python function specifically for "create hardlink between these two drivepool paths" and so balancing is the only other part to it AFAICT. 

1 hour ago, Christopher (Drashna) said:

Ah, sorry for misunderstanding the question. 

But Viktor is absolutely correct, too. 

IIRC, the balancing engine doesn't check files outside of the pool folder structure, nor supports checking that.

But you can find sample code here: http://wiki.covecube.com/StableBit_DrivePool_-_Develop_Balancing_Plugins

 

 

Edit:

Additionally, StableBit DrivePool does not support this, because outside of modifying the files manually, there should be no way for there to be a hard link in the poolpart folders. And accessing the PoolPart folders is not supported.   So it's a completely unsupported scenario. 

And while it may be possible to add support for doing this, I'm not really sure that it's something that we should even do, just due to all the potential issues that this could cause. 

If it's viable to do these checks in a plug in I'm pretty likely to do it myself, but it would require changing the behaviour of the FileMover, which I'm uncertain if possible. My guess was that it's not possible based on my initial reading, and that's why I asked. Even something as simple as hooks for before move and after move that the balancers could place on files they believe will need to be touched when they get moved would probably make it viable to do in the balancer, but I don't think that's possible right now.

I would assume that balancers can call APIs directly to check for hardlinks, and my suggested options would only involve "found some outside, so ignore those" or "don't move because..." so it would never actually touch files outside, but it could be aware they exist via the hard link listing of the file.

But there are a lot of handy use cases for hardlinks, so having them handled decently, even if not officially supported and requiring us to make them external to the drivepool volume would be nice, so it was worth asking. Especially having the FileMover support moving them, even if it's an option we need to turn on via a setting.

In any case, "we don't support it you silly hardlinking maniac" is a totally reasonable answer if that's what it is! :lol:

 

Link to comment
Share on other sites

  • 0
20 hours ago, Phidaissi said:

In any case, "we don't support it you silly hardlinking maniac" is a totally reasonable answer if that's what it is! :lol:

:lol:

Well, I did talk to alex (the developer) about this, and I've flagged it as a feature request, so we'll look into this, eventually. (eg, I can't give you a timeframe for that).  But keep in mind that if we do implement something like this, we'd probably ONLY move hardlinked files in the pool, around. We wouldn't touch anything outside of the PoolPart folder structure, for sanity's sake. 

Link to comment
Share on other sites

  • 0

FWIW, i would very much like this.  Indeed, this is the one thing keeping me off of DrivePool.  I would happily move to drivepool and even pay extra for such a feature.

My use case is similar to Phidaissi's.  I often download torrents, but then use hardlinks to 'virtually move' them to a new location with a different name.  In my scenario, both the torrent location and the final destination location would be in the same pool.  So this constraint would be satisfied:

> But keep in mind that if we do implement something like this, we'd probably ONLY move hardlinked files in the pool, around.

Right now i'm using storage spaces, and the experience is pretty abysmal.  Performance is terrible (using a parity setup) and I'd really like to move to some alternate tech. DrivePool+Snapraid looks great and has awesome perf when i've tested it.  However, the lack of hardlinks makes it a non-starter as i simply cannot have my folder setup properly like i can with Storage Spaces (which does support hard links). 

Thanks so much!

Link to comment
Share on other sites

  • 0

+1 for the feature request. Without hard links I cannot accurately represent the filesystem that I'm backing up. I use DP to create a backup for my Ubuntu Server NAS with ZFS. I use hard links to preserve original folder structures while making a copy for my own personal folder structure. Without the links, I can't actually use DP as a backup destination, as it cannot represent the source that I'm backing up.

Link to comment
Share on other sites

  • 0

+1 another vote for hardlink support.  I have a decent spread of audio, image and video files stored in Drivepool.  I would like to produce durable curated collections of my content for slideshow/mixtape purposes, without wasting space or moving the original files.  Symlinks break if I drag-drop share such a collection to removable storage.  Hardlinks don't -- the content is copied for real to the outgoing thumb drive.

Link to comment
Share on other sites

  • 0

+1 for this feature request.  2 use cases would be Plex, which isn't usable on the pool without it, and my photos folder.  I have the photos categorized in subfolders, if one falls into two categories I'd like to link the two copies.  DFHL is a nice automated scanner to accomplish this: https://www.jensscheffler.de/dfhl

Link to comment
Share on other sites

  • 0

Not an answer to the hardlink issue, but just a suggestion for another (possibly far simpler) way to fix some of these duplicate file issues such as with torrents - enable data deduplication (requires Server 2012 or newer). Then you can just copy the files - no need for hardlinks and DrivePool can keep treating all your files as normal files correctly - and Windows will handle the data duplication for you.

Note that this will result in a highly fragmented hard drive, which may have some performance issues depending on your use-case - and can result in the counterintuitive situation where your physical drive is storing more data than can fit on it (Possibly by double, in the case of something like the torrent scenario described above), so can no longer be duplicated by DrivePool to another same-sized drive.

Link to comment
Share on other sites

  • 0
On 1/8/2021 at 2:48 AM, Ned said:

Not an answer to the hardlink issue, but just a suggestion for another (possibly far simpler) way to fix some of these duplicate file issues such as with torrents - enable data deduplication (requires Server 2012 or newer). Then you can just copy the files - no need for hardlinks and DrivePool can keep treating all your files as normal files correctly - and Windows will handle the data duplication for you.

Unfortunately, this does not work across physical volumes is the issue. It's already possible to make hard links on the same drive, the issue is when one of those copies gets balanced to a different drive, and it has exactly the same outcome - an additional copy of something that didn't need an additional copy.

That's why the request was regarding balancing hardlinks and not about making hardlinks. ;)

Link to comment
Share on other sites

  • 0
On 1/5/2019 at 11:01 PM, Phidaissi said:

Would it be possible to have the balancing system preserve hardlinks when moving files though?

Hard no.  Hard links are NOT support on the pool.  And hardlinks DO NOT work across drives.  Hard links work by linking the same data to multiple locations on the same volume. 

So, no, it cannot, at all. 

Link to comment
Share on other sites

  • 0
On 1/10/2021 at 6:48 AM, Christopher (Drashna) said:

Hard no.  Hard links are NOT support on the pool.  And hardlinks DO NOT work across drives.  Hard links work by linking the same data to multiple locations on the same volume. 

So, no, it cannot, at all. 

I'm pretty sure you've just repeated exactly what you did two years ago when the thread was started and just read the bold question and not the actual explanation. :lol:

It was all about things that are actually possible, and not at all about hard links across drives. In fact, that was kind of the point...

We did already go over that, and your own replies above indicated that what I explained was as you said:

On 1/11/2019 at 8:12 AM, Christopher (Drashna) said:

:lol:

Well, I did talk to alex (the developer) about this, and I've flagged it as a feature request, so we'll look into this, eventually. (eg, I can't give you a timeframe for that).  But keep in mind that if we do implement something like this, we'd probably ONLY move hardlinked files in the pool, around. We wouldn't touch anything outside of the PoolPart folder structure, for sanity's sake. 

 

Link to comment
Share on other sites

  • 0

Sorry, I did skim, and didn't fully read. 

However, I've learned more about how the file system works (alex is the expert here though), but with that expanded knowledge is learning that hard links on the pool is not possible. Not a "maybe possible in the future", but a hard no. 

 

However, with StableBit CloudDrive, you could create a drive with the local disk provider, use the pool drive, and then use hardlinks to your hearts desire on the CloudDrive disk, since CloudDrive doesn't deal with the file system at all.  It deals with the raw disk data. 

Also, features like data deduplication should work on the CloudDrive disk, too. 

Link to comment
Share on other sites

  • 0
On 1/11/2021 at 3:53 PM, Christopher (Drashna) said:

Sorry, I did skim, and didn't fully read. 

However, I've learned more about how the file system works (alex is the expert here though), but with that expanded knowledge is learning that hard links on the pool is not possible. Not a "maybe possible in the future", but a hard no. 

 

However, with StableBit CloudDrive, you could create a drive with the local disk provider, use the pool drive, and then use hardlinks to your hearts desire on the CloudDrive disk, since CloudDrive doesn't deal with the file system at all.  It deals with the raw disk data. 

Also, features like data deduplication should work on the CloudDrive disk, too. 

Just to clarify based on the second half of your post... Am I correct in interpreting that if I set up any drive (cloud, local, NAS share, etc) in CloudDrive, and I add that to a pool of only CloudDrives in DrivePool (X:), hardlinking could theoretically work on X:? And if that's the case, would there be issues in terms of balancing? 

Thanks!

Link to comment
Share on other sites

  • 0
5 hours ago, Gabe said:

Just to clarify based on the second half of your post... Am I correct in interpreting that if I set up any drive (cloud, local, NAS share, etc) in CloudDrive, and I add that to a pool of only CloudDrives in DrivePool (X:), hardlinking could theoretically work on X:? And if that's the case, would there be issues in terms of balancing? 

Thanks!

No.  Hardlinking doesn't work on the pool drive, at all, and never will.  The hard links are an object/feature of the volume, not the disk, and require that all instances of the file be on the same *physical* volume. 

They work on StableBit CloudDrive, because it doesn't emulate the filesystem the way that StableBit DrivePool does.  It handles things on a block level (below the file system, basically), and never directly deals with the file system.  Because of this,  just about anything you can do on a normal disk, you can do on the StableBit CloudDrive disks.  

But if they're pooled, then the pool's limitations still apply (at least to the pool drive).

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Answer this question...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...