Jump to content

Question

Posted

I understand that drivepool doesn't support hardlinks because files need to be on the same drive to be hardlinked, but why does it not support hardlinks for files that do exist on the same drive?

For example, if I create a pool with 2 drives and duplicate files from drve a to drive b, why can't files be hardlinked within drive a and within drive b?

In other words:

Drive A: File A hardlinked to File B (both of which are on Drive A)

Drive B: File A hardlinked to File B (both of which are on Drive B)

12 answers to this question

Recommended Posts

  • 0
Posted

What if the file is duplicated, and exists on multiple drives?  What if the file needs to be moved to another drive, but not all copies of it need to be?   Etc.

It's not that it's not possible, but the additional overhead of all of these edge cases (and more that I can't think of aff the top of my head) make things a LOT more complex.  And since some of this branched checking needs to happen in the kernel ... the longer it takes, the more it will adversely impact the system.   

It's a similar reason for why we don't support dynamic disks, either. 

  • 0
Posted

In my example, the file is duplicated, but it's only hardlinked when on the same drive. As soon as one of the hardlinked copies (that are on the same drive) is moved to another drive, then the hardlink would break and a copy is made instead.

This shouldn't be difficult to implement.

  • 0
Posted

Why don't you just replicate the same structure?

If File A and File B are hardlinked on Drive A and both are duplicated onto Drive B, they will be hardlinked on Drive B as well.

If File A and File B are hardlinked on Drive A and File A is duplicated onto Drive B and File B is duplicated onto Drive C, then a copy of the file is made onto Drive C.

Or, only allow hardlinks for folders that are duplicated as a whole onto another drive. In other words, folders that are duplicated onto the same drive would allow hardlinks and folders where files/subfolders are duplicated onto different drives would not allow hardlinks.

There must be a way to implement this instead of just plainly not supporting hardlinks for any scenario.

  • 0
Posted
On 6/21/2025 at 5:54 AM, Salim said:

In my example, the file is duplicated, but it's only hardlinked when on the same drive. As soon as one of the hardlinked copies (that are on the same drive) is moved to another drive, then the hardlink would break and a copy is made instead.

This shouldn't be difficult to implement.

Those last six words are written on the gravestones of many a project.

The problem with breaking hardlinks is that from the perspective of someone externally accessing the pool, this would mean that making changes to certain "files" would automagically update other "files" (because they're hardlinks on the same physical disk within the pool) right up until they suddenly didn't (because they're no longer hardlinks on the same physical disk within the pool). Any reliable implementation of hardlinks in DrivePool would have to ensure that hardlinks did not break.

On 6/21/2025 at 8:07 PM, Salim said:

If File A and File B are hardlinked on Drive A and both are duplicated onto Drive B, they will be hardlinked on Drive B as well.

If File A and File B are hardlinked on Drive A and File A is duplicated onto Drive B and File B is duplicated onto Drive C, then a copy of the file is made onto Drive C.

Note that if we're getting technical - and we need to when we're "opening up the hood" to look at how hardlinks actually work - there is no "File A and File B hardlinked together"; there is only one file with multiple links to it in the allocation system of the volume. If you make a change to the content of what you call "File A" then you are making a change to the content of what you call "File B", because it's one content.

This is not unlike an inverse of DrivePool's duplication where one file on the pool can be stored as multiple instances in the disks of the pool and making a change to that file involves simultaneously propagating that change to all instances of that file in the disks of the pool.

Now in theory this should "just" mean that (at minimum) whenever DrivePool performs balancing, placement, duplication or evacuation (so basically quite often by default) it would have to include something like the equivalent of "fsutil hardlink list" on the operational file(s) on the source disk(s) to check for hardlinks and then (de)propagate any and all such to the target disk(s) as part of the copy and/or delete process.

But in practice this means (at minimum) squeezing more code into a part of the kernel that complains about every literal millisecond of performance sacrificed to have such nice things. And extrapolating hardlinks isn't a simple binary decision, it's more along the lines of a for-do array. The word "just" is doing a lot of work here - and we haven't even gotten into handling the edge cases Christopher mentioned. DrivePool needs to include code to handle "File A" potentially being in a folder with a different duplication level to a folder containing "File B" (and potentially "File C", "File D", etc as NTFS supports up to 1024 hardlinks per file). Even if we KISS and "just" pick the highest level out of caution, DrivePool also has to check whether "File A" is in a folder with a placement rule that is different to the folder with "File B" (or, again, potentially "File C", "File D", etc). What is DrivePool supposed to do when "File A" is in a folder that must be kept only on disks #1 and #2 while "File B" is in a folder that must be kept only on disks #3 and #4? That's a user-level call, which means yet more lookups in kernel space (plus additions to the GUI).

On 6/21/2025 at 8:07 PM, Salim said:

There must be a way to implement this instead of just plainly not supporting hardlinks for any scenario.

TLDR? There is but Christopher is right: "the additional overhead of all of these edge cases" ... "make things a LOT more complex." That's generally the problem in a lot of these situations - the more things a tool needs to be able to do the harder it gets to make it do any/all of those things without breaking, and at the end of the day the people making the tools need to put food on their table.

Maybe you could try a local CloudDrive-on-DrivePool setup? I don't know how much that would affect performance or resilience, but you'd get hardlinks (because the combination lets you shift the other features into a separate layer). Other alternatives... mount a NTFS-formatted iSCSI LUN from a ZFS NAS (e.g. QNAP, Synology, TrueNAS, etc)?

  • 0
Posted

Ok, then how about not allowing selective duplication unless all folders that have hardlinks are duplicated onto the same drive? And if hardlinks are attempted after duplication has been set up, only allow hardlinks on files that are being duplicated onto the same drive? This way, hardlinks don't have to be broken, you're only controlling the scenarios where they're being allowed.

For example, if you have a file on Disk A with Hardlink A in Folder A and Hardlink B in Folder B, only allow Folder A to be duplicated to Disk B if Folder B is also duplicated to Disk B (or force Folder B to be duplicated to the same disk when Folder A is duplicated). This would force all hardlinked files to be on the same drive and would get rid of edge cases.

What do you think?

  • 0
Posted
6 hours ago, Salim said:

Ok, then how about not allowing selective duplication unless all folders that have hardlinks are duplicated onto the same drive? And if hardlinks are attempted after duplication has been set up, only allow hardlinks on files that are being duplicated onto the same drive? This way, hardlinks don't have to be broken, you're only controlling the scenarios where they're being allowed.

For example, if you have a file on Disk A with Hardlink A in Folder A and Hardlink B in Folder B, only allow Folder A to be duplicated to Disk B if Folder B is also duplicated to Disk B (or force Folder B to be duplicated to the same disk when Folder A is duplicated). This would force all hardlinked files to be on the same drive and would get rid of edge cases.

What do you think?

Your examples seem to only take a 2 disk pool into consideration. Adding a 3rd drive (or more) into the pool immediately complicates things when it comes to the balancing and duplication checks that would be needed to make this possible.

Also, even if you try to control the conditions under which hardlinks are allowed to be created, those conditions can change due to adding/removing a drive or any automatic balancing or drive evacuation that takes place. Allowing hardlinks to be created under certain conditions when those conditions could change at any point afterwards probably isn't a good idea. 

Any implementation to fully support hard links has to work properly and scale with large pools without any (or at least minimal) performance penalties, and be 100% guaranteed to not break the links regardless of dupication, balancing, etc. 

  • 0
Posted

My logic should work on pools with any number of disks if it only allows hardlinks to be created when files are on the same disk.

Afaik, balancing is only performed when using a JBOD setup where there's no duplication, maybe hardlinks would be a bad idea in such a setup, yeah.

But my scenario is a mirrored setup where most (if not all) files are mirrored onto another drive (files in Folder A on Disk A are mirrored onto Disk B, and files in Folder B on Disk A are mirrored onto Disk C), so there can be multiple disks, but it's always a mirror.

So if hardlinks are gonna be a problem in a JBOD setup, then only allow it in a mirrored setup where there's no balancing involved.

  • 0
Posted
8 hours ago, Salim said:

My logic should work on pools with any number of disks if it only allows hardlinks to be created when files are on the same disk.

Afaik, balancing is only performed when using a JBOD setup where there's no duplication, maybe hardlinks would be a bad idea in such a setup, yeah.

But my scenario is a mirrored setup where most (if not all) files are mirrored onto another drive (files in Folder A on Disk A are mirrored onto Disk B, and files in Folder B on Disk A are mirrored onto Disk C), so there can be multiple disks, but it's always a mirror.

So if hardlinks are gonna be a problem in a JBOD setup, then only allow it in a mirrored setup where there's no balancing involved.

Balancing can happen on any pool regardless of duplication level or number of drives (obvious minimum of 2 drives in the pool for balancing to occur) Balancing can/will occur based on critera set be the balancing options, the enabled plugins (the order of the plugins also affects how data will be balanced), any user file placement rules, adding/removing a drive from the pool, etc.

Hardlinks would need to be 100% compatible and guaranteed not to break with any pool regardless of disk layout, balancing settings or duplication settings. I'll say it again, trying to support hardlinks under certain conditions and preventing them under others is just a really bad idea when those conditions can change at any time.

 

  • 0
Posted

Why would there be balancing if a pool is set up as a mirror and everything is mirrored selectively onto another drive?

I'm not sure how drivepool is setup exactly (I haven't used it yet because of the lack of hardlink support), but I would assume you can set up a pool as a mirror or jbod and maybe other types. So if a pool is set up as a mirror with selective mirroring (I select which folders are mirrored onto which disks), I don't see the need for balancing and I don't see why hardlinks can't be supported in such a setup.

And if drivepool doesn't support the creation of different pool types, why not add this option and allow the creation of a pool type that supports hardlinks, but with limitations on balancing and adding/removing drives (and anything that can break hardlinks)?

  • 0
Posted
7 hours ago, Salim said:

Why would there be balancing if a pool is set up as a mirror and everything is mirrored selectively onto another drive?

I'm not sure how drivepool is setup exactly (I haven't used it yet because of the lack of hardlink support), but I would assume you can set up a pool as a mirror or jbod and maybe other types. So if a pool is set up as a mirror with selective mirroring (I select which folders are mirrored onto which disks), I don't see the need for balancing and I don't see why hardlinks can't be supported in such a setup.

And if drivepool doesn't support the creation of different pool types, why not add this option and allow the creation of a pool type that supports hardlinks, but with limitations on balancing and adding/removing drives (and anything that can break hardlinks)?

Since you don't use/haven't used drivepool I would suggest learning more about how it works so you can see why supporting hardlinks isn't as simple as you might think, and also why having drivepool try to support them conditionally isn't a great idea either.

I've been using drivepool for about 12 years now myself. I have full confidence that if there was a simple way for Alex to add support for hardlinks on the pool it would have been done a long time ago.

I also want to be 100% clear that I'd love to see hardlinks supported on the pool myself, but I also understand why it hasn't happened yet.

  • 0
Posted
10 hours ago, Salim said:

I'm not sure how drivepool is setup exactly (I haven't used it yet because of the lack of hardlink support), but I would assume

Salim, what you're asking for isn't simple. To try to use a bad car analogy to give you an idea: "I don't know how this truck works that can tow multiple trailers with the abilities to redistribute the cargo between trailers and even switch out whole trailers, safely, while the truck is still being driven on the highway, but I would assume it can't be difficult to make it so it could also/instead run on train tracks with the press of a button."

However looking at your previous thread:

On 1/28/2025 at 7:00 PM, Salim said:

The main drive is already a 4 disk 2-way mirror storage space, but this only protects against 1 drive failure, if for any reason 2 drives fail, my data is nuked. So I am syncing this storage space onto another single drive, so if the storage space fails for any reason, my files are immediately accessible on the single drive (and vice versa) without having to perform any restore or rebuild or unhide folders first.

So I am wanting to write the data simultaneously onto both drives (the storage space and the single drive) instead of writing to one and then syncing to the other.

Presuming you still want this, on Windows, AND hardlinks, I'm again going to suggest a CloudDrive-on-DrivePool setup, something like:

  • Create a DrivePool pool, let's call it "Alpha", add your "main drive" as the first disk (let's call it "Beta") and your "single drive" as the second disk (let's call it "Gamma"), enable real-time duplication, disable read-striping, set whole-pool duplication x2 (you could use per-folder only if you knew exactly what you were doing).
    • note: if you plan to expand/replace Beta and/or Gamma in the future, and don't mind a bit of extra complexity now, I would suggest adding each of them to a pool of their own and THEN add those pools instead to Alpha to help with future-proofing. YMMV.
  • Connect "Alpha" as a Local Disk provider in CloudDrive, then Create your virtual drive on it, let's call that "Zod"; make sure its chosen size is not more than the free space of each of Beta and Gamma (so if Beta had 20TB free and Gamma had 18TB free you'd pick a size less than 18TB) so that it'll fit.

There might be some fiddly bits to expand upon in that but that's the gist of it. Then you could create hardlinks in Zod to your heart's content and they'll be replicated across all disks. The catch would be that you couldn't "read" Zod's data by looking individually at Alpha, Beta or Gamma because of the block translation - but if you "lost" either Beta or Gamma due to physical disk failure you could still recover by replacing the relevant disk(s), with Zod switching to a read-only state until you did so. You could even attach Gamma to another machine that also had CloudDrive installed, to extract the data from it, but you'd have to be careful to avoid turning that into a one-way trip.

  • 0
Posted

@Shane thanks, I'll have a look at this setup when I get the chance, but it sounds like I would be playing with fire with such a setup because it's as if I'm creating a software raid 1 where Cloud drive (zod) is a virtual drive on a cloudpool mirror (alpha), so my data cannot be read except by clouddrive (due to block translation), so my data would be at clouddrive's mercy, and as reliable as clouddrive is, that would be risky I believe.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Answer this question...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...