Jump to content
Covecube Inc.

Phidaissi

Members
  • Content Count

    4
  • Joined

  • Last visited

About Phidaissi

  • Rank
    Newbie

Profile Information

  • Gender
    Female
  • Location
    Melbourne, Australia
  1. Phidaissi

    Balancing hardlinks

    This is definitely one of the edge cases I thought of too and why having it on the balancer would be nice. I agree that the use of hardlinks as a general rule is not a requirement of the average user, and gave my use case as an example of a situation where hardlinking was the clear best option - as opposed to say things like plex where hardlinking isn't the best option but they do it anyway *grumble*. Balancing rules would be great for that sort of thing. For example, "Hardlinks outside drivepool volume will anchor file" (balancer may not move) as an option, where the default is to ignore anything outside it, creating a copy if moved by balancer (current behaviour). The most annoying case I thought was if you have hardlinks in two locations that the balancer wants to move in different ways, thus why ignoring a directory might be nice, only the links outside that directory would be considered for moving, and the links inside the directory would just get taken along for the ride where ever the balancer takes the outside one. But yes it might be something not wanted by others, and why having it optional (such as via plugin) would be nice. Though tbh, unless there are some big problems they create that I'm missing, moving hardlinks together is most of the way to making hardlinks 'just work', and potentially exposing them for direct use in the future. I have a small python function specifically for "create hardlink between these two drivepool paths" and so balancing is the only other part to it AFAICT. If it's viable to do these checks in a plug in I'm pretty likely to do it myself, but it would require changing the behaviour of the FileMover, which I'm uncertain if possible. My guess was that it's not possible based on my initial reading, and that's why I asked. Even something as simple as hooks for before move and after move that the balancers could place on files they believe will need to be touched when they get moved would probably make it viable to do in the balancer, but I don't think that's possible right now. I would assume that balancers can call APIs directly to check for hardlinks, and my suggested options would only involve "found some outside, so ignore those" or "don't move because..." so it would never actually touch files outside, but it could be aware they exist via the hard link listing of the file. But there are a lot of handy use cases for hardlinks, so having them handled decently, even if not officially supported and requiring us to make them external to the drivepool volume would be nice, so it was worth asking. Especially having the FileMover support moving them, even if it's an option we need to turn on via a setting. In any case, "we don't support it you silly hardlinking maniac" is a totally reasonable answer if that's what it is!
  2. Phidaissi

    Balancing hardlinks

    For reference, these are the win32 fileapi methods that do this: To determine if a file has hard links, fileapi function GetFileInformationByHandle gives a result where BY_HANDLE_FILE_INFORMATION .nNumberOfLinks is the number of directory entries for a given file. ie, 1 for typical files, 2 or more for files that have hard links. As you're moving the file anyway when balancing this is likely an action with negligible overhead. To get the list of all directory entries associated with a hard link, you use the fileapi function FindFirstFileNameW with FindNextFileNameW, which should give the same number of results as the nNumberOfLinks in prior step. I'm actually using a script in python so I use python wrappers that access those win32 APIs: Use os.stat(file_name).st_nlink to determine number of links - which actually works on the drivepool volume already because I presume it's just passed from the source volume If it has hard links (st_nlink > 1), I then determine the underlying location of it - via folder mounts in a known location where I check that subpath in each PoolPart. A balancer would negate this step as it knows which real volume it's on already Use win32file.FindFileNames(file_name) on the directory entry on the real volume (xx/PoolPart.xx/xx) to get the other directory entries on that volume - this doesn't work on the drivepool volume itself at the moment, so needs to be passed to the specific drive like this to work
  3. Phidaissi

    Balancing hardlinks

    Sorry, but did you read only the questions in bold and not my explanation following them that addresses the how? :( I know how they work and that's why I explicitly mentioned the solution to both of those things. The balancer moves files between disks/volumes and I was suggesting that when a file with hard links is moved that all links move with it, which is as you said, necessary for them to actually be hardlinked and not just additional copies on multiple drives. You can check if a file has more than one hard link (the first entry counts, so a regular file has a count of 1), and if it is hardlinked you can create identical links on the target drive! Looking at the link count has basically no overhead, and you'd only need to check the list of links when you actually encounter one, so the minor penalty would only apply to when actually moving a hard linked file. This information is stored in NTFS, drivepool doesn't need to store it, and you can get the information via several methods - I mentioned two of them above, but I can get you the exact api calls necessary to do the checks if that helps. That means it's possible to modify file move behaviour to do this: Check if file has link count > 1; if not do regular move from drive A to B Get list of linked directory entries in file system on A Continue regular file move from A to B Replicate hard links discovered on A in (2) on B Remove linked directory entries from A that were discovered in (2) This moves a file and it's associated hard links from A to B, balancing the file with all hard links. I actually do some of this already via a script I run regularly - look in seeding for still existing hard links, replicate links for files that have been moved between drives by balancer on the drive they were moved to, remove redundant copy left that wasn't balanced. If this isn't a change you'd be happy to make, then my question is if it's possible to do this in a balancing plug in which I write myself. I'm not sure what information balancers actually get and whether they can alter this behaviour without drivepool being updated to make it possible, which is why I asked. Basically, I am already doing this and know how it's done in an automated way that detects the changes, but doing this with scripts to fix it after the fact isn't very elegant, and I can think of a couple edge cases where if it's done in drivepool balancing it can be greatly improved, like a few new balancing rules that would be great in a plug in.
  4. Phidaissi

    Balancing hardlinks

    Having looked through previous discussions a bit I know that hardlinks are not supported through the drivepool drive itself. Would it be possible to have the balancing system preserve hardlinks when moving files though? Currently I download a torrent to a smallish temporary drive, and when it's complete it gets moved into my pool. In order to continue to seed the files while also having them sorted into their appropriate places, the files get moved into a staging area (ie, E:\Unsorted) and after they've been moved there I have a script that hardlinks them to a seeding location (ie, E:\Seeding) via the underlying PoolParts and locating each file as necessary on the right drive to do so. Then I can relocate the unsorted files to where I want them to actually be, and all the while it continues to seed them no matter. (Thus cannot be done with a symlink from seeding to unsorted, as this would break when I move the unsorted files) When I'm done seeding I can just delete them from seeding without concern. (Thus cannot be done with a symlink from unsorted to seeding, as removal of seeding would then remove the file entirely and simply break the unsorted reference to it) This works fine, and has the expected consequence that if drivepool decides to balance one of the 'copies' of the hardlink, it effectively becomes two files on different drives. That's an acceptable consequence most of the time, but mildly annoying when it's not too difficult to fix. I'm just not sure if it's viable to do with the balancer as it is currently, and don't want to invest the time to attempt creating a plugin myself to handle this if it turns out to be impossible under the current design! So could the balancer be made to recognise when it's about to move a hard linked file (discoverable in sysinternals or with fsutil) and after moving the file replicate the hardlinks within the new poolpart and remove them on the old one? This way hard linked files can still be balanced across the pool. Would it be possible to tell balancers to ignore a directory? Being able to say "ignore seeding directory" (since it's pretty much only hardlinks) and have drivepool balance files as needed otherwise would make this work even better by avoiding potential for the balancer to get confused. It would be nice if drivepool could allow the creation of hardlinks by passing the hardlink calls to the underlying NTFS on the relevant poolpart. This is effectively what I do within my script, just with the added annoyance of me having to locate which poolpart the file is in. Given that previous discussions have basically just said "we don't support it" I'm assuming there must be some other issue with supporting them that I'm missing and I'm curious about what that is, as it's not really explained elsewhere that I saw, but I'd be happy for it to just respect hardlinks manually created when moving files around.
×