Jump to content
Covecube Inc.
  • 0
Beach

File placement based on folder

Question

Hi, im trying to figure out if its possible to make a rule so that folders and all files in the subfolders are placed on the same Hard drive together instead of each file in a folder placed on random hard drives. Ie

 

 

C:\

     Folder 1

     Folder 2

         Folder A

         Folder B

     Folder 3

         Folder C

 

 

In the example above all the files in Folder A will be placed on the same HD(doesn't matter which but all together). Folder B files all on the same HD. etc etc

 

 

Reason for this is if I have a HD failure instead of missing random files from random folders its a lot easier for me to say "ok Folder A is gone and needs to be restored"

 

Share this post


Link to post
Share on other sites

Recommended Posts

  • 0

But I am a customer too and I am not sure I'd want (the risks associated with) this.

What risk? If implemented properly it will be optional. You don't need to use it.

 

I don't need duplication and that code part alone makes DrivePool not as "simple and fast as possible". But I don't complain about it because I simply don't use it + I am not forced to use it.

Share this post


Link to post
Share on other sites
  • 0

Perhaps. Of course, perfect code never fails but perfect code is very hard to come by. If the core code is stable and lean and this is an optional add-in or somesuch then, sure, I would have no issue with it. But as I understand it, it would require a lot of work at the driver level itself.

Share this post


Link to post
Share on other sites
  • 0

Perhaps. Of course, perfect code never fails but perfect code is very hard to come by. If the core code is stable and lean and this is an optional add-in or somesuch then, sure, I would have no issue with it. But as I understand it, it would require a lot of work at the driver level itself.

 

.On both the driver and the service, actually. 

 

A bulk of the code would be service side, because we try to keep the driver lightweight. 

 

Either way, if/when we were to implement it, it is something that we'd try to well test first.  

 

Given that I have a fairly massive pool, it would be a good test bed for it, actually. 

Share this post


Link to post
Share on other sites
  • 0

Read the entire thread, but cannot understand if there is a decision to do this ? I would very much like to see this implemented, even as an addon. 

Share this post


Link to post
Share on other sites
  • 0

Read the entire thread, but cannot understand if there is a decision to do this ? I would very much like to see this implemented, even as an addon. 

 

 

At this time, no.  However, it is something that may be added in future versions.

 

 

 

 

 

It is something that has been highly demanded, so that definitely does influence our decision. But the fact that this would require a massive rewrite to the balancing code also weighs in our decision.

 

 

We'd love to add every feature that everyone wants, but that's logistically impossible.  So, for now, we don't have plans on implementing this as a feature. But that may change in the future. 

 

And as I've said elsewhere, I do push this topic repeatedly.  Heck, I have a sticky note on my screen just to remind me to bring it up. 

Share this post


Link to post
Share on other sites
  • 0

I think the main issue is that there is only one developer, but I could be wrong. So the first feature of choice would have to be cloning (of the developer, not data ;) ).

Share this post


Link to post
Share on other sites
  • 0

yeah, there is only the one developer.  

 

And the problem with voting for features is that just because you want a feature doesn't make it easy, feasible, or effective.   Alex is pretty good about adding features that are simple and easy.  But this isn't a simple thing to add, nor easy.  It would likely require a massive rewrite to the balancing engine.  And ... there are other things we want to add before that.  

 

That said, because this does come up often, I do push this topic frequently.  So "bounty" or no, ... I may wear Alex down sooner or later. :)

Share this post


Link to post
Share on other sites
  • 0

Hello,

I just downloaded a trial of your app last night after getting snapraid setup.  I'm trying to manage 46tb of data in millions of organized file/folders that will be housed across 24 drives.  I added 3 of my light duty drives to a drivepool to demo your app.  I really like the existing feature set/gui/functionality of what's there.  However as I'm looking at long term archival of data for what is potentially many decades to come  while managing an ever growing data set.  The idea of intentionally/irreversibly scrambling my hierarchys seems extremely counter intuitive to what I'm going for which is improving organization with multiple ways of recovering data.  The current implementation of your code doesn't give me multiple ways out once you've undone all my organization.

Sorry, I fully understand that it's my burden to find an app that suits my needs and not your responsibility to make changes that potentially go against this applications goals.  I just wanted to add my voice to the chorus here stating that you'd surely have a long time customer if it does become possible to incorporate this functionality.  I've seen your app mentioned many times in data circles I communicate in, but this massive drawback is always mentioned as a deal breaking caveat.  Best of luck, you guys have done a good job with what's here!

Share this post


Link to post
Share on other sites
  • 0
1 hour ago, Eatingpattern said:

Hello,

I just downloaded a trial of your app last night after getting snapraid setup.  I'm trying to manage 46tb of data in millions of organized file/folders that will be housed across 24 drives.  I added 3 of my light duty drives to a drivepool to demo your app.  I really like the existing feature set/gui/functionality of what's there.  However as I'm looking at long term archival of data for what is potentially many decades to come  while managing an ever growing data set.  The idea of intentionally/irreversibly scrambling my hierarchys seems extremely counter intuitive to what I'm going for which is improving organization with multiple ways of recovering data.  The current implementation of your code doesn't give me multiple ways out once you've undone all my organization.

Sorry, I fully understand that it's my burden to find an app that suits my needs and not your responsibility to make changes that potentially go against this applications goals.  I just wanted to add my voice to the chorus here stating that you'd surely have a long time customer if it does become possible to incorporate this functionality.  I've seen your app mentioned many times in data circles I communicate in, but this massive drawback is always mentioned as a deal breaking caveat.  Best of luck, you guys have done a good job with what's here!

In your case, the Ordered File Placement balancer plugin may actually suit your needs (or be VERY close to it). 

https://stablebit.com/DrivePool/Plugins

Quote

This plug-in changes StableBit DrivePool's default file placement strategy.

By default, StableBit DrivePool always places new files onto the disk with the most free space. This tends to equalize the amount of free space across all the disks in the pool.

With this plug-in, StableBit DrivePool will place new files in a way that will fill one disk at a time. Once a disk is filled to a pre-set threshold, StableBit DrivePool will then move onto filling the next disk.

This has a number of benefits:

  • Files copied at the same time will tend to be on the same disk. Because those files were copied at the same time, it stands to reason that they might be related. It can be beneficial, in terms of file recovery, to have related files be placed on the same disk.
  • You may want StableBit DrivePool to write to one disk at a time in order to implement an efficient power savings policy.
     

 

Share this post


Link to post
Share on other sites
  • 0
24 minutes ago, Christopher (Drashna) said:

In your case, the Ordered File Placement balancer plugin may actually suit your needs (or be VERY close to it). 

https://stablebit.com/DrivePool/Plugins

 

Well I certainly wasn't expecting a response that quick.  I certainly appreciate that! 

 

Interesting!  I wasn't aware that there were plugins available outside of the defaults.  Yes, I can see how that would at least put me in the ballpark and any splits sound like they should be between two disks if need be instead of striped across all my volumes.  I think this will be as close as I need!  Thank you for your time and have a good weekend!

Share this post


Link to post
Share on other sites
  • 0

You're very welcome! :) 

Well, the listed plugins drastically change the default behavior.  So we don't include them by default, as that could lead to more problems then it solves.

And yeah, it will fill up the drives "sequentially", at least.  And most folders should be on the same disk, but this won't guarantee or enforce that, unfortunately. 

 

Share this post


Link to post
Share on other sites
  • 0

Hi, I'm currently doing research on converting into a DrivePool + SnapRAID setup for my server. I'm one of those people that are struggling due to the limitations of split level folder management. I'm trying my best to find a good middle ground between no folder management and micro-managing. Ordered File Placement plugin wouldn't be a great solution for my case since this would make the SnapRAID parity to be huge due to the uneven distribution. Which brings me to the ootb file placement feature. It doesn't seem like the file placement is regex but includes the * and + which has very similar functionality to their regex counterparts. For my middle ground folder management, I was thinking along the lines of splitting the content by folder title - folders starting with A-M to one drive, and N-Z in another. Would the only way to achieve this to create a rule per first letter of the alphabet? 

Ex. Rule for /FolderX/A*

If it was regex, I could do /FolderX/[A-M]* in one rule but it doesn't look like it supports full regex based off of the documentation. Any tips or pointers would be greatly appreciated!

Thanks!

Share this post


Link to post
Share on other sites
  • 0

Well, regex would probably work for this, and allow for very complex configs.

The issue is that we'd need to implement support for it in the kernel too. And IIRC, that was where the issue really lay.  It would be some complex checking, basically every time a file was written to the pool, and could affect a significant performance penalty.

Share this post


Link to post
Share on other sites
  • 0
13 minutes ago, Christopher (Drashna) said:

Well, regex would probably work for this, and allow for very complex configs.

The issue is that we'd need to implement support for it in the kernel too. And IIRC, that was where the issue really lay.  It would be some complex checking, basically every time a file was written to the pool, and could affect a significant performance penalty.

I agree, I didn't expect the file management was a full regex implementation. So in my case, if I wanted to base it off the first letter for each folder as I described above. I would create individual rules like below? 

Rule 1 - /FolderX/A*

Rule2 - /FolderX/B*

Rule3 - /FolderX/C*

...

RuleX - /FolderX/N*

 

Share this post


Link to post
Share on other sites
  • 0

Good day.

I thought about this per-folder balancing a while back - I thought maybe we could write a balancing plugin to do it?

But quickly ran into all sorts of problems; it is definitively not as simple as it looks! Examples: If the folder grows beyond free space, then you have to split it between volumes (or maybe move it all to a different volume, but maybe there is not a single volume with enough free space.) So when it comes time to balance, you have to look at ALL the volumes - maybe the folder is on 3 different volumes, but now suddenly there is lots of free space on Volume 1 where a portion of the folder resides, so we should move all the files there, so they are together. Do we even do this if the folder in question is not present on Volume 1? Do we move all 3 folders from 3 different volumes (say 2, 3 and 4) to Volume 1 now that there is enough free space there to keep the folder together? And let's not forget that stuff can be duplicated, so you have double/triple the moving-files-around work to do. Also, since DrivePool balances on the fly, you'd have to check and calculate the size of the folder, on ALL volumes, every time a file was written to the pool - it would be a pretty slow balancing plugin (if the folder is on 3 volumes, and you write a new file, it should go to the volume where the folder is biggest so as to keep all files together). And think about it, DrivePool would potentially have to move files around each time a file was deleted from the pool, as it tries to "keep things together."

It's definitively trickier than it looks!

Regards,

Share this post


Link to post
Share on other sites
  • 0

Yup, it's super complicated. :)

 

And there are two issues you're not thinking about, mentioned or aware of:
The pool driver uses some of the balancer settings when evaluating where to place files. So a slow balancer could cause problems there too. 

For a pool like mine, the "Videos" folder is larger than any one drive.  How do we keep that together? And what depth should we even start at?

It's DEFINITELY complicated. :)

Also, this is why "that should be simple" is a running joke internally.  There have been a few times that Alex thought that was the case... and ... yeah. :)
(aka, concepts are easy, coding them may not be)

Share this post


Link to post
Share on other sites
  • 0

Good day.

Yup, the "Balance on the fly" would be slow, since you'd have to check ALL drives for the folder. For large folders, you do NOT keep it together; what you do is place any new file into the volume where the folder already has the most data (comparing all the volumes where the folder exists) AND where there is space (and not limited by some rule.) So not only do you have to check all drives, you have to calculate the amount of space the folder takes on every drive, so you can tell which drive to pick, and that means scanning all the files inside. It would make everything slow. One way around this is to just place the file anywhere where there is space, and just balance later. As far as depth, it does not matter : You treat all folders as singles - i.e. Subfolders NEVER count, they are just another folder to keep together; you just try to keep the FILES together, not the Subfolders.

All of these problems are with nothing else running. What happens when you throw in Placement Rules, duplication and other balancers? Gets crazy pretty fast.

It's probably do'able, and you can cut corners (place any new file anywhere and balance later). But there sure is a lot to think about.

Regards,

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Answer this question...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...

×
×
  • Create New...