Jump to content

StableBit DrivePool Per-Folder Duplication


Alex

Recommended Posts

Since this has been a point of discussion on the old forum, I thought that I'd start this forum category by posting about per-folder file duplication in StableBit DrivePool.

 

Unlike the blog posts, I'll try to keep this brief (and somewhat technical).

 

File Duplication

 

What is file duplication?

 

Simply put, file duplication protects your files by storing 2 copies of a file on 2 separate hard drives. In case one fails, the other will have a copy of all of your duplicated files.

 

Designing File Duplication

 

The #1 priority for file duplication was to make the technology behind it as simple as possible, thus avoiding any unnecessary complications (and bugs). The first approach that DrivePool took was to put the duplication count in the folder name itself (you can't get any simpler than that).

 

For example, "Pictures.2" would duplicate all of your pictures to 2 hard drives.

 

This was very straightforward but unfortunately didn't work very well with shared folders. The name of a shared folder (as seen on the network) is typically the name of the folder itself, so it doesn't make sense to include the duplication count in the shared folder name. And more importantly, WHS 2011 didn't work well with this scheme.

 

(DrivePol 1.0 BETA M3 did try to work around the issues with folder links, but that was eventually replaced with a better and simpler system).

 

Alternate Data Streams

 

DrivePool 1.0 final shipped with the ability to store "tags" on folders. Although the tags are nothing more than alternate data streams on directories, I still like the word "tags" to describe the approach, because these "tags" describe something about a directory.

 

One of these tags eventually became a "DuplicationCount".

 

At first, the idea was to store the actual number of copies in the tag. So if a folder is designated as duplicated it would contain "2" in the duplication tag. But because we needed to enable folder duplication at any level in a directory tree, it was necessary to implement something that's a bit more flexible.

 

The current system supports an "Inherit" and a "Multiple" flag in addition to an explicit duplication count, and supports setting a duplication count on any arbitrary folder on the pool.

 

Complications

 

The new tag based system is not without complications.

 

We have issues with the read-only attribute on directories (which came up recently). And what happens if you move a duplicated folder to a location that's not duplicated? Well, we're handling all of these cases for you in a (hopefully) intelligent manner.

 

DrivePool 2.0

 

I've considered scrapping per-folder duplication in DrivePool 2.0. The reason is because you can create duplicated pools and non duplicated pools, and I feel that this is sufficient flexibility for most people. If we got rid of per-folder duplication it would make a lot of thing much simpler (such as balancing).

 

Feedback

 

What do you think of per-folder duplication?

 

Let me know. I'm listening.

Link to comment
Share on other sites

I dont use Duplication because of my limited physical space i my server for expansion. But if i understand right, you want to remove folder duplication in favor of "if you want duplication,it will be the entire pool"?

Link to comment
Share on other sites

Yep, that's the idea.

 

If you want duplication, why not just simply duplicate the entire pool? It would make balancing a whole lot simpler.

 

If you want to store non-duplicated files, then just create a new non-duplicated pool.

Link to comment
Share on other sites

But how would that work for storage? You would have 1 disk sharing 2 pools? What if you have subfolders in Documents that you want to duplicate but not everything, you will be force to store part of your Documents in one pool and the sub-folder in the other? Then on my Server i would have a share called Documents (non-dup) and another called Documents1 (dup) or will everything still be in 1 Documents folder when browsing shares?

 

Example, in my pictures folder i have a folder i store printscreens in that i wouldnt want to duplicate but i would want all my family pictures to be duplicated (in subfolder based on event they come from) how would your plan work on that situation?

Link to comment
Share on other sites

I like the flexibility of being able to have per-folder duplication if, when and where I want it.

 

Per-pool-only would require getting rid of the current DP limitation of one pool per volume, otherwise we're simply replacing "disk juggling" with "volume juggling".

 

Per-pool-only also means multiple UNC shares, which raises the specter of move-via-copy shenanigans (unless we mount the pools as virtual folders instead of virtual drives, which is inelegant, etc).

Link to comment
Share on other sites

I don't like the idea I don't fancy having to sort 30Tb of data into new pools I like the current dupe function I would however like to be able to duplicate subfolder's it would tidy stuff up.

 

On a side note why not add parity to drivepool saving us disk space then we wouldn't need dupe at all or keep both for those that wish to have maximum redundancy. like me

 

 

Lee

Link to comment
Share on other sites

My two words to this.

 

To me per-folder duplication is a must have of DP, along with >2 duplication counts.

As others say, I like the ability to mix in subfolders important and less important things, so that some folders will be duplicated and others not.

 

Having to create two pools would simply not work for me:

* I share the pools over the network, so I would certainly need 2 shares per category: 1 on the duplicated pool, 1 on the non-duplicated one. (too cumbersome to maintain and to use)

* I would need to subdivide my HDDs into volumes to provision the duplicated and non duplicated pools; that completely destroys the concept of DP where there is no need to manage anything in relation to volumes.

 

Regarding parity, we touch here to something else that is a major advantage of DP to me: the ability to access my data by plugging a previously pooled HDD in any computer, without any specific install.

I am ready to pay the cost of full duplication instead of parity for this advantage.

 

There is one other thing I would really like to see added to DP, as you talk about folders tagging Alex.

This is folder based balancing.

For now you balance on a per file basis, distributing them across the HDDs.

You already have a plugin to group files depending on the moment they are created.

What would be really great is to tag a folder so that its entire content is always kept on one HDD (no matter which one); when rebalancing the entire folder is considered as one unit to balance; when new files are created on that folder, they are created on that HDD.

Link to comment
Share on other sites

Per-pool-only would require getting rid of the current DP limitation of one pool per volume, otherwise we're simply replacing "disk juggling" with "volume juggling".

 

Per-pool-only also means multiple UNC shares, which raises the specter of move-via-copy shenanigans (unless we mount the pools as virtual folders instead of virtual drives, which is inelegant, etc).

 

That's very accurate.

 

Those 2 points are right on target and make perfect sense, in terms of the technology.

 

I guess I'm always looking to simplify things, but this simplification might come at the expense of sacrificing too much functionality.

Link to comment
Share on other sites

Regarding parity, we touch here to something else that is a major advantage of DP to me: the ability to access my data by plugging a previously pooled HDD in any computer, without any specific install.

I am ready to pay the cost of full duplication instead of parity for this advantage.

 

My thoughts on parity agree with your. I have some terabytes of personal data, and do not feel that I'm missing anything because I don't use parity.

 

In fact, I feel even more secure.

 

I know that all of my files are stored as standard NTFS files, which can be accessed and recovered by countless software developed over the past few decades.

Link to comment
Share on other sites

There is one other thing I would really like to see added to DP, as you talk about folders tagging Alex.

This is folder based balancing.

For now you balance on a per file basis, distributing them across the HDDs.

You already have a plugin to group files depending on the moment they are created.

What would be really great is to tag a folder so that its entire content is always kept on one HDD (no matter which one); when rebalancing the entire folder is considered as one unit to balance; when new files are created on that folder, they are created on that HDD.

 

This is a pretty interesting feature request. I've considered this, and actually it would be super simple to implement a per-folder name placement limit in the kernel.

 

We can control file placement by file name (including wildcard patterns) or file size.

 

For example, you might be able to say something like:

  • I want all files that match this pattern \ServerFolders\ImportantDocuments\* to be placed on Drives 1, 3 and 4 only.

This is all very easy to do, given our existing infrastructure.

 

The problem with implementing something like this is that the non-real time balancer is not designed to deal with this. In other words, the module that reshuffled your existing files into their designated locations, doesn't understand folder names, and it would be non-trivial to implement this (because of performance considerations).

 

If we were only interested in controlling new file placement, we could have this feature next week.

 

But this is really something that I would like to do eventually.

Link to comment
Share on other sites

Hi Alex,

 

Per folder duplication is a major feature that DP must keep.

It has many usages including reducing the number of root shares (if per folder duplication wouldn't be available) and keeping it all clean ;)

 

This way, I can have my HandyCam "movies" inside the standard "Videos" share but enforcing duplication on those while not enabling it for the other movies (that I have on DVD/BR).

 

That was my 2 cents.

Link to comment
Share on other sites

This is a pretty interesting feature request. I've considered this, and actually it would be super simple to implement a per-folder name placement limit in the kernel.

 

We can control file placement by file name (including wildcard patterns) or file size.

 

For example, you might be able to say something like:

  • I want all files that match this pattern \ServerFolders\ImportantDocuments\* to be placed on Drives 1, 3 and 4 only.

This is all very easy to do, given our existing infrastructure.

 

The problem with implementing something like this is that the non-real time balancer is not designed to deal with this. In other words, the module that reshuffled your existing files into their designated locations, doesn't understand folder names, and it would be non-trivial to implement this (because of performance considerations).

 

If we were only interested in controlling new file placement, we could have this feature next week.

 

But this is really something that I would like to do eventually.

This would be awesome. Please add it soon.

Link to comment
Share on other sites

  • 3 weeks later...
  • 2 weeks later...
  • 3 weeks later...

I agree with DrParis - per-folder duplication is a must have of DP, along with >2 duplication counts.

 

For me, the simplicity of managing one pool, with variable duplication counts depending on the importance and volume of my data, is the whole attraction of DP and the thing that makes it stand head and shoulders above the others.  I never have to worry about (manually) juggling data between individual disks or backup schemes or complex raid / parity schemes or any of that tedium again.  For me it's the perfect balance between efficient storage and reliable resiliency to disk failures (and I've had a few).  And I don't have to worry about my future needs, I can just adjust a duplication count here and there, add some storage and grow my pool reliably and smoothly.

 

To explain my rationalle..

 

I have lots of disks, large volumes (90%+) of low priority data (TV recordings etc), and small volumes of very high priority data (family pictures etc) - and I can't imagine I'm alone in this balance.  I love the fact that I don't have to duplicate the low priority data (wasting precious and expensive space), yet can keep lots of copies of my important docs and photos and never worry about another hard drive failure again.  I can just throw in another disk when I run out of space and add it to the pool.  A marvellous, almost maintenance free, reliable and efficient system - with one big simple pool.

 

On parity - Parity wouldn't be any good to me as I'd waste space a large amount of space adding lots of parity data for data I don't care for much, and it will waste my biggest (and generally newest) hard drive as that's the one required for parity.  It assumes all your data is equally important.  So in my PVR machine for example, where I have lots of odd disk sizes it becomes complicated and inefficient.  I'd much rather just pool together the mismatched disks into one lovely simple space for my unduplicated recordings, and have some other folders duplicated 3 or more times for important files on that computer (so I can use PVR as a network backup for important stuff).  And whilst I have the space, I can duplicate my low priority stuff also - and then just remove the duplication as I start to run out of space, or just add another disk or two to the pool, change a duplication setting and voila it all just gets rebalanced in the background.  So perfect and simple! Not to mention wonderfully scalable and future proof.

 

On using multiple pools for differing redundancy - definitely not.  DP doesn't allow me to add multiple pools to the same set of disks, and even if it did this approach would be a real pain for me.  I'd end up having to setup a different pool for each type of data which I might conceivably want to vary the duplication for - photos, TV, docs - so would end up being a cumbersome mess.  Else I'd have to start manually shovelling data between pools to manage things when I change a duplication count, and that would be so messy. 

 

Ps. I acknowledge that per folder parity system, with variable parity, would be (architecturally) possibly the perfect solution - but I'm more than happy to waste a bit of space for the simplicity and reliability of the DP per-folder file duplication approach.  If I could trust a parity implementation, and all my disks were the same size, and all my data was the same priority, and I knew exactly what my future redundancy requirements are, and I knew that they'd never change, I'd consider parity, But this is not the case!

 

In short - please don't remove these two fabulous features!  

Link to comment
Share on other sites

Just to update you guys on this, thanks for the feedback. I got a bunch of feedback on this issue both here and through the standard support channel @ stablebit.com/contact

 

I will be adding >2 duplication count support to the UI post 1.0 release final.

 

Right now it is perfectly normal to use >2 duplication counts using dpcmd because both the file system and the service were designed to understand any arbitrary duplication count.

 

It's only the UI that doesn't understand duplication counts of >2, but that can be updated. For now, all duplication counts >2 will show up as x2 in the UI but this is a purely a cosmetic issue.

Link to comment
Share on other sites

  • 4 weeks later...

No need for me to reiterate the reasons and details that everyone who is in favor of keeping Per-Folder duplication. It's a necessary feature that should be kept.

 

Alex, thanks for reaching out to the community for input. This is why you'll always have our support for your products! Keep up the great work!

 

Link to comment
Share on other sites

  • 3 weeks later...

seems I'm late to ring in on this subject, and while it seems the decision has been made, yes folder duplication is the second reason why one would use pooling software,  MS actually gotthat one right with WHS V1, and like all good ideas at MS, they dropped it and moved on to something new,  MCE was another great idea from our Friends at Redmond, at least they haven't shelved this one, yet. any who. 

Just like Mrbiggles said, I use folder duplication for those family pictures, family movies, personal documents, etc.   

 

I do use multiple pools as well,  I don't want to use my primary pool, which has mostly static data, to become fragmented with my recorded TV archive for example, which changes daily. I have a second pool for recorded TV. 

Link to comment
Share on other sites

  • 9 months later...
×
×
  • Create New...