StableBit DrivePool Per-Folder Duplication

Alex · May 31, 2013

Since this has been a point of discussion on the old forum, I thought that I'd start this forum category by posting about per-folder file duplication in StableBit DrivePool.

Unlike the blog posts, I'll try to keep this brief (and somewhat technical).

File Duplication

What is file duplication?

Simply put, file duplication protects your files by storing 2 copies of a file on 2 separate hard drives. In case one fails, the other will have a copy of all of your duplicated files.

Designing File Duplication

The #1 priority for file duplication was to make the technology behind it as simple as possible, thus avoiding any unnecessary complications (and bugs). The first approach that DrivePool took was to put the duplication count in the folder name itself (you can't get any simpler than that).

For example, "Pictures.2" would duplicate all of your pictures to 2 hard drives.

This was very straightforward but unfortunately didn't work very well with shared folders. The name of a shared folder (as seen on the network) is typically the name of the folder itself, so it doesn't make sense to include the duplication count in the shared folder name. And more importantly, WHS 2011 didn't work well with this scheme.

(DrivePol 1.0 BETA M3 did try to work around the issues with folder links, but that was eventually replaced with a better and simpler system).

Alternate Data Streams

DrivePool 1.0 final shipped with the ability to store "tags" on folders. Although the tags are nothing more than alternate data streams on directories, I still like the word "tags" to describe the approach, because these "tags" describe something about a directory.

One of these tags eventually became a "DuplicationCount".

At first, the idea was to store the actual number of copies in the tag. So if a folder is designated as duplicated it would contain "2" in the duplication tag. But because we needed to enable folder duplication at any level in a directory tree, it was necessary to implement something that's a bit more flexible.

The current system supports an "Inherit" and a "Multiple" flag in addition to an explicit duplication count, and supports setting a duplication count on any arbitrary folder on the pool.

Complications

The new tag based system is not without complications.

We have issues with the read-only attribute on directories (which came up recently). And what happens if you move a duplicated folder to a location that's not duplicated? Well, we're handling all of these cases for you in a (hopefully) intelligent manner.

DrivePool 2.0

I've considered scrapping per-folder duplication in DrivePool 2.0. The reason is because you can create duplicated pools and non duplicated pools, and I feel that this is sufficient flexibility for most people. If we got rid of per-folder duplication it would make a lot of thing much simpler (such as balancing).

Feedback

What do you think of per-folder duplication?

Let me know. I'm listening.

saitoh183 · May 31, 2013

I dont use Duplication because of my limited physical space i my server for expansion. But if i understand right, you want to remove folder duplication in favor of "if you want duplication,it will be the entire pool"?

Alex · June 1, 2013

Yep, that's the idea.

If you want duplication, why not just simply duplicate the entire pool? It would make balancing a whole lot simpler.

If you want to store non-duplicated files, then just create a new non-duplicated pool.

saitoh183 · June 1, 2013

But how would that work for storage? You would have 1 disk sharing 2 pools? What if you have subfolders in Documents that you want to duplicate but not everything, you will be force to store part of your Documents in one pool and the sub-folder in the other? Then on my Server i would have a share called Documents (non-dup) and another called Documents1 (dup) or will everything still be in 1 Documents folder when browsing shares?

Example, in my pictures folder i have a folder i store printscreens in that i wouldnt want to duplicate but i would want all my family pictures to be duplicated (in subfolder based on event they come from) how would your plan work on that situation?

Shane · June 1, 2013

I like the flexibility of being able to have per-folder duplication if, when and where I want it.

Per-pool-only would require getting rid of the current DP limitation of one pool per volume, otherwise we're simply replacing "disk juggling" with "volume juggling".

Per-pool-only also means multiple UNC shares, which raises the specter of move-via-copy shenanigans (unless we mount the pools as virtual folders instead of virtual drives, which is inelegant, etc).

lee1978 · June 2, 2013

I don't like the idea I don't fancy having to sort 30Tb of data into new pools I like the current dupe function I would however like to be able to duplicate subfolder's it would tidy stuff up.

On a side note why not add parity to drivepool saving us disk space then we wouldn't need dupe at all or keep both for those that wish to have maximum redundancy. like me

Lee

DrParis · June 2, 2013

My two words to this.

To me per-folder duplication is a must have of DP, along with >2 duplication counts.

As others say, I like the ability to mix in subfolders important and less important things, so that some folders will be duplicated and others not.

Having to create two pools would simply not work for me:

* I share the pools over the network, so I would certainly need 2 shares per category: 1 on the duplicated pool, 1 on the non-duplicated one. (too cumbersome to maintain and to use)

* I would need to subdivide my HDDs into volumes to provision the duplicated and non duplicated pools; that completely destroys the concept of DP where there is no need to manage anything in relation to volumes.

Regarding parity, we touch here to something else that is a major advantage of DP to me: the ability to access my data by plugging a previously pooled HDD in any computer, without any specific install.

I am ready to pay the cost of full duplication instead of parity for this advantage.

There is one other thing I would really like to see added to DP, as you talk about folders tagging Alex.

This is folder based balancing.

For now you balance on a per file basis, distributing them across the HDDs.

You already have a plugin to group files depending on the moment they are created.

What would be really great is to tag a folder so that its entire content is always kept on one HDD (no matter which one); when rebalancing the entire folder is considered as one unit to balance; when new files are created on that folder, they are created on that HDD.

Alex · June 2, 2013

Per-pool-only would require getting rid of the current DP limitation of one pool per volume, otherwise we're simply replacing "disk juggling" with "volume juggling".

Per-pool-only also means multiple UNC shares, which raises the specter of move-via-copy shenanigans (unless we mount the pools as virtual folders instead of virtual drives, which is inelegant, etc).

That's very accurate.

Those 2 points are right on target and make perfect sense, in terms of the technology.

I guess I'm always looking to simplify things, but this simplification might come at the expense of sacrificing too much functionality.

Alex · June 2, 2013

Regarding parity, we touch here to something else that is a major advantage of DP to me: the ability to access my data by plugging a previously pooled HDD in any computer, without any specific install.

I am ready to pay the cost of full duplication instead of parity for this advantage.

My thoughts on parity agree with your. I have some terabytes of personal data, and do not feel that I'm missing anything because I don't use parity.

In fact, I feel even more secure.

I know that all of my files are stored as standard NTFS files, which can be accessed and recovered by countless software developed over the past few decades.

Alex · June 2, 2013

There is one other thing I would really like to see added to DP, as you talk about folders tagging Alex.

This is folder based balancing.

For now you balance on a per file basis, distributing them across the HDDs.

You already have a plugin to group files depending on the moment they are created.

What would be really great is to tag a folder so that its entire content is always kept on one HDD (no matter which one); when rebalancing the entire folder is considered as one unit to balance; when new files are created on that folder, they are created on that HDD.

This is a pretty interesting feature request. I've considered this, and actually it would be super simple to implement a per-folder name placement limit in the kernel.

We can control file placement by file name (including wildcard patterns) or file size.

For example, you might be able to say something like:

I want all files that match this pattern \ServerFolders\ImportantDocuments\* to be placed on Drives 1, 3 and 4 only.

This is all very easy to do, given our existing infrastructure.

The problem with implementing something like this is that the non-real time balancer is not designed to deal with this. In other words, the module that reshuffled your existing files into their designated locations, doesn't understand folder names, and it would be non-trivial to implement this (because of performance considerations).

If we were only interested in controlling new file placement, we could have this feature next week.

But this is really something that I would like to do eventually.

Codegear · June 4, 2013

Hi Alex,

Per folder duplication is a major feature that DP must keep.

It has many usages including reducing the number of root shares (if per folder duplication wouldn't be available) and keeping it all clean

This way, I can have my HandyCam "movies" inside the standard "Videos" share but enforcing duplication on those while not enabling it for the other movies (that I have on DVD/BR).

That was my 2 cents.

weezywee · June 6, 2013

This is a pretty interesting feature request. I've considered this, and actually it would be super simple to implement a per-folder name placement limit in the kernel.

We can control file placement by file name (including wildcard patterns) or file size.

For example, you might be able to say something like:

I want all files that match this pattern \ServerFolders\ImportantDocuments\* to be placed on Drives 1, 3 and 4 only.
This is all very easy to do, given our existing infrastructure.

The problem with implementing something like this is that the non-real time balancer is not designed to deal with this. In other words, the module that reshuffled your existing files into their designated locations, doesn't understand folder names, and it would be non-trivial to implement this (because of performance considerations).

If we were only interested in controlling new file placement, we could have this feature next week.

But this is really something that I would like to do eventually.

This would be awesome. Please add it soon.

Alex · June 12, 2013

This would be awesome. Please add it soon.

I'll try

DFergATL · June 30, 2013

Per Folder duplication is something I need from DP. Just adding my name to the list of people who need this functionality. Please don't remove it.

sspell · July 13, 2013

I like the per folder duplication scheme so my vote is to keep that functional, I only want 1 drive pool to manage.

mrbiggles · July 30, 2013

I agree with DrParis - per-folder duplication is a must have of DP, along with >2 duplication counts.

For me, the simplicity of managing one pool, with variable duplication counts depending on the importance and volume of my data, is the whole attraction of DP and the thing that makes it stand head and shoulders above the others. I never have to worry about (manually) juggling data between individual disks or backup schemes or complex raid / parity schemes or any of that tedium again. For me it's the perfect balance between efficient storage and reliable resiliency to disk failures (and I've had a few). And I don't have to worry about my future needs, I can just adjust a duplication count here and there, add some storage and grow my pool reliably and smoothly.

To explain my rationalle..

I have lots of disks, large volumes (90%+) of low priority data (TV recordings etc), and small volumes of very high priority data (family pictures etc) - and I can't imagine I'm alone in this balance. I love the fact that I don't have to duplicate the low priority data (wasting precious and expensive space), yet can keep lots of copies of my important docs and photos and never worry about another hard drive failure again. I can just throw in another disk when I run out of space and add it to the pool. A marvellous, almost maintenance free, reliable and efficient system - with one big simple pool.

On parity - Parity wouldn't be any good to me as I'd waste space a large amount of space adding lots of parity data for data I don't care for much, and it will waste my biggest (and generally newest) hard drive as that's the one required for parity. It assumes all your data is equally important. So in my PVR machine for example, where I have lots of odd disk sizes it becomes complicated and inefficient. I'd much rather just pool together the mismatched disks into one lovely simple space for my unduplicated recordings, and have some other folders duplicated 3 or more times for important files on that computer (so I can use PVR as a network backup for important stuff). And whilst I have the space, I can duplicate my low priority stuff also - and then just remove the duplication as I start to run out of space, or just add another disk or two to the pool, change a duplication setting and voila it all just gets rebalanced in the background. So perfect and simple! Not to mention wonderfully scalable and future proof.

On using multiple pools for differing redundancy - definitely not. DP doesn't allow me to add multiple pools to the same set of disks, and even if it did this approach would be a real pain for me. I'd end up having to setup a different pool for each type of data which I might conceivably want to vary the duplication for - photos, TV, docs - so would end up being a cumbersome mess. Else I'd have to start manually shovelling data between pools to manage things when I change a duplication count, and that would be so messy.

Ps. I acknowledge that per folder parity system, with variable parity, would be (architecturally) possibly the perfect solution - but I'm more than happy to waste a bit of space for the simplicity and reliability of the DP per-folder file duplication approach. If I could trust a parity implementation, and all my disks were the same size, and all my data was the same priority, and I knew exactly what my future redundancy requirements are, and I knew that they'd never change, I'd consider parity, But this is not the case!

In short - please don't remove these two fabulous features!

sspell · August 3, 2013

mrbiggles you pretty much shared my thoughts exactly. I always say keep it simple and that makes drive pool so nice it's simple and easy to work with and the per folder duplication has that in spades.

Alex · August 5, 2013

Just to update you guys on this, thanks for the feedback. I got a bunch of feedback on this issue both here and through the standard support channel @ stablebit.com/contact

I will be adding >2 duplication count support to the UI post 1.0 release final.

Right now it is perfectly normal to use >2 duplication counts using dpcmd because both the file system and the service were designed to understand any arbitrary duplication count.

It's only the UI that doesn't understand duplication counts of >2, but that can be updated. For now, all duplication counts >2 will show up as x2 in the UI but this is a purely a cosmetic issue.

kihimcarr · September 2, 2013

No need for me to reiterate the reasons and details that everyone who is in favor of keeping Per-Folder duplication. It's a necessary feature that should be kept.

Alex, thanks for reaching out to the community for input. This is why you'll always have our support for your products! Keep up the great work!

gringott · September 20, 2013

I fell in love with multipule pools and have no need of by foldler duplication. But keep it if you must!!!

Alex · September 23, 2013

I fell in love with multipule pools and have no need of by foldler duplication. But keep it if you must!!!

Yep, that's what I was thinking as well when I made the initial post. But many people seem to love the feature, so it's not going away

dbailey75 · September 23, 2013

seems I'm late to ring in on this subject, and while it seems the decision has been made, yes folder duplication is the second reason why one would use pooling software, MS actually gotthat one right with WHS V1, and like all good ideas at MS, they dropped it and moved on to something new, MCE was another great idea from our Friends at Redmond, at least they haven't shelved this one, yet. any who.

Just like Mrbiggles said, I use folder duplication for those family pictures, family movies, personal documents, etc.

I do use multiple pools as well, I don't want to use my primary pool, which has mostly static data, to become fragmented with my recorded TV archive for example, which changes daily. I have a second pool for recorded TV.

britgeezer · July 13, 2014

Call me old fashioned I like duplication by Folder.

Christopher (Drashna) · July 13, 2014

And that's still entirely possible. This just lets you choose which disks those folder go on.

Sign In

StableBit DrivePool Per-Folder Duplication

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites