Jump to content

  • Log in with Twitter Log in with Windows Live Log In with Google      Sign In   
  • Create Account

Photo

StableBit DrivePool Per-Folder Duplication


  • Please log in to reply
23 replies to this topic

#1 Alex

Alex

    Lead Programmer

  • Administrators
  • 243 posts
  • LocationNew York, USA

Posted 31 May 2013 - 12:41 AM

Since this has been a point of discussion on the old forum, I thought that I'd start this forum category by posting about per-folder file duplication in StableBit DrivePool.

 

Unlike the blog posts, I'll try to keep this brief (and somewhat technical).

 

File Duplication

 

What is file duplication?

 

Simply put, file duplication protects your files by storing 2 copies of a file on 2 separate hard drives. In case one fails, the other will have a copy of all of your duplicated files.

 

Designing File Duplication

 

The #1 priority for file duplication was to make the technology behind it as simple as possible, thus avoiding any unnecessary complications (and bugs). The first approach that DrivePool took was to put the duplication count in the folder name itself (you can't get any simpler than that).

 

For example, "Pictures.2" would duplicate all of your pictures to 2 hard drives.

 

This was very straightforward but unfortunately didn't work very well with shared folders. The name of a shared folder (as seen on the network) is typically the name of the folder itself, so it doesn't make sense to include the duplication count in the shared folder name. And more importantly, WHS 2011 didn't work well with this scheme.

 

(DrivePol 1.0 BETA M3 did try to work around the issues with folder links, but that was eventually replaced with a better and simpler system).

 

Alternate Data Streams

 

DrivePool 1.0 final shipped with the ability to store "tags" on folders. Although the tags are nothing more than alternate data streams on directories, I still like the word "tags" to describe the approach, because these "tags" describe something about a directory.

 

One of these tags eventually became a "DuplicationCount".

 

At first, the idea was to store the actual number of copies in the tag. So if a folder is designated as duplicated it would contain "2" in the duplication tag. But because we needed to enable folder duplication at any level in a directory tree, it was necessary to implement something that's a bit more flexible.

 

The current system supports an "Inherit" and a "Multiple" flag in addition to an explicit duplication count, and supports setting a duplication count on any arbitrary folder on the pool.

 

Complications

 

The new tag based system is not without complications.

 

We have issues with the read-only attribute on directories (which came up recently). And what happens if you move a duplicated folder to a location that's not duplicated? Well, we're handling all of these cases for you in a (hopefully) intelligent manner.

 

DrivePool 2.0

 

I've considered scrapping per-folder duplication in DrivePool 2.0. The reason is because you can create duplicated pools and non duplicated pools, and I feel that this is sufficient flexibility for most people. If we got rid of per-folder duplication it would make a lot of thing much simpler (such as balancing).

 

Feedback

 

What do you think of per-folder duplication?

 

Let me know. I'm listening.



#2 saitoh183

saitoh183

    Resident Guru

  • Moderators
  • 68 posts
  • LocationCanada

Posted 31 May 2013 - 03:15 AM

I dont use Duplication because of my limited physical space i my server for expansion. But if i understand right, you want to remove folder duplication in favor of "if you want duplication,it will be the entire pool"?


Windows 8.1 pro
Snapraid (snapshot only)
MSI Z97S SLI Plus + 12GB Ram
1x Syba SD-SATA2-2E2I 4 Chnl SATA II Card
Mediasonic HF2-SU2S2 Pro Box 4 Bay (esata)
Pooling: Stablebit Drivepool (20TB)


#3 Alex

Alex

    Lead Programmer

  • Administrators
  • 243 posts
  • LocationNew York, USA

Posted 01 June 2013 - 03:38 AM

Yep, that's the idea.

 

If you want duplication, why not just simply duplicate the entire pool? It would make balancing a whole lot simpler.

 

If you want to store non-duplicated files, then just create a new non-duplicated pool.



#4 saitoh183

saitoh183

    Resident Guru

  • Moderators
  • 68 posts
  • LocationCanada

Posted 01 June 2013 - 04:02 AM

But how would that work for storage? You would have 1 disk sharing 2 pools? What if you have subfolders in Documents that you want to duplicate but not everything, you will be force to store part of your Documents in one pool and the sub-folder in the other? Then on my Server i would have a share called Documents (non-dup) and another called Documents1 (dup) or will everything still be in 1 Documents folder when browsing shares?

 

Example, in my pictures folder i have a folder i store printscreens in that i wouldnt want to duplicate but i would want all my family pictures to be duplicated (in subfolder based on event they come from) how would your plan work on that situation?


Windows 8.1 pro
Snapraid (snapshot only)
MSI Z97S SLI Plus + 12GB Ram
1x Syba SD-SATA2-2E2I 4 Chnl SATA II Card
Mediasonic HF2-SU2S2 Pro Box 4 Bay (esata)
Pooling: Stablebit Drivepool (20TB)


#5 Shane

Shane

    Resident Guru

  • Moderators
  • 100 posts
  • LocationAustralia

Posted 01 June 2013 - 06:56 AM

I like the flexibility of being able to have per-folder duplication if, when and where I want it.

 

Per-pool-only would require getting rid of the current DP limitation of one pool per volume, otherwise we're simply replacing "disk juggling" with "volume juggling".

 

Per-pool-only also means multiple UNC shares, which raises the specter of move-via-copy shenanigans (unless we mount the pools as virtual folders instead of virtual drives, which is inelegant, etc).



#6 lee1978

lee1978

    Advanced Member

  • Members
  • PipPipPip
  • 199 posts

Posted 02 June 2013 - 12:38 PM

I don't like the idea I don't fancy having to sort 30Tb of data into new pools I like the current dupe function I would however like to be able to duplicate subfolder's it would tidy stuff up.

 

On a side note why not add parity to drivepool saving us disk space then we wouldn't need dupe at all or keep both for those that wish to have maximum redundancy. like me

 

 

Lee



#7 DrParis

DrParis

    Advanced Member

  • Members
  • PipPipPip
  • 44 posts
  • LocationFrance

Posted 02 June 2013 - 09:25 PM

My two words to this.

 

To me per-folder duplication is a must have of DP, along with >2 duplication counts.

As others say, I like the ability to mix in subfolders important and less important things, so that some folders will be duplicated and others not.

 

Having to create two pools would simply not work for me:

* I share the pools over the network, so I would certainly need 2 shares per category: 1 on the duplicated pool, 1 on the non-duplicated one. (too cumbersome to maintain and to use)

* I would need to subdivide my HDDs into volumes to provision the duplicated and non duplicated pools; that completely destroys the concept of DP where there is no need to manage anything in relation to volumes.

 

Regarding parity, we touch here to something else that is a major advantage of DP to me: the ability to access my data by plugging a previously pooled HDD in any computer, without any specific install.

I am ready to pay the cost of full duplication instead of parity for this advantage.

 

There is one other thing I would really like to see added to DP, as you talk about folders tagging Alex.

This is folder based balancing.

For now you balance on a per file basis, distributing them across the HDDs.

You already have a plugin to group files depending on the moment they are created.

What would be really great is to tag a folder so that its entire content is always kept on one HDD (no matter which one); when rebalancing the entire folder is considered as one unit to balance; when new files are created on that folder, they are created on that HDD.


  • Jeff likes this

#8 Alex

Alex

    Lead Programmer

  • Administrators
  • 243 posts
  • LocationNew York, USA

Posted 02 June 2013 - 11:34 PM

Per-pool-only would require getting rid of the current DP limitation of one pool per volume, otherwise we're simply replacing "disk juggling" with "volume juggling".

 

Per-pool-only also means multiple UNC shares, which raises the specter of move-via-copy shenanigans (unless we mount the pools as virtual folders instead of virtual drives, which is inelegant, etc).

 

That's very accurate.

 

Those 2 points are right on target and make perfect sense, in terms of the technology.

 

I guess I'm always looking to simplify things, but this simplification might come at the expense of sacrificing too much functionality.



#9 Alex

Alex

    Lead Programmer

  • Administrators
  • 243 posts
  • LocationNew York, USA

Posted 02 June 2013 - 11:40 PM

Regarding parity, we touch here to something else that is a major advantage of DP to me: the ability to access my data by plugging a previously pooled HDD in any computer, without any specific install.

I am ready to pay the cost of full duplication instead of parity for this advantage.

 

My thoughts on parity agree with your. I have some terabytes of personal data, and do not feel that I'm missing anything because I don't use parity.

 

In fact, I feel even more secure.

 

I know that all of my files are stored as standard NTFS files, which can be accessed and recovered by countless software developed over the past few decades.



#10 Alex

Alex

    Lead Programmer

  • Administrators
  • 243 posts
  • LocationNew York, USA

Posted 02 June 2013 - 11:59 PM

There is one other thing I would really like to see added to DP, as you talk about folders tagging Alex.

This is folder based balancing.

For now you balance on a per file basis, distributing them across the HDDs.

You already have a plugin to group files depending on the moment they are created.

What would be really great is to tag a folder so that its entire content is always kept on one HDD (no matter which one); when rebalancing the entire folder is considered as one unit to balance; when new files are created on that folder, they are created on that HDD.

 

This is a pretty interesting feature request. I've considered this, and actually it would be super simple to implement a per-folder name placement limit in the kernel.

 

We can control file placement by file name (including wildcard patterns) or file size.

 

For example, you might be able to say something like:

  • I want all files that match this pattern \ServerFolders\ImportantDocuments\* to be placed on Drives 1, 3 and 4 only.

This is all very easy to do, given our existing infrastructure.

 

The problem with implementing something like this is that the non-real time balancer is not designed to deal with this. In other words, the module that reshuffled your existing files into their designated locations, doesn't understand folder names, and it would be non-trivial to implement this (because of performance considerations).

 

If we were only interested in controlling new file placement, we could have this feature next week.

 

But this is really something that I would like to do eventually.



#11 Codegear

Codegear

    Newbie

  • Members
  • Pip
  • 6 posts
  • LocationMontréal, Canada

Posted 04 June 2013 - 02:26 AM

Hi Alex,

 

Per folder duplication is a major feature that DP must keep.

It has many usages including reducing the number of root shares (if per folder duplication wouldn't be available) and keeping it all clean ;)

 

This way, I can have my HandyCam "movies" inside the standard "Videos" share but enforcing duplication on those while not enabling it for the other movies (that I have on DVD/BR).

 

That was my 2 cents.



#12 weezywee

weezywee

    Member

  • Members
  • PipPip
  • 19 posts

Posted 06 June 2013 - 07:21 PM

This is a pretty interesting feature request. I've considered this, and actually it would be super simple to implement a per-folder name placement limit in the kernel.

 

We can control file placement by file name (including wildcard patterns) or file size.

 

For example, you might be able to say something like:

  • I want all files that match this pattern \ServerFolders\ImportantDocuments\* to be placed on Drives 1, 3 and 4 only.

This is all very easy to do, given our existing infrastructure.

 

The problem with implementing something like this is that the non-real time balancer is not designed to deal with this. In other words, the module that reshuffled your existing files into their designated locations, doesn't understand folder names, and it would be non-trivial to implement this (because of performance considerations).

 

If we were only interested in controlling new file placement, we could have this feature next week.

 

But this is really something that I would like to do eventually.

This would be awesome. Please add it soon.



#13 Alex

Alex

    Lead Programmer

  • Administrators
  • 243 posts
  • LocationNew York, USA

Posted 12 June 2013 - 04:23 AM

This would be awesome. Please add it soon.

 

I'll try  ;)



#14 DFergATL

DFergATL

    Newbie

  • Members
  • Pip
  • 8 posts

Posted 30 June 2013 - 11:43 PM

Per Folder duplication is something I need from DP.  Just adding my name to the list of people who need this functionality.  Please don't remove it.



#15 sspell

sspell

    Advanced Member

  • Members
  • PipPipPip
  • 62 posts

Posted 13 July 2013 - 07:37 PM

I like the per folder duplication scheme so my vote is to keep that functional, I only want 1 drive pool to manage.



#16 mrbiggles

mrbiggles

    Newbie

  • Members
  • Pip
  • 1 posts

Posted 30 July 2013 - 09:52 PM

I agree with DrParis - per-folder duplication is a must have of DP, along with >2 duplication counts.

 

For me, the simplicity of managing one pool, with variable duplication counts depending on the importance and volume of my data, is the whole attraction of DP and the thing that makes it stand head and shoulders above the others.  I never have to worry about (manually) juggling data between individual disks or backup schemes or complex raid / parity schemes or any of that tedium again.  For me it's the perfect balance between efficient storage and reliable resiliency to disk failures (and I've had a few).  And I don't have to worry about my future needs, I can just adjust a duplication count here and there, add some storage and grow my pool reliably and smoothly.

 

To explain my rationalle..

 

I have lots of disks, large volumes (90%+) of low priority data (TV recordings etc), and small volumes of very high priority data (family pictures etc) - and I can't imagine I'm alone in this balance.  I love the fact that I don't have to duplicate the low priority data (wasting precious and expensive space), yet can keep lots of copies of my important docs and photos and never worry about another hard drive failure again.  I can just throw in another disk when I run out of space and add it to the pool.  A marvellous, almost maintenance free, reliable and efficient system - with one big simple pool.

 

On parity - Parity wouldn't be any good to me as I'd waste space a large amount of space adding lots of parity data for data I don't care for much, and it will waste my biggest (and generally newest) hard drive as that's the one required for parity.  It assumes all your data is equally important.  So in my PVR machine for example, where I have lots of odd disk sizes it becomes complicated and inefficient.  I'd much rather just pool together the mismatched disks into one lovely simple space for my unduplicated recordings, and have some other folders duplicated 3 or more times for important files on that computer (so I can use PVR as a network backup for important stuff).  And whilst I have the space, I can duplicate my low priority stuff also - and then just remove the duplication as I start to run out of space, or just add another disk or two to the pool, change a duplication setting and voila it all just gets rebalanced in the background.  So perfect and simple! Not to mention wonderfully scalable and future proof.

 

On using multiple pools for differing redundancy - definitely not.  DP doesn't allow me to add multiple pools to the same set of disks, and even if it did this approach would be a real pain for me.  I'd end up having to setup a different pool for each type of data which I might conceivably want to vary the duplication for - photos, TV, docs - so would end up being a cumbersome mess.  Else I'd have to start manually shovelling data between pools to manage things when I change a duplication count, and that would be so messy. 

 

Ps. I acknowledge that per folder parity system, with variable parity, would be (architecturally) possibly the perfect solution - but I'm more than happy to waste a bit of space for the simplicity and reliability of the DP per-folder file duplication approach.  If I could trust a parity implementation, and all my disks were the same size, and all my data was the same priority, and I knew exactly what my future redundancy requirements are, and I knew that they'd never change, I'd consider parity, But this is not the case!

 

In short - please don't remove these two fabulous features!  


  • Alex likes this

#17 sspell

sspell

    Advanced Member

  • Members
  • PipPipPip
  • 62 posts

Posted 03 August 2013 - 03:15 AM

mrbiggles you pretty much shared my thoughts exactly. I always say keep it simple and that makes drive pool so nice it's simple and easy to work with and the per folder duplication has that in spades.


  • Christopher (Drashna) likes this

#18 Alex

Alex

    Lead Programmer

  • Administrators
  • 243 posts
  • LocationNew York, USA

Posted 05 August 2013 - 10:26 PM

Just to update you guys on this, thanks for the feedback. I got a bunch of feedback on this issue both here and through the standard support channel @ stablebit.com/contact

 

I will be adding >2 duplication count support to the UI post 1.0 release final.

 

Right now it is perfectly normal to use >2 duplication counts using dpcmd because both the file system and the service were designed to understand any arbitrary duplication count.

 

It's only the UI that doesn't understand duplication counts of >2, but that can be updated. For now, all duplication counts >2 will show up as x2 in the UI but this is a purely a cosmetic issue.



#19 kihimcarr

kihimcarr

    Advanced Member

  • Members
  • PipPipPip
  • 51 posts

Posted 02 September 2013 - 12:59 PM

No need for me to reiterate the reasons and details that everyone who is in favor of keeping Per-Folder duplication. It's a necessary feature that should be kept.

 

Alex, thanks for reaching out to the community for input. This is why you'll always have our support for your products! Keep up the great work!

 



#20 gringott

gringott

    Member

  • Members
  • PipPip
  • 27 posts

Posted 20 September 2013 - 07:19 PM

I fell in love with multipule pools and have no need of by foldler duplication. But keep it if you must!!!






0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users