It's only the directory entries that occur in many duplicates and not the files within them.
Example setup:
+Pool of 3 drives.
+Folder X has a duplication count of 2x and contains 5 files. (10 files in total across 3 drives)
If the folder is then balanced across the 3 drives Folder X will exist in three copies because there will be files belonging to that folder stored on all drives
(For example drive A with 3 files, Drive B with 4 files and Drive C with 3 files)
So getting the duplication count on folders is a bit misleading. You should look at individual files instead.
The topic of adding RAID-style parity to DrivePool was raised several times on the old forum. I've posted this FAQ because (1) this is a new forum and (2) a new user asked about adding folder-level parity, which - to mangle a phrase - is the same fish but a different kettle.
Since folks have varying levels of familiarity with parity I'm going to divide this post into three sections: (1) how parity works and the difference between parity and duplication, (2) the difference between drive-level and folder-level parity, (3) the TLDR conclusion for parity in DrivePool. I intend to update the post if anything changes or needs clarification (or if someone points out any mistakes I've made).
Disclaimer: I do not work for Covecube/Stablebit. These are my own comments. You don't know me from Jack. No warranty, express or implied, in this or any other universe.
Part 1: how parity works and the difference between parity and duplication
Duplication is fast. Every file gets simultaneously written to multiple disks, so as long as all of those disks don't die the file is still there, and by splitting reads amongst the copies you can load files faster. But to fully protect against a given number of disks dying, you need that many times number of disks. That doesn't just add up fast, it multiplies fast.
Parity relies on the ability to generate one or more "blocks" of a series of reversible checksums equal to the size of the largest protected "block" of content. If you want to protect three disks, each parity block requires its own disk as big as the biggest of those three disks. For every N parity blocks you have, any N data blocks can be recovered if they are corrupted or destroyed. Have twelve data disks and want to be protected against any two of them dying simultaneously? You'll only need two parity disks.
Sounds great, right? Right. But there are tradeoffs.
Whenever the content of any data block is altered, the corresponding checksums must be recalculated within the parity block, and if the content of any data block is corrupted or lost, the corresponding checksums must be combined with the remaining data blocks to rebuild the data. While duplication requires more disks, parity requires more time.
In a xN duplication system, you xN your disks, for each file it simultaneously writes the same data to N disks, but so long as p<=N disks die, where 'p' depends on which disks died, you replace the bad disk(s) and keep going - all of your data is immediately available. The drawback is the geometric increase in required disks and the risk of the wrong N disks dying simultaneously (e.g. if you have x2 duplication, if two disks die simultaneously and one happens to be a disk that was storing duplicates of the first disk's files, those are gone for good).
In a +N parity system, you add +N disks, for each file it writes the data to one disk and calculates the parity checksums which it then writes to N disks, but if any N disks die, you replace the bad disk(s) and wait while the computer recalculates and rebuilds the lost data - some of your data might still be available, but no data can be changed until it's finished (because parity needs to use the data on the good disks for its calculations).
(sidenote: "snapshot"-style parity systems attempt to reduce the time cost by risking a reduction in the amount of recoverable data; the more dynamic your content, the more you risk not being able to recover)
Part 2: the difference between drive-level and folder-level parity
Drive-level parity, aside from the math and the effort of writing the software, can be straightforward enough for the end user: you dedicate N drives to parity that are as big as the biggest drive in your data array. If this sounds good to you, some folks (e.g. fellow forum moderator Saitoh183) use DrivePool and the FlexRAID parity module together for this sort of thing. It apparently works very well.
(I'll note here that drive-level parity has two major implementation methods: striped and dedicated. In the dedicated method described above, parity and content are on separate disks, with the advantages of simplicity and readability and the disadvantage of increased wear on the parity disks and the risk that entails. In the striped method, each disk in the array contains both data and parity blocks; this spreads the wear evenly across all disks but makes the disks unreadable on other systems that don't have compatible parity software installed. There are ways to hybridise the two, but it's even more time and effort).
Folder-level parity is... more complicated. Your parity block has to be as big as the biggest folder in your data array. Move a file into that folder, and your folder is now bigger than your parity block - oops. This is a solvable problem, but 'solvable' does not mean 'easy', sort of how building a skyscraper is not the same as building a two-storey home. For what it's worth, FlexRAID's parity module is (at the time of my writing this post) $39.95 and that's drive-level parity.
Conclusion: the TLDR for parity in DrivePool
As I see it, DrivePool's "architectural imperative" is "elegant, friendly, reliable". This means not saddling the end-user with technical details or vast arrays of options. You pop in disks, tell the pool, done; a disk dies, swap it for a new one, tell the pool, done; a dead disk doesn't bring everything to a halt and size doesn't matter, done.
My impression (again, I don't speak for Covecube/Stablebit) is that parity appears to be in the category of "it would be nice to have for some users but practically it'd be a whole new subsystem, so unless a rabbit gets pulled out of a hat we're not going to see it any time soon and it might end up as a separate product even then (so that folks who just want pooling don't have to pay twice+ as much)".
I've gone from single to dual to quad core, 2GB to 4GB to 8GB, and there's a difference, but the biggest leap (by an order of magnitude) in general Windows performance I ever had was when I changed from mechanical platter to solid state for my system drive.
My POST-to-desktop time went from a couple of minutes to 20 seconds, and my icon-littered dual-display desktop is immediately responsive. Now if all someone does with their PC is write letters and play solitaire, that's probably irrelevant. But otherwise....
The car analogy I use with my non-tech-savvy friends is, "It's like the difference between a station wagon and a sports car."
(backing onto topic, another +1 for "option")
(wait, this is the off-topic section... in that case, here's a
- my favourite part is what they do to the drives while the machine is running...)
Thanks for the explanation, Alex. It's a shame that shell extensions are so inefficient; if I have to choose between "works well" and "looks slick" I'll pick the former every time.
To be honest, I wasn't 100% comfortable with the arrow not having any text next to it, but I doubt that any designer ever is 100% satisfied with their design. You always want to keep tweaking it to make it perfect, but there are time constraints (plus, we can't exactly afford Johny Ive here).
I decided to ship it and listen for feedback, and based on that feedback I've slightly modified the pool options menu.
I have a APC UPS that I've hacked to use 4x deep cycle marine batteries to provide 3 to 4 hours backup for my entire office (instead of the built in Li-Ion which would only last 15 min.).
Thanks for your feedback, I'm definitely listening.
While strictly speaking this is not a "Metro UI", I agree, it is similar.
I thinks that one of the major problems with Metro is discoverability, it is nearly non-existent. If you're not familiar with it then you don't know what to click and where. Other than that, it's pretty neat.
So I'll try to add more "discoverability" to DrivePool to make it easier to use.