I can set file duplication to 3 (default) but I cannot find any tools to check if the files maintain consistency across all three copies. Assuming I use NTFS, TerraCopy can create MD5 checksums and store them in ADS for each file which can be verified by the MD5 verifiers from below (as well as by TerraCopy itself). TerraCopy can also create a MD5 master file to compare against each file at a later date, but since DP is reading the file from any of the three copies at random, I can still get corrupted data if the three copies diverge and I get the MD5 calculated against the same copy of the data since I cannot specify that I only want to see #1, #2 or #3 copy of a specific file when I do a pool scrub.
If you move a couple of TB of data, sooner or later, you will see bits silently flipped. In some older threads I found some notes regarding a High Integrity Disk Pool but they are a couple years old, and I wonder if there are any current plans to improve on the data integrity checks.
DriveBender does have "pool integrity check" in the main interface (I did not find a clear description of what it does) but you can only have one set of duplicated files and its support seems to be winding down, though its latest release is only a couple of month old.
Size and date checks are NOT data integrity checks - binary compare or CRC64 / MD5 / SHA4096 / etc are what I am looking for to confirm that the 3 default copies are indeed in sync (and using a 2 out of 3 rule to replace / fix corrupted copies is a small step afterwards).
StableBit Scanner is going in across a couple of my systems (separate and independent of DP) - it is reasonably priced, and it is easier to setup for repetitive tasks than HD Sentinel (though you still want HDS for data repair) but is no substitute for data integrity checks.
1/ Is there any way to get data copy #1 or #2 or #3 from Drive Pool, instead of a random copy so I can implement external data integrity checks for the data stored in a DP (3 copies in this example)?
2/ Are there any plans for internal data integrity checks between the multiple copies in a pool (binary compare, MD5, CRC64, whatever)?
3/ I did not find a clean way to replace a bad disk - I was expecting a "swap the disk, and run a duplicate_ckeck" to make sure the files in the pool get the specified number of copies It looks like once a disk is removed, the rest is read only and you have to jump through hoops to restore r/w functionality. Is there an way to just remove a bad drive from the pool? Adding a new one is easy.....
4/ For 3 times file redundancy, using 5 physical disks (same size), how many disks may fail before I loose data? In other words, is there a risk of having 2 out of 3 copies on the same disk?
Question
kitonne
I can set file duplication to 3 (default) but I cannot find any tools to check if the files maintain consistency across all three copies. Assuming I use NTFS, TerraCopy can create MD5 checksums and store them in ADS for each file which can be verified by the MD5 verifiers from below (as well as by TerraCopy itself). TerraCopy can also create a MD5 master file to compare against each file at a later date, but since DP is reading the file from any of the three copies at random, I can still get corrupted data if the three copies diverge and I get the MD5 calculated against the same copy of the data since I cannot specify that I only want to see #1, #2 or #3 copy of a specific file when I do a pool scrub.
https://github.com/TalAloni/MD5Stream
https://github.com/Y0tsuya/md5hash
If you move a couple of TB of data, sooner or later, you will see bits silently flipped. In some older threads I found some notes regarding a High Integrity Disk Pool but they are a couple years old, and I wonder if there are any current plans to improve on the data integrity checks.
DriveBender does have "pool integrity check" in the main interface (I did not find a clear description of what it does) but you can only have one set of duplicated files and its support seems to be winding down, though its latest release is only a couple of month old.
Size and date checks are NOT data integrity checks - binary compare or CRC64 / MD5 / SHA4096 / etc are what I am looking for to confirm that the 3 default copies are indeed in sync (and using a 2 out of 3 rule to replace / fix corrupted copies is a small step afterwards).
StableBit Scanner is going in across a couple of my systems (separate and independent of DP) - it is reasonably priced, and it is easier to setup for repetitive tasks than HD Sentinel (though you still want HDS for data repair) but is no substitute for data integrity checks.
1/ Is there any way to get data copy #1 or #2 or #3 from Drive Pool, instead of a random copy so I can implement external data integrity checks for the data stored in a DP (3 copies in this example)?
2/ Are there any plans for internal data integrity checks between the multiple copies in a pool (binary compare, MD5, CRC64, whatever)?
3/ I did not find a clean way to replace a bad disk - I was expecting a "swap the disk, and run a duplicate_ckeck" to make sure the files in the pool get the specified number of copies It looks like once a disk is removed, the rest is read only and you have to jump through hoops to restore r/w functionality. Is there an way to just remove a bad drive from the pool? Adding a new one is easy.....
4/ For 3 times file redundancy, using 5 physical disks (same size), how many disks may fail before I loose data? In other words, is there a risk of having 2 out of 3 copies on the same disk?
Thank you!
Link to comment
Share on other sites
3 answers to this question
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.