Jump to content
  • 0

Bitlocker and Scanner - File Recovery, Checksumming, and "Bit Rot"?


VhyVenom

Question

Hi,

 

In the Scanner features it outlines file recovery:

 

File Recovery
  • Once a damaged file is identified by the file system aware scan, you can attempt recovery of that file. *
  • File recovery supports uncompressed and unencrypted NTFS files.
  • Partial file recovery works by reassembling what's left of the file to a known good location on another disk.
  • An optional, full file recovery step is attempted by reading each unreadable sector multiple times, while sending the drive head through a pre-programmed set of head motion profiles. This has the effect of varying the direction and the velocity of the drive's head right before reading the unreadable sector, increasing the chance of one last successful read.

* File recovery is not guaranteed by any means, but stands a good chance of at least partially recovering a damaged file.

 

My DrivePool disks are encrypted with Bitlocker then added to the Pool. Is the implied behavior of File recovery not support repairs on Bitlocker volumes?

 

2. Is there any form of checksumming done by the scanner or the DrivePool soft? Basically a "poor mans" next gen file system feature?

 

3. (How) Does DrivePool and Scanner protect against so called "bit rot" or a flipped bit, especially when duplication is enabled (in theory that would allow a copy from known good copy, based on say checksum)?

 

thank you,

~v

Link to comment
Share on other sites

5 answers to this question

Recommended Posts

  • 0

  1. I'm not 100% sure (Alex will have to answer). But I beleive that is in reference to EFS and not bitlocker. 

    (I've flagged Alex, to verify this)

  2. No. We do compare duplicate files when performing he periodic duplication pass to make sure they match, and prompt you if they don't. But we don't maintain a checksum list of all the files.

    Additionally, we offer additional verification on move/copy for DrivePool. It's off by default because it does increase the overhead (and it can add up real quick). It's the "DrivePool_VerifyAfterCopy" option in the advanced config file:

    DrivePool v1.X: http://wiki.covecube.com/StableBit_DrivePool_Advanced_Settings

    DrivePool v2.X: http://wiki.covecube.com/StableBit_DrivePool_2.x_Advanced_Settings

  3.  

    I think I answered that in part already. 

    But Scanner's surface scan should help the disk identify problem sectors on the disk, hopefully before anything happens. Or at least by "refreshing" the sector by accessing it help prevents bit rot. 

    Alex talks about StableBit Scanner and how it works in detail here:

     

Link to comment
Share on other sites

  • 0

Thank you for your response Christopher. I also thought it was Windows EFS based but that doesn't sound right as EFS is dead. Would like to know how that reflects for Bitlocker.

 

I watched several of the HomeServer Shows to better understand. From my data gathering Stablebit DrivePool current response to "Bit Rot" is a rather roundabout way + notification that would appear to not fully protect against the issue. To further explain it seems:

 

In the videocast the term "Bit Rot" was loosely defined by Alex as when data is not accessed regularly and the underlying medium "rots"/decays silently - for example bad sectors start appearing or can't read from a part of the disk. I would refer to this as "Underlying Medium based Rotting". Traditionally a user would not see the "rotting" occuring until the fateful day they attempt to reaccess that data (say months/years after it was initially written) and discover they cannot read the full file.

 

To allievate some but not majority aspects of this, StableBit's Scanner software solution seems to have a regularly scheduled (every 30days by default) surface scan of the disk to check that all sectors are READABLE. If a sector is having trouble being read it reports it (and possibly offers some mediation choices).

 

How this helps? My interpetation is that it helps ensure that non-regularly accessed data is not living on bad chunks of a storage medium for long periods of time without notification. It doesn't so much care about what the DATA is but whether the underlying storage platform is reporting the sectors readable. "Underlying Medium based Rotting" as I refer to it seems like a common and possible traditional issue and arguably more commonly experienced than what I would refer to as the next-gen "DATA Bit Rot".

 

Next-gen "DATA Bit Rot"/Data Corruption I would loosely define as the scenario where the underlying storage medium is 100% readable - a user accessing the data today or 3 years from now would have a completly "readable" file. However the data that is stored has been tarnished: Whether its a bit flipped causing a jpeg to look off (as described in Arstechnica's next gen file systems article) or downright garbage data. The file is completely readable but the data stored is no longer "pure".

 

How can we have tarnished/non-"pure" data when the underlying storage medium is 100% readable? One example is file gets mangled during transmission to our DrivePool - there is nothing the endpoint (in this case our DrivePool) can do about this as it receives garbage in so it saves to disk the garbage it received. To combat "in transit mangling" a user can use verification after copy (for example Teracopy has a option to verify after transfer to ensure simple checksum of two files match up).

 

Christopher you mention that during duplication passes data is compared and prompted if mismatch - how about other DrivePool based procedures such as balancing (are there any other procedures?)? Also is that mistmatch promt in the DrivePool GUI or notification (would imagine a notification would be handy if you dont regularly "monitor" the DrivePool application).

 

Christopher you mentioned the "DrivePool_VerifyAfterCopy" option in the advanced config file. First off - Can we get this as a check box (don't care if its "hidden" in the GUI, just prefer to not somehow screw up a config file)? Secondly what scenario is that used in (ie during balancing? during user initiated copies between non-pool disk and DrivePool?)? I dont understand at what level/when that verification would happen were it enabled.

 

It seems that we need a form of checksumming of data and periodically rechecking those checksums, and generating a report when there is a illogical mismatch. An illogical mismatch of checksum I would loosely define as when it is not logical for a file to have a different checksum. For example when a file, prevoiusly checksumed, reports a different checksum at a later scan but the file meta data reports the last modified time as the same. Or vice versa the checksum is same but the last modified time is different. As an example "familyreiunion.mp4" shouldn't ever report a different checksum and same modification time, if it does we should check it out/be notified.

 

Where does StableBit come into play in regards to Next-gen "DATA Bit Rot"/Data Corruption? Well the StableBit Scanner application today seems to provide mainly 1 function today, and thats to be a prettified scheduled Surface scan (off the top of my head, chkdsk /r does similar?) - please correct me if I am wrong. Initially it my impression that there was also a level of checksumming going on but it does not appear to be the case. The reason I would like StableBit scanner to do this versus a third party utlity is because Scanner would ideally have an understanding of the duplicated data and upon detecting a checksum issue it could check its storage pool for its duplicated/triplicated counterpoint and offer to restore a copy from the duplicated data, or at least report all this back to the user. If this gets implemented it will be the stopgap needed between todays file systems and the so called next gen file systems.

 

BTW hopefully this all comes across in good terms :) . I spend far too much time looking into the offerings available in the NAS space and my career offers me access a wide array of knowledge on the developing technologies in this space. With that being said StableBit DrivePool has impressed me so far, farthest yet when collaborated with technologies like Bitlocker. I have spent several sleepless night redoing my whole test infrastructure to blindly test StableBit's offerings. I have had several pleasant pleasures along the way (Read striping? FUCK YEA. Disk Performance monitor? Extremely useful feature! Remote Control Gui? Extremely useful!!!). There has been a few unexpected quirks but I intend to open a few threads on those to highlight them.

 

~v

Link to comment
Share on other sites

  • 0

Oh boy, this is a long post (absolutely no problem with that). I'll try to answer all that I can.
 
 EFS may or may not be dead. It's still supported by NTFS, so we cant' say that it's 100% dead. At best.

 

I didn't want to get into this before, but bitrot is a very complicated subject. Very. A large part of that is because the term is only loosely defined. Or it's defined in 10 different ways, by 10 different sources.  Modern drives should be much more resilient to this random bit flipping. Improvements like HP's SMART IV technology (errors reported in SMART as "End to End errors") help prevent a lot of errors.

Documentation about it here: http://h20000.www2.hp.com/bc/docs/support/SupportManual/c01159621/c01159621.pdf

(Sorry for the PDF link)

 

 

As for DrivePool, if it detects that the files are different, it will notify you in the UI about it, and ask you to resolve it. It will list the paths of the files in question, and outputs that to the logs as well.

 

As for the VerifyOnCopy, this is for ANY move or copy operation done by DrivePool. Be it a duplication pass, or while balancing. It will also output the errors to the UI for you to resolve.

And for adding the setting to the UI, it's on the to-do list, and I'll bug Alex (the developer) about this again, because I agree that it needs to be in the UI.

 

Additionally, if data integrity is an absolute important feature, there is an Add-in for WHS/2012Essentials that you may want to check out:

http://integrity-checker.com/index.html

 

 

And yes, this all comes across in good terms. It's just a complicated subject. Very complicated. 

 

 

 

 

As for "next gen file systems", Server 2012's "ReFS" falls into that category as well. We don't currently support it, but investigating support for it is on our "to-do" list. (if you've noticed, we have a lot that we want done, but limited resources). Adding ReFS support most likely won't be simple either.

Link to comment
Share on other sites

  • 0

Thank you for your reply Christopher! It is helpful to be able to bounce off thoughts and questions off knowledgable folks.
 
Im hearing through the gravevine EFS is dead (but yes there is still potential for some users till have it since in NTFS/Windows its being deprecated but not yet fully removed). However when we talk about potential users.... Would you believe me if there was a total of 7 (as in seven! not hundred, not thousands, 7 total) active updating users of a certain infrastructure? Sometimes it doesn't make sense to support some stuff.

 

I'd very much like to see that VerifyOnCopy flag/checkbox integrated into DrivePool options. I am not sure of the performance impact but it seems that it would a rather important option for day to day operation (that could/should be disabled for initial seeding, balancing, duplication of data into a DrivePool but enabled after that).
 
Christopher when you say "As for DrivePool, if it detects that the files are different,..." <-- When/what scenario does DrivePool do this detection? Im not sure if you mean after you enable VerifyOnCopy or during another scenario such as read striping?
 
ReFS "killer" features are quite dependent on Storage Spaces. Running ReFS on its own has appears to have limited benefit today. If you run StorageSpaces (nobody here, right?) then the eventual usage of ReFS will be welcomed but there is a long way for those to go. Storage Spaces is an interesting platform, there is some SIGNIFICANT advantages of using it for certain I/O needs but to take advantage of those it will likely require a setup that is beyond most peoples reach (think of legitimate $100K+ SAN replacements built on top of StorageSpaces).
 
Regarding data integrity reguarly checking (or the sembalance of it) I absolutely think its a feature that "I wanna have". It falls into the same category of the value add that Scanner brings to the table. I will open a seperate thread for specifically for questions regarding data integrity and verification solutions - perhaps some folks may have some input.
 
Thank you,
~v

Link to comment
Share on other sites

  • 0

Not a problem. We rely on feedback from customers, so bouncing ideas is a great way to help us out as well (win-win)

 

EFS ... Well, with BitLocker support in Windows.... why encrypt one or two files? Just encrypt the whole disk. And with how dead simple BitLocker is... Why bother with just a couple of files.

That, and I do remember that it did have other issues. But it's been so long since I looked at EFS, I'd have to double check.

But backwards compatibility... that's very important for Microsoft. And for good reasons. While other OS's like Apple's Mac OS X and iOS, and Google's Android ... and linux can get away with removing features... just look at the uproar when Microsoft removes "bad" features. And I suspect that's why EFS still exists.

 

 

And for the VerifyOnCopy option, I'm not 100% sure why it's not enabled by default. I know (or swear that it has happened) that I have discussed this with Alex before. But my memory is horrible.

However, I do know that enabling the option causes a lot more issues for the user to "resolve" manually. But I think it's best to let Alex answer this one. 

 

But as for when this happens? Any time that DrivePool moves or copies a file. So during duplication, or balancing passes. 

 

 

As for ReFS.... I know I'm not using Storage Spaces. :)

But yeah, I was partially aware that some of the features in ReFS are supposed be used with Storage Spaces.

Though, in theory, you should be able to use Storage Spaces and DrivePool (add the "spaces" to a pool). I don't think it hast been tested, but it's a possibility. Similar to how you can add a RAID array to the Pool. But in both cases, you would absolutely lose the ability to get SMART data from the disks.



Ah, yes, specifically, ReFS can detect corruption. But it can ONLY automatically correct it if a proper storage space is configured. It has to be a mirrored or parity "Space" to do so.

But it can at least detect it otherwise.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Answer this question...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...