Jump to content
  • 0

Will duplication repair corrupt files?


Rob Platt

Question

I've ran DrivePool for years and have had little to no problems with the software itself.

I have duplication turned on for most of my important media (ie photos/videos)

In the past, I've had drive corruption that required running chkdsk. I had a case where chkdsk "repaired" the drive which resulted in many of my files ending up 0kb. These files were originally duplicated, so I thought no problem, DrivePool will repair the duplication and the 0kb files will be replaced with the originals. Only, I don't think that's what happened. I ended up losing some files.

Fast forward to now. I had a drive that ended up with corruption. I pulled the drive and ran chkdsk on it. Drive is now repaired, but I found 0kb files. I checked my DrivePool and the same file does exist with data.

The question is, how do I proceed? Do I return the drive to the array and let DrivePool correct it? Do I delete the 0kb files first? If we were talking 10-20 files I would just back them up and go over the results and replace the missing/bad files. However I have many thousands of files. Trying to track which ones need to be fixed would be a project in of itself.

Hoping DrivePool knows how to handle this.

I'm on version 2.2.2.933

Thank you!

Link to comment
Share on other sites

7 answers to this question

Recommended Posts

  • 0

+1, it a question that I have asked before but never received any response. I would also like to know how DrivePool recovers from a drive loss and how it rebuilds itself from a drive failure. Nobody wants duplicates of date time stamp newer 0kb files. How does DrivePool know which is the good copy and how does it repair itself?

Link to comment
Share on other sites

  • 0

DrivePool's duplication is primarily intended to help with losing a drive; if individual files are being corrupted it may or may not be able to help depending on the situation.

In this case where you know the corruption is limited to files on a single drive, if you're using pool-level duplication then the safest thing to do is let DrivePool re-duplicate the pulled drive using the rest of the drives: "Remove" the missing drive and it should proceed to re-duplicate (if not, you can use Cog icon -> Troubleshooting -> Recheck Duplication). Once you've got the pool fully duplicated again, you can format the corrupted drive (or at least delete/rename the hidden poolpart) and then re-connect it (in that order, as current versions of DrivePool will automatically attempt to re-add it to the pool and in this case you don't want that).

However, if you've got insufficient free space on the remaining disks to re-duplicate the pool, or you're certain that only the 0kb files are the files that have been corrupted (that the remaining files on that drive are bit-for-bit intact) then you could just delete the 0kb files, re-connect the drive and tell DrivePool to Recheck Duplication. This will certainly be quicker since the pool will have less files that need re-duplicating.

1 hour ago, gtaus said:

+1, it a question that I have asked before but never received any response. I would also like to know how DrivePool recovers from a drive loss and how it rebuilds itself from a drive failure. Nobody wants duplicates of date time stamp newer 0kb files. How does DrivePool know which is the good copy and how does it repair itself?

Short answer is it can't know. It checks each file's attributes and if they don't match then it runs a check of the content. If that also doesn't match then it alerts the user and offers them the choice of fixing it themselves or (if applicable) letting DrivePool replace the "older" instances.

Link to comment
Share on other sites

  • 0
1 hour ago, Shane said:

DrivePool's duplication is primarily intended to help with losing a drive; if individual files are being corrupted it may or may not be able to help depending on the situation.

In this case where you know the corruption is limited to files on a single drive, if you're using pool-level duplication then the safest thing to do is let DrivePool re-duplicate the pulled drive using the rest of the drives: "Remove" the missing drive and it should proceed to re-duplicate (if not, you can use Cog icon -> Troubleshooting -> Recheck Duplication). Once you've got the pool fully duplicated again, you can format the corrupted drive (or at least delete/rename the hidden poolpart) and then re-connect it (in that order, as current versions of DrivePool will automatically attempt to re-add it to the pool and in this case you don't want that).

However, if you've got insufficient free space on the remaining disks to re-duplicate the pool, or you're certain that only the 0kb files are the files that have been corrupted (that the remaining files on that drive are bit-for-bit intact) then you could just delete the 0kb files, re-connect the drive and tell DrivePool to Recheck Duplication. This will certainly be quicker since the pool will have less files that need re-duplicating.

Short answer is it can't know. It checks each file's attributes and if they don't match then it runs a check of the content. If that also doesn't match then it alerts the user and offers them the choice of fixing it themselves or (if applicable) letting DrivePool replace the "older" instances.

Shane, thank you for your detailed response.

It makes sense that this is the expected behavior. However, it would be hugely beneficial if there was an improved way of handling this. While DrivePool is "not backup", I love the concept of a self-healing, and adaptable "Raid/JBOD" configuration.

Follow-up question since I forgot to include it.

What if I had 3x duplication (In this case my Photos are 3x). What would happen to the 1 drive that was "repaired" by chkdsk? Would DrivePool sense that 2/3 of the files are good and fix the third? I clearly hear you when you suggest ejecting that disk. I can always manually restore certain files. I'm just curious about how it would be handled on its own.

Link to comment
Share on other sites

  • 0
2 hours ago, Shane said:

In this case where you know the corruption is limited to files on a single drive, if you're using pool-level duplication then the safest thing to do is let DrivePool re-duplicate the pulled drive using the rest of the drives: "Remove" the missing drive and it should proceed to re-duplicate (if not, you can use Cog icon -> Troubleshooting -> Recheck Duplication). Once you've got the pool fully duplicated again, you can format the corrupted drive (or at least delete/rename the hidden poolpart) and then re-connect it (in that order, as current versions of DrivePool will automatically attempt to re-add it to the pool and in this case you don't want that).

That is what I expected, but glad you spelled it out. 

2 hours ago, Shane said:

Short answer [to: How does DrivePool know which is the good copy and how does it repair itself?] is it can't know. It checks each file's attributes and if they don't match then it runs a check of the content. If that also doesn't match then it alerts the user and offers them the choice of fixing it themselves or (if applicable) letting DrivePool replace the "older" instances.

That is also what I have come to believe. DrivePool makes a copy of the file(s), but really doesn't know which, if either, copy is good. For this reason, I have started creating .par2 files on folders before I archive them. That way, the .par2 file will tell me if the file is still intact and complete. I create about 10% .par2 files, which also allows the program to "heal" most files that may have become corrupt. I don't pretend to know all the magic that goes into the .par2 program, but the .par2 files are an easy way to ensure that the saved files have not been damaged.

@Shane, Thanks for the responses and the explanations.

Link to comment
Share on other sites

  • 0
3 hours ago, Rob Platt said:

However, it would be hugely beneficial if there was an improved way of handling this. While DrivePool is "not backup", I love the concept of a self-healing, and adaptable "Raid/JBOD" configuration.

SnapRAID is popular with some users of DrivePool for this reason since the two can be used together. As gtaus mentions, MultiPar is another option.

3 hours ago, Rob Platt said:

What if I had 3x duplication (In this case my Photos are 3x). What would happen to the 1 drive that was "repaired" by chkdsk? Would DrivePool sense that 2/3 of the files are good and fix the third?

Testing 3x duplication recheck (on 2.3.0.1244 BETA):

  • date modified is different
    • file content is identical -> DrivePool does nothing
    • file content or size is different -> DrivePool alerts the user and offer to delete the older version
  • date modified is identical
    • file size is different -> DrivePool will alert the user and offer to delete the older version
    • file size is identical
      • file content is identical -> DrivePool does nothing
      • file content is different -> DrivePool does nothing.

So if the damage includes a change to date or size, then it will alert you that there's a discrepancy and offer to delete the older version (which may not be what you want in this case); if the damage is only to the content (e.g. a "k" is replaced with a "q"), then it won't detect it.

Link to comment
Share on other sites

  • 0
21 hours ago, Shane said:

date modified is identical

  • file size is different -> DrivePool will alert the user and offer to delete the older version

 

Shane,

Once again, thank you for taking the time to explain all of this. 

In the above, is there a typo? How can drivepool offer to delete an older version if the timestamps are the same?

If a file resides on three disks, and two of those files have a timestamp of 1/1/2020 and the third one has a timestamp of 1/1/2021 - drive pool will offer to delete BOTH older versions?? There's no concept of 2 of them are more correct than the third?

Link to comment
Share on other sites

  • 0
1 hour ago, Rob Platt said:

In the above, is there a typo? How can drivepool offer to delete an older version if the timestamps are the same?

No typo; it makes the offer anyway. I assume it's just a default "there's a duplication problem, here's a thing I can try" response, and if you tell it to go ahead it doesn't succeed since in this case there is no older file.

1 hour ago, Rob Platt said:

If a file resides on three disks, and two of those files have a timestamp of 1/1/2020 and the third one has a timestamp of 1/1/2021 - drive pool will offer to delete BOTH older versions?? There's no concept of 2 of them are more correct than the third?

If the content is different, then yes. Correct, DrivePool makes no decisions based on file content. You could request that the developer add that feature (deleting the different file) as an option?

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Answer this question...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...