Jump to content
  • 0

Corruption: Clouddrive just started mass deleting chunks


modplan

Question

Google Drive

.486

win2k8r2

 

 

I've paused upload threads and it has stopped, but about 2500 chunks were just "permanently deleted"

 

I copied about 775GB into a brand new 10TB drive. Over 440GB of that was successfully uploaded and it was chugging along like normal. Nothing changed. Watching technical details I noticed suddenly the upload threads were popping up at 0%, never progressing, then new ones would pop up at 0% and this would continue over and over. No errors were thrown by the GUI, nothing abnormal in the logs.

 

I logged in to the Google Drive web GUI and I see tons of this:

2435j7d.jpg

 

Via trial and error I found some video files that had chunks in this range (~28,800 to ~31,300) When I try to play these files, tons of prefetch threads spawn in this range, stay at 0% and the "Prefetched" count jumps up to 500mb instantly. The file never plays in VLC. If I try to open an image file that is in chunks in this range, windows image viewer tells me it is corrupt. 

 

Any ideas how this could possibly happen? Pretty disappointed right now, this is some serious corruption.

 

 

EDIT: I've uploaded what logs I have, but I do not see anything abnormal in them. I did NOT have drive tracing enabled and am obviously scared to enable it and resume uploads for fear of more chunks being deleted from the cloud!

Link to comment
Share on other sites

Recommended Posts

  • 0

Yeah, it would have wrapped, if it's been doing stuff afterwards. 

 

And assuming 20MB chunk sizes, that's 14GBs of data, basically. 

 

And since I'm assuming that you're not deleting contents from the CloudDrive disk... :(

 

Not deleting anything, and the FS still thinks the files are there, their backing chunks in the cloud are just gone :(

Link to comment
Share on other sites

  • 0

You're sure that the application didnt just make new files on the drive to replace old ones?

 

Not doing any kind of application work. Just copying files over from one drive to CloudDrive and randomly through the copy process (usually several hundred GBs in) CloudDrive just starts deleting chunks from the cloud on its own. Any files who had data stored in any of the deleted chunks, become instantly corrupt, but still show up in Explorer as normal.

Link to comment
Share on other sites

  • 0

Do you have any sync services setup on your cloud drive such as cloudhq.net that could be deleting these chunks?  I have 40TB + uploaded to Google Drive using Stablebit and haven't had any chunks randomly deleted like this.

 

Nope, nothing. But the scary part is, unless you log in to your Google Drive account on your browser, and scan the entire history of the "Stablebit CloudDrive" folder looking for deletes like you see in my screenshot above, that should not have taken place, I have no idea how you would know this is going on, it is completely silent. All the files appear to still be there on your drive, but some random chunks are missing so some of them are silently corrupt.

 

Edit: And you can see in my screenshot the chunks were deleted by "StableBit C..." not some other application.

Link to comment
Share on other sites

  • 0

I may have gone over some of this already, but I want to be very verbose here: 

 

Let me cover a bit how NTFS works first. though. 

THe entry for the files, the info about them, etc is stored in the partition table, at the beginning of the disks. The actual data is stored later on. Meaning that even if the data gets corrupted (eg, the blocks actually deleted), the files will still show up. 

This is true for phsyical disks (you can use tools to do this), and for StableBit CloudDrive. 

 

This explains why the files still show up. The NTFS entries have not been affected. 

 

 

 

 

That said, checksums of the actual files is really important here. To (again) verify that the data has actually been affected.  And in part how.  I don't mean the upload verification here (but that's never a bad idea to have enabled).  But checksum on the affected data on the drive, itself. 

 

 

Deletion of the chunks may be normal. Especially with Google drive, due to the mime type errors. When we see these, we delete the file and re-upload it.  This is the only way we've been able to resolve that specific issue. 

Additionally, chunks can be deleted if the data in them is zero'ed out, as there is no reason to store a chunk with absolutely no data. It's more efficient to send a delete request to the chunk than it is to upload the chunk with all zeroes. 

 

The only other reason that this *should* happen is if you're destroying a drive.

 

 

Aside from these reasons, StableBit CloudDrive should NOT be deleting chunks randomly. And in some cases, it may be doing so, but it may not be affecting the integrity of the files.

 

 

That said, for testing this, it may be a good idea to set the upload threads to "0" while you're not at the computer, so it effectively pauses the upload. This way, you can keep an eye on it, and make sure that you're able to get the logging for the issue when it actually happens. 

 

Also, Alex mentioned adding/enabling a couple of things that should help with this (or at least to help identify when it is happening). 

 

 

 

 

 

And Alex and I really, really want to resolve this as soon as possible, not only because this is an integrity issue, but because we don't feel comfortable about releasing a "Stable" version until we've either fixed this issue or confirmed that what you're seeing is expected behavior at this point. 

Link to comment
Share on other sites

  • 0

That said, checksums of the actual files is really important here. To (again) verify that the data has actually been affected.  And in part how.  I don't mean the upload verification here (but that's never a bad idea to have enabled).  But checksum on the affected data on the drive, itself. 

 

 

Additionally, chunks can be deleted if the data in them is zero'ed out, as there is no reason to store a chunk with absolutely no data. It's more efficient to send a delete request to the chunk than it is to upload the chunk with all zeroes. 

 

 

That said, for testing this, it may be a good idea to set the upload threads to "0" while you're not at the computer, so it effectively pauses the upload. This way, you can keep an eye on it, and make sure that you're able to get the logging for the issue when it actually happens. 

 

And Alex and I really, really want to resolve this as soon as possible, not only because this is an integrity issue, but because we don't feel comfortable about releasing a "Stable" version until we've either fixed this issue or confirmed that what you're seeing is expected behavior at this point. 

 

To address each of those in order.

 

1) Checksum verification is indeed on, and the chunks pass. I see in "Technical Details" when I try to open one of the corrupt files, that it is indeed trying to download the deleted chunks, but the progress sits at 0% and then is discarded (rather quickly) from the details window. I think we discussed this a page or two back and you indicated Alex said this is normal for a deleted chunk. It appears, to me, that CloudDrive EXPECTS these chunks to not exist, it knows it deleted them.

 

2) I really think, based on what we have discussed and the only ways chunks can get deleted, that the service or driver is thinking these chunks are being zeroed when they really aren't. I'm not sure what could cause this. EDIT: I very much doubt but wanted to make sure: if a file is in use/inaccessible it wouldn't cause some type of collision and cause this?

 

3) Turning off upload threads would certainly help me keep a better eye on this to collect logs, however this is a home server, that I only connect to via RDP. I could monitor it better while I was home in the evenings, but it would likely take weeks of me turning upload threads on and off to get to the point where this is repro'd and I caught it. Also I believe my 2nd log upload I caught it in the act (by luck), was not much useful gained from that? Has additional logging been put in place since then to increase supportability?

 

4) Me too and I'm at your disposal to help! I really want to get this archive data safely into the cloud ASAP :)

Link to comment
Share on other sites

  • 0

  1. To clarify, I don't mean the checksum verification in the app itself. I mean independently of our software, using something like HashTab, QuickPar, or the like. That can generate and compare md5/crc/sha hashes of the files, and verify that they're intact or not. 

     

    As for the files, I'll have to double check with Alex on this, but I think this is normal behavior. Specifically, we do track the file IDs of the files for quick access. If the file isn't there, then we assume that it's all zero byte data, as that's the only reasons it would have been deleted.  That would account for the activity you've described. 

     

  2. It very well could be that. Alex is adding some additional logging here, and there was another feature that he'd mentioned that was going to toggle for Google Drive, as it may help here. 

     

  3. I'll reflag the specific logs for Alex, just in case. 

    As for the additional logging, it's mostly text related, IIRC. Noting when empty chunks are deleted. 

    And yeah, it will significantly slow down the speed. But i'm not sure what else to do, as we really do need to catch it in action... and ideally with the newer versions.

     

  4. Agreed! :)
Link to comment
Share on other sites

  • 0

I am planning on using ExactFile to do the comparison. It will create an md5 hash for every file in a directory, that can then be ran against and compared to the files in a different directory. 

 

I see this in the latest changelog:

.536
* Fixed crash on service start.
* Never allow NULL chunks to be uploaded to providers that are encrypting their data and that do not support partial writes.

 

I'll upgrade to .536, create a new drive, and get to testing again!

Link to comment
Share on other sites

  • 0

Well, I'm in the last 2 hours of uploading my dataset, all has been good so far. 

 

And bam, Alex's change saved over 900 chunks from being deleted!

 

21oygb4.jpg

 

I saw this occurring when it was at the ~500 mark and flipped on drive tracing. I'm uploading those logs now. It's still trying to delete chunks, up to 1,300 in the time it took me to write this post, but, my chunks are safe, nothing has been deleted from the cloud!

 

Edit: it looks like it is continuously trying to delete these chunks, retrying and failing over and over and over. Will it ever break out of this (possibly) infinite loop and move on to uploading the final 20GB?

 

Edit2: I paused uploading, and then resumed it, and same as in the past, this caused CloudDrive to stop trying to delete chunks, and I'm back to uploading now.

 

Edit3: I'm in the process of verifying every single file via md5 hash with ExactFile, it'll take a while, but I'll post results when I have them.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Answer this question...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...