Jump to content
  • 0

Files still disappearing


darkly

Question

Yup it's me again. On the latest version. I completely trashed the previous cloud drives I was working with before and made a new one a couple months ago. Was working fine until now when I just scanned through all my files to find random files in random directories all over my drive had gone missing again. Cloud drive is version 1.1.0.1051. Drive pool is 2.2.2.934. Running on Windows Server 2012. My cloud drive is a 256TB drive split into 8 NTFS partitions (I ditched ReFS in case that was causing issues before. apparently not) which are each mounted to an NTFS directory. Multiple levels of drivepool then pool all 8 partitions into one drive, then pool that drive (on its own for now, but designed this way for future expansion) into another drive. All content is then dealt with from that drive. Seemed to be working fine for months until I looked through the files today and found that random ones (a good 5-10%) have gone missing entirely. I really need to get this issue figured out as soon as possible. I've wasted so much time due to this issue since I first reported it almost a year ago. What information do you need?

Edit:

No errors in either software. Even went and bought two ssds for a RAID1 to use as my cache drive so I've eliminated bandwidth issues.

Edit again:

Just tried remounting the drive and drivepool shows "Error: (0xC0000102) - {Corrupt File}" which resulted in it failing to measure a particular path within one of the 8 partitions. How does this happen? Is there a way to restore it?

Another note:

The thing is, I don't think it's just a drivepool issue because the first time I reported this, i was using one single large clouddrive without drivepool.

More:

Tried going into the file system within that partition directly and found that one of the folders inside one of drivepool's directories was corrupt and couldn't be accessed. I'm running a chkdsk on that entire partition now. Will update once done.

Sigh...:

If it ever completes that is. It's hanging on the second "recovering orphaned file" task it got to. I'll be leaving it running for however long seems reasonable....

A question:

as long as I'm waiting for this... is there a better way to get a large (on the order of hundreds of terabytes) cloud drive set up without breaking things such as the ability to grow it in the future or the ability to run chkdsk (which fails on large volumes). I'm at my wit's end here and I'm not sure how to move forward with this. Is anyone here storing large amounts of data on gdrive without running into this file deletion/corruption issue? Is it my server? I've had almost 10tb of local storage sit here just fine without ever losing any data. From searching this forum, just about no one else seems to be having the issues I've been plagued with. Where do i go from here? I'm almost ready to just my server to you guys for monitoring and debugging (insert forced laughter as my mind slowly devolves into an insane mess here).

Link to comment
Share on other sites

3 answers to this question

Recommended Posts

  • 0

Got some updates on this issue.

I'm still not sure what caused the original corruption, but it seems to be of only the file system, not the data. I'm not sure if this was caused for the same reasons as in the past, but I'm happy to report that I was able to fully recover from it.

The chkdsk eventually went through after a reboot and remounting the drive a few times. Like I suspected above, it was only one partition of the CloudDrive that was corrupted, and in this case, having this multi-partition CloudDrive + DrivePool setup actually saved me from quite the headache, as only the files on that partition had gone missing (due to DrivePool dealing with entire files at a time). I took the partition offline and ran chkdsk on it a total of 5 times (the first two being repeats of what I mentioned above, stalling after a certain point; the next 3 being performed after the reboot and remounting) before it finally reported that there were no more errors. I remounted the partition, and upon checking my data, found that everything was accessible again. Just to be sure, I'm having StableBit Scanner running in the background until it passes the entire cloud drive, though it's going to take a while as it's a 256TB drive.

One thing that maybe someone could look into: The issue seemed to have happened during a move operation on the CloudDrive. I had moved a bunch of directories to a different directory on the same drive (in one operation), and found that some files had not been moved once my data was available again. Maybe this is somehow related to what caused the corruption to begin with, or maybe it's just coincidence.

Link to comment
Share on other sites

  • 0

Sorry for not getting back to you on this! But glad to hear that it sounds like it's mostly resolved (by a bunch of disk checks?)

 

On 8/22/2018 at 9:12 AM, darkly said:

One thing that maybe someone could look into: The issue seemed to have happened during a move operation on the CloudDrive. I had moved a bunch of directories to a different directory on the same drive (in one operation), and found that some files had not been moved once my data was available again. Maybe this is somehow related to what caused the corruption to begin with, or maybe it's just coincidence.

We've .... spent a LOT of time on that section of code for StableBit DrivePool.  Not just moves, but deletes as well.  NTFS does some "weird" stuff, and it's cause issues in the past.

That said, if you're able to reproduce this sort of issue, that would be awesome.  Especially, "at will".
http://wiki.covecube.com/StableBit_DrivePool_2.x_Log_Collection
http://wiki.covecube.com/StableBit_CloudDrive_Drive_Tracing

grabbing the drive tracing from this when it happens would be super helpful (as well as detailed instructions on how you triggered it).  This way, we can take a look at the logs and see what exactly happened. 

Link to comment
Share on other sites

  • 0
49 minutes ago, Christopher (Drashna) said:

Sorry for not getting back to you on this! But glad to hear that it sounds like it's mostly resolved (by a bunch of disk checks?)

 

We've .... spent a LOT of time on that section of code for StableBit DrivePool.  Not just moves, but deletes as well.  NTFS does some "weird" stuff, and it's cause issues in the past.

That said, if you're able to reproduce this sort of issue, that would be awesome.  Especially, "at will".
http://wiki.covecube.com/StableBit_DrivePool_2.x_Log_Collection
http://wiki.covecube.com/StableBit_CloudDrive_Drive_Tracing

grabbing the drive tracing from this when it happens would be super helpful (as well as detailed instructions on how you triggered it).  This way, we can take a look at the logs and see what exactly happened. 

Oddly, I did another chkdsk the other day and got errors on the same partition, but a chkdsk /scan and chkdsk /spotfix got rid of them and the next chkdsk passed with no errors. I didn't notice any issues with my data prior to this though. Going forward, I'm running routine chkdsks on each partition to be sure.

If the issue was indeed caused by issues with move/delete operations in DrivePool, it would appear that this is a separate issue from the first issue I encountered (where I was only using one large CloudDrive), but maybe the same as the second time I experienced the issue (the first time I did the 8 partition setup). I was using ReFS at the time though, so I'm not sure.

I'll try to find some time to reproduce the error. In case anyone else with some spare time wants to try, this is what I'll be doing: 1) Create a CloudDrive on Google Drive. All settings on default except the size will be 50TB, full drive encryption will be on, and the drive will be unformatted. 2) Create 5 NTFS partitions of 10TB each on the CloudDrive. 3) Add all 5 partitions to a DrivePool. All settings to be left on default. 4) Add the resulting pool to yet one more pool by itself. 5) Create the following directory structure on this final pool: Two folders at the root (let's call them A and B); one folder within each of these (call them A2 and B2); finally one folder inside A2 (A3). 6) Within A3, create 10 text files. 7) Move A3 back and forth between B2 and A2, checking the contents each time until files are missing.

This approximates my setup at a much simpler scale and SHOULD reproduce the issue, if I'm understanding Chris correctly as to what could be causing the issue I experienced, and if this is indeed what is causing the issue.

I plan on getting another set of licences for all 3 products in the coming weeks as I transition from hosing my Plex on my PowerEdge 2950 to my Elitebook 8770w which has a much better CPU for decoding and encoding video streams (only one CPU vs two obviously, but the server CPU had 1 thread per core, and the 8770w's i7-3920XM has much better single threaded performance for VC1 decoding). I probably won't have to much time to attempt to reproduce the issue until this happens, but I'll let you know once I do.

Finally, some question: Is there any sort of best practice of sorts to avoid triggering those DrivePool issues, or any sort of workaround for the time being? Do you know the scope of the potential damage? Will it always be just some messy file system corruption that chkdsk can wrap up or is there the potential for more serious data corruption with the current move/delete issues?

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Answer this question...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...