Jump to content
  • 0

How robust is CloudDrive?


jeraziah

Question

I'm wondering if there is any way to check the consistency of a cloud drive, besides the upload verification?

 

Last night I simulated a power outage by accident... At the next boot CloudDrive did some recovery actions on my 2 drives and is now working again.

But i noticed that it started to upload the whole cache again.

 

One drive had 50GB cache and it changed from to upload 180gb, cached 50gb to to upload 230gb, cached 0gb.

The other drive had 10GB cache and it changed from to upload 180gb, cached 10gb to to upload 190gb, cached 0gb.

 

How does the recovery work? How do I know my cloud data is not corrupt?

Link to comment
Share on other sites

12 answers to this question

Recommended Posts

  • 0

CHKDSK?

 

Or do you mean more in the regards to checksums? 

If you mean checksums, then this is a good program: 

https://corz.org/windows/software/checksum/

 

As for the recovery, it's verifying the cache, and is done internally.  

In fact, this sort of scenario is probably the most well tested aspect of the software. 

Link to comment
Share on other sites

  • 0

My main problem seems to be that the cloud drive service is not able to shutdown properly. After every reboot of the system cloud drive is performing recovery and after that it uploads the whole cache again.

For example it says to upload 150GB and cached 50GB. After a reboot it is performing recovery and after that it says to upload 200GB and cached 0GB. Why is it uploading the cache again and what happened to the 50GB that where already uploaded?

 

Any tipps for troubleshooting the shutdown issue?

Link to comment
Share on other sites

  • 0

Good luck. Rebooting my server is a game of "will clouddrive service stop properly?" There seems to be so much conflict between either clouddrive or drivepool with the built-in windows disk management/virtual disk manager. if CD/DP isn't running 100% stable disk management and diskpart will never even load. 80% of the time if i stop the clouddrive service it'll hang on stopping. at that point... what do you do? if you reboot there's a high chance it'll enter drive recovery and have to reupload the cache (which gets annoying if you have bandwidth caps on a dedicated server). 

Link to comment
Share on other sites

  • 0

Chistopher explains it best, but basically CloudDrive has no way of knowing if anything was corrupted on the "crash" so it reuploads your cache to ensure data integrity. Hashing wouldn't exactly work because the data in the cloud can't be "scanned" without first downloading it. So if you were thinking just hash the files and compare the hashes you'd have to download every block anyways from the cloud to compare to the local cache block (which would take the same amount of bandwidth as just reuploading it all). It kind of sucks but that's a limitation with using something in the cloud at the moment. Even if there was an option to run repair or purge local cache (thus preventing the reupload, and just banking on redownloading content as it's needed) you wouldn't know if there was a partial transfer that happened that was cut off during the crash and now you have a corrupted block in the cloud. So time goes on and eventually you find out when someone goes to retrieve a file that it's corrupted because the block on the cloud was only partially uploaded. Overall there doesn't seem to be a quick and easy way outside of reuploading everything to ensure there is no corrupted blocks sitting on the cloud.

 

It's one of the reasons I've changed over to pretty small local caches for my clouddrives (due to having 1gbs up/down on a dedicated) as the frequency of drive crashes and reuploads is worse than just having a small cache and having to pull files more frequently from the cloud. Which sucks because as I get closer to 0 cache size I could just be using ACDLI and not have any of these issues to begin with (just have no cache). But at this point I have too much time and space invested in using clouddrive/drivepool to start over (including switching from winserver to linux). In reality this is still a beta product and it's expected things act weird and there are somewhat-regular improvements/fixes. And with cache drives being dynamically changeable I can just bump them up down the road if things become more stable.

Link to comment
Share on other sites

  • 0

Ok I understand now but I have to ask.. .wouldn't it be enough if cloud drive would download the last uploaded block to see if that one is corrupt or not?

I mean this would mean that when you set your cache to zero and the system crashes and there is a corrupt block in the cloud, the file this block is from is now corrupt because there is no cached copy of it?

 

I now opened a support ticket, hopefully someone is able to figure out why the service is not able to stop.

Link to comment
Share on other sites

  • 0

The reason for the check of the cache is data integrity.

 

Specifically, there are routines in the service that need to be run before shutting down.

 

That is mostly ensuring that all of the data has been properly flushed to the disk.   IIRC, this is an issue with dealing with NTFS. 

 

NTFS caches data in the memory, and then periodically flushes data to the drive.  This occurs both on the CloudDrive disk itself, and on the cache drive.  Meaning that we absolutely most make sure that the data is flushed to the drive properly, and in both cases (and you may see how this can create a loop).  This is why it can take a while for the service to shut down. 

 

 

 

Now, when the service is terminated or the system experiences an unplanned/unsafe shutdown, that means that we cannot trust the data in the cache.  We have to verify the integrity of the data that we have. Hence the behavior you're seeing. 

Data integrity is our top priority (always).


I now opened a support ticket, hopefully someone is able to figure out why the service is not able to stop.

 

Specifically, NTFS keeps new files and modifications in memory, and periodically flushes these changes to the drive. 
 
This memory caching occurs both for the data on the CloudDrive disk, and for the cache drive itself.  
This means that when data is flushed to the CloudDrive disk, it modifies the cache data.. which then needs to be flushed to the cache drive.
 
As you can see, this can create a loop.  And since this happens a lot, it can cause the service to take a while to properly shut down, as the service needs to make sure that ALL of this data is flushed to disk before stopping. Otherwise data corruption can occur. 
 
Regards
Link to comment
Share on other sites

  • 0

 

As you can see, this can create a loop.  And since this happens a lot, it can cause the service to take a while to properly shut down, as the service needs to make sure that ALL of this data is flushed to disk before stopping.

 

I think you misunderstood me... The service is not taking a while to properly shut down... it simply does not properly shut down. I've waited 5 hours for it to shut down and in the meantime the whole system was unusable. Explorere crashed without any possibility to stop the process or restart it.

Network shares where gone... This is also true for the drive pool service. I'm unable to stop this one also when this issue happens and when I decided that it's time for a restart Windows 10 gets stuck at "restarting".

 

This happened 4 times already in 3 days.

Link to comment
Share on other sites

  • 0

I just checked the event log and found out that when the problems happens there are a lot of these entries:

 

The IO operation at logical block address 0x6412b8 for Disk 6 (PDO name: \Device\00000044) was retried.
The IO operation at logical block address 0x5eb520 for Disk 6 (PDO name: \Device\00000044) was retried.
The IO operation at logical block address 0x640838 for Disk 10 (PDO name: \Device\00000048) was retried.
The IO operation at logical block address 0x591210 for Disk 18 (PDO name: \Device\0000007a) was retried.

 

After a while these stop and these show up:

An error was detected on device \Device\Harddisk10\DR10 during a paging operation.
An error was detected on device \Device\Harddisk9\DR9 during a paging operation.
An error was detected on device \Device\Harddisk5\DR5 during a paging operation.

 

At this point the system is unusable until I hard reboot it. Don't know if my problem is related to CloudDrive, DrivePool or if it's a driver/hardware issue.

Link to comment
Share on other sites

  • 0

Do you know which disks these are?  Eg, what drives they correlate to?

 

StableBit DRivePool experiences the LBA retry issues frequently, and they're harmless on it. 

 

However, the paging operation stuff may be part of the problem here.

 

If you can get logs when this happens? 

http://wiki.covecube.com/StableBit_CloudDrive_Drive_Tracing

 

And ... I hate to ask, but initiate a crash dump:

http://wiki.covecube.com/StableBit_DrivePool_System_Freeze

Link to comment
Share on other sites

  • 0

So I do not use CD and know nothing about this but reading this I wonder whether it would make sense to disable write caching (at least on the drive containing the cache drive)?

 

It may help, but if there is a lot of activity on the drive, it will adversely affect performance.

 

Additionally, while write caching may be disabled, Windows will still cache NTFS info. 

 

 

That said, IIRC, we're using specific API calls to the cache data that should bypass a lot of the delay.   (eg "paging" IO calls)

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Answer this question...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...