Jump to content
Covecube Inc.
  • 0
dragon2611

I/O deadlock?

Question

Is it possible to get into a situation where the I/O seems to deadlock, I have a 2012r2 machine I was copying 300GB or so of data to onedrive Business including lots of small files,

It Looks like the copying has stopped but also I cannot bring up task manager and a shutdown -r -f -t 30 command is hanging.  Hoping that it's cloud drive and not one of the disks in the box is failing.

Share this post


Link to post
Share on other sites

Recommended Posts

  • 0

 

I'll install this build on the machine I was trying to use last time and see if it solves the issue.

 

Edit:  Installer sat for a very long time and eventually stated the service failed to start, X64 version on server 2012r2

 

from the event log.

Service cannot be started. System.TypeInitializationException: The type initializer for 'CloudDriveService.Main' threw an exception. ---> System.IO.FileNotFoundException: Could not load file or assembly 'Cove.Util, Version=1.0.5689.32472, Culture=neutral, PublicKeyToken=4c28787c0620f424' or one of its dependencies. The system cannot find the file specified.
   at CloudDriveService.Main..cctor()
   --- End of inner exception stack trace ---
   at CloudDriveService.Main.#vt()
   at #eae.#Efc.#Bfc(String[] #qRc)
   at System.ServiceProcess.ServiceBase.ServiceQueuedMainCallback(Object state)

Was this a clean installation or an upgrade?

And if was an upgrade, to what version and from which version?

 

Just had the 'deadlock' lockup again. Happened while drivepool was duplicating files to the clouddrive. Will upload a dump shortly.

 

*edit* Uploaded here. Drive is doing recovery right now.

Okay. Could you uninstall the current version, and make sure that "C:\Program Files\StableBit\CloudDrive" is empty. 

If it's not, make sure you manually stop the services for it in "services.msc", use task manager to "end" the notification app, delete the directory, reboot and then reinstall?

Share this post


Link to post
Share on other sites
  • 0

Presumably I hit the API limits for Onedrive as After reboot I wasn't getting any upload/download.
 

It actually echos the exeperiance I had with StorageMadeEasy where the onedrive/office365 API was to unstable for handling large files.

Share this post


Link to post
Share on other sites
  • 0

Might be the same crash I was having, though I was using Amazon CloudDrive. During heavy I/O (was duplicating via drivepool at the time) the system stops responding (ish). Mouse still moves, still responds to pings, can move windows around (though they're all 'not responding'), media continues to play, but cant ctrl+alt+del, or remote desktop into it - needs a hard restart.

 

I've been unable to reliably reproduce it so have been unable to get logs/mem dumps when it happens.

 

As for handling large files - it looks like clouddrive stores data at block level, so individual filesize might not matter.

Share this post


Link to post
Share on other sites
  • 0

Same thing I was seeing, I couldn't move interact with explorer or open anything but services that were running on the machine seemed to still be accessable (E.g softether vpn server was still working as I could VPN to the box)

 

I think the problem is caused by large files or lots of small files, essentially anything that causes a large number of API calls, My best guess is the API gets rate limited or something else causes a disruption to the upload process and when explorer tries to go access the drive it gets stuck waiting for an I/O operation.

 

Of course I could be way off the mark.

 

I've uninstalled it for now as I was only really interested in using it with either onedrive or Hubic (not yet supported) and the main reason I was interested was due to the clientside encryption (I.e I want to use these providers but it doesn't mean that I want to have to trust them 100%)

Share this post


Link to post
Share on other sites
  • 0

I think I'm experiencing this as well. Trying to sync ~160 Gb of surveillance data to ACD. After hours my RAM is completely consumed and then at some point the server just reboots. Chiming in to follow the thread.

post-1884-0-12596000-1435603448_thumb.png

Share this post


Link to post
Share on other sites
  • 0

@Dragon2611:

OneDrive for Business/Office365 uses sharepoint as the backend, IIRC.  And that can become problematic if you have a lot of data. We've implemented workarounds for that but ....

And I do beleive that they throttle the connection. 

 

 

But yes, files are stored in blocks. Specifically, you can set the "chunk" size, which is the block size that we store the raw data in.  Changing these sizes can absolutely change the performance profile for the drive. 

 

 

Anyone experiencing the lockup, could you get a memory dump?

http://wiki.covecube.com/StableBit_CloudDrive_System_Freeze

 

I know that StableBit CloudDrive is very memory intensive, but if it's causing issues, that's something that needs to be addressed.

Share this post


Link to post
Share on other sites
  • 0

It had another lockup this morning. I've now run that reg tweak and rebooted, so next time it happens I'll hopefully be able to get a dump.

Share this post


Link to post
Share on other sites
  • 0

Caught and dumped. Compressing and uploading atm.

 

This crash occurred during duplication, though it does occur at other times too when the disk isn't part of a drive pool - maybe during I/O of some kind?

Share this post


Link to post
Share on other sites
  • 0

Is there any way to see when these issues are looked at? Or should I just keep an eye on the changelog. I'd like to know when it's good to test again of if you need more info/dumps.

Share this post


Link to post
Share on other sites
  • 0

Sorry for the delayed response here.

 

 

It was in Windows 8.1 rather than Server 2012 (not sure if this is important). Could you create another issue for the resize-while-uploading bug too? http://community.covecube.com/index.php?/topic/1297-drivepool-integration-questions/&do=findComment&comment=8697posts #4 and #5.

The OS can definitely matter, so it's always good to include that info.

 

 

As for the resize issue, giving it's timing, and the "deadlock" type issue, I'm not bugging it at this time.

 

Don't worry, we're not ignoring it. However, the issue with the cache and causing the system to lock up (dead lock) looks to be related specifically.  The drive is ... effectively getting too much IO and is having issues handling it. And it's causing issues with reads, including pinning NTFS metadata (which would include volume/disk size info).  

 

Once that issue is resolved, we will make sure that the resizing functionality is working properly, and fix the issue if it's still happening.

Share this post


Link to post
Share on other sites
  • 0

Alex is actively working on the issue, but it is a very, very complicated issue.  

 

I'll bug him about it, though. 

 

Also, the above link (this one: https://stablebit.com/Admin/IssueAnalysis/17719)is where alex will post any status updates, including whien it's fixed (as well as the version number of the fixed version) 

And I'll definitely let you know as soon as we have a fix for it. 

Share this post


Link to post
Share on other sites
  • 0

Do you know if data loss/corruption can occur when this happens?

 

I copied several hundred 500mb-1gb files onto a cloud disk, and during upload this deadlock thing happened twice. Once uploading had finished, several of the files became corrupted (they had different file hashes than the originals). The corrupted files seemed to be in two distinct clusters when sorted by file modified time (ie. they were copied sequentially). I'm wondering if chunks of these corrupted files were being uploaded at the time of the deadlock, and weren't caught by recovery after a restart. Is it possible that chunks being uploaded when this crash occurs can become corrupted?

Share this post


Link to post
Share on other sites
  • 0

It shouldn't corrupt it.

 

However if the system crashed/BSODed, then it's definitely possible (as there is always a possibility of corruption in this case).

 

However, a chkdsk pass of the CloudDrive should detect and correct any issues

Share this post


Link to post
Share on other sites
  • 0

Anyone seeing this issue, please try the latest beta build:

 
 
 
Alex (the developer) has significantly overhauled the caching system, and it should work much better now. Additionally, when the cache drive is running out of space, it will throttle reads and writes, (and block rights if it gets too slow).
 
There are some other fixes (including to Amazon Cloud Drive handling of throttling codes, but it may have issues due to authorization issues).
 
 
However, we do recommend uninstalling the current version of CloudDrive, deleting the contents of "C:\Program Files\StableBit\CloudDrive" and then installing the new version (due to an installer issue we've identified recently). 

Share this post


Link to post
Share on other sites
  • 0

Great to see progress on this. Unfortunately I'm still unable to reattach drives in this build (bug in other thread), so can't really give it a proper testing yet.

Share this post


Link to post
Share on other sites
  • 0

Great to see progress on this. Unfortunately I'm still unable to reattach drives in this build (bug in other thread), so can't really give it a proper testing yet.

Definitely. Alex has been devoting a lot of time to trying to fix this issue, because it's a huge one, unfortunately. 

 

As for the reattach bug, I believe I've flagged that for Alex already, and we'll look into it soon.

Share this post


Link to post
Share on other sites
  • 0

I'll install this build on the machine I was trying to use last time and see if it solves the issue.

 

Edit:  Installer sat for a very long time and eventually stated the service failed to start, X64 version on server 2012r2

 

from the event log.

Service cannot be started. System.TypeInitializationException: The type initializer for 'CloudDriveService.Main' threw an exception. ---> System.IO.FileNotFoundException: Could not load file or assembly 'Cove.Util, Version=1.0.5689.32472, Culture=neutral, PublicKeyToken=4c28787c0620f424' or one of its dependencies. The system cannot find the file specified.
   at CloudDriveService.Main..cctor()
   --- End of inner exception stack trace ---
   at CloudDriveService.Main.#vt()
   at #eae.#Efc.#Bfc(String[] #qRc)
   at System.ServiceProcess.ServiceBase.ServiceQueuedMainCallback(Object state)

Share this post


Link to post
Share on other sites
  • 0

Just had the 'deadlock' lockup again. Happened while drivepool was duplicating files to the clouddrive. Will upload a dump shortly.

 

*edit* Uploaded here. Drive is doing recovery right now.

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Answer this question...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...

×
×
  • Create New...