Jump to content

modplan

Members
  • Posts

    101
  • Joined

  • Last visited

  • Days Won

    3

Posts posted by modplan

  1. FYI I'm impatient  :D so I am retrying with a new disk on .533 with the same dataset. Even if I do not see the problem again, I wouldn't be 100% confident the problem is solved, but I'm willing to spend a few days uploading to find out.

     

    My question is, should I turn on drive tracing for the entire process? My C: drive is a small SSD, will the logs get massive and use up a lot of space?

  2. Thanks for the response Chris. Not sure I understand, likely my question wasn't clear haha. Do we think in the latest builds this issue is: Not Resolved? Possibly Resolved? Definitely Not Resolved?

     

    If Alex hasn't been able to reproduce....maybe we have no clue and none of the above?

     

    EDIT: Whoops never mind I think you are saying that Alex was on the road to reproducing this issue via test, hit some bugs that could have caused it, fixed some of those bugs, but had to restart the test and has not been able to repro it since? So we still aren't quite sure where we are regarding it? Anything I can do to test/help?

  3. He's still working on it. sorry. 

     

    However, he has found a few low level bugs because of this, and some of the integrity testing we've (he's) been doing (and some of this may be related to what you're seeing). 

     

    Thanks Chris. So I assume then that while my logs uncovered some other bugs for Alex, the fixes in and up to .518 probably haven't solved this particular issue yet?

  4. Alex have any luck root causing this yet? Anything I can do or any tests I can run? CloudDrive is currently sitting idle for me until this is resolved, since I don't want to dedicate days of uploading again to have the drive become worthless. 

     

    Willing to dedicate some time doing whatever I can to sort this out if Alex could use anything from me. 

  5. Hi guys,

     

    I just caught up with the changelist, and I noticed that "unencrypted" drives are now actually encrypted with a public static key. Are there plans to publish this key? Has it been published anywhere? I think it's important to have access to it for data recovery purposes.

     

    Thanks!

     

    Expanding on that, do we have a data recovery procedure at all yet? Even if we know the key, that wouldn't help me stitch all those chunks back together and recover my files. 

     

    I really like what Arq Backup does. They have an opensource program that is on github that does nothing but download & decrypt chunks, allowing you to download your files. Handy if the developer suddenly disappears, the app stops working years down the road if/when it is no longer supported or any number of scenarios. 

     

    The data format is even fully documented. https://sreitshamer.github.io/arq_restore/

     

    Hopefully Alex has something similar on the roadmap (think I may have seen it mentioned that he does). 

  6. Chris,

     

    Saw Alex's update on the issue analysis. Unable to update there. Please let him know there might have been a power outage several days/a week before the deletes started happening. I know we had a power outage during a storm last week. What I can't remember is whether or not I had created this drive and started uploading yet when the outage happened.

     

    If the power outage did happen after the drive was created, it would have been after all data was copied to the drive and the drive was marked RO, since I remember monitoring the copy operation and the outage was either before that and before the drive was created, or afterwards during the upload cycle, sorry that I can't pinpoint it exactly. 

     

    Not sure if that matters, but again, just trying to provide more info. 

  7. Just for further info. The drive is now completely uploaded. "To Upload" = 0B

    I was hoping at the end of the upload cycle CD may re-upload the chunks it previously deleted but that does not appear to be the case.

     

    To Recap:

    - ~775GB was copied to the cloud drive and we started uploading

    - This specific cloud drive was then marked as read only with diskpart and has been that way the entire time since

    - Some previously-uploaded chunks were deleted by clouddrive on 2 separate occasions during the upload cycle

    - Each time I noticed deletions happening, upload was paused for a while, on resume, normal uploads, not further deletions, started

    - We are now fully uploaded (several days later) and "To Upload" = 0B

     

    All files still show up in windows explorer (pinned metadata I assume) but any file that had a chunk in the deleted range is now of course corrupt. 

    No significant errors were thrown by the CD GUI during this time.

     

    I'm hopeful Alex can repro, please let me know if I can provide more info.

  8. Modplan:

     

    As for the deletion, I just wanted to make sure you knew the cases in which we do delete files from the provider, and the *only* cases which it should. 

     

    As for the memory test, this is to make sure that the "WriteMap" and other in memory objects aren't getting corrupted by bad memory. And since NTFS uses the memory extensively for caching, it's a good idea in general. 

     

     

     

    As for the diskpart stuff, was this for the CloudDrive disk, correct? 

    If so, well I don't think this would cause that issue, but I've let Alex know, just in case it is. 

     

     

    And I'm sorry to hear that it's happened again, but I'm glad to hear you were able to enable logging when this happened.  That should definitely help identify why this was happening. And I've flagged the logs for Alex already. 

     

     

     

     

    And hopefully, this is an easy to find issue.  

     

    Hey Chris, yes I just wanted to be a thorough as possible in my response to aid in getting to the bottom of this. 

     

    Yes the readonly flag was set on the clouddrive drive itself, via diskpart. I agree, from my limited understanding that this shouldn't cause an issue, but just wanted to provide all the info I thought of. 

     

    Hopefully the logs tell the full story. 

  9. Possibly relevant

     

    AFTER all data had been copied to this drive and was in the To Upload "cache". The drive was marked as read only with diskpart:

    att vol set readonly
    

    I do this semi-often with physical drives that are meant for archive purposes ONLY, in order to prevent any kind of corruption, accidental deletion, etc. They are only marked as RW when data needs to be written, then back to RO they go.

     

    This is the first time I have done this on a CloudDrive drive. I do not think setting this NTFS flag should have any interaction with CloudDrive or cause any issues, but I wanted to point this out, since it is the only anomaly I can currently find that is different than my previous testing and uploads.

  10. Thanks Chris. Just to point some things out one by one:

     

    To let you know, there are a few circumstances that a provider can delete chunks.  

    • Destroying a drive
    • chunk contains only 0's (provider specific, Google Drive being one of them).
    • (Google Drive only) When the mime type error is generated, delete the chunk and re-upload it

    These are the only times that it should ever delete a chunk on the pool. Period. 

     

    • The drive was not destroyed, we were just uploading along. 
    • As for only zeroes, well I guess that could be a possibility, but only if clouddrive was actively overwriting these previously uploaded chunks with all 0's and I don't see why it would do so.
    • No MIME errors were generated in the GUI (very familiar with this error, believe I was the first to report it) and the chunks were never re-uploaded, theyre gone.

     

     

    However, what you're describing about the "0%" parts, I've talked with Alex already, and this is normal. At least for chunks with no data. 

     

    The chunks had data, they had been uploaded over 24 hours prior, then I saw tons of writes at 0% and then Google Drive reported the chunks as "permanently deleted", I verified the exact chunk IDs I saw at 0% were the exact ones shown as deleted in the GDrive GUI. Unless you are commenting on the prefetch threads stuck at 0%? If so, the chunks not only have no data now, they are completely gone from Google Drive.

     

     

     

    That said, this is very very odd, and you're the only one to report something like this. And Alex has been doing extensive testing for at least the last few days.

     

    I agree, I've had to have uploaded 10-15TB to various drives on almost every BETA version testing this product over the past 3-4 months. This is the first time I've seen this. And to see it happen on its own in the middle of an upload cycle with no action or intervention on my part, is scary.

     

     

    Also, could you check the Event Viewer on the system? Check the "WIndows Logs", and the "System" section. Check for any disk, ntfs or controller related errors. That or just export the entire log (unfiltered) and upload it using the same link you used for the logs. That way, we can take a look at it.

     

     

    Also, I'd highly recommend running a memory test (extended), if you haven't already done so recently.  

     

    Don't see anything out of the ordinary in 'System' nor 'Application' around the time this started. A memtest is likely worth doing, but I don't think a bad DIMM would cause clouddrive to suddenly go on a deletion spree. As we can see in the above google drive screenshot, google lists the application that initiated the action, in this case a permanent delete, and we can see "Stablebit Clo..."

     

     

     

    And it may be worth turning on Upload Verification, in this case. That downloads and checks the chunks after upload. That may help prevent issues from occurring. 

     

    Upload verification is on and has been on since this drive's creation. In fact, after the writes that were at 0% that deleted the chunks were finished, Read verify threads were spawned, which were also stuck at 0%, effectively verifying the delete. It really seems to me that Cloud Drive really thought it should delete these chunks, and even verified doing so, but it caused corruption.

     

    -------------------

     

    I'm not terribly concerned about this specific data, it was a backup. All I will lose if I have to blow this drive away and start over is a few days of upload time. However I want to provide as much info as possible to hopefully root cause this and prevent this from ever happening again, to me, or anyone else. Right now I have files that show up in windows explorer that look normal (due to pinned metadata I assume) but that are completely corrupt. If I had not been watching "Technical Details" when this occurred and started investigating I would assume that all my data is perfectly safe right now in the cloud, it isn't.

     

    -------------------

     

    I resumed the upload cycle late last night several hours after my post, since I've pretty much written off this drive and all of its data, I figured I would let it resume uploading and see if it somehow magically re-uploaded all of the chunks it deleted. If it does, we still have a problem, since for about 24 hours these chunks will have existed neither in the cloud nor in the cache, so the files are corrupt, but I would be glad if all of my data did eventually become whole.

     

    So far, no more deletions have occurred, all night we were chugging along uploading. Currently we are uploading in the chunk ID range of 57,XXX, a far cry from the ~28,800 to ~31,300 range that was deleted. But maybe we will circle back after all new data has been uploaded and re-upload these chunks? We'll see, in about 20 hours.

  11. Google Drive

    .486

    win2k8r2

     

     

    I've paused upload threads and it has stopped, but about 2500 chunks were just "permanently deleted"

     

    I copied about 775GB into a brand new 10TB drive. Over 440GB of that was successfully uploaded and it was chugging along like normal. Nothing changed. Watching technical details I noticed suddenly the upload threads were popping up at 0%, never progressing, then new ones would pop up at 0% and this would continue over and over. No errors were thrown by the GUI, nothing abnormal in the logs.

     

    I logged in to the Google Drive web GUI and I see tons of this:

    2435j7d.jpg

     

    Via trial and error I found some video files that had chunks in this range (~28,800 to ~31,300) When I try to play these files, tons of prefetch threads spawn in this range, stay at 0% and the "Prefetched" count jumps up to 500mb instantly. The file never plays in VLC. If I try to open an image file that is in chunks in this range, windows image viewer tells me it is corrupt. 

     

    Any ideas how this could possibly happen? Pretty disappointed right now, this is some serious corruption.

     

     

    EDIT: I've uploaded what logs I have, but I do not see anything abnormal in them. I did NOT have drive tracing enabled and am obviously scared to enable it and resume uploads for fear of more chunks being deleted from the cloud!

  12. Well, Alex posted a reply. There isn't a "fix" yet, but it's on his mind. 

     

    That said, it may be that the specific chunk in question was being repeatedly updated, causing it to be uploaded again and again. 

     

    Hey Chris,

     

    I've seen this issue as well, normally with lower chunks (often chunk# 55 or 64 in my case). It appears to me that Cloud Drive logs the updates to that chunk and uploads each single update individually, instead of coalescing all of the updates together and uploading a single chunk. Note that I see this after setting "upload threads" to 0, and then setting them back to the normal value after my copy/change/write is done, so the chunk is definitely not being continuously updated during the upload. I would assume all updates to the chunk would then coalesced into a single upload, instead of uploading the chunk over and over for each previous update done. Just wanted to throw in my .02, I see this when updating the archive attribute on files with backup software. That attribute likely exists for each file in the MFT that is stored on these blocks that get continuously uploaded (64, 55, whatever).

     

    Maybe that helps shed some light? My upload is fast, so it is easy to ignore for me, but it does waste time.

     

     

    EDIT: Take a cloud drive drive with many files, change some NTFS attributes on some/all files with upload threads set to 0. Turn the upload threads back on and I bet you will see a single, or several chunks, upload over and over again. A good way to repro this.

     

    EDIT2: I work in SAN dev for an enterprise SAN provider, I have no insight on how Alex's Windows driver works, but feel free to PM me for more info, I faint hunch as to what is going on here from a SCSI perspective, more than willing to help root cause this and get to the bottom of it (along with any other issues I can help with). But I doubt you guys need my help if you can repro it  :D

  13. Thanks so much for the info.

     

    For curiosity sake,

    Since I can only have the drive attached to 1 CloudDrive installation at a time,

    can I share this drive in Windows or SMB for other computers to have access to without any issues? I assume yes, but I want to make sure :).

    Yes

  14. I just got a checksum mismatch on one of my disks. chkdsk shows no errors. What is my course of action here? A few files I checked seem fine, is there some silent corruption going on somewhere? I got this error when changing the label of the drive....however I think I may have seen it before on this drive a while back. Upload verification has been turned on and off multiple times on this drive while we have gone through all the betas, and while I have been tweaking my upload/download threads for maximizing performance to see if I could tolerate upload verification being turned on (it is currently ON).

     

    Google Drive

    .470

    win2k8r2

  15.  

    Thanks thnz, looks like you are right. 

     

    Which leads to a broader conversation. Is this the right way to do this for drives with large caches? Could a "chunk tracking database" that marks local cached chunks as perfectly uploaded be used to prevent the wholesale re-upload of the cache? Only re-upload the chunks that were not marked in the database as previously successfully uploaded? If someone with somewhat limited bandwidth sets a 500GB cache on a large drive, suffers a power outage, but 498GB of that cache was previously perfectly uploaded, this wholesale re-upload would take weeks.

     

    Edit: If a "chunk tracking database" is not viable, maybe download the chunks and compare the checksums to the local chunks? Most people have MUCH faster download, than upload. So only the needed chunks would be re-uploaded.

  16. What OS are you using?

    What version of StableBit CloudDrive are you using?

     

    What is the size of the drive?

    What is the size of the cache (You've indicated that it's 100GBs, but I'm not entirely sure about this). 

     

     

    To clarify, did you restore any files or the system from the backup, and then continue backing up the system? 

    If so, that *may* have been the cause of the issue here, specifically.

     

    Otherwise, could you let me know exactly what happened?

     

    Sorry, info I know I should have provided. 

     

    2k8 R2

    .470 (But have seen this several times while using CloudDrive, any time there is an unexpected reboot basically, version does not matter)

    Drive is 10TB 

    Cache is currently 100GB, I've been playing with different sizes.

     

    No files have been restored. In fact this seems to have nothing to do with running a backup, in my experience, any local cache data is moved to "To Upload" after "Drive Recovery" after an unexpected reboot. This is just the only drive I currently have a Cache on. I have seen this on other drives just with standard files, when I had a cache on them.

     

     

    Basically:

    - Create Drive with a cache

    - Add data to the drive, ensure that the cache has some data in it

    - Ensure "To Upload" = 0B

    - Pull the power plug/force BSOD

    - On Boot, drive will go through lengthy "Drive Recovery" procedure

    - After "Drive Recovery", ALL local data from the cache is added to "To Upload" and re-upload begins, ALL this data already exists in the cloud

    - If you have a large cache, this takes forever

     

    Hope that makes sense?

  17. Hi,

     

    Are there any plans to support the EMC Atmos storage?

     

    I could certainly be wrong, but I think this is more of a consumer focused product. Does Atmos have an S3 compatible API like NetApp StorageGRID does? I could see CloudDrive maybe implementing a generic S3 API driver, like some other cloud facing apps have done.

  18. Sorry if this has been covered a quick search did not find me what I was looking for. 

     

    I have a CloudDrive that I send windows server backups to nightly. The full backup size is about 75 GB, but the nightly incremental change is only 6-7GB, easily uploadable in my backup window. 

     

    I have set the cache on this drive to 100GB to ensure the majority of the "full backup" data is stored in cache, so that when windows server backup is comparing blocks to determine the incremental changes, clouddrive does not have to (slowly) download ~75GB of data for comparison every single night. 

     

    This works very well.

     

    The problem comes when there is a power outage, crash, BSOD, etc. Even though the clouddrive is fully current and "To Upload" is 0, when I bring the server back up, after we go though drive recovery (which takes 5-8 hours for this 100GB), cloud drive then moves ALL 100GB of the cache into "To Upload" and starts re-uploading all of that data.

     

    Why? I can't think of how this is necessary. Incase a little data was written to the cache at the last minute before the unexpected reboot? If so, certainly there is a better way of handling this than a 100GB re-upload, some sort of new-unuploaded-blocks tag/database? What if a drive has a massive cache, a re-upload could take days/weeks!

     

    Thanks for any insight, I've gone through this process a couple of times using clouddrive and it has been painful every time. I'd be happy even if we downloaded every single block that is cached, compared it to the local cache block, and then only uploaded the ones that have changed.

×
×
  • Create New...