Jump to content

srcrist

Members
  • Posts

    466
  • Joined

  • Last visited

  • Days Won

    36

Everything posted by srcrist

  1. I mean, this assumption is faulty, and important data should never be trusted exclusively to a single backup solution--cloud or otherwise. There is no such thing as the perfect backup solution. There is a reason that the 3-2-1 backup rule exists. 3 copies, 2 different types of media, at least 1 offsite location. If you're relying exclusively on CloudDrive or any other storage solution as your only backup option, you're going to have a bad time. Google rolled back data and corrupted many CloudDrive volumes (we think), and they may do the same in the future. Google Drive is a consumer solution, after all. It is not intended to meet the needs of sensitive business data. CloudDrive is a remarkable product and takes a great number of steps to oversee the integrity of your data, but it can't work miracles. Cloud storage problems exist, and should not be relied upon exclusively. Using CloudDrive with an enterprise provider like Google Cloud Storage, Amazon S3, or Microsoft Azure should be the first step, at a minimum, to store sensitive and important business data. Not a consumer provider like Google Drive, which is where we saw corruption last month. Linux ISOs and media files are one thing, but I wouldn't even use CloudDrive with Google Drive to store copies of family photographs that I did not have another backup for.
  2. Again, other providers *can* still use larger chunks. Please see the changelog: This was because of issue 24914, documented here. Again, this isn't really correct. The problem, as documented above, is that larger chunks results in more retrieval calls to particular chunks, thus triggering Google's download quota limitations. That is the problem that I could not remember. It was not because of concerns about the speed, and it was not a general problem with all providers. EDIT: It looks like the issue with Google Drive might be resolved with an increase in the partial read size as you discussed in this post, but the code change request for that is still incomplete. So this prerequisite still isn't met. Maybe something to follow up with Christopher and Alex about.
  3. Christopher, I'm sure you've spoken with Alex about this issue. I'm just wondering if there's been any discussion of infrastructural changes that might be able to improve the reliability of the file system data? I was wondering if, for example, CloudDrive could store a periodic local mirror of the file system data which could be restored in case of corruption? I don't know enough about NTFS and how the journal and such are stored on the drive to know if this is feasible or not. It just seems to me that almost everyone (who had an issue) saw file system corruption, but not corruption of the actual data on the drive. Which makes sense, because that data is frequently modified and is, as such, more vulnerable to inconsistencies on Google's part. So if that data could be given some sort of added redundancy...it might help to prevent future issues of this sort. Do you have any thoughts on that? Or maybe Alex could chime in? My basic thought is that I'd rather have corruption of file data for individual files, which can be replaced if necessary, than lose an entire multi-terabyte volume because the file system itself (which comprises a very small minority of the actual data on the drive) gets borked. I'd love some features to take extra care with that data.
  4. The maximum chunk size is actually a per-provider limitation. Some providers *can* use chunks larger than 20MB. During Beta, Google could use chunks as large as 100MB, if I remember right, but that caused some sort of issue, which escapes me, with Google's service and API limitations. So this isn't really a matter of CloudDrive's features, but those supported by the provider you're using.
  5. You'd have to figure out which drive is contained in which folder on your drive. If you open the technical details in CloudDrive, and open the drive details window, the "Uid" under the "Overview" heading will correspond to the folder name on your Google Drive. You'll have to restore EVERYTHING under that folder.
  6. You should consider that if it DOESN'T work, though, it may render the entire drive unrecoverable even using photorec or recuva. Once that data is missing from Google's servers, there's no getting it back.
  7. Now that's an interesting possibility. Maybe? Sure. Maybe. You'd want to detach the CloudDrive first, probably. It might be worth a shot. The Google outage was March 13th, so a date before that would be your best shot. If this works, it would definitely help to confirm that some sort of partial rollback is the cause of this issue.
  8. well that's odd. There are other options, like photorec. Try that instead.
  9. Yes...with the caveat that it didn't prevent the google corruption that happened last month even if people used multiple accounts. The problem appears to be that Google rolled back data to an older version of some of the files. This is obviously fine for the actual file data itself, since that doesn't really change. But the chunks containing the filesystem data DO change. Often. So everybody's file systems were corrupted. If you mirror the pool to another pool that is on another account, and google has a similar issue, both pools will both be being modified basically simultaneously, and both pools would be corrupted if they did another rollback. It would actually be better to mirror it to an entirely different provider, or to mirror it locally.
  10. You can just go ahead and use recuva. It's going to scan the drive sector by sector, so it doesn't matter if the file system is screwed.
  11. I'm afraid I don't have good news for you... I did all of the research I could, and, as far as I could tell, that just means the drive is borked. That error would usually indicate a failing hard drive, but that's obviously not the case here. It's just unrecoverable corruption. The data on the drive is probably recoverable with recuva. I could recover mine that way, at least. Ultimately, though, I didn't have anything irreplaceable on my drive, so I just opted to wipe it and start over rather than go through and rename everything. Any files recovered will have arbitrary names. That data on the drive should be fine, though, even though the file system is trashed--if you have anything important.
  12. Nobody in the community is 100% positive why this is happening (including Alex and Christopher). Christopher has said that the only way this *should* be able to happen is if the chunks were modified in the cloud. Google had a pretty significant service outage on March 13th, and we started seeing reports of these corruption issues on the forum immediately after that. My best personal theory is that whatever Google's issue was, they did a rollback, and restored older versions of some of the chunks in the cloud. This would obviously corrupt a CloudDrive drive. The above post covers the only known process with a chance of recovery, but outcomes have not, unfortunately, been great. I too, did not notice any corruption at first, but, after noticing that files began disappearing over time, also ultimately simply wiped a 250TB CloudDrive and started over. The above process did not work for me, and, in fact, caused volumes to show up RAW--but, to be clear, they were losing files before that, and were obviously corrupt. This process did not *cause* the corruption.
  13. To be clear: None of those show how much of the drive is actually used by data in the same way that the system sees it. CloudDrive simply cannot tell you that. For some providers, like the local disk provider, all of the chunks are created when the drive is created--so "Cloud Unused" isn't a relevant piece of data. It creates the chunks to represent the entire drive all at once, so the amount of space you specify is also always the amount of space used on your storage device--in this case, a local drive. For some providers, like Google Drive, CloudDrive does not create all of the chunks at drive creation. It only creates the chunks and uploads them to the cloud provider when data is actually written to them by the system. But the "Cloud Used" number is not the amount of space used by the system, it's the amount of space out of the total limit of the drive that you created--and it will not drop if data is deleted from the drive. Once the chunks are uploaded, they will be modified in place if data is removed from the drive. It may be helpful to use an example. Let's say you create a 100GB drive on Google Drive, and you set a 5GB fixed cache. At first it will tell you that a few hundred MB are used on the cloud, and that the cloud unused is 99.9GB or something similar. Then let's say we put 100GB of data on the drive. The local amount will basically be our cache, or 5GB, the cloud used will now be 100GB, and the cloud unused will be some number very close to 0. Then let's say we delete 50GB of data from the drive. The local will still probably be 5GB, the cloud used will still be 100GB, and the cloud unused will still be something close to 0. Why? Because all of those chunks are already created, and they still exist on the cloud provider. CloudDrive doesn't know if those chunks contain available space or not. CloudDrive just knows that there are X number of chunks representing Y amount of space stored on the cloud provider. Your system is what knows whether or not that data is available for use--because it's the OS that manages the file system that tracks that data. Windows Explorer will report 50GB free on the 100GB drive, but CloudDrive will still be reporting the space that it has used and available on the provider. Note: Chunks are not removed from the provider once they have been created unless you resize the drive itself, because there isn't any reason to remove them unless you actually change the size of the drive.
  14. Again, CloudDrive itself has no way to know that. You specify a size because that is the amount of space that CloudDrive creates in order to store things, but whether or not space is available for USE is simply not something that the drive infrastructure is aware of. The file system, at the OS level, handles that information. You can always, of course, simply open up Windows Explorer to see how much space is available on your drive. But at the level at which CloudDrive operates, that information simply is not available. Furthermore, the drive size can contain multiple volumes--so it can't really just look at some particular volume and compare it to the amount of data on the disk, even if the amount of data on the disk WERE representative of the amount of space available for new information. Which, again, it is not--because of how NTFS works. It would have to look at ALL volumes on your drive and compare them to the maximum size, and even knowing what volumes are on what drives requires access to the file system which it, again, does not have. You're talking about adding entirely new levels of infrastructure to the application to accomplish something that can be accomplished by looking at ANY other disk tool in Windows. Simply looking at Windows Explorer, Disk Management, or Resource Monitor can provide you with volume usage information. The charts provided in CloudDrive are for the purpose of monitoring the usage on the provider, not the drive. Other tools exist for that, but no other tool exists to provide information about the provider usage.
  15. I think there might be a fundamental misunderstanding of how CloudDrive operates here. Christopher can correct me if I'm wrong, but my understanding is that CloudDrive, as an application, is simply not aware of the filesystem usage on the drive. Think of the CloudDrive software as analogous to the firmware that operates a hard drive. It might be able to tell you if a particular portion of the drive has been written to at least once, but it can't tell you how much space is available on the drive because it simply doesn't operate at that level. In order for CloudDrive, or your HDD's firmware, to be able to tell you how much space is available for a particular purpose, it would have to somehow communicate with the file system--and neither does that. DrivePool, on the other hand, does. It operates at a file system level, and, as such, is aware of how much of the space on the disk is actually currently in use. Another way to consider it is this: NTFS does not generally modify data on delete. So if you delete a file, NTFS simply marks that file as deleted and remembers that the space used by that file can now be used for new data. As far as the drive is concerned, nothing has changed in that area of the drive, but NTFS still considers it available. If that makes sense. So from the drive's perspective, that space is still used, even though the system doesn't actually look at it that way. This is one of the distinctions between a tool like CloudDrive, which operates at a block-based level like a local disk drive, and a tool like Google File Stream or rClone, which operate at a file-based level, and are aware of the file system itself.
  16. To my knowledge, Google does not throttle bandwidth at all, no. But they do have the upload limit of 750GB/day, which means that a large number of upload threads is relatively pointless if you're constantly uploading large amounts of data. It's pretty easy to hit 75mbps or so with only 2 or 3 upload threads, and anything more than that will exceed Google's upload limit anyway. If you *know* that you're uploading less than 750GB that day anyway, though, you could theoretically get several hundred mbps performance out of 10 threads. So it's sort of situational. Many of us do use servers with 1gbps synchronous pipes, in any case, so there is a performance benefit to more threads...at least in the short term. But, ultimately, I'm mostly just interested in understanding the technical details from Christopher so that I can experiment and tweak. I just feel like I have a fundamental misunderstanding of how the API limits work.
  17. Out of curiosity, does Google set different limits for the upload and download threads in the API? I've always assumed that since I see throttling around 12-15 threads in one direction, that the total number of threads in both directions needed to be less than that. Are you saying it should be fine with 10 in each direction even though 20 in one direction would get throttled?
  18. Glad to see an official response on this. Christopher, are you able to provide a quick explanation of *why* that process would help? What exactly is going on with these RAW errors, and can they be prevented in case of a Google outage in the future? Would turning on file upload verification help?
  19. What result did chkdsk give you? Does it report that the volume is fine? Or is it giving you some other error? Also open an actual support ticket here: https://stablebit.com/Contact And run the troubleshooter and attach your support ticket number after you submit that request. The troubleshooter is located here: http://wiki.covecube.com/StableBit_Troubleshooter This is probably a result from Google's issues a few weeks ago, but different people are experiencing different levels of corruption from that. So we'll need to figure out your specific situation to get a solution--if one exists.
  20. It won't really limit your ability to upload larger amounts of data, it just throttles writes to the drive when the cache drive fills up. So if you have 150GB of local disk space on the cache drive, but you copy 200GB of data to it, the first roughly 145GB of data will copy at essentially full speed, as if you're just copying from one local drive to another, and then it will throttle the drive writes so that the last 55GB of data will slowly copy to the CloudDrive drive as chunks are uploaded from your local cache to the cloud provider. Long story short: it isn't a problem unless high speeds are a concern. As long as you're fine copying data at roughly the speed of your upload, it will work fine no matter how much data you're writing to the CloudDrive drive.
  21. SSD. Disk usage for the cache, particularly with a larger drive, can be heavy. I always suggest an SSD cache drive. You'll definitely notice a significant impact. Aside from upload space, most drives don't need or generally benefit from a cache larger than 50-100GB or so. You'll definitely get diminishing returns with anything larger than that. So speed is far more important than size.
  22. I'm not sure why it would need to reindex all of the files on the drive like that, but if it does, indeed, need to search everything once, you could probably use an application like WinDirStat to do it all in one go. It'll touch every file on the drive in a few minutes.
  23. No, you can only throttle on a per-drive basis.
  24. That's just a warning. You thread count is a bit too high, and you're probably getting throttled. Google only allows around 15 simultaneous threads at a time. Try dropping your upload threads to 5 and keeping your download threads where they are. That warning will probably go away. Ultimately, though, even temporary network hiccups can occasionally cause those warnings. So it might also be nothing. It's only something to worry about if it happens regularly and frequently.
  25. srcrist

    Data corrupted..?

    Right. That is how chkdsk works. It repairs the corrupted volume information and will discard entries if it needs to. Now you have a healthy volume, but you need to recover the files if you can. That is a separate process. It's important to understand how your file system works if you're going to be managing terabytes of data in the cloud. The alternative would have been operating with an unhealthy volume and continuing to corrupt data every time you wrote to the drive. Here is some additional information that may help you: https://www.minitool.com/data-recovery/recover-data-after-chkdsk.html
×
×
  • Create New...