Jump to content
Covecube Inc.

srcrist

Members
  • Content Count

    247
  • Joined

  • Last visited

  • Days Won

    16

Everything posted by srcrist

  1. EDIT: Never mind. I understand what you're looking at now and where you're getting those numbers.
  2. Sorta. But I think you might have it backwards. You wouldn't back up the data that sits on the provider to other providers, because the drive structure changes somewhat by provider. You would, instead, mount multiple providers with CloudDrive and use a tool like DrivePool to mirror the volumes to one another.
  3. A 2500-2800ms response time to google is almost *certainly* a network issue on your part, and not one that needs to be considered as a general concern. It should not take that long to send an internet packet around the entire planet, and, if you're seeing that sort of latency, you probably have bad hardware between you and your destination. A round trip to a satellite, which is the lengthiest hop in existence, is only 500ms plus the terrestrial hops. So unless you are, for some reason, beaming data to space and back several times, 2500-2800ms is simply a broken connection. I think it's important for you to consider just how unrepresentative that is, of the typical internet user, when considering whether or not someone is speaking to your issues specifically, or when requesting that changes be made to accommodate your situation. You're talking about latency that exceeds 5x even the typical satellite internet connection. The number of people dealing with such a situation are a fraction of a fraction of a percent. Regardless of your situation, though, Google themselves have a hard limit of 750gb/day. A single thread, for most users, can hit that limit. In any case, my above response is perfectly compatible with slower response times. It's very easy to hit the maximum 70-75 mbps. No matter how many threads are taken up by reads, you can always add additional threads to make up for the read threads. If there are two read threads outstanding, you add two more for uploads. If there are 4 read threads outstanding, you add 4. If your responses are that slow, you won't get throttled no matter how many threads you add--which means that adding threads can always compensate for your connection delays. If you're still experiencing throughput issues, the problem is something else.
  4. No, you're right. It will use the upload thread count to download the data, so that will take up some number of slots for a time before those threads will be available for a new upload chunk. The mistake, though, is translating that to a loss of throughput, which isn't (or at least does not necessarily need to be) true. That is: you can achieve the same upload speeds with more threads, if you need to. 5 upload threads should be more than enough for most users to still max out the 70-75mbps of average data that Google allows you to upload each day, while also reading back the chunks in real-time. What *is* true is that if you theoretically only had one upload thread, it would have to use the same thread to read the data back each time, and that would slow things down somewhat. But, even for residential connections, that wouldn't generally translate to a 50% reduction in throughput, because your downstream capacity is generally a lot faster than your upstream on residential connections. So it would take a lot less time to *read* each chunk than to upload it to the provider. If your connection is symmetrical, then the question simply becomes about your overall throughput. Anything over 100 mbps or so should not generally notice a difference aside from downstream congestion as a result of the verification process. Bottom line: If you *do* suffer a performance drop with upload verification enabled, and you're working with a high-bandwidth connection (100 mbps symmetrical or more), you just need to add a few upload threads to accommodate the increased workload. As long as you can hit 70ish mbps upload, though, you're capable of maxing out Google's limitations regardless. So all of this only sort of matters in a theoretical sense.
  5. So chkdsk is not affected by the size of the *drive*. Only the size of the *volume* (aka, generally, partition). You can have multiple volumes on the SAME CloudDrive drive. So you can still expand the size the drive and create a second VOLUME smaller than 55TB or so, and chkdsk will work with it just fine. I would, in fact, recommend this over a second drive entirely, so that you don't have multiple caches. Aside from this note, DrivePool will operate identically regardless of whether or not you're using a volume on a local disk, a CloudDrive disk, or multiple CloudDrive volumes on the same disk. DrivePool does not operate at the block level, so it doesn't concern itself with the chunks. That will all still be handled at the CloudDrive level. You can customize the DrivePool settings to configure the file placement however you want. There is no impact on performance, as DrivePool simply forwards requests to the underlying drives--CloudDrive or otherwise. Chkdsk is a file system tool. So it operates on volumes, not drives. But, to fix issues, you'd use chkdsk on the volumes that underlie your pool.
  6. To be clear: this is not necessarily true. It depends on the speed of your connection. All the upload verification means is that it will download every byte that it uploads before clearing the data from the local cache. If your downstream bandwidth is at least as large as your upstream bandwidth, it should not theoretically slow down your upload speeds at all. There will just be a delay before the data is removed from local storage while it reads the data back from the provider. CloudDrive can (and will) still move on to the next chunk upload while this process is completed.
  7. I mean, this assumption is faulty, and important data should never be trusted exclusively to a single backup solution--cloud or otherwise. There is no such thing as the perfect backup solution. There is a reason that the 3-2-1 backup rule exists. 3 copies, 2 different types of media, at least 1 offsite location. If you're relying exclusively on CloudDrive or any other storage solution as your only backup option, you're going to have a bad time. Google rolled back data and corrupted many CloudDrive volumes (we think), and they may do the same in the future. Google Drive is a consumer solution, after all. It is not intended to meet the needs of sensitive business data. CloudDrive is a remarkable product and takes a great number of steps to oversee the integrity of your data, but it can't work miracles. Cloud storage problems exist, and should not be relied upon exclusively. Using CloudDrive with an enterprise provider like Google Cloud Storage, Amazon S3, or Microsoft Azure should be the first step, at a minimum, to store sensitive and important business data. Not a consumer provider like Google Drive, which is where we saw corruption last month. Linux ISOs and media files are one thing, but I wouldn't even use CloudDrive with Google Drive to store copies of family photographs that I did not have another backup for.
  8. Again, other providers *can* still use larger chunks. Please see the changelog: This was because of issue 24914, documented here. Again, this isn't really correct. The problem, as documented above, is that larger chunks results in more retrieval calls to particular chunks, thus triggering Google's download quota limitations. That is the problem that I could not remember. It was not because of concerns about the speed, and it was not a general problem with all providers. EDIT: It looks like the issue with Google Drive might be resolved with an increase in the partial read size as you discussed in this post, but the code change request for that is still incomplete. So this prerequisite still isn't met. Maybe something to follow up with Christopher and Alex about.
  9. Christopher, I'm sure you've spoken with Alex about this issue. I'm just wondering if there's been any discussion of infrastructural changes that might be able to improve the reliability of the file system data? I was wondering if, for example, CloudDrive could store a periodic local mirror of the file system data which could be restored in case of corruption? I don't know enough about NTFS and how the journal and such are stored on the drive to know if this is feasible or not. It just seems to me that almost everyone (who had an issue) saw file system corruption, but not corruption of the actual data on the drive. Which makes sense, because that data is frequently modified and is, as such, more vulnerable to inconsistencies on Google's part. So if that data could be given some sort of added redundancy...it might help to prevent future issues of this sort. Do you have any thoughts on that? Or maybe Alex could chime in? My basic thought is that I'd rather have corruption of file data for individual files, which can be replaced if necessary, than lose an entire multi-terabyte volume because the file system itself (which comprises a very small minority of the actual data on the drive) gets borked. I'd love some features to take extra care with that data.
  10. The maximum chunk size is actually a per-provider limitation. Some providers *can* use chunks larger than 20MB. During Beta, Google could use chunks as large as 100MB, if I remember right, but that caused some sort of issue, which escapes me, with Google's service and API limitations. So this isn't really a matter of CloudDrive's features, but those supported by the provider you're using.
  11. You'd have to figure out which drive is contained in which folder on your drive. If you open the technical details in CloudDrive, and open the drive details window, the "Uid" under the "Overview" heading will correspond to the folder name on your Google Drive. You'll have to restore EVERYTHING under that folder.
  12. You should consider that if it DOESN'T work, though, it may render the entire drive unrecoverable even using photorec or recuva. Once that data is missing from Google's servers, there's no getting it back.
  13. Now that's an interesting possibility. Maybe? Sure. Maybe. You'd want to detach the CloudDrive first, probably. It might be worth a shot. The Google outage was March 13th, so a date before that would be your best shot. If this works, it would definitely help to confirm that some sort of partial rollback is the cause of this issue.
  14. well that's odd. There are other options, like photorec. Try that instead.
  15. Yes...with the caveat that it didn't prevent the google corruption that happened last month even if people used multiple accounts. The problem appears to be that Google rolled back data to an older version of some of the files. This is obviously fine for the actual file data itself, since that doesn't really change. But the chunks containing the filesystem data DO change. Often. So everybody's file systems were corrupted. If you mirror the pool to another pool that is on another account, and google has a similar issue, both pools will both be being modified basically simultaneously, and both pools would be corrupted if they did another rollback. It would actually be better to mirror it to an entirely different provider, or to mirror it locally.
  16. You can just go ahead and use recuva. It's going to scan the drive sector by sector, so it doesn't matter if the file system is screwed.
  17. I'm afraid I don't have good news for you... I did all of the research I could, and, as far as I could tell, that just means the drive is borked. That error would usually indicate a failing hard drive, but that's obviously not the case here. It's just unrecoverable corruption. The data on the drive is probably recoverable with recuva. I could recover mine that way, at least. Ultimately, though, I didn't have anything irreplaceable on my drive, so I just opted to wipe it and start over rather than go through and rename everything. Any files recovered will have arbitrary names. That data on the drive should be fine, though, even though the file system is trashed--if you have anything important.
  18. Nobody in the community is 100% positive why this is happening (including Alex and Christopher). Christopher has said that the only way this *should* be able to happen is if the chunks were modified in the cloud. Google had a pretty significant service outage on March 13th, and we started seeing reports of these corruption issues on the forum immediately after that. My best personal theory is that whatever Google's issue was, they did a rollback, and restored older versions of some of the chunks in the cloud. This would obviously corrupt a CloudDrive drive. The above post covers the only known process with a chance of recovery, but outcomes have not, unfortunately, been great. I too, did not notice any corruption at first, but, after noticing that files began disappearing over time, also ultimately simply wiped a 250TB CloudDrive and started over. The above process did not work for me, and, in fact, caused volumes to show up RAW--but, to be clear, they were losing files before that, and were obviously corrupt. This process did not *cause* the corruption.
  19. To be clear: None of those show how much of the drive is actually used by data in the same way that the system sees it. CloudDrive simply cannot tell you that. For some providers, like the local disk provider, all of the chunks are created when the drive is created--so "Cloud Unused" isn't a relevant piece of data. It creates the chunks to represent the entire drive all at once, so the amount of space you specify is also always the amount of space used on your storage device--in this case, a local drive. For some providers, like Google Drive, CloudDrive does not create all of the chunks at drive creation. It only creates the chunks and uploads them to the cloud provider when data is actually written to them by the system. But the "Cloud Used" number is not the amount of space used by the system, it's the amount of space out of the total limit of the drive that you created--and it will not drop if data is deleted from the drive. Once the chunks are uploaded, they will be modified in place if data is removed from the drive. It may be helpful to use an example. Let's say you create a 100GB drive on Google Drive, and you set a 5GB fixed cache. At first it will tell you that a few hundred MB are used on the cloud, and that the cloud unused is 99.9GB or something similar. Then let's say we put 100GB of data on the drive. The local amount will basically be our cache, or 5GB, the cloud used will now be 100GB, and the cloud unused will be some number very close to 0. Then let's say we delete 50GB of data from the drive. The local will still probably be 5GB, the cloud used will still be 100GB, and the cloud unused will still be something close to 0. Why? Because all of those chunks are already created, and they still exist on the cloud provider. CloudDrive doesn't know if those chunks contain available space or not. CloudDrive just knows that there are X number of chunks representing Y amount of space stored on the cloud provider. Your system is what knows whether or not that data is available for use--because it's the OS that manages the file system that tracks that data. Windows Explorer will report 50GB free on the 100GB drive, but CloudDrive will still be reporting the space that it has used and available on the provider. Note: Chunks are not removed from the provider once they have been created unless you resize the drive itself, because there isn't any reason to remove them unless you actually change the size of the drive.
  20. Again, CloudDrive itself has no way to know that. You specify a size because that is the amount of space that CloudDrive creates in order to store things, but whether or not space is available for USE is simply not something that the drive infrastructure is aware of. The file system, at the OS level, handles that information. You can always, of course, simply open up Windows Explorer to see how much space is available on your drive. But at the level at which CloudDrive operates, that information simply is not available. Furthermore, the drive size can contain multiple volumes--so it can't really just look at some particular volume and compare it to the amount of data on the disk, even if the amount of data on the disk WERE representative of the amount of space available for new information. Which, again, it is not--because of how NTFS works. It would have to look at ALL volumes on your drive and compare them to the maximum size, and even knowing what volumes are on what drives requires access to the file system which it, again, does not have. You're talking about adding entirely new levels of infrastructure to the application to accomplish something that can be accomplished by looking at ANY other disk tool in Windows. Simply looking at Windows Explorer, Disk Management, or Resource Monitor can provide you with volume usage information. The charts provided in CloudDrive are for the purpose of monitoring the usage on the provider, not the drive. Other tools exist for that, but no other tool exists to provide information about the provider usage.
  21. I think there might be a fundamental misunderstanding of how CloudDrive operates here. Christopher can correct me if I'm wrong, but my understanding is that CloudDrive, as an application, is simply not aware of the filesystem usage on the drive. Think of the CloudDrive software as analogous to the firmware that operates a hard drive. It might be able to tell you if a particular portion of the drive has been written to at least once, but it can't tell you how much space is available on the drive because it simply doesn't operate at that level. In order for CloudDrive, or your HDD's firmware, to be able to tell you how much space is available for a particular purpose, it would have to somehow communicate with the file system--and neither does that. DrivePool, on the other hand, does. It operates at a file system level, and, as such, is aware of how much of the space on the disk is actually currently in use. Another way to consider it is this: NTFS does not generally modify data on delete. So if you delete a file, NTFS simply marks that file as deleted and remembers that the space used by that file can now be used for new data. As far as the drive is concerned, nothing has changed in that area of the drive, but NTFS still considers it available. If that makes sense. So from the drive's perspective, that space is still used, even though the system doesn't actually look at it that way. This is one of the distinctions between a tool like CloudDrive, which operates at a block-based level like a local disk drive, and a tool like Google File Stream or rClone, which operate at a file-based level, and are aware of the file system itself.
  22. To my knowledge, Google does not throttle bandwidth at all, no. But they do have the upload limit of 750GB/day, which means that a large number of upload threads is relatively pointless if you're constantly uploading large amounts of data. It's pretty easy to hit 75mbps or so with only 2 or 3 upload threads, and anything more than that will exceed Google's upload limit anyway. If you *know* that you're uploading less than 750GB that day anyway, though, you could theoretically get several hundred mbps performance out of 10 threads. So it's sort of situational. Many of us do use servers with 1gbps synchronous pipes, in any case, so there is a performance benefit to more threads...at least in the short term. But, ultimately, I'm mostly just interested in understanding the technical details from Christopher so that I can experiment and tweak. I just feel like I have a fundamental misunderstanding of how the API limits work.
  23. Out of curiosity, does Google set different limits for the upload and download threads in the API? I've always assumed that since I see throttling around 12-15 threads in one direction, that the total number of threads in both directions needed to be less than that. Are you saying it should be fine with 10 in each direction even though 20 in one direction would get throttled?
  24. Glad to see an official response on this. Christopher, are you able to provide a quick explanation of *why* that process would help? What exactly is going on with these RAW errors, and can they be prevented in case of a Google outage in the future? Would turning on file upload verification help?
  25. What result did chkdsk give you? Does it report that the volume is fine? Or is it giving you some other error? Also open an actual support ticket here: https://stablebit.com/Contact And run the troubleshooter and attach your support ticket number after you submit that request. The troubleshooter is located here: http://wiki.covecube.com/StableBit_Troubleshooter This is probably a result from Google's issues a few weeks ago, but different people are experiencing different levels of corruption from that. So we'll need to figure out your specific situation to get a solution--if one exists.
×
×
  • Create New...