Jump to content
Covecube Inc.

srcrist

Members
  • Content Count

    466
  • Joined

  • Last visited

  • Days Won

    34

Posts posted by srcrist

  1. 25 minutes ago, darkly said:

    As I stated in my first comment, there is NO documentation about how existing data is handled when features like this which affect things at a large scope are enabled, and that's really concerning when you're familiar with the type of problems that you can run into with CloudDrive if you're not careful with things like the upload cap, available space on the cache, and R/W bottlenecks on the cache drive. As someone who has lost many terabytes of data due to this, I am understandably reluctant to touch a feature like this which could actually help me on the long run, because I don't know what it does NOW.

    To be clear: there is documentation on this feature in the change log. 

  2. The change log seems to suggest that enabling it on a drive with existing data will only impact new data written to the drive:

    .1121
    * Added an option to enable or disable data duplication for existing cloud drives (Manage Drive -> Data duplication...).
        - Any new data written to the cloud drive after duplication was enabled will be stored twice at the storage provider.
        - Existing data on the drive that is not overwritten will continue to be stored once.
    .1118
    * Added an option to enable data duplication when creating a new cloud drive.
        - Data duplication stores your data twice at the storage provider.
        - It consumes twice the upload bandwidth and twice the storage space at the provider.
        - In case of data corruption or loss of the primary data blocks, the secondary blocks will be used to provide redundancy 
          for read operations.

    So it should not impact the cache drive in any immediate sense. CloudDrive is generally smarter than that about the cache though. I would expect it to throttle writes to the cache as it was processing the data, as it does with large volume copies from other sources like DrivePool. Large writes from other sources do not corrupt or dismount the drive. It simply throttles the writes until space is available. A moot point, in any case, as it will not duplicate your existing chunks unless you manually download and reupload the data. 

  3. CloudDrive duplication is block-level duplication. It makes several copies of the chunks that contain your file system data (basically everything that gets "pinned"). If any of those chunks are then detected as corrupt or inaccessible, it will use one of the redundant chunks to access your file system data, and then repair the redundancy with the valid copy. 

    DrivePool duplication is file-level duplication. It will make however many copies of whatever data you specify, and arrange those copies throughout your pool as you specify. DrivePool duplication is very customizable. You have full control over where and when it duplicates your data. If you want it to duplicate your entire pool, that is a setting you control. As is whether or not it does so at regular intervals, or immediately. That's all up to your balancer settings in DrivePool. 

    Despite the name similarity, their functionality really has nothing to do with one another. CloudDrive's duplication makes copies of very specific chunks on your cloud provider. It doesn't have anything to do with duplicating your actual data. It's intended to prevent corruption from arbitrary rollbacks and data loss on the provider's part, like we saw back in March and June of last year.

    EDIT: It slipped my mind that full duplication can also be enabled in CloudDrive. This is still block-level duplication on your cloud provider. Rather than using one chunk for each chunk, it would use two. For the same purpose mentioned above. If one chunk is corrupt or unavailable, it will use the other and repair the redundancy. Net effect being that 100GB of data on your cloud storage would take up 200GB worth of chunks, of course, and also twice the upload time per byte. You would still only see ONE copy of the data in your file system, though.

  4. CloudDrive itself does not support Team Drives because their API access is different. But DrivePool can certainly pool multiple CloudDrive volumes together. It can pool any volume your system can access.

    But CloudDrive will not work to use Team Drives to evade the Google upload limitations, if that was your intention. There is some news that they are banning accounts for doing so, as well. Just FYI. See here: https://old.reddit.com/r/DataHoarder/comments/emuu9l/google_gsuit_whats_it_like_and_is_it_still_worth/fdw1cri/

    I also want to add that the other solutions are not immune to the data loss issue any more than CloudDrive, it's just that data loss can manifest differently. If a file is corrupted on Google's end with rClone or Netdrive, you lose that file. If a chunk containing CloudDrive's file system is corrupted, you can lose access to the file system itself, and have to rebuild it. Ultimately, though, nobody should ever use cloud storage for any data that they consider to be irreplaceable without backups. 

  5. OK. So, there is a lot here, so let's unpack this one step at a time. I'm reading some fundamental confusion here, so I want to make sure to clear it up before you take any additional steps forward.

    3 hours ago, SirAce135 said:

    1. Can I complete this migration without having to reupload everything?

    Starting here, which is very important: It's critical that you understand the distinction in methodology between something like Netdrive and CloudDrive, as a solution. Netdrive and rClone and their cousins are file-based solutions that effectively operate as frontends for Google's Drive API. They upload local files to Drive as files on Drive, and those files are then accessible from your Drive--whether online via the web interface, or via the tools themselves. That means that if you use Netdrive to upload a 100MB File1.bin, you'll have a 100MB file called File1.bin on your Google drive that is identical to the one you uploaded. Some solutions, like rClone, may upload the file with an obfuscated file name like Xf7f3g.bin, and even apply encryption to the file as it's being uploaded, and decryption when it is retrieved. But they are still uploading the entire file, as a file, using Google's API.

    If you understand all of that, then understand that CloudDrive does not operate the same way.

    CloudDrive is not a frontend for Google's API. CloudDrive creates a drive image, breaks that up into hundreds, thousands, or even millions of chunks, and then uses Google's infrastructure and API to upload those chunks to your cloud storage. This means that if you use CloudDrive to store a 100MB file called File1.bin, you'll actually have some number of chunks (depending on your configured chunk size) called something like XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX-chunk-1, XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX-chunk-2, XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX-chunk-3, etc, as well as a bunch of metadata that CloudDrive uses to access and modify the data on your drive. Note that these "files" do not correspond to the file size, type, or name that you uploaded, and cannot be accessed outside of CloudDrive in any meaningful way. CloudDrive actually stores your content as blocks, just like a physical hard drive, and then stores chunks of those blocks on your cloud storage. Though it can accomplish similar ends to Netdrive, rClone, or any related piece of software, its actual method of doing so is very different in important ways for you to understand.

    So what, exactly, does this mean for you? It means, for starters, that you cannot simply use CloudDrive to access information that is already located in your cloud storage. CloudDrive only accesses information that has been converted to the format that it uses to store data, and CloudDrive's format cannot be accessed by other applications (or Google themselves). Any data that you'd like to migrate from your existing cloud storage to a CloudDrive volume must be downloaded and moved to the CloudDrive volume just as it would need to be if you were to migrate the data to a new physical drive on your machine--for the same reasons.

    It may be helpful to think of CloudDrive as a virtual machine drive image. It's the same general concept. Just as you would have to copy data within the virtual machine in order to move data to the VM image, you'll have to copy data within CloudDrive to move it to your CloudDrive volume.

    There are both benefits and drawbacks to using this approach:

    Benefits  

    • CloudDrive is, in my experience, faster than rClone and its cousins. Particularly around the area of jumping to granular data locations, as you would for, say, jumping to a specific location in a media file.
    • CloudDrive stores an actual file system in the cloud, and that file system can be repaired and maintained just like one located on a physical drive. Tools like chkdsk and windows' own in-built indexing systems function on a CloudDrive volume just as they will on your local drive volumes. In your case this means that Plex's library scans will take seconds, and will not lock you out of Google's API limitations.
    • CloudDrive's block-based storage means that it can modify portions of files in-place, without downloading the entire file and reuploading it.
    • CloudDrive's cache is vastly more intelligent than those implemented by file-based solutions, and is capable of, for example, storing the most frequently accessed chunks of data, such as those containing the metadata information in media files, rather than whole media files. This, like the above, also translates to faster access times and searches.
    • CloudDrive's block-based solution allows for a level of encryption and data security that other solutions simply cannot match. Data is completely AES encrypted before it is ever even written to the cache, and not even Covecube themselves can access the data without your key. Neither your cloud provider, nor unauthorized users and administrators for your organization, can access your data without consent. 

    Drawbacks (read carefully) 

    • CloudDrive's use of an actual file system also introduces vulnerabilities that file-based solutions do not have. If the file system data itself becomes corrupted on your storage, it can affect your ability to access the entire drive--in the same way that a corrupted file system can cause data loss on a physical drive as well. The most common sorts of corruption can be repaired with tools like chkdsk, but there have been incidents caused by Google's infrastructure that have caused massive data loss for CloudDrive users in the past--and there may be more in the future, though CloudDrive has implemented redundancies and checks to prevent them going forward. Note that tools like testdisk and recuva can be used on a CloudDrive volume just as they can on a physical volume in order to recover corrupt data, but this process is very tedious and generally only worth using for genuinely critical and irreplaceable data. I don't personally consider media files to be critical or irreplaceable, but each user must consider their own risk tolerance.
    • A CloudDrive volume is not accessible without CloudDrive. Your data will be locked into this ecosystem if you convert to CloudDrive as a solution. Your data will also only be accessible from one machine at a time. CloudDrive's caching system means that corruption can occur if multiple machines could access your data at once, and, as such, it will not permit the volume to be mounted by multiple instances simultaneously.
    • And, as mentioned, all data must be uploaded within the CloudDrive infrastructure to be used with CloudDrive. Your existing data will not work.

     

    So, having said all of that, before I move on to helping you with your other questions, let me know that you're still interested in moving forward with this process. I can help you with the other questions, but I'm not sure that you were on the right page with the project you were signing up for here. rClone and NetDrive both also make fine solutions for media storage, but they're actually very different beasts than CloudDrive, and it's really important to understand the distinction. Many people are not interested in the additional limitations. 

  6. I can actually confirm this bug as well. The circumstances were very straightforward: I detached the drive from the machine because I had to take it down for a hardware test. When the test was completed, the drive said that it was already attached (to the same machine I detached it from), and I had to force the mount and reindex the drive. This was about a week ago, on 1261. I do not, sadly, have any logs or records from the incident, and the drive functions as normal after the reindex.

    EDIT: I should add that attempting to force the mount once actually gave me an error about the cache directory already existing, but forcing it a second time allowed it to mount and start the indexing process.

    In any case, something does seem borked with the detach and reattach process. 

  7. If your drive is properly detached and reattached it should not have to reindex the drive. You should be able to attach it and pick up where you left off after a few minutes of pinning and synchronization. 

  8. If CloudDrive is indicating that your downstream performance is better than you're seeing for the file transfer, my first guess is that it might be drive I/O congestion. Are you, by chance, copying the data to the same drive that you're using as your CloudDrive cache? 

  9. 22 hours ago, Edward said:

    By the way where do users here get such massive cloud storage? For example Google charges over $200pm for 10tb which is way above what I would pay. 

    Some people are lucky enough to get unlimited drives through their work or school, and some people use gsuite accounts which have unlimited with more than 5 users on the domain, or 1TB with less than 5...but Google doesn't actually enforce that limit, as far as I know. 

  10. 12 hours ago, kird said:

    Raidrive/Netdrive are 2 programs that have a similar behavior to SCD, to my knowledge. I don't know if in their deeper process they have differences.

    They are not comparable products. Both applications are more similar to the popular rClone solution for linux. They are file-based solutions that effectively act as frontends for Google's API. They do not support in-place modification of data. You must download and reupload an entire file just to change a single byte. They also do not have access to genuine file system data because they do not use a genuine drive image, they simply emulate one at some level. All of the above is why you do not need to create a drive beyond mounting your cloud storage with those applications. CloudDrive's solution and implementation is more similar to a virtual machine, wherein it stores an image of the disk on your storage space.

    12 hours ago, kird said:

    as data storage of different sorts and SCD is a program that claims total support for working in the cloud in a secure manner regardless of whether you plan to store 1Mb or I don't know how many Tb

    None of this really has anything to do with this thread, but since it needs to be said (again):

    CloudDrive functions exactly as advertised, and it's certainly plenty secure. But it, like all cloud solutions, is vulnerable to modifications of data at the provider. Security and reliability are two different things. And, in some cases, is more vulnerable because some of that data on your provider is the file system data for the drive. Google's service disruptions back in March caused it to return revisions of the chunks containing the file system data that were stale (read: had been updated since the revision that was returned). This probably happened because Google had to roll back some of their storage for one reason or another. We don't really know. This is completely undocumented behavior on Google's part. These pieces were cryptographically signed as authentic CloudDrive chunks, which means they passed CloudDrive verifications, but they were old revisions of the chunks that corrupted the file system.

    This is not a problem that would be unique to CloudDrive, but it is a problem that CloudDrive is uniquely sensitive to. Those other applications you mentioned do not store file system data on your provider at all. It is entirely possible that Google reverted files from those applications during their outage, but it would not have resulted in a corrupt drive, it would simply have erased any changes made to those particular files since the stale revisions were uploaded. Since those applications are also not constantly accessing said data like CloudDrive is, it's entirely possible that some portion of the storage of their users is, in fact, corrupted, but nobody would even notice until they tried to access it. And, with 100TB or more, that could be a very long time--if ever. 

    Note that while some people, including myself, had volumes corrupted by Google's outage, none of the actual file data was lost any more than it would have been with another application. All of the data was accessible (and recoverable) with volume repair applications like testdisk and recuva. But it simply wasn't worth the effort to rebuild the volumes rather than just discard the data and rebuild, because it was expendable data. But genuinely irreplaceable data could be recovered, so it isn't even really accurate to call it data loss. 

    This is not a problem with a solution that can be implemented on the software side. At least not without throwing out CloudDrive's intended functionality wholesale and making it operate exactly like the dozen or so other Google API frontends that are already on the market, or storing an exact local mirror of all of your data on an array of physical drives. In which case, what's the point? It is, frankly, not a problem that we will hopefully ever have to deal with again, presuming Google has learned their own lessons from their service failure. But it's still a teachable lesson in the sense that any data stored on the provider is still at the mercy of the provider's functionality and there isn't anything to be done about that. So, your options are to either a) only store data that you can afford to lose or b) take steps to backup your data to account for losses at the provider. There isn't anything CloudDrive can do to account for that for you. They've taken some steps to add additional redundancy to the file system data and track chksum values in a local database to detect a provider that returns authentic but stale data, but there is no guarantee that either of those things will actually prevent corruption from a similar outage in the future, and nobody should operate based on the assumption that they will. 

    The size of the drive is certainly irrelevant to CloudDrive and its operation, but it seems to be relevant to the users who are devastated about their losses. If you choose to store 100+ TB of data that you consider to be irreplaceable on cloud storage, that is a poor decision. Not because of CloudDrive, but because that's a lot of ostensibly important data to trust to something that is fundamentally and unavoidably unreliable. Contrarily, if you can accept some level of risk in order to store hundreds of terabytes of expendable data at an extremely low cost, then this seems like a great way to do it. But it's up to each individual user to determine what functionality/risk tradeoff they're willing to accept for some arbitrary amount of data. If you want to mitigate volume corruption then you can do so with something like rClone, at a functionality cost. If you want the additional functionality, CloudDrive is here as well, at the cost of some degree of risk. But either way, your data will still be at the mercy of your provider--and neither you nor your application of choice have any control over that.

    If Google decided to pull all developer APIs tomorrow or shut down drive completely, like Amazon did a year or two ago, your data would be gone and you couldn't do anything about it. And that is a risk you will have to accept if you want cheap cloud storage. 

  11. Doublecheck that it's correctly added to the subpool. I can't see that it is from your screenshots, and that folder should be created as soon as it's added to the pool. If it does look like it's correctly added, and the folder still does not exist, I would just remove it and re-add it to the subpool and see if that causes it to be created. Beyond that, you'd have to open an actual support ticket, because I'm not sure why it wouldn't be created when the drive is added. 

  12. 3 hours ago, kird said:

    Thanks srcrist for your knowledge , I respect everything you've said about data protection, but I don't agree at all, will not argue anything since it is obvious that data loss incidents are happening only with users or almost exclusively who are scd customers

     

    That is just a passive aggressive way of arguing that I am wrong, and that the data loss issues are a solvable problem for Covecube. Neither of which are correct. I'm sorry.

    The reasons that the data loss are experienced on CloudDrive and not other solutions are related to how CloudDrive operates by design. It is a consequence of CloudDrive storing blocks of an actual disk image with a fully functional file system and, as such,  being more sensitive to revisions than something like rClone which simply uploads whole files. This has been explained multiple times by Christopher and Alex and it makes perfect sense if you understand both how a file system operates, and how CloudDrive is intended to operate as a product. If anyone is not able to accept the additional sensitivities of a block-based cloud storage solution then, again, simply do not use it. rClone or something similar may very well better fit your needs. I don't think Covecube were ever intending this product to serve users who want to use it to store abusive amounts of media on consumer grade cloud storage. It works for that purpose, but it is not the intended function. And removing the functionality that is responsible for these sensitivities also eliminates the intended functionality of a block-based solution. Namely, in-place read and write modifiability of cloud data. And CloudDrive is, to my knowledge, still the only product on the market with such capability.

    But I would never use any other cloud solution for hundreds of TB of irreplaceable data either. There is simply no way that is an intelligent solution, and anyone who is doing it is, frankly, begging for inevitable catastrophe.

    3 hours ago, kird said:

    On the subject of the Api, I don't quite understand the mechanism that you say is by application and not by account even though these are two totally different accounts as they are different domains despite being gdrive. So simply generating an API for one of the two accounts and editing the .json with the data would be enough?

    As was explained in the other thread, an API key is not to access the data on a given account. It is a key for an application to request data from Google's services. A single API key can request data from any account that authorizes said access; as evidenced by the fact that Covecube's default API key, which was obviously created from the developer's Google account, can access the data on your Google Drive. You can use an API key that is, in fact, requested by an account completely unrelated to any account that any data is actually stored on for CloudDrive.

    It should be noted that Alex again removed Google Drive from the experimental providers in .1425, though, as it appears that Google approved their quota limit expansion after some delay. So all of this is moot, if you don't want to change the key. 

  13. You do not need two separate API keys to access multiple drives. And it does not negatively impact security in the least, unless you are using CloudDrive to access someone else's data. API keys are for applications, not accounts.

    5 hours ago, kird said:

    Regarding the backup, it doesn't make any sense, people who have more than 100 Tb in their accounts, how do you want them to have more copies spread over other accounts?

    Perhaps do not store 100TB of irreplaceable data on a consumer grade cloud storage account? But, otherwise, yes. Other accounts with redundancy would be a good first step. 

    5 hours ago, kird said:

    I assure you that if you don't use SCD and you save your data in gdrive accounts you do have the integrity of your data safeguarded.

    I assure you that it is not. Google does not have a data integrity SLA for Drive at all. It simply does not exist. Google does an admirable job of maintaining data integrity, but we've already seen two issues where they lost users' data. It will happen again, and Drive users cannot do anything about it. If you don't have the space to backup your data, and you care about that data, then you're storing too much data. Period. The real question isn't, "how am I supposed to back up 100TB," it's, "why are you storing 100TB of data, that you do not consider to be expendable, in the cloud, that you cannot back up?" That's on you, as the user.

    5 hours ago, kird said:

    Please, I hope the developers will get their act together and start implementing a program where the security of their clients' data is certified.

    There is absolutely nothing--and I mean nothing whatsoever--that the developers of CloudDrive can do to "certify" the integrity and security of your data that they are not already doing. CloudDrive uses end-to-end, enterprise grade encryption for the data, and has integrity verification built-in at multiple points. And yet cloud storage is still cloud storage...and your data is (and will always be) vulnerable to loss by any cloud storage provider that you choose. And there is nothing they can do about that. 

    If that is not a level of risk that you are comfortable taking on...do not use cloud storage for your data, with CloudDrive or any other similar solution.

  14. You want all of your applications now pointing at the hybrid pool. Once the data is correctly moved, the data will appear identically to how it appeared before you nested the pool. The structure of the underlying pool is, as always, transparent at the file system level to applications. Your sub pool(s) do not even need drive letters/mount points, FYI. You can simply give the hybrid pool the localpool mount point. Which, in your case, appears to be P:

    To move your data, here is the process:

    So let's say you have a hybrid pool (O:), consisting of a localpool (P:) which contains drives D:, E:, F:, and G: and a cloudpool (M:) containing a single cloud drive (we'll just say H:). Right now, if you look at your actual drive file systems you'll have a poolpart folder containing another poolpart folder. That is, D:, E:, F:, G:, and H: all have a hidden poolpart folder in root containing a second poolpart folder. All of the data on each drive that you want to be accessible to the pool needs to be moved into the second poolpart folder on that drive.

    So, right now, for example, you probably have G:\Poolpart-XXXX\Poolpart-YYYY\ and G:\Poolpart-XXXX\<a bunch of other stuff> within your poolpart folder on that drive. All of the <a bunch of other stuff> simply needs to be cut and pasted to move it to the Poolpart-YYYY folder instead of Poolpart-XXXX. It will then be accessible at O:, with an identical structure to how it is presently accessible via P:. Note that Poolpart-XXXX represents localpool (P:) and Poolpart-YYYY represents hybridpool (O:), in this example. Each level of nesting actually represents one pool level above the previous. Thus, the master pool of any given hierarchy will be contained in the deepest nested poolpart folder.

    You will repeat this movement process for each individual logical volume you are including in the pool. That is, E:\Poolpart-XXXX\Poolpart-YYYY, D:\Poolpart-XXXX\Poolpart-YYYY, etc, etc. Just move everything on the drive to the corresponding poolpart-YYYY folder on the same drive. Then restart the service and remeasure the hybrid pool and it will all be within the pool. 

  15. Once you've created the nested pool, you'll need to move all of the existing data into the poolpart hidden folder within the outer poolpart hidden folder before it will be accessible from the pool. It's the same process that you need to complete if you simply added a drive to a non-nested pool that already had data on it. If you want the data to be accessible within the pool, you'll have to move the data into the pool structure. Right now you should have drives with a hidden poolpart folder and all of the other data on the drive within your subpool. You need to take all of that other data and simply move it within the hidden folder. See this older thread for a similar situation: https://community.covecube.com/index.php?/topic/4040-data-now-showing-in-hierarchical-pool/&sortby=date

     

  16. It looks like Google was uh...less than communicative about the verification requirement changes. I've seen that error from a few other large apps this week like Cozi as well. They're really cracking down, and the verification process can apparently cost upwards of $15,000. IFTT apparently reduced their gmail support because of it when they rolled out similar requirements for gmail auth back in March. 

    The good news for us is that you should be able to get around it by using your own API key with CloudDrive. See the other thread for discussion about doing so. 

  17. You can specify what specific folders and files you want to duplicate with the duplication and balancing options in drivepool. But a nested drivepool setup is still what you'd want to use. 

    You want a pool that contains your existing pool and the cloud storage, and then configure the duplication at that level. 

  18. Yeah, the only reason I know to do that is that once upon a time Amazon Cloud Drive was marked experimental and that was the only way to access that as well. Providers are moved to experimental in order to purposely hide them from general users. Either because they are problematic, or because there are more technical steps to use them than others. 

  19. It sounds like you'd want a nested pool setup. That is, you'll want a pool that contains your existing pool and the CloudDrive (or a pool of cloud drives, if necessary), and then you can enable duplication between the pool and the drive, or the pool and pool of drives (depending on your needs). That will duplicate your entire existing pool to the cloud. 

  20. You could use a more traditional file encryption tool to encrypt the files that are on your drive, if you wanted. Though the net effect is that all of the data will need to be downloaded, encrypted, and then uploaded again in its new encrypted format. That's really true no matter what method you used. Even if you could, hypothetically, encrypt a CloudDrive drive after creation, it would still need to download each chunk, encrypt it, and then upload it again.

    There is no way to encrypt the chunks stored on your provider after drive creation, though. 

    You do not need to use the same key. It will prompt you for a key for each drive when you attach the drive. 

  21. Since Google Drive is now marked as experimental, you'll have to enable experimental providers under the troubleshooting options at the top of the CloudDrive UI in order to see it. 

    Google Cloud Storage is Google's enterprise storage provider, so that won't work for Google Drive. 

×
×
  • Create New...