Jump to content

How the StableBit CloudDrive Cache works


Christopher (Drashna)

Recommended Posts

This information comes from Alex, in a ticket dealing with the Cache. Since he posted a nice long bit of information about it, I've posted it here, for anyone curious on the details of *how* it works. 
 
 
What happens when the disk used for the local cache is being used? What happens when uploading? 
 
  • The Upload cache and the Download cache and pinned data are seperate things and treated as such in the code
    • Download cache is self learning, and tries to keep frequently accessed blocks of data on the local system. This speeds up access of the drive, for well... frequently accessed data. 
    • Pinned data is generally drive and file system objects for the drive. This is kept (pinned to) the system, because this will be accessed frequently (written every time you create/modify/delete a file, and is accessed every time you read file properties. 
    • Upload cache is everything written to the disk and may not be uploaded to the provider yet.  This is explained a bit more in detail below. 
 
Upload cache can (is allowed) to exceed the Cache limit specified.  This is does, because otherwise will prevent files from being written. We could limit it to the specified size, but that would wipe out the self learning feature of the download cache. So... not ideal.
 
We do plan on implementing a maximum limit in the future, but for now, the max limit is the drive size.
However, we do throttle based on free space remaining (we will always attempt to leave 5GBs free on the drive). 
 

* [D] Changed the criteria for when accelerated cache trimming kicks in. This allows for local write speeds to approach about 5 GB/s and still not overrun the cache.
    - >= 50 GB free space: Cache is trimmed every 10 seconds.
                           (theoretical maximum write speed 5 GB/s)
    -  < 50 GB free space: Cache will be trimmed every 1 second.
   (-  <  6 GB free space: Write throttling begins.)
    -  <  5 GB free space: All data will be unpinned, and cache will be trimmed every 1 second to maintain 5 GB free, if possible.
                           (theoretical maximum write speed 6 GB/s)
 
As you can see, we get more aggressive the closer to running out of disk space we get.  
 
 
Now, for data integrity issues, because that is always a concern for us. Despite any issues that may be brought up here, we do everything we can to ensure data integrity. That's our number 1 priority. 
 
 
This is what happens when you write something to the cloud drive:
  • All writes to the cloud drive go directly to the local cache file. This happens directly in the kernel and is very fast.
    • If the drive is encrypted, it's encrypted during this part. Anything stored on disks will be encrypted.
  • At this point, your cache has some updated data that your cloud provider doesn't.
  • The StableBit CloudDrive service is notified that some data in the cache is in the "To Upload" state, and need to be uploaded to the cloud provider.
  • The service reads that data from the cache, uploads it to your provider and only once it's absolutely sure that the data has been uploaded correctly, will it tell the cloud drive kernel driver that it's safe to remove that data from the "To Upload" state.
  • Now, in reality it can get much more complicated, because you can say, what happens if more new data gets written to the parts that are actively being uploaded, so this can get really complicated, really fast. But that's all handled by our driver, so let's just keep this simplistic view for this example.
So what happens when you pull the plug in the middle of this process?
  • StableBit CloudDrive loses  the "To Upload" state. Well, it doesn't really lose it, but it's in an indeterminate state, and we can't trust it any longer.
  • In order to recover from this, StableBit CloudDrive assumes that all locally cached data was not uploaded to the provider.
  • It is safe to make this assumption because uploading something that has already been uploaded before doesn't corrupt the drive as a whole in the cloud. While, not uploading something that needed to be uploaded would be catastrophic. Because that would mean that your data in the cloud would get "out of sync" with your local cloud drive cache and would get corrupted.
Now up to this point I've described how StableBit CloudDrive maintains the data integrity of the bits on its drive, but that's another very important factor to consider here, and that's the file system.
 
The file system runs on top of the drive and, among other things, makes sure that the file metadata doesn't get corrupted if there's a power outage. The metadata is the part of the data on the disk that describes where the files are stored, and how directories are organized, so it's critically important.
 
This metadata is under the control of your file system (e.g. NTFS, ReFS, etc...). NTFS is designed to be resilient in the case of sudden power loss. It guarantees that the metadata always remains consistent (by performing repairs on it after a power loss). At least that's the theory. When this fails, that's when you need to run chkdsk.
 
But what it doesn't guarantee, is that the file data itself remains consistent after a power loss.
 
So there's that to consider. Also, Windows will cache data in memory. Even after you finish copying a file, Windows will not write the entire contents of that file to disk immediately. It will keep it in the cache and write out that data over time. If you look in the Resource Monitor under the Memory tab you may see some orange blocks in the chart, that memory is called "Modified". This is essentially the amount of data that is waiting to be written to the disk from the cache (and memory mapped files).
Link to comment
Share on other sites

  • 9 months later...
  • 1 year later...

I really don't understand why the "to upload" state becomes indeterminate for the entire write cache.  Shouldn't it only have to re-upload chunks that it didn't record as being completed?  Why is a chunk not treated akin to a block in a journaling filesystem?  Of course I understand that if chunks are 100MB in size, it could still take some time to write them, but no way should the entire cache be invalidated upon a crash.

 

This is especially important for me right now because I've got my system locking up on a not infrequent basis that requires me to hard reset (plus the occasional BSOD).  A 200G cache on a 10TB drive (100/5 Mbps d/u) always takes 45+ minutes to recover.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...