Jump to content
  • 0

Local cache disk filling up despite "Fixed" cache size


Tell

Question

Hi all,

 

I've run into a situation where the local cache drive is getting filled up despite having a fixed local cache. Configuration is:

  • StableBit CloudDrive v 1.0.0.634 BETA
  • 4 TB drive created on Amazon Cloud Drive with 30 MB chunk size, custom security profile
  • Local cache set to 6 GB FIXED on a 120 GB SSD (the SSD is exclusive to CloudDrive - there's absolutely nothing else on this drive)
  • Lots of data to upload

When the local cache is filled up (6 GB) CD starts throttling write requests, as it should be doing (hooray for this feature, by the way). However, when the total amount of data uploaded is nearing the size of the cache drive, CloudDrive starts slowing down until it completely stops accepting writes and throws a red warning message saying that the local cache drive is full.

 

This is the CloudPart-folder after a session of having uploaded approx 30 GB of data.

 

post-2445-0-50523000-1469530493_thumb.png

 

This is the local cache disk at the same time as the screenshot above. Remember, there is absolutely nothing on this drive other than the CloudPart-folder.

 

post-2445-0-99852600-1469530502_thumb.png

 

Selecting "Performance --> Clear local cache" does nothing. Detaching and re-attaching the drive clears and empties the local drive, reducing the "Used space" to almost nothing, and I can again start filling the cloud drive with data until the cache drive runs full again.

 

As is obvious, a discrepancy exists between the amount of data reported as "Used space" on the SSD and the "Size on disk" of the CloudPart folder. My guess is that this is some sort of bug related to the handling of NTFS sparse files. Any ideas?

Link to comment
Share on other sites

Recommended Posts

  • 0

To clarify, Size and Size on disk are two entirely different things. 

 

Size on disk is the actual amount of space being taken up on disk.  However, you can report more space being used than what is actually being used. These files are generally called "Sparse files".  

 

That is what we use for the cache data, as it is a VERY efficient way to handle things. You can read about it here:

https://en.wikipedia.org/wiki/Sparse_file

 

As for the usage, I can't say for certain, but an educated guess would be that the other space is being used by "VSS Snapshots" (aka "Previous Versions") 

 

You can check and disable it.  To do so, run "control sysdm.cpl", open the 'System Protection" tab.  

This should list most of your disks. Find the one in question. If protection is enabled, then that is most likely what is going on here. 

With the disk selected (click on it, to select), then click the "Configure" option. 

This will tell you how much disk space it's currently using.  Chances are, it's close to the 29GBs that you're seeing.  And if that is the case, you can disable and delete these here. 

 

Once you've done that, everything should "go back to normal", essentially. 

Link to comment
Share on other sites

  • 0

Drashna,

 

Thank you for your feedback. Yes, I suspect this is somehow related to the handling of sparse files, as I wrote.

 

VSS is totally disabled on the server in question (the service is not running). On Windows Server, Shadow Copies appear as part of the "Shared Folders" MMC Snap-In (under All Tasks). I've attached a screenshot confirming that the service is disabled in general and on this drive in particular.

 

post-2445-0-77949100-1469621097_thumb.png post-2445-0-66112800-1469620849_thumb.png post-2445-0-97149000-1469620867_thumb.png

 

I also intuitively suspected VSS from doing this (since "Used space" on disk ≠ "Size on disk" of the folder). However, as far as I can reason, if VSS was the culprit, detaching the Cloud Drive should not immediately free all the space again. VSS seems to be ruled out, however.

 

Any other ideas?

Link to comment
Share on other sites

  • 0

detaching the Cloud Drive should not immediately free all the space again. 

 

To clarify, all of the space is freed up when unmounting the CloudDrive disk? 

 

If so, I'm not entirely sure, and I'll have to flag this for Alex (the Developer). 

 

It may also be worth running something like WinDirStat to see what it reports. Not that I suspect it will be anything different. 

Link to comment
Share on other sites

  • 0

To clarify, all of the space is freed up when unmounting the CloudDrive disk?

 

"Used space" on the physical drive is returned to near-zero when I "Detach" the drive from CloudDrive. This also clears the red warning CloudDrive throws. I can then "Attach" the drive again and resume uploading, until the physical cache drive again is full.

Link to comment
Share on other sites

  • 0

I ran WinDirStat (as local admin) and the output is as attached, complete with drive properties. (I swapped the 120GB SSD for a 480GB SSD just to give it more wiggle room). I also upped the local cache to 10 GB to see what happens, but no change.

 

post-2445-0-58432200-1469776682_thumb.png

 

Best,

Tell

Link to comment
Share on other sites

  • 0

Just wanted to mention that I am experiencing the exact same thing as the original poster,

 

The disk drive holding the local cache is filled up despite using a fixed cache size. 

 

I am running StableBit CloudDrive 1.0.0.667 BETA on Win7 x64 Professional.

 

The same happens on all 3 drives (all Amazon Cloud Drives ).

 

When detaching the drives, the exact amount of data in total written to the drive in question is then "released" on the cache drive. I just detached one of the drives releasing 519 GB of data (the fixed cache size for the drive is 10 GB ).

 

 

 

 

Link to comment
Share on other sites

  • 0

Okay, just to clarify here, I talked with Alex directly about this issue. 

 

 

This isn't really a bug with StableBit CloudDrive, but specifically with NTFS (and likely how it deals with "sparse files"). 

 

Rebooting may fix the issue, actually. 

 

 

And if you want to check, run "fsutil fsinfo ntfsinfo x:", where "x:" is the cache drive in question.  Look for the "total Reserved" entry. This should be very low (4 digits, at most), but in this case, it will be MUCH higher.  

 

If this is the case, then you can safely ignore this for now.  

 

 

Unfortunately, "fixing" this may be be non-trivial, or not possible, as it may purely be an NTFS issue.

Link to comment
Share on other sites

  • 0

I have this same issue with local cache filling up several of my 1TB SSD drives and also overflowing onto other 4TB drives in my server.  how do I restrict the cache size to the value I have set under performance --> set cache size (i have selected 'fixed" btw).

Link to comment
Share on other sites

  • 0

I have this same issue with local cache filling up several of my 1TB SSD drives and also overflowing onto other 4TB drives in my server.  how do I restrict the cache size to the value I have set under performance --> set cache size (i have selected 'fixed" btw).

 

Check the "Size on disk" for the CloudDrive cache size.  if this is reported correctly (eg, up to or at the cache size specified), then everything is fine. 

 

The "disk usage" discrepency, as show here:

http://community.covecube.com/index.php?/topic/2081-local-cache-disk-filling-up-despite-fixed-cache-size/&do=findComment&comment=14366

 

This is an NTFS issue, not a StableBit CloudDrive issue, specifically.  It's something weird that NTFS is doing, that is misreporting or misappropriating sectors on the disk.  

 

 

Alex isn't exactly sure what is gong on. However, if this issue does continue to occur, he will definitely look into it. But it may not be a problem that we can solve (it may amount to opening a ticket with Microsoft, and then issuing a hotfix.... which is a long, drawn out process).

 

 

 

However, if the 'Size on disk" is exceeding the limit that you've specified, then this is a bug with our code and something that we definitely do need to look into and fix. 

Link to comment
Share on other sites

  • 0

Check the "Size on disk" for the CloudDrive cache size.  if this is reported correctly (eg, up to or at the cache size specified), then everything is fine. 

 

The "disk usage" discrepency, as show here:

http://community.covecube.com/index.php?/topic/2081-local-cache-disk-filling-up-despite-fixed-cache-size/&do=findComment&comment=14366

 

This is an NTFS issue, not a StableBit CloudDrive issue, specifically.  It's something weird that NTFS is doing, that is misreporting or misappropriating sectors on the disk.  

 

 

Alex isn't exactly sure what is gong on. However, if this issue does continue to occur, he will definitely look into it. But it may not be a problem that we can solve (it may amount to opening a ticket with Microsoft, and then issuing a hotfix.... which is a long, drawn out process).

 

 

 

However, if the 'Size on disk" is exceeding the limit that you've specified, then this is a bug with our code and something that we definitely do need to look into and fix. 

 

I have attached a similar screenshot to the one that @Tell provided

 

local cache is set to fixed size 50GB, size on disk is 697GB.

 

CEziA7j.png

 

Link to comment
Share on other sites

  • 0

I have attached a similar screenshot to the one that @Tell provided

 

If you notice, the "size on disk" is reported properly in his case (it's not exceeding the specified cache size), His disk shows a higher usage. 

 

This is an issue with the "Total Reserved" entry being VERY high. 

 

Your case looks like the cache is far exceeding that 50GB limit. 

 

local cache is set to fixed size 50GB, size on disk is 697GB.

 

Could you verify that it is a Fixed size cache?  (It does look like it's set to "Expandable" here). 

 

If it is fixed, try copying data to the drive. The copy should be incredibly slow, as it should be idling the size. 

 

 

Additionally, could you install the 1.0.0.680 build, or the 1.0.0.682 build, as it may address this issue.  

(the 1.0.0.672 is an "intermediary" build, meaning that the code was in flux, and the issues you are seeing may be due to that, and not actually a bug). 

Link to comment
Share on other sites

  • 0

Drashna and friends,

 

I'm still experiencing the issues, which is preventing me from uploading more than the size of my cache drive (480GB) before I have to reboot the computer. Hosting nothing but the CloudDrive cache, fsutil gives:

C:\Windows\system32>fsutil fsinfo ntfsinfo s:
NTFS Volume Serial Number :       0x________________
NTFS Version   :                  3.1
LFS Version    :                  2.0
Number Sectors :                  0x0000000037e027ff
Total Clusters :                  0x0000000006fc04ff
Free Clusters  :                  0x0000000006fb70c3
Total Reserved :                  0x0000000006e6d810
Bytes Per Sector  :               512
Bytes Per Physical Sector :       512
Bytes Per Cluster :               4096
Bytes Per FileRecord Segment    : 1024
Clusters Per FileRecord Segment : 0
Mft Valid Data Length :           0x0000000000240000
Mft Start Lcn  :                  0x00000000000c0000
Mft2 Start Lcn :                  0x0000000000000002
Mft Zone Start :                  0x00000000000c00c0
Mft Zone End   :                  0x00000000000cc8c0

Since a reboot solves the issue, could it be that CloudDrive needs to release the write handles on the (sparse) files so that the drive manager will let go of the reservation? Did https://stablebit.com/Admin/IssueAnalysis/27122uncover anything?

 

Best,

Tell

Link to comment
Share on other sites

  • 0

Since a reboot solves the issue, could it be that CloudDrive needs to release the write handles on the (sparse) files so that the drive manager will let go of the reservation? Did https://stablebit.com/Admin/IssueAnalysis/27122uncover anything?

 

I doubt it, because once the service starts up, it would open the handle back up.  And the reserved space will decrease over time, as well. 

 

IIRC, Alex said it's a weird interaction between NTFS and sparse files, meaning it's an NTFS bug. 

The ticket was to investigate if there was something we could do, such as send file system commands to get the system to update this info properly. 

 

However that won't be a simple process, and will require a lot of research and testing. 

Link to comment
Share on other sites

  • 0

I doubt it, because once the service starts up, it would open the handle back up.  And the reserved space will decrease over time, as well. 

 

 

Turns out running…

net stop CloudDriveService
net start CloudDriveService

… does actually resolve the issue and release the reserved space on the drive (which in effect was the same as a reboot accomplished), allowing me to continue adding data. It is clear to me that the issue is resolved when CloudDrive then releases/flushes these sparse files. The issue does not re-appear until after I have added data again. While the problem might be NTFS-related, I would claim that this would be relatively simple to mitigate by having CloudDrive release these file handles from time to time so that the file system can catch up on how much space is occupied by the sparse files. It makes sense to me that Windows, to improve performance, might not compute new free space from sparse files until after the file handles are released – after all, free disk space is not a metric that mostly needs be accurate to prevent overfill and not the other direction.

 

TL;DR: The problem is solved by restarting CloudDriveService, which flushes something to disk. CloudDrive should do this on its own.

Link to comment
Share on other sites

  • 0

The problem is that we're using Sparse files here.  

 

Sparse files are a single, large file, that is mostly empty.   It only uses the disk space needed.   

https://en.wikipedia.org/wiki/Sparse_file

 

The issue is that NTFS is likely over allocating space for the sparse file, in the form of Reserved space.   

 

Reducing the cache size may help here.   

 

 

But "releasing" the file isn't a possibility without stopping the service or detaching the drive from the system.  Otherwise, ti's going to be in constant use. 

 

 

Specifically, the problem isn't StableBit CloudDrive, but in how NTFS is handling the sparse file.  And to make things worse, we're not doing anything weird or complex with the file. We're creating a "normal" sparse file, and using/accessing it in very normal ways. 

Link to comment
Share on other sites

  • 0

Turns out running…

net stop CloudDriveService
net start CloudDriveService

… does actually resolve the issue and release the reserved space on the drive (which in effect was the same as a reboot accomplished), allowing me to continue adding data. It is clear to me that the issue is resolved when CloudDrive then releases/flushes these sparse files. The issue does not re-appear until after I have added data again. While the problem might be NTFS-related, I would claim that this would be relatively simple to mitigate by having CloudDrive release these file handles from time to time so that the file system can catch up on how much space is occupied by the sparse files. It makes sense to me that Windows, to improve performance, might not compute new free space from sparse files until after the file handles are released – after all, free disk space is not a metric that mostly needs be accurate to prevent overfill and not the other direction.

 

TL;DR: The problem is solved by restarting CloudDriveService, which flushes something to disk. CloudDrive should do this on its own.

 

Does clouddrive treat this like a system crash if an upload/download is in progress or if anything is in 'To Upload' ?? or does it handle it gracefully ??

 

If graceful, I could set up a scheduled task to do this nightly, or even every few hours.

 

Having this same problem, a separate thread was made about it OVER 2 months ago. I've basically abandoned Clouddrive because of this, if it is never fixed I'll probably ask for a refund (but not sure if I'll get it, since I purchased all the way back in January).

 

My cache drive fills all the way up and then forcibly dismounts my clouddrive disks every day or two, sometimes in less than a day. Despite what has been said, the reserved clusters NEVER, EVER go down over time unless I reboot or detach the disks. 

Link to comment
Share on other sites

  • 0

I'm also seeing the exact same issue, pausing the local file transfer to the CloudDrive mount point will resume upload but once you resume the local copy it will pause in CloudDrive

 

I hope Alex gets around to it soon!

EDIT: Sorry for the noise. Using the latest beta now (756) with fixed cache size. Seem to do the trick here (you guys should update the frontpage links :-))

Link to comment
Share on other sites

  • 0

I'm also seeing the exact same issue, pausing the local file transfer to the CloudDrive mount point will resume upload but once you resume the local copy it will pause in CloudDrive

 

I hope Alex gets around to it soon!

 

EDIT: Sorry for the noise. Using the latest beta now (756) with fixed cache size. Seem to do the trick here (you guys should update the frontpage links :-))

 

 

Glad to hear it! 

 

And there is a good amount of testing that has to be done before that happens. Things like ensuring updating works properly (not just for the program, but for the drives too). 

 

But yeah, we want to do so, sooner rather than later. 

Link to comment
Share on other sites

  • 0

I'm going to be very blunt here:

 

For the "Total reserved" discrepancy issue, this issue is with how NTFS handles sparse files, and not a StableBit CloudDrive specific issue.

 

 

We use Sparse files for the cache, as it's the most efficient way to handle the cache.  

And we're not doing anything complex with them (in fact, they should be working very much like a dynamically expanding VHDX file). 

 

Alex has observed this happening on his system, but wasn't able to pin down a reason for this happening (this is even with source code for NTFS, from sites like ReactOS, and old leaks).  There isn't a good reason for it. 

 

 

 

Any investigation into this is actually investigating and digging into NTFS itself. That means reverse engineering and a shitload of testing.   it' won't be a small commitment of time.  

 

Additionally, because it is a pretty rare issue, an issue with NTFS and Windows itself, and one that generally resolves itself... it's not a high priority. 

Especially as this is an issue that could take months or years to come to find a solution, or come to the conclusion that there is nothing we can do. 

 

The other option is to open a ticket with Microsoft directly and hope for resolution. Doing such is generally incredibly expensive, and may not be useful ("oh, well, that's just how it works" or "well, we don't really know")

 

 

Yes, this is a shitty answer (and I hate giving it). But I'd rather give a shitty answer than continually dancing around it.

Link to comment
Share on other sites

  • 0

Thanks Christopher. While most of that was understood, I only find fault with categorizing an issue that fills up a 120GB SSD and knocks all of my drives offline in 6-12 hours an issue that "generally resolves itself" unless by resolving itself you mean forcing the dismount of my drives so that the clusters are finally freed....

 

For me, and it looks like others, it makes CloudDrive utterly useless.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Answer this question...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...