Automatic self-healing - how it works?

zhup · November 22, 2019

Hello,

How does "automatic self-healing" work?

Are only cloud files or also the files on hdds repaired? Does de-duplication have to be turned on?

Thank you in advance.

Bowsa · December 14, 2019

It doesn't work. I've still suffered data loss multiple times with this enabled...

srcrist · December 15, 2019

On 11/22/2019 at 7:26 AM, zhup said:

Hello,

How does "automatic self-healing" work?

Are only cloud files or also the files on hdds repaired? Does de-duplication have to be turned on?

Thank you in advance.

Self-healing refers to the structure of the drive on your storage, not your local files or even the files stored on the drive. It simply means that it will use the duplicated data to repair corrupt chunks when it finds them.

22 hours ago, Bowsa said:

It doesn't work. I've still suffered data loss multiple times with this enabled...

Have you submitted a ticket to support? I haven't heard of data loss issues even among those who aren't using data duplication in about 6 months. The self-healing works, in any case. I've seen it show up in the logs once or twice.

Bowsa · December 17, 2019

On 12/15/2019 at 11:47 AM, srcrist said:

Self-healing refers to the structure of the drive on your storage, not your local files or even the files stored on the drive. It simply means that it will use the duplicated data to repair corrupt chunks when it finds them.

Have you submitted a ticket to support? I haven't heard of data loss issues even among those who aren't using data duplication in about 6 months. The self-healing works, in any case. I've seen it show up in the logs once or twice.

Haven't submitted a ticket, I'm used to suffering data loss at least 2-5 time a year with StableBit cloud. Nothing I can do about it

srcrist · December 17, 2019

7 hours ago, Bowsa said:

Haven't submitted a ticket, I'm used to suffering data loss at least 2-5 time a year with StableBit cloud. Nothing I can do about it

I mean...you could stop using it, if it loses your data...or at least contact support to see if you can get your issue fixed...

Bowsa · December 24, 2019

On 12/17/2019 at 2:42 AM, srcrist said:

I mean...you could stop using it, if it loses your data...or at least contact support to see if you can get your issue fixed...

Contacted them... is anyone else getting this error? Has occurred more than 20 times now, where the chunk is unavailable both in my local and duplicate drive.

All that data lost...

Jellepepe · December 26, 2019

On 12/24/2019 at 1:45 AM, Bowsa said:

Contacted them... is anyone else getting this error? Has occurred more than 20 times now, where the chunk is unavailable both in my local and duplicate drive.

All that data lost...

Sounds to me like you may just have a failing drive in your system or a really shitty network connection causing corruption. A drive definitely should not normally introduce corruption without something affecting the data.

zhup · February 11, 2020

I need an explanation from the Stablebit team.
What exactly is automatic self-healing and de-duplication? How does it work?

Which data is repaired and with using what data?

Thank you in advance.

Christopher (Drashna) · February 15, 2020

If you mean the self healing and duplication for StableBit CloudDrive, it is just that, duplication of the data chunks. It can/will upload the data twice, basically. So that if one block gets corrupted/damaged/deleted, it can attempt to access the other block, and use that.

darkly · February 20, 2020

On 2/14/2020 at 7:10 PM, Christopher (Drashna) said:

If you mean the self healing and duplication for StableBit CloudDrive, it is just that, duplication of the data chunks. It can/will upload the data twice, basically. So that if one block gets corrupted/damaged/deleted, it can attempt to access the other block, and use that.

Can you shine some more light on what exactly happens when these type of features are turned on or off on existing drives with large amounts of data already saved? I'm particularly interested in the details behind a CloudDrive+DrivePool nested setup (multiple partitions on CloudDrives pooled).

Some examples:

40+TB of data on a single pool consisting of a single clouddrive split into multiple partitions, mounted with a 50GB cache on a 1TB drive. What EXACTLY happens when duplication is turned on in CD, or if file duplication is turned on in DP?

40+TB on a pool, which itself is consists of a single pool. That pool consists of multiple partitions from a single clouddrive. A second clouddrive is partitioned and those partitions are pooled together (in yet another pool). The resulting pool is added to the first pool (the one which directly contains the data) so that it now consists of two pools. Balancers are enabled in drivepool, so it begins to try to balance the 40TB of data between the two pools now. Same question here. What EXACTLY would happen?

And, more simply, what happens on a clouddrive is duplication is disabled?

I wish there was a bit more clarity on how these types of scenarios would be handled. My main concern with these type of scenarios is that suddenly my 40TB of data will attempt to be duplicated all at once (or in the case of the second scenario, DrivePool rapidly begins to migrate 20TB of data), instantly filling my 1TB cache drive, destabilizing my clouddrive, and resulting in drive corruption/data loss. As far as I can tell from the documentation, there is nothing in the software to mitigate this. Am I wrong?

srcrist · February 20, 2020

CloudDrive duplication is block-level duplication. It makes several copies of the chunks that contain your file system data (basically everything that gets "pinned"). If any of those chunks are then detected as corrupt or inaccessible, it will use one of the redundant chunks to access your file system data, and then repair the redundancy with the valid copy.

DrivePool duplication is file-level duplication. It will make however many copies of whatever data you specify, and arrange those copies throughout your pool as you specify. DrivePool duplication is very customizable. You have full control over where and when it duplicates your data. If you want it to duplicate your entire pool, that is a setting you control. As is whether or not it does so at regular intervals, or immediately. That's all up to your balancer settings in DrivePool.

Despite the name similarity, their functionality really has nothing to do with one another. CloudDrive's duplication makes copies of very specific chunks on your cloud provider. It doesn't have anything to do with duplicating your actual data. It's intended to prevent corruption from arbitrary rollbacks and data loss on the provider's part, like we saw back in March and June of last year.

EDIT: It slipped my mind that full duplication can also be enabled in CloudDrive. This is still block-level duplication on your cloud provider. Rather than using one chunk for each chunk, it would use two. For the same purpose mentioned above. If one chunk is corrupt or unavailable, it will use the other and repair the redundancy. Net effect being that 100GB of data on your cloud storage would take up 200GB worth of chunks, of course, and also twice the upload time per byte. You would still only see ONE copy of the data in your file system, though.

darkly · February 20, 2020

29 minutes ago, srcrist said:

CloudDrive duplication is block-level duplication. It makes several copies of the chunks that contain your file system data (basically everything that gets "pinned"). If any of those chunks are then detected as corrupt or inaccessible, it will use one of the redundant chunks to access your file system data, and then repair the redundancy with the valid copy.

DrivePool duplication is file-level duplication. It will make however many copies of whatever data you specify, and arrange those copies throughout your pool as you specify. DrivePool duplication is very customizable. You have full control over where and when it duplicates your data. If you want it to duplicate your entire pool, that is a setting you control. As is whether or not it does so at regular intervals, or immediately. That's all up to your balancer settings in DrivePool.

Despite the name similarity, their functionality really has nothing to do with one another. CloudDrive's duplication makes copies of very specific chunks on your cloud provider. It doesn't have anything to do with duplicating your actual data. It's intended to prevent corruption from arbitrary rollbacks and data loss on the provider's part, like we saw back in March and June of last year.

EDIT: It slipped my mind that full duplication can also be enabled in CloudDrive. This is still block-level duplication on your cloud provider. Rather than using one chunk for each chunk, it would use two. For the same purpose mentioned above. If one chunk is corrupt or unavailable, it will use the other and repair the redundancy. Net effect being that 100GB of data on your cloud storage would take up 200GB worth of chunks, of course, and also twice the upload time per byte. You would still only see ONE copy of the data in your file system, though.

I fully understand that the two functions are technically different, but my point is that functionally, both can affect my data at a large scale. My question is about what the actual step-by-step result of enabling these features on an existing data set would be, not what the end result is intended to be (as it might never get there). An existing large dataset with a cache drive nowhere near the size of it is the key point to my scenarios. What (down to EVERY key step) happens from the MOMENT I enable duplication in CloudDrive? All of my prior experience with CloudDrive suggests to me that my 40TB of data would start being downloaded, read, duplicated, and reuploaded, and this would max out my cache drive, causing the clouddrive to fail and either become corrupted or dismount, if hitting the 750GB upload cap doesn't do so first.

As I stated in my first comment, there is NO documentation about how existing data is handled when features like this which affect things at a large scope are enabled, and that's really concerning when you're familiar with the type of problems that you can run into with CloudDrive if you're not careful with things like the upload cap, available space on the cache, and R/W bottlenecks on the cache drive. As someone who has lost many terabytes of data due to this, I am understandably reluctant to touch a feature like this which could actually help me on the long run, because I don't know what it does NOW.

srcrist · February 20, 2020

The change log seems to suggest that enabling it on a drive with existing data will only impact new data written to the drive:

.1121
* Added an option to enable or disable data duplication for existing cloud drives (Manage Drive -> Data duplication...).
    - Any new data written to the cloud drive after duplication was enabled will be stored twice at the storage provider.
    - Existing data on the drive that is not overwritten will continue to be stored once.

.1118
* Added an option to enable data duplication when creating a new cloud drive.
    - Data duplication stores your data twice at the storage provider.
    - It consumes twice the upload bandwidth and twice the storage space at the provider.
    - In case of data corruption or loss of the primary data blocks, the secondary blocks will be used to provide redundancy 
      for read operations.

So it should not impact the cache drive in any immediate sense. CloudDrive is generally smarter than that about the cache though. I would expect it to throttle writes to the cache as it was processing the data, as it does with large volume copies from other sources like DrivePool. Large writes from other sources do not corrupt or dismount the drive. It simply throttles the writes until space is available. A moot point, in any case, as it will not duplicate your existing chunks unless you manually download and reupload the data.

srcrist · February 20, 2020

25 minutes ago, darkly said:

As I stated in my first comment, there is NO documentation about how existing data is handled when features like this which affect things at a large scope are enabled, and that's really concerning when you're familiar with the type of problems that you can run into with CloudDrive if you're not careful with things like the upload cap, available space on the cache, and R/W bottlenecks on the cache drive. As someone who has lost many terabytes of data due to this, I am understandably reluctant to touch a feature like this which could actually help me on the long run, because I don't know what it does NOW.

To be clear: there is documentation on this feature in the change log.

darkly · February 21, 2020

2 hours ago, srcrist said:

To be clear: there is documentation on this feature in the change log.

oh THAT'S where it is. I rarely think of the changelog as a place for technical documentation of features . . . Thanks for pointing me to that anyway.

3 hours ago, srcrist said:

CloudDrive is generally smarter than that about the cache though. I would expect it to throttle writes to the cache as it was processing the data, as it does with large volume copies from other sources like DrivePool. Large writes from other sources do not corrupt or dismount the drive. It simply throttles the writes until space is available. A moot point, in any case, as it will not duplicate your existing chunks unless you manually download and reupload the data.

This has definitely not been my experience, except in the short term. While writes are throttled for a short while, I've found that extended operation under this throttling can cause SERIOUS issues as CloudDrive attempts to read chunks and is unable to due to the cache being full. I've lost entire drives due to this.

srcrist · February 21, 2020

1 hour ago, darkly said:

oh THAT'S where it is. I rarely think of the changelog as a place for technical documentation of features . . . Thanks for pointing me to that anyway.

Isn't the the entire purpose of a changelog? To explain the functionality of new features? The changes that have been made to the operation of the application?

1 hour ago, darkly said:

This has definitely not been my experience, except in the short term. While writes are throttled for a short while, I've found that extended operation under this throttling can cause SERIOUS issues as CloudDrive attempts to read chunks and is unable to due to the cache being full. I've lost entire drives due to this.

I copied roughly 100 TB over the course of several months from one drive to another via CloudDrive, and with the cache drive being full that entire time it simply throttled the writes to the cache and accepted data as room became available. Which is intended functionality as far as I am aware. You may have encountered a bug, or there may have been a technical issue with your cache drive itself--but it should do little more than copy slowly. I'm not even sure what you mean by CloudDrive being unable to read chunks because the cache is full. How would a full cache prevent reads? The full cache is storing your writes, and the reads are from the cloud unless you configure your cache to maintain some read availability...but even without it, CloudDrive can still read chunks.

It's possible that you're simply misattributing the cause of your data loss. The only way in which I can think that you might lose data via the cache is if the drive you're using for cache space isn't storing valid data.

The cache, in any case, will not be meaningfully affected by any sort of data duplication.

darkly · February 21, 2020

1 hour ago, srcrist said:

Isn't the the entire purpose of a changelog? To explain the functionality of new features? The changes that have been made to the operation of the application?

Most changelogs don't go in depth into the inner workings of added functionality. They only detail the fact that the functionality now exists, or reference some bug that has been fixed. Expecting to find detailed information on how duplication functions in the CloudDrive changelog feels about the same as expecting Plex server changelogs to detail how temporary files created during transcoding are handled.

1 hour ago, srcrist said:

I copied roughly 100 TB over the course of several months from one drive to another via CloudDrive, and with the cache drive being full that entire time it simply throttled the writes to the cache and accepted data as room became available. Which is intended functionality as far as I am aware. You may have encountered a bug, or there may have been a technical issue with your cache drive itself--but it should do little more than copy slowly. I'm not even sure what you mean by CloudDrive being unable to read chunks because the cache is full. How would a full cache prevent reads? The full cache is storing your writes, and the reads are from the cloud unless you configure your cache to maintain some read availability...but even without it, CloudDrive can still read chunks.

It's possible that you're simply misattributing the cause of your data loss. The only way in which I can think that you might lose data via the cache is if the drive you're using for cache space isn't storing valid data.

The cache, in any case, will not be meaningfully affected by any sort of data duplication.

When your cache is low on space, you first get a warning that writes are being throttled. At some point you get warnings that CD is having difficulty reading data from the provider (I don't remember the specific order in which things happen or the exact wording, and I really don't care to trigger the issue to find out). That warning mentions that continued issues could cause some issues (data loss, drive disconnecting, or something like that). I'm 99.99% sure CD does not read data directly from the cloud. If the cache is full, it cannot download the data to read it. I've had this happen multiple times on drives that were otherwise stable, and unless there's some cosmic level coincidences happening, there's no way I'm misattributing the issue when it occurs every time the cache drive fills up and I accidentally leave it in that state for longer than I intended (such as leaving a large file transfer unattended). Considering clouddrive itself gives you a warning about having trouble reading data when the cache becomes saturated for an extended period, I don't think I'm wrong. Anyway, I've personally watched as my writes were throttled, and yet my cache drive continued to fill up more and more (albeit much slower) until it had effectively no free space at all except for whenever CD managed to offload another chunk to the cloud (before the drive ultimately failed due to not being able to fetch data from the cloud fast enough anyway).

I've had this happen multiple times on multiple devices and the drives never have any other problems until I carelessly left a large transfer going that saturated my cache. I've had multiple clouddrives mounted on the same system, using the same cache drive, with one drive sitting passively with no reads and writes, and the other drive having a large transfer I accidentally left running, and sure enough, that drive dismounts and/or loses data and the passive drive remains in peak condition with no issues whatsoever when I check it after the fact for any problems.

EDIT
90% sure the error in this post is the one I was seeing (with different wording for the provider and drives ofc):

srcrist · February 21, 2020

OK. Best of luck.

Sign In

Automatic self-healing - how it works?

Question

zhup

17 answers to this question

Recommended Posts

Bowsa

srcrist

Bowsa

srcrist

Bowsa

Jellepepe

zhup

Christopher (Drashna)

darkly

srcrist

darkly

srcrist

srcrist

darkly

srcrist

darkly

srcrist

Join the conversation

Browse

Activity