Jump to content
  • 1
Message added by Shane,

As this continues to be an important topic, I've pinned it and am summarising here:

  • I've confirmed that enabling Read Striping can result in corrupted reads of duplicated files.
  • On (newer?) versions of DrivePool the feature is apparently enabled by default.

If you cannot confirm your installation is unaffected you should ensure Read Striping is disabled:

  • For each pool you have, open the DrivePool GUI, go to Manage Pool -> Performance -> make sure Read Striping is not ticked.

Shane, volunteer mod.

Question

Posted

I am using StableBit DrivePool and noticed that I am getting file corruption that is somewhat reproducible. I particularly noticed this with FLAC files as I was attempting to verify my library of music using flac -t which checks the MD5 signature of the decoded WAV file.

My setup is DrivePool with Folder Duplication enabled on specific folders, "Bypass filesystem Filters" checked, "Read striping" checked and "Real-time duplication checked".

Note that it appears to be "Read striping" that is the culprit for this but I am not 100% sure. Particularly concerning to me is that this happens even with "Verify after copy" checked.

Steps for me to reproduce are to download a FLAC to a DrivePool location that matches the above parameters, verify it with flac -t (ensure that it verifies OK), copy the file to a different location (doesn't even have to be a DrivePool location), verify this copied file with flac -t and see that the file does not verify anymore.

Checking with a hex editor, I can see it's not even just 1 byte difference, usually something like the first 32kb is fine, then I get random jumbled up data for a 128kb or so then the remainder of the file is correct (and the file size is correct).

Here's something I would never expect to see in a working filesystem:

PS F:\test\> flac -t .\test.flac

flac 1.3.2
Copyright (C) 2000-2009  Josh Coalson, 2011-2016  Xiph.Org Foundation
flac comes with ABSOLUTELY NO WARRANTY.  This is free software, and you are
welcome to redistribute it under certain conditions.  Type `flac' for details.

test.flac: ok
PS F:\test\> copy .\test.flac test2.flac
PS F:\test\> flac -t .\test2.flac

flac 1.3.2
Copyright (C) 2000-2009  Josh Coalson, 2011-2016  Xiph.Org Foundation
flac comes with ABSOLUTELY NO WARRANTY.  This is free software, and you are
welcome to redistribute it under certain conditions.  Type `flac' for details.

test2.flac: *** Got error code 2:FLAC__STREAM_DECODER_ERROR_STATUS_FRAME_CRC_MISMATCH


test2.flac: ERROR while decoding data
            state = FLAC__STREAM_DECODER_ABORTED

Edit:

It seems I am not the only one running into this problem, see this recent Reddit thread:

 

21 answers to this question

Recommended Posts

  • 0
Posted

FLAC is a file format for lossless audio compression, you can download the flac utility from https://xiph.org/flac/download.html

The reason I use it as an example is because the FLAC format has built-in corruption checks, so it's easy to verify if the file contents is not what the user expects.

I suspect that the file format is largely irrelevant, however this does seem to be more likely to happen with larger files (ie. you probably can't reproduce this by copying small text files around).

  • 0
Posted

Ok - i have downloaded the flac.exe

have tested on two different machines (win10 and 2012r2) with drivepool installed (2.3.0.1124 beta)

with same settings as you listed - i am not seeing any corruption of flac files - or any other files

flac.exe (1.3.2) always comes back "OK"

tried

copying to same folder as original

copying from pool to a non pool drive

tried several files - all copied fine and came back as "OK"

  • 0
Posted

Well that's a 2 years old version - and there have been numerous changes/bug fixes since then

if you want to stay on the stable version 2.2.3.1019 is available from October 2019 - which is the latest

or move to the beta

and see if your problem goes away or not

 

  • 0
Posted
On 6/29/2020 at 5:43 AM, Catch-22 said:

Note that it appears to be "Read striping" that is the culprit for this but I am not 100% sure. Particularly concerning to me is that this happens even with "Verify after copy" checked.

We've had reports of this in the past (and recently).   But it's one of those things that seem hard to reproduce, and hard to track down. 

However, based on how read striping works in StableBit DrivePool, it may be an interaction with how the program is reading the file data.  

https://stablebit.com/Support/DrivePool/2.X/Manual?Section=Performance Options#Read Striping

Specifically, I suspect the issue is with this part: 

Quote

For large sequential I/O, such as large file copying, read striping will utilize a block based algorithm, maximizing the use of each disk and minimizing disk context switches.

If you could, in the advanced settings for DrivePool, could you set the override vale for "CoveFs_ReadStripingBlockMode" to "false", reboot and see if that fixes the issue? 

https://wiki.covecube.com/StableBit_DrivePool_2.x_Advanced_Settings#Settings.json

Alternative, instead, try setting the override value for "CoveFs_ReadStripingBlockModeNoCache" to "true".  

  • 0
Posted

I have manually updated to version  2.2.3.1019 (not sure why the update check wasn't working).

Quote

If you could, in the advanced settings for DrivePool, could you set the override vale for "CoveFs_ReadStripingBlockMode" to "false", reboot and see if that fixes the issue?

I have tried this setting after a reboot and still run into the same issues (easiest to reproduce for me when downloading then verifying FLAC files).

I haven't tried the other setting you mentioned. I've reset CoveFs_ReadStripingBlockMode to defaults and will try CoveFs_ReadStripingBlockModeNoCache on my next reboot.

  • 0
Posted

I was able to reproduce it now simply by running flac -t over and over again on the same file.

F:\Music\Flitz&Suppe\Yokai (2020)>flac -t "1 - Tanuki.flac"

flac 1.3.2
Copyright (C) 2000-2009  Josh Coalson, 2011-2016  Xiph.Org Foundation
flac comes with ABSOLUTELY NO WARRANTY.  This is free software, and you are
welcome to redistribute it under certain conditions.  Type `flac' for details.

1 - Tanuki.flac: ok

F:\Music\Flitz&Suppe\Yokai (2020)>flac -t "1 - Tanuki.flac"

flac 1.3.2
Copyright (C) 2000-2009  Josh Coalson, 2011-2016  Xiph.Org Foundation
flac comes with ABSOLUTELY NO WARRANTY.  This is free software, and you are
welcome to redistribute it under certain conditions.  Type `flac' for details.

1 - Tanuki.flac: ok

F:\Music\Flitz&Suppe\Yokai (2020)>flac -t "1 - Tanuki.flac"

flac 1.3.2
Copyright (C) 2000-2009  Josh Coalson, 2011-2016  Xiph.Org Foundation
flac comes with ABSOLUTELY NO WARRANTY.  This is free software, and you are
welcome to redistribute it under certain conditions.  Type `flac' for details.

1 - Tanuki.flac: *** Got error code 0:FLAC__STREAM_DECODER_ERROR_STATUS_LOST_SYNC
*** Got error code 0:FLAC__STREAM_DECODER_ERROR_STATUS_LOST_SYNC


1 - Tanuki.flac: ERROR while decoding data
                 state = FLAC__STREAM_DECODER_ABORTED

 

  • 0
Posted
On 7/14/2020 at 10:30 PM, Catch-22 said:

I was able to reproduce it now simply by running flac -t over and over again on the same file.

F:\Music\Flitz&Suppe\Yokai (2020)>flac -t "1 - Tanuki.flac"

flac 1.3.2
Copyright (C) 2000-2009  Josh Coalson, 2011-2016  Xiph.Org Foundation
flac comes with ABSOLUTELY NO WARRANTY.  This is free software, and you are
welcome to redistribute it under certain conditions.  Type `flac' for details.

1 - Tanuki.flac: ok

F:\Music\Flitz&Suppe\Yokai (2020)>flac -t "1 - Tanuki.flac"

flac 1.3.2
Copyright (C) 2000-2009  Josh Coalson, 2011-2016  Xiph.Org Foundation
flac comes with ABSOLUTELY NO WARRANTY.  This is free software, and you are
welcome to redistribute it under certain conditions.  Type `flac' for details.

1 - Tanuki.flac: ok

F:\Music\Flitz&Suppe\Yokai (2020)>flac -t "1 - Tanuki.flac"

flac 1.3.2
Copyright (C) 2000-2009  Josh Coalson, 2011-2016  Xiph.Org Foundation
flac comes with ABSOLUTELY NO WARRANTY.  This is free software, and you are
welcome to redistribute it under certain conditions.  Type `flac' for details.

1 - Tanuki.flac: *** Got error code 0:FLAC__STREAM_DECODER_ERROR_STATUS_LOST_SYNC
*** Got error code 0:FLAC__STREAM_DECODER_ERROR_STATUS_LOST_SYNC


1 - Tanuki.flac: ERROR while decoding data
                 state = FLAC__STREAM_DECODER_ABORTED

 

@Catch-22 were you able to resolve this issue? I'm running into something similar with MKV files (usually multiple GB).

 

  • 0
Posted

This is just terrible behavior for a drive pooling software and should be the number one bug that needs to be fixed. Did the devs ever get it addressed or is it still an issue?

  • 0
Posted (edited)

Looks like it might still be an issue - depending on the utility you're trying to use.

I'm running some tests on my home server using DrivePool 2.3.2.1493 (latest version currently) with Read Striping temporarily enabled (it defaults to off). I haven't reproduced the issue with FLAC but I am using the more recent 1.4.2 release, HashCheck v2.4.0 (gurnec) sometimes returns MISMATCH or UNREADABLE (occasionally on the 1st and 3rd file of my first sample set of 79 files totalling 1.2GB) or just UNREADABLE (much more frequently on my second sample set of 82 files totalling 185GB), and HashTools 4.6 (Binary Fortress) hasn't had any problems at all on any passes so far of either sample set. Update: I also tested the Windows file comparison tool FC.exe and found it was susceptible too, sometimes returning either mismatching bytes or failing to find a file on the pool despite it being present.

From what I can tell this issue does not affect normal file copy operations: both loop copy (a->b->a) and cascade copy (a->b->c) tests of the smaller sample set shows no corruption at all after 512 iterations, although I am still running these tests with the larger sample set and will edit this post with the result after they finish in a day or two. Edit: the larget sample set also showed no corruption after 24 iterations (cascade, could not run more due to available space) and 77 iterations (loop).

Given the above: my guess would be that different file-checking utilities might be using different functions/calls to read the files and some of those functions/calls may be too low-level / not designed to allow for virtual drives, although I'm not sure how that results in different data being delivered from the virtual drive and only sometimes at that. Some sort of race condition or timeout? Thankfully, as mentioned above, the Read Striping option is not enabled by default (which might also be why this issue hasn't seen much light).

@Christopher (Drashna) It might be a good idea to add a Y/N dialog warning (or at least update the tooltip) to the Read Striping option, perhaps along the lines of "some utilities that depend on reading files in chunks, e.g. for comparison or integrity checking, may experience issues with this enabled"?

Edited by Shane
Update on testing.
  • 0
Posted
On 3/23/2023 at 11:21 AM, Shane said:

Looks like it might still be an issue - depending on the utility you're trying to use.

I'm running some tests on my home server using DrivePool 2.3.2.1493 (latest version currently) with Read Striping temporarily enabled (it defaults to off). I haven't reproduced the issue with FLAC but I am using the more recent 1.4.2 release, HashCheck v2.4.0 (gurnec) sometimes returns MISMATCH or UNREADABLE (occasionally on the 1st and 3rd file of my first sample set of 79 files totalling 1.2GB) or just UNREADABLE (much more frequently on my second sample set of 82 files totalling 185GB), and HashTools 4.6 (Binary Fortress) hasn't had any problems at all on any passes so far of either sample set. Update: I also tested the Windows file comparison tool FC.exe and found it was susceptible too, sometimes returning either mismatching bytes or failing to find a file on the pool despite it being present.

From what I can tell this issue does not affect normal file copy operations: both loop copy (a->b->a) and cascade copy (a->b->c) tests of the smaller sample set shows no corruption at all after 512 iterations, although I am still running these tests with the larger sample set and will edit this post with the result after they finish in a day or two. Edit: the larget sample set also showed no corruption after 24 iterations (cascade, could not run more due to available space) and 77 iterations (loop).

Given the above: my guess would be that different file-checking utilities might be using different functions/calls to read the files and some of those functions/calls may be too low-level / not designed to allow for virtual drives, although I'm not sure how that results in different data being delivered from the virtual drive and only sometimes at that. Some sort of race condition or timeout? Thankfully, as mentioned above, the Read Striping option is not enabled by default (which might also be why this issue hasn't seen much light).

@Christopher (Drashna) It might be a good idea to add a Y/N dialog warning (or at least update the tooltip) to the Read Striping option, perhaps along the lines of "some utilities that depend on reading files in chunks, e.g. for comparison or integrity checking, may experience issues with this enabled"?

Hello. Has this issue been resolved? I ran into some CRC errors the other day and wondered if this was the cause or not. 

  • 0
Posted

I don't know if it has been resolved (and, if the cause is the use of "virtual-unfriendly" function calls by the third party tools, it would not be possible to resolve except by the makers of those tools).

If read striping is enabled and the CRC errors are reported by hashing/verification utilities you rely on, I would turn off read striping.

If read striping is not enabled, then there may be issues with your hardware.

  • 0
Posted

This issue is definitely NOT resolved and is really extremely serious.  I can't believe it hasn't gotten more attention given the high potential for data corruption.  I'm on v2.3.11.1663 (latest at this time) and was highly perplexed at seeing random corruption throughout thousands of files I copied to a linux server via an rsync command.  This sent me on a wild goose chase trying to look into bad RAM modules or bugs in rsync but it is now clear that the issue was DrivePool all along (it didn't help that I actually did have some bad RAM on the linux server, but that was red herring as it has since been replaced with ECC RAM that has been tested).

After noticing that the source data on a DrivePool volume "seemed" valid but thousands of the files copied to the linux server were corrupt I spent weeks trying to figure out what was going on.  Trying to narrow down the issue I started working with individual files.  In particular I looked at some MP3 files that were corrupt on the remote side.  When I would re-copy a file via rsync with the --checksum parameter it would always report the mismatch and act like it was re-copying the file, but then sometimes the file would STILL be corrupt on the remote side.  WTF?  Apparently this bug was causing the rsync re-copy to send yet another corrupted version of the file to the remote side, though it would occasionally copy a good version.  Super weird and very inconsistent.

So then I wrote a Node.js script to iterate through a folder and generate/compare MD5 hashes of source files (on the DrivePool volume) and target files (on the remote linux server).  I started with a small dataset of around 4000 files (22 of which were corrupt).  Things got even weirder with multiple runs of this script showing different files with mismatched hashes, and I realized it was frequently generating an incorrect hash for the SOURCE file.  There could be different results each time the script was run.  Sometimes hundreds of files would show a hash mismatch.

It's only been a short time since I disabled read-striping so I can't verify that has fixed everything but with read-striping disabled I haven't yet experienced a single corrupt transfer.  An rsync command to compare based on checksum completed and fixed all 22 remaining corrupt files.  And another couple runs of my hash compare script for the small 4000 file dataset shows no hash mismatches.

The only thing preventing this from becoming an utter disaster is that I hadn't yet deleted the source material after copying to the remote server, so I still have the original files to compare and try to repair the whole mess.  However, some of the files were already reorganized on the remote server, so it is still going to take a lot of manual work to get everything fixed.

Sorry for the rant, but if the devs are not going to actually fix DrivePool I'm about done with this software.  There are too many "weird" things going on (not just this particularly bad bug).

  • 0
Posted

This explains why thousands of my photos in Adobe Lightroom Classic suddenly started becoming corrupted over the last couple days since I enabled read-striping and then moved catalogs around between devices en-masse.

  • 0
Posted

So, I turned off the drive pool for a few days to run diagnostics on each individual drive without any errors or drops. As soon as I turned back on the drive pool service, several drives dropped during the rebalancing of the drive pool. I am starting to get a little desperate. There is clearly something wrong with the drive pool service. I disabled read stripping.

  • 0
Posted

Rebalancing tends to put additional load on any device (e.g. a USB enclosure) that handles multiple drives within a pool. If the device is (becoming) flaky then that load can cause one or more drives or even the entire device to drop. I'd suggest taking a look at the hardware you're using?

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Answer this question...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...