Jump to content
  • 8

Beware of DrivePool corruption / data leakage / file deletion / performance degradation scenarios Windows 10/11


Message added by Shane,

As this continues to be an important topic, I've pinned it and am summarising here:

The OP describes faults in change notification handling and FileID handling. The former can cause at least performance issues/crashes (e.g. in Visual Studio), the latter is more severe and causes file corruption/loss for affected users. Specifically for the latter, I've confirmed:

  • Generally a FileID is presumed by apps that use it to be unique and persistent on a given volume that reports itself as NTFS (collisions are possible albeit astronomically unlikely), however DrivePool's implementation is such that collisions after a reboot are effectively inevitable on a given pool.
  • Affected software is that which decides that historical file A (pre-reboot) is current file B (post-reboot) because they have the same FileID and proceeds to read/write the wrong file.

Software affected by the FileID issue that I am aware of:

  • OneDrive, DropBox (data loss). Do not point at a pool.
  • FreeFileSync (slow sync, maybe data loss, proceed with caution). Be careful pointing at a pool.

If you have been able to confirm an application is affected by these particular issues, please post in this thread. When a fix is released I will update this ASAP.

Shane, volunteer mod.

Question

Posted

To start, while new to DrivePool I love its potential I own multiple licenses and their full suite.  If you only use drivepool for basic file archiving of large files with simple applications accessing them for periodic reads it is probably uncommon you would hit these bugs.  This assumes you don't use any file synchronization / backup solutions.  Further, I don't know how many thousands (tens or hundreds?) of DrivePool users there are, but clearly many are not hitting these bugs or recognizing they are hitting these bugs, so this IT NOT some new destructive my files are 100% going to die issue.  Some of the reports I have seen on the forums though may be actually issues due to these things without it being recognized as such. As far as I know previously CoveCube was not aware of these issues, so tickets may not have even considered this possibility.

I started reporting these bugs to StableBit ~9 months ago, and informed I would be putting this post together ~1 month ago.  Please see the disclaimer below as well, as some of this is based on observations over known facts.

You are most likely to run into these bugs with applications that: *) Synchronize or backup files, including cloud mounted drives like onedrive or dropbox *) Applications that must handle large quantities of files or monitor them for changes like coding applications (Visual Studio/ VSCode)


Still, these bugs can cause silent file corruption, file misplacement, deleted files, performance degradation, data leakage ( a file shared with someone externally could have its contents overwritten by any sensitive file on your computer), missed file changes, and potential other issues for a small portion of users (I have had nearly all these things occur).  It may also trigger some BSOD crashes, I had one such crash that is likely related.  Due to the subtle nature some of these bugs can present with, it may be hard to notice they are happening even if they are.  In addition, these issues can occur even without file mirroring and files pinned to a specific drive.  I do have some potential workarounds/suggestions at the bottom.

More details are at the bottom but the important bug facts upfront:

  • Windows has a native file changed notification API using overlapped IO calls.  This allows an application to listen for changes on a folder, or a folder and sub folders, without having to constantly check every file to see if it changed.  Stablebit triggers "file changed" notifications even when files are just accessed (read) in certain ways.  Stablebit does NOT generate notification events on the parent folder when a file under it changes (Windows does).  Stablebit does NOT generate a notification event only when a FileID changes (next bug talks about FileIDs).

 

  • Windows, like linux, has a unique ID number for each file written on the hard drive.  If there are hardlinks to the same file, it has the same unique ID (so one File ID may have multiple paths associated with it). In linux this is called the inode number, Windows calls it the FileID.  Rather than accessing a file by its path, you can open a file by its FileID.  In addition it is impossible for two files to share the same FileID, it is a 128 bit number persistent across reboots (128 bits means the number of unique numbers represented is 39 digits long, or has the uniqueness of something like the MD5 hash).  A FileID does not change when a file moves or is modified.  Stablebit, by default, supports FileIDs however they seem to be ephemeral, they do not seem to survive across reboots or file moves.  Keep in mind FileIDs are used for directories as well, it is not just files. Further, if a directory is moved/renamed not only does its FileID change but every file under it changes. I am not sure if there are other situations in which they may change.  In addition, if a descendant file/directory FileID changes due to something like a directory rename Stablebit does NOT generate a notification event that it has changed (the application gets the directory event notification but nothing on the children).


There are some other things to consider as well, DrivePool does not implement the standard windows USN Journal (a system of tracking file changes on a drive).  It specifically identifies itself as not supporting this so applications shouldn't be trying to use it with a drivepool drive. That does mean that applications that traditionally don't use the file change notification API or the FileIDs may fall back to a combination of those to accomplish what they would otherwise use the USN Journal for (and this can exacerbate the problem).  The same is true of Volume Shadow Copy (VSS) where applications that might traditionally use this cannot (and drivepool identifies it cannot do VSS) so may resort to methods below that they do not traditionally use.


Now the effects of the above bugs may not be completely apparent:

  • For the overlapped IO / File change notification 

This means an application monitoring for changes on a DrivePool folder or sub-folder will get erroneous notifications files changed when anything even accesses them. Just opening something like file explorer on a folder, or even switching between applications can cause file accesses that trigger the notification. If an application takes actions on a notification and then checks the file at the end of the notification this in itself may cause another notification.  Applications that rely on getting a folder changed notification when a child changes will not get these at all with DrivePool.  If it isn't monitoring children at all just the folder, this means no notifications could be generated (vs just the child) so it could miss changes.

  • For FileIDs

It depends what the application uses the FileID for but it may assume the FileID should stay the same when a file moves, as it doesn't with DrivePool this might mean it reads or backs up, or syncs the entire file again if it is moved (perf issue).  An application that uses the Windows API to open a File by its ID may not get the file it is expecting or the file that was simply moved will throw an error when opened by its old FileID as drivepool has changed the ID.   For an example lets say an application caches that the FileID for ImportantDoc1.docx is 12345 but then 12345 refers to ImportantDoc2.docx due to a restart.  If this application is a file sync application and ImportantDoc1.docx is changed remotely when it goes to write those remote changes to the local file if it uses the OpenFileById method to do so it will actually override ImportantDoc2.docx with those changes.

I didn't spend the time to read Windows file system requirements to know when Windows expects a FileID to potentially change (or not change).  It is important to note that even if theoretical changes/reuse are allowed if they are not common place (because windows uses essentially a number like an md5 hash in terms of repeats) applications may just assume it doesn't happen even if it is technically allowed to do so.  A backup of file sync program might assume that a file with specific FileID is always the same file, if FileID 12345 is c:\MyDocuments\ImportantDoc1.docx one day and then c:\MyDocuments\ImportantDoc2.docx another it may mistake document 2 for document 1, overriding important data or restore data to the wrong place.  If it is trying to create a whole drive backup it may assume it has already backed up c:\MyDocuments\ImportantDoc2.docx if it now has the same File ID as ImportantDoc1.docx by the time it reaches it (at which point DrivePool would have a different FileID for Document1).


Why might applications use FileIDs or file change notifiers? It may not seem intuitive why applications would use these but a few major reasons are: *) Performance, file change notifiers are a event/push based system so the application is told when something changes, the common alternative is a poll based system where an application must scan all the files looking for changes (and may try to rely on file timestamps or even hashing the entire file to determine this) this causes a good bit more overhead / slowdown.  *)  FileID's are nice because they already handle hardlink file de-duplication (Windows may have multiple copies of a file on a drive for various reasons, but if you backup based on FileID you backup that file once rather than multiple times.  FileIDs are also great for handling renames.  Lets say you are an application that syncs files and the user backs up c:\temp\mydir with 1000 files under it.  If they rename c:\temp\mydir to c:\temp\mydir2 an application use FileIDS can say, wait that folder is the same it was just renamed. OK rename that folder in our remote version too.  This is a very minimal operation on both ends.  With DrivePool however the FileID changes for the directory and all sub-files.  If the sync application uses this to determine changes it now uploads all these files to the system using a good bit more resources locally and remotely.  If the application also uses versioning this may be far more likely to cause a conflict with two or more clients syncing, as mass amounts of files are seemingly being changed.

Finally, even if an application is trying to monitor for FileIDs changing using the file change API, due to notification bugs above it may not get any notifications when child FileIDs change so it might assume it has not.


Real Examples
OneDrive
This started with massive onedrive failures.  I would find onedrive was re-uploading hundreds of gigabytes of images an videos multiple times a week.  These were not changing or moving.  I don't know if the issue is onedrive uses FileIDs to determine if a file is already uploaded, or if it is because when it scanned a directory it may have triggered a notification that all the files in that directory changed and based on that notification it reuploads.  After this I noticed files were becoming deleted both locally and in the cloud.  I don't know what caused this, it might have been because the old file it thought was deleted as the FileID was gone and while there was a new file (actually the same file) in its place there may have been some odd race condition.   It is also possible that it queued the file for upload, the FileID changed and when it went to open it to upload it found it was 'deleted' as the FileID no longer pointed to a file and queued the delete operation.   I also found that files that were uploaded into the cloud in one folder were sometimes downloading to an alternate folder locally.  I am guessing this is because the folder FileID changed.  It thought the 2023 folder was with ID XYZ but that now pointed to a different folder and so it put the file in the wrong place.  The final form of corruption was finding the data from one photo or video actually in a file with a completely different name.  This is almost guaranteed to be due to the FileID bugs.  This is highly destructive as backups make this far harder to correct.  With one files contents replaced with another you need to know when the good content existed and in what files were effected.  Depending on retention policies the file contents that replaced it may override the good backups before you notice.  I also had a BSOD with onedrive where it was trying to set attributes on a file and the CoveFS driver corrupted some memory.  It is possible this was a race condition as onedrive may have been doing hundreds of files very rapidly due to the bugs.  I have not captured a second BSOD due to it, but also stopped using onedrive on DrivePool due to the corruption.   Another example of this is data leakage.  Lets say you share your favorite article on kittens with a group of people.   Onedrive, believing that file has changed, goes to open it using the FileID however that file ID could essentially now correspond to any file on your computer now the contents of some sensitive file are put in the place of that kitten file, and everyone you shared it with can access it.

Visual Studio Failures
Visual studio is a code editor/compiler.  There are three distinct bugs that happen.  First, when compiling if you touched one file in a folder it seemed to recompile the entire folder, this due likely to the notification bug.  This is just a slow down, but an annoying one.  Second, Visual Studio has compiler generated code support.  This means the compiler will generate actual source code that lives next to your own source code.   Normally once compiled it doesn't regenerate and compile this source unless it must change but due to the notification bugs it regenerates this code constantly and if there is an error in other code it causes an error there causing several other invalid errors.  When debugging visual studio by default will only use symbols (debug location data) as the notifications from DrivePool happen on certain file accesses visual studio constantly thinks the source has changed since it was compiled and you will only be able to breakpoint inside source if you disable the exact symbol match default.  If you have multiple projects in a solution with one dependent on another it will often rebuild other project deps even when they haven't changed, for large solutions that can be crippling (perf issue).  Finally I often had intellisense errors showing up even though no errors during compiling, and worse intellisense would completely break at points.  All due to DrivePool.


Technical details / full background & disclaimer

I have sample code and logs to document these issues in greater detail if anyone wants to replicate it themselves.

It is important for me to state drivepool is closed source and I don't have the technical details of how it works.  I also don't have the technical details on how applications like onedrive or visual studio work.  So some of these things may be guesses as to why the applications fail/etc.

The facts stated are true (to the best of my knowledge) 


Shortly before my trial expired in October of last year I discovered some odd behavior.  I had a technical ticket filed within a week and within a month had traced down at least one of the bugs.  The issue can be seen https://stablebit.com/Admin/IssueAnalysis/28720 , it does show priority 2/important which I would assume is the second highest (probably critical or similar above).  It is great it has priority but as we are over 6 months since filed without updates I figured warning others about the potential corruption was important.  


The FileSystemWatcher API is implemented in windows using async overlapped IO the exact code can be seen: https://github.com/dotnet/runtime/blob/57bfe474518ab5b7cfe6bf7424a79ce3af9d6657/src/libraries/System.IO.FileSystem.Watcher/src/System/IO/FileSystemWatcher.Win32.cs#L32-L66

That corresponds to this kernel api:
https://learn.microsoft.com/en-us/windows/win32/fileio/synchronous-and-asynchronous-i-o

Newer api calls use GetFileInformationByHandleEx to get the FileID but with older stats calls represented by nFileIndexHigh/nFileIndexLow.  


In terms of the FileID bug I wouldn't normally have even thought about it but the advanced config (https://wiki.covecube.com/StableBit_DrivePool_2.x_Advanced_Settings) mentions this under CoveFs_OpenByFileId  "When enabled, the pool will keep track of every file ID that it gives out in pageable memory (memory that is saved to disk and loaded as necessary).".   Keeping track of files in memory is certainly very different from Windows so I thought this may be the source of issue.  I also don't know if there are caps on the maximum number of files it will track as if it resets FileIDs in situations other than reboots that could be much worse. Turning this off will atleast break nfs servers as it mentions it right in the docs "required by the NFS server".

Finally, the FileID numbers given out by DrivePool are incremental and very low.  This means when they do reset you almost certainly will get collisions with former numbers.   What is not clear is if there is the chance of potential FileID corruption issues.  If when it is assigning these ids in a multi-threaded scenario with many different files at the same time could this system fail? I have seen no proof this happens, but when incremental ids are assigned like this for mass quantities of potential files it has a higher chance of occurring.

Microsoft mentions this about deleting the USN Journal: "Deleting the change journal impacts the File Replication Service (FRS) and the Indexing Service, because it requires these services to perform a complete (and time-consuming) scan of the volume. This in turn negatively impacts FRS SYSVOL replication and replication between DFS link alternates while the volume is being rescanned.".  Now DrivePool never has the USN journal supported so it isn't exactly the same thing, but it is clear that several core Windows services do use it for normal operations I do not know what backups they use when it is unavailable. 


Potential Fixes
There are advanced settings for drivepool https://wiki.covecube.com/StableBit_DrivePool_2.x_Advanced_Settings beware these changes may break other things.
CoveFs_OpenByFileId - Set to false, by default it is true.  This will disable the OpenByFileID API.  It is clear several applications use this API.  In addition, while DrivePool may disable that function with this setting it doesn't disable FileID's themselves.  Any application using FileIDs as static identifiers for files may still run into problems. 

I would avoid any file backup/synchronization tools and DrivePool drives (if possible).  These likely have the highest chance of lost files, misplaced files, file content being mixed up, and excess resource usage.   If not avoiding consider taking file hashes for the entire drivepool directory tree.  Do this again at a later point and make sure files that shouldn't have changed still have the same hash.

If you have files that rarely change after being created then hashing each file at some point after creation and alerting if that file disappears or hash changes would easily act as an early warning to a bug here being hit.

Recommended Posts

  • 0
Posted
9 hours ago, MitchC said:

I don't know what would happen if FileID returned 0 or claimed not available on the system even thought it is an NTFS volume.

I should think good practice would be to respect a zero value regardless (one should always default to failing safely), but the other option would be to return maxint which means "this particular file cannot be given a unique File ID" and just do so for all files (basically a way of saying "in theory yes, in practice no").

DrivePool does have an advanced setting CoveFs_OpenByFileId however currently #1 it defaults to true and #2 when set to false any querying file name by file id fails but querying file id by file name still returns the (broken) file id instead of zero. I've just made a support request to ask StableBit to fix that.

Note that if any application is using File ID to assume "this is the same file I saw last whenever" (rather than "this is probably the same file I saw last whenever") for any volume that has users or other applications independently able to delete and create files, consider whether you need to start looking for a replacement application. While the odds of collision may be extremely low it's still not what File ID is for and in a mission-critical environment it's taunting Murphy.

8 hours ago, Thronic said:

Maybe they could passthrough the underlying FileID, change/unchanged attributes from the drives where the files actually are - they are on a real NTFS volume after all. Trickier with duplicated files though...

A direct passthrough has the problem that any given FileID value is only guaranteed to be unique within a single volume while a pool is almost certainly multiple volumes. As the Microsoft 64-bit File ID (for NTFS, two concatenated incrementing DWORDs, basically) isn't that much different from DrivePool's ersatz 64-bit File ID (one incrementing QWORD, I think) in the ways that matter for this it'd still be highly vulnerable to file collisions and that's still bad.

... Hmm. On the other hand, if you used the most significant 8 of the 64 bits to instead identify the source poolpart within a pool, theoretically you could still uniquely represent all or at least the majority of files in a pool of up to 255 drives so long as you returned maxint for any other files ("other files" being any query where the File ID in a poolpart set any of those 8 bits to 1, the file only resided on the 256th or higher poolpart or no File ID returned by the first 255 poolparts was not maxint) and still technically meet the specification for Microsoft's 64-bit File ID? I think it should at least "fail safely" which would be an improvement over the current situation?

Does that look right?

8 hours ago, Thronic said:

Does using CloudDrive on top of DrivePool have any of these issues? Or does that indeed act as a true NTFS volume?

@Christopher (Drashna) does CloudDrive use File ID to track the underlying files on Storage Providers that are NTFS or ReFS volumes in a way that requires a File ID to remain unique to a file across reboots? I'd guess not, and that CloudDrive on top of DrivePool is fine, but...

  • 0
Posted

Unfortunately it has been over a year and no change.
I feel it is a case of the developer thinks they are right and the rest of the world is wrong.

To any users out there of Drivepool, the warning is clear, be VERY careful how you interact with the data on a Drivepool volume. Any software you use that interacts with fileid on a Drivepool volume can give unintended consequences, from minor performance loss through to serious data loss.

I just don’t understand why they can’t map the last 64bit of object-id to file-id. This seems like a very simple fix to me? If the 128bit object-id is unique for every file then the last 64bit are as well. Just means a limit of unique 18446744073709551615 files for the volume as per ntfs.

  • 0
Posted

Fair enough, not the last 64bits.

If there was no Object-ID then I agree with the overhead point. However the Drivepool developer has ALREADY gone to all the trouble and overhead of generating unique 128bit Object-ID's for every file on the pool.

This is why I feel it should be trivial to now also populate the File-ID with a unique 64bit value derived from this Object-ID. All the hard work has already been done.

No argument with the beta/alpha test. I would happily test this way as well. At present though we have a broken File-ID system.

 

  • 0
Posted
5 hours ago, JC_RFC said:

If there was no Object-ID then I agree with the overhead point. However the Drivepool developer has ALREADY gone to all the trouble and overhead of generating unique 128bit Object-ID's for every file on the pool.

This is why I feel it should be trivial to now also populate the File-ID with a unique 64bit value derived from this Object-ID. All the hard work has already been done.

My testing so far hasn't seen DrivePool automatically generating Object-ID records for files on the pool; if all the files on your pool have Object-ID records you may want to look for whatever on your system is doing that.

I suspect that trivial in theory ends up being a lot of work in practice. Not only do you need to do populate the File-ID records in your virtual MFT from the existing Object-ID records in the underlaying physical $ObjID files across all of your poolpart voumes, you also need to be able to generate new File-ID records whenever new files are created on the pool and immediately update those $ObjID files accordingly, you need to ensure these records are correctly propagating during balancing/placement/duplication changes, you need to add the appropriate detect/repair routines to your consistency checking, and you need to do all of this as efficiently and safely as possible.

  • 0
Posted

You are right, I don't know where I got this idea that Drivepool had object-id's for all files from.

I just checked and my files do not have an object_id. So yes, lots of work from here to have unique file_id's, agreed.

  • 0
Posted
Quote

Our File IDs persist until the next reboot. We avoid using fully persistent File IDs to enhance performance.

Guess that's that then, pretty much. A compromise made for having every drive intact with its own NTFS volume, emulating NTFS on top of NTFS.

I keep thinking I'd prefer my pool to be on block level between hardware layer and file system. We'd loose the ability to access drives individually via direct NTFS mounting outside the pool (which I guess is important to the developer and at least initial users), but it would have been a true NTFS on top of drives, formatted normally and with full integrity (whatever optional behavior NTFS is actually using). Any developer could then lean on experience they make on any real NTFS, and get the same functionality here.

If not using striping across drives, virtual drive could easily place entire file byte arrays on individual drives without splitting. Drives would then still not have to be reliant on eachother to recover data from specific drives like typical raids, one could read via the virtual drive whatever is on them by checking whatever proprietary byte array journal data one designs to attach to each file on block level. I'd personally prefer like something like that, at least from a glance when just thinking about it.

I'm pretty much in the @JC_RFC camp on this.

Thanks all for making updates on this.

  • 0
Posted

Would i be safe from data corruption if I'm not using all the fancy stuff like read stripping & duplication?

I never enabled those, the only plugins i enabled are the scanner and data limiter, and i have file placement rules set so that things don't spread out and contained within their a specific drive, for example, choosing one SSD for a specific app(s) or game(s), and HDD for all other date.

I'm a new user so i haven't yet encountered any weird stuff.

  • 0
Posted (edited)

Hi haoma.

The corruption isn't being caused by DrivePool's duplication feature (and while read-striping EDIT: can have some issues with some older or... I'll say "edge-case" utilities, so I usually just leave it off anyway is buggy and should be turned off, that's also not the cause here).

The corruption risk comes in if any app relies on a file's ID number to remain unique and stable unless the file is modified or deleted, as that number is being changed by DrivePool even when the file is simply renamed, moved or if the drivepool machine is restarted - and in the latter case being re-used for completely different files.

TLDR: far as I know currently the most we can do is to change the Override value for CoveFs_OpenByFileId from null to false (see Advanced Settings). At least as of this post date it doesn't fix the File ID problem, but it does mean affected apps should either safely fall back to alternative methods or at least return some kind of error so you can avoid using them with a pool drive.

Edited by Shane
EDIT to update re read-striping bug
  • 0
Posted (edited)

To be fair to Stablebit I used Drivepool for the past few years and have NEVER lost a single file because of Drivepool. The elaborateness OR simpleness of how you use Drivepool within itself is not really of concern.

What is being warned of here though is if you use any special applications that might expect FileID to behave as per NTFS there will be risks with that.

My example is that I use Freefilesync quite a bit to maintain files between my pool, my htpc and another backup location. When I move files on one drive, freefilesync using fileid recognises the file was moved so syncs a "move" on the remote filesystem as well. This saves potentially hours of copying and then deleting. It does not work on Drivepool because the fileid changes on each reboot. In this case Freefilesync fails "SAFE" in that it does the copy and delete instead, so I only suffer performance issues.

What could happen though is that you use another app that say cleans old files, or moves files around that does not fail safe if a fileid is repeated for a different file etc and in doing so you do suffer file loss. This will only happen if you use some third party application that makes changes to files. It's not the type of thing a word processor or a pc game etc are going to be doing (typically in case someone jumps in with a it could be possible argument).

So generally Drivepool is safe, and for you most likely of nothing to worry about, but if you do use an application now or in the future that is for cleaning up or syncing etc then be very careful in case it uses fileid and causes data loss because of this issue.

For day to day use, in my experience you can continue to use it as is. If you want to add to the group of us that would like this improved, feel free to add your voice to the request as otherwise I don't see any update for this in the foreseeable future.

Edited by JC_RFC
typo
  • 0
Posted

Cross-quoting my workaround that definitely works:

1 hour ago, roirraWedorehT said:

Thanks for the details, and the work necessary to gather them and post about it.

I'm working around the issue currently by having a Hyper-V virtual PC set only for using Google Drive Backup and Sync, or whatever they happen to be calling it at this moment.  I gave the virtual PC a secondary virtual hard drive - I made it a 10 GB dynamically expanding drive, so it'll only take up as much space as it needs to, but is open to expansion for a good long while, and have Drive sync to a folder on that VHDX.  Then I have it shared over the network, and I assigned it the letter G: on my host desktop.

I know there are plenty of people out there who don't have experience with virtual PCs, but Windows' built-in Hyper-V makes it so easy and fast.

I would obviously prefer to use it locally, but this will work for now.  The sync app hasn't crashed at all, and everything is synced.

Edit:  Now that I think about it, creating a .VHDX locally on the host desktop through Disk Management, mounting it (as drive G: or whatever you prefer), and pointing Google Drive and Sync to that virtual drive sitting on a drivepool would probably work, as well - but I'm tired of playing around and chancing that something else will still cause issues, so I'll keep Google Drive and Sync in my virtual PC for the foreseeable future. 

Edit 2:  It's possible I actually tried that proposed second simpler solution first, but I still had issues.  The issues might've been because, when rebooting, or signing out and signing back in to Windows (if you ever do that), it's possible that the VHDX wasn't mounted before Google Drive Backup and Sync automatically launched.  While it's possible that there's a way to force Google Drive Backup and Sync to wait to launch, I either didn't look at that possibility, or didn't want to have to go down that road.

Or there were circumstances where the VHDX could be temporarily unmounted, causing Google's program to crash.

Anyone, please let me know if you test the local .VHDX solution, and it has no issues for weeks.

Cheers, all.

 

  • 0
Posted

I am curious about one thing... It was hinted at before with SnapRAID "not helping" but isn't that only assumed if you're using it to parity the pool itself?

I've been using SnapRAID and DrivePool for a few years now. I don't use tools like Google Drive or OneDrive on my pool but I do have programs directly access the pool such as Autodesk Inventor and Blender.

I haven't noticed this kind of issue myself but I wanted to make a note about SnapRAID. You should not be syncing from the pool created by DrivePool. And as such SnapRAID should give you some protection against this if you're using some of the tools available in SnapRAID. Assuming I am not misunderstanding the fundamental issue here.

With how my system is set up I have all of my drives (both the data and the parity) mounted to folders on my C drive with no drive letter. SnapRAID is configured to access these directly only. DrivePool is creating a pool from these mount points and in turn can not even see the files SnapRAID creates on them. This means SnapRAID can repair the information DrivePool uses (the PoolPart folder) which also means a rebuilt volume would reappear in DrivePool like nothing happened. It also means that SnapRAID is running diff and scrub directly on the drive and not through DrivePool so it won't see FileIDs changing that way.

Now I don't think it is completely safe, just that SnapRAID will see something has happened to the file and it should alert you to this. It's no different than if another program corrupted the file directly, it'll still sync that corruption blindly if you tell it so.

  • 0
Posted

@CharredChar I would agree if Snapraid is used on the underlying drives it should avoid triggering corruption due to the FileID bug.   Keep in mind that Snapraid does not like data that is changed/moved around so I would make sure to minimize any drivepool balancing.   It also will throw 'errors' if you have writes going on during its scrub due to balancing.     

Keep in mind as well that it is mainly meant to detect corruption due to a bad disk/hardware issue it does not guard well against intentionally overwritten files.     I believe (and as not a snapraid user maybe my understanding is wrong) you normally run a scrub shortly after running a sync as any modifications between the two would show warnings during a scrub.

If a FileID issue happens it really would just look like a file was overwritten so while it is recoverable if you notice, as soon as you run a sync it would just update the parity like anything else and there wouldn't be any warning about an issue.     There is also some possibility of snapraid not being able to properly detect modified files effected by the FileID bug.   It normally uses the inode (which wouldn't change) and last modified timestamp for detecting changes.   A synchronization/backup/cloud share program may intentionally set a last modified timestamp on a file which normally wouldn't be an issue, even if you were adding old files to the system.  With old files as they wouldn't have existed prior snapraid would pickup that they were 'new' as not seen before.     With the FileID bug thought it gets more complicated.   Say you have Image X.jpg  and it has a timestamp of 2020-01-01   and then you have ImportantWork.docx and it has a timestamp of 2024-05-05.    With the File ID bug a program may override ImportantWork.docx with the content of X.jpg and set the timestamp to 2020-01-01.   Now if snapraid detects changes if the timestamps simply don't match then no problem it will pickup this override (for better or for worse) without an issue.   If however it looks for things modified since the last sync time it could potentially miss that change as the FileID didn't change and the timestamp is older than the last sync.  I don't know enough to say about that part on snapraid.  Here is a Q/A that talks about changes being missed with non-changing timestamps:

Why are VeraCrypt containers never saved?

VeraCrypt (a fork of TrueCrypt) by default has enabled the option Preserve modification time-stamp of file containers that makes impossible at SnapRAID, and at other backup programs, to detect that a file container is changed. Ensure to disable this option in VeraCrypt.

Snapraid doesn't really have any point in time snapshotting.    It does very well to detect drive errors or recover from a lost drive but any fileid data loss will be lost forever as soon as you run sync after it happens.  This also means if you don't detect this issue right away it happening slowly over time will still lead to you losing everything except between that last sync and when you detect it.

My concern was if you point snapraid at any drivepools directly. I am not a snapraid user/developer nor one for drivepool so I didn't spend any time to actually try and repro corruption with it, but anything that touches the file ID for anything other than an in memory (or live comparison) constant could be at risk. 

 

Re: @Shane 's question about what is effected otherwise

Again even if you are quite technical and have very obvious failures it may be awhile before you are able to confirm the problem is drivepool.  Even as the OP here and finding the technical bugs with drivepool some app failures took me months to attribute and confirm.

With these bugs outstanding I have temporarily suspended my use of DrivePool so I no longer investigate or track app failures related to drivepool.

I would guess most tools that might have the potential of a few dozen or more files open at once specifically but not on the list biggest ones would be:

Visual Studio Code
Visual Studio
*) Severe performance impact on these IDEs.  They use file changed notifications to determine if source should be recompiled.  This can result in much longer build times.
*) Debug symbol match failures, as they get notifications the symbols/files have changed they never believe the symbols are an exact match (can be worked around by allowing non-matching symbol use).
*) Complicated build failures if compiler generated code is used.
*) For visual studio, intellisense will show errors when there are none, and will completely break at times

  • 0
Posted

That is correct, SnapRAID does not function properly if data is altered during a sync. Anything I have that might do so gets disabled (services, mostly) before a sync starts and enabled after it completes successfully. I actually came to these forums today to see if I could add an extra safe-guard by seeing if I could command DrivePool to lock the pool of any writes (without blocking reads or altering the permissions of all files) during the time SnapRAID is running.

Scrubs also don't check over the entire drive by default, there are augments for scrub to change this. For example, I have mine check both the newest data along with older data every sync to where everything on the array gets checked in 30 days. This has actually caught damaged files for me multiple times in the past (I am not sure what caused them) so it has worked well for me. And you are correct, it won't "protect" against intentionally altering a file. But what I was mentioning before is that since SnapRAID runs on the underlying drive and not through DrivePool the FileID should not change just due to modifications of that file. But that is the important part I wanted to point out, not running SnapRAID directly on the pool managed by DrivePool.

If something happens where the file is deleted then replaced with a file of the same name (using the OneDrive example) would this not actually change the FileID on the underlying file system since it is no longer actually the same file? This would come up in SnapRAID as "File Deleted" along with "File Added" even though it has the same name in the same location. This is what I was trying to confirm with my post. Since SnapRAID doesn't interact with DrivePool at all where the concern lies here would be how the file is handled on the actual drive when the FileID issue occurs in DrivePool.

And in my specific case SnapRAID won't run for me if it sees too many "deleted" files as that is the first flag for a major issue. Though this would only trigger with enough files being deleted so if it happens to a handful over a long period of time it would not catch it. It would catch something like the OneDrive example though, assuming the above about the FileID is true. I also have it run the Touch command as well which should add an extra layer of defense, though not against applications that intentionally match the modify date of a file they are copying.

This has been something that caught my eye as I am starting to get into VS (one reason I was looking at locking the pool during sync) so I want to avoid such issues. It has been my practice to usually use some other drive outside of the pool for active work (Multi-TB SSDs are cheap now adays) then periodically copy to the pool and mostly use the pool more like an archive. This kind of issue really just cements that practice for me since it is so difficult to know how programs will react to such a bug. I sure hope it gets resolved in the future.

  • 0
Posted

  

1 hour ago, CharredChar said:

If something happens where the file is deleted then replaced with a file of the same name (using the OneDrive example) would this not actually change the FileID on the underlying file system since it is no longer actually the same file? This would come up in SnapRAID as "File Deleted" along with "File Added" even though it has the same name in the same location.

Except I don't think you would see a delete and replace, it would be a modify.   An application that triggers the drivepool bug on the array would open a file handle by its ID/Inode number, only problem is the file it thinks it is opening is not the one it actually opens.  It just writes the contents it thinks goes with that file into that handle though.

1 hour ago, CharredChar said:

And in my specific case SnapRAID won't run for me if it sees too many "deleted" files as that is the first flag for a major issue.

I would assume this could potentially be possible with modified files too, although depending on what you use drivepool for one might normally have a good number of modified files.

 

1 hour ago, CharredChar said:

It would catch something like the OneDrive example though, assuming the above about the FileID is true

First, I don't think there is any contest about if the FileID bug is true, even the developer has acknowledged it but mostly said wont fix.   As for catching onedrive, im not sure.   I ran drivepool for months and it wasn't like overnight just all files changed to other files (and most of the files being corrupted were not modified for quite some time).   Maybe it is something like onedrive notices a file change and adds it to a sync index by file ID,  if the machine reboots before it actually syncs it it then reads it by ID and gets the wrong contents.    

 

1 hour ago, CharredChar said:

It has been my practice to usually use some other drive outside of the pool for active work

Yeah for most projects I kept the source on main drives but I sadly work on some massive hogs from time to time so those went on drivepool.   It is nice having all the chrome source and debug binaries around but its 100G per build so;0.

 

1 hour ago, CharredChar said:

. For example, I have mine check both the newest data along with older data every sync to where everything on the array gets checked in 30 days. This has actually caught damaged files for me multiple times in the past

I assume you mean check every scrub right?  Sync updates the parity info on new/changed data so would not want to be part of some check process as if you did detect this happening sync destroys that ability to recover.    Part of my concern would be if a triggering FileID program on the drivepool starts replacing existing file content with alternate file content but the timestamp for modified time doesn't change can snapraid potentially miss this change.  As any data that snapraid hasn't calculated the parity yet on is that much data you can't recover that you think you can.   If you ran a Sync on Sunday and then by Friday you have modified/deleted 5GB of files and this FileID bug replaced the content on another 10GB of files then you have 15 GB less than your parity size that you can recover.   Now if you are recovering that 10GB from this bug no biggy, but if you have If your scrub checks new data and 10% of old data it would catch this if one of these files was part of that 10%.  If you scrub once a week thought it would be over 2 months before you are guaranteed to catch these silent modifications.  

Also, Office and OneDrive have a very specific integration so if you are signed into a personal or work microsoft account in an office program and have onedrive / sharepoint(onedrive for business) enabled (which MS makes it decently hard to avoid) you have cooperative syncing that may also be vulnerable to this issue.

The best way I would see to detect this would be to hash all files and store the date of the last time they were hashed.   At a later date you would hash all files again, if the hash changes but the modified date was before the last time the files were hashed something is probably wrong.  This isn't foolproof (for example a sync program that syncs an older file down after a hashing and sets the modified time to before the hashing).

None of these things would also help to detect "corrupted" backups.   If you backup to the cloud maybe it wouldn't self sabotage by overriding files on your PC with files it think changed on that same PC but it could end up backing up the wrong contents to your remote backup space.   Here you would see the same file names on your PC in the backup location but the backup location would have alternate content.  One day your PC breaks/is lost/etc you go to restore from backup and then discover the issue.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Answer this question...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...