Jump to content

MitchC

Members
  • Posts

    14
  • Joined

  • Last visited

  • Days Won

    5

Reputation Activity

  1. Like
    MitchC got a reaction from Shane in Beware of DrivePool corruption / data leakage / file deletion / performance degradation scenarios Windows 10/11   
    If this is bi-directional syncing as well this is a bit of the nightmare scenario for this bug.   80TB is a massive amount if you had just 00.1% of your data (aka 100GB) changed would you notice? 
    SHA hashing of every file is good at detecting ordinary corruption but would likely not catch the data loss this set of bugs can cause.   The issue here would appear more like you overwrote a file with new content intentionally (or depending on the application that you copied a file over another).  
    I can assume if your data is all pristine then Syncthing probably doesn't use file ID's right now.
     
    It is well shown that drivepools file change notification system has multiple large flaws.  Many applications can trigger a 'file content changed' notification from Windows even when they are only reading it.  Maybe syncthing checks the file timestamp and size at that point and if its the same it does nothing.  If it listens to the file system though at best you just have the file completely read and re-hashed for sync-thing to decide no changes, at worst it just queues it for syncing and you sync extra data.     Either way you could be wearing the drives out faster that needed, losing performance, or potentially wasting bandwidth and backup time.   We also know that drivepool does not properly bubble up file change notifications when writes actually happen which, depending how syncthing is watching, could mean it misses some files that change.  Not a huge deal but if it does a full scan monthly to make sure it has all file changes detected and in between you rely on file change notifications to catch files to sync it means you might think everything was in sync to right before a crash when in reality it might be up to a month out of date.
    If the file is likely to actually have changed (say a log file) I would say unrelated.   Even for one time writes, it could be the application was still writing the file as well when syncthing starts hashing and again not related.  It is also possible though that it goes to read a changed file, but causes the notification bug so it gets the file has changed and then provides that warning.  This could be a race condition as it would likely cause it right at the start of the read so depending when it starts considering the notifications after it reads to be a change it may only happen some times.   Another option would be if something else also has file change notifications then if the other app reads the file after syncthing starts reading it the other app causes a 'write notification' even though its only reading due ot this bug.
     
    First, there is 0% chance these bugs are not critically problematic with drivepool.   They can lead to direct data loss or corruption and sensitive data to be leaked which is a horrid statement for a file system.   The question is do these bugs affect your specific use case.  
    The problem is it may not present uniformly.   Maybe syncthing does diff based syncing for only the parts of a file that changed for bigger files (say over 10MB) but any files it thinks have changed that are under 10MB it just syncs them blindly as they are so small and it keeps cpu usage down.  Maybe it uses a more simple solution.  If a file is 'changed' it tries to optimize append based changes.  It hashes the file file size and if that equals the old hash it knows it only needs to sync the newer bytes, otherwise it syncs the whole file. 
    Even if the worst that happens right now is you have excess drive reads of bandwidth spend that speaks nothing of tomorrow.   Maybe syncthing decides it does not need to hash a file when it gets a change notification as that causes a full additional read of the entire file (hurting performance and drive life) so it starts to just trust windows file change notifications.  Maybe you never even upgrade syncthing but right now you don't use an application that triggers the 'file content changed' notification when it just opens a file for reading (IE VSCode might not but something like notepad does).  You start using a new file editor or video player and now it does trigger that bug, so now syncthing is getting a whole lot more of these change notifications.   When you upgrade syncthing do you read about all changes between versions?  Who knows if some internal changes wold even make the change log.  If syncthing starts relying on FileID's more in the next version then your data may slowly corrupt.
    If most of your data doesn't change then hashing it all now,   and hashing it down the line and comparing would show you if a file changes that shouldn't.   This is not the same hashing that syncthing does as it is looking for corruption from the system/disk/transfer and not for the file contents being updated on purpose.   Still, even then as these bugs are likely to effect files that change more often first that may not catch things quickly (mainly there you are waiting for it to go to write a FileID of one  of the new files but it ends up overriding an old file instead).
    I briefly looked at syncthings documentation that it said it compares file modification times for detecting if a file changed.   I don't know if their documentation just didn't mention it using size as well or if it actually only looks at the modification time to detect changes.  If so this could be more risky as well.
    Personally I moved to Truenas which while not as flexible in terms of drive pooling got me what I wanted and the snapshotting makes backups fantastic.   For others if unraid or similar is possible you could still have such flexibility without the liability of drivepool.  This is not a fun game of chance to play where you are hoping you don't find yourself using the wrong combination of apps that leads to data loss.
    Drivepool clearly works (at least mostly as many may not know perf or other issues are caused by drivepool) correct for most people.  Because of the exceptional difficulty of knowing how these bugs could effect you today or in the future I still see it as quite reckless that drivepool does not at least warn you of these possibilities.   This is not dis-similar to the fact that there seems to be a decent bug in reading striped data from drivepool for multiple users yet there is no such warning that striping can cause such a bad issue: 
     
  2. Like
    MitchC got a reaction from roirraWedorehT in Beware of DrivePool corruption / data leakage / file deletion / performance degradation scenarios Windows 10/11   
    Sorry, should also mention this is confirmed by StableBit and can be easily reproduced.   The attached powershell script is a basic example of the file monitoring api.  Run it by "monitor.ps1 my_folder"  where my folder is what you want to monitor.  Have a file say hello.txt inside.   Open that file in notepad.     It should instantly generate a monitoring file change event.  Further tab away from notepad and tab back to it, you will again get a changed event for that file.  Run the same thing on a true NTFS system and it will not do the same.
    You can also reproduce the lack of notifications for other events by changing the IncludeSubdirectories variable in it and doing some of the tests I mention above.
    watcher.ps1
  3. Thanks
    MitchC got a reaction from roirraWedorehT in Beware of DrivePool corruption / data leakage / file deletion / performance degradation scenarios Windows 10/11   
    So this is correct, as the documentation you linked to states.  One item I mentioned though, is the fact that even if it can  be re-used if in practice it isn't software may make the wrong assumption that it won't.  Not good on that software but it may be a practical exception that one might try to meet.  Further, that documentation also states:
    "In the NTFS file system, a file keeps the same file ID until it is deleted. "
    As DrivePool identifies itself as NTFS it is breaking that expectation.
    I am not sure how well things work if you just disable File IDs, maybe software will fallback to a more safe behavior (even if less performant).    In addition, I think the biggest issue is silent file corruption.  I think that can only happen due to File ID collisions (rather than just the FIle ID changing).   It is a 128 bit number, GUID's are 128 bits.  Just randomize the sucker the first time you assign a file ID (rather than using the incremental behavior currently).  Aside from it being more thread safe as you don't have a single locked increment counter it is highly unlikely you would hit a collision.  Could you run into a duplicate ? sure.  Likely? Probably not.   Maybe over many reboots (or whatever resets the ID's in drivepool beside that) but as long as whatever app that uses the FileID has detected it is gone before it is reused it eventually colliding would likely not have much effect.   Not perfect but probably an easier solution.  Granted apps like onedrive may still think all the files are deleted and re-upload them if the FileID's change (although that may be more likely due to the notification bug).
    Sure.  Except one doesn't always know how tools work.  I am only making a highly educated guess this is what OneDrive is using, but only made this after significant file corruption and research.  One would hope you don't need to have corruption before figuring out the tool you are using uses the FileID.    In addition, FileID may not be the primary item a backup/sync tool uses but something like USF may be a much more common first choice.  It may only fall back to other options when that is not available.
    Is it possible the 5-6 apps I have found that run into issues are the only ones out there that uses these things? Sure.  I just would guess I am not that lucky so there are likely many more that use these features.
     
    I did see either you (or someone else) who posted about the file hashing issue with the read striping.  It is a big shame, reporting data corruption (invalid hash values or rather returning the wrong read data which is what would lead to that) is another fairly massive problem.    Marking good data bad because of an inconsistent read can lead to someone thinking they lost data and trashing it, or restoring an older version that may cause newer data to be lost in an attempt to fix.  I would look into a more consistent read striping repro test but at the end of the day these other things stop me from being able to use drivepool for most things I would like to.
  4. Thanks
    MitchC got a reaction from roirraWedorehT in Beware of DrivePool corruption / data leakage / file deletion / performance degradation scenarios Windows 10/11   
    To start, while new to DrivePool I love its potential I own multiple licenses and their full suite.  If you only use drivepool for basic file archiving of large files with simple applications accessing them for periodic reads it is probably uncommon you would hit these bugs.  This assumes you don't use any file synchronization / backup solutions.  Further, I don't know how many thousands (tens or hundreds?) of DrivePool users there are, but clearly many are not hitting these bugs or recognizing they are hitting these bugs, so this IT NOT some new destructive my files are 100% going to die issue.  Some of the reports I have seen on the forums though may be actually issues due to these things without it being recognized as such. As far as I know previously CoveCube was not aware of these issues, so tickets may not have even considered this possibility.
    I started reporting these bugs to StableBit ~9 months ago, and informed I would be putting this post together ~1 month ago.  Please see the disclaimer below as well, as some of this is based on observations over known facts.
    You are most likely to run into these bugs with applications that: *) Synchronize or backup files, including cloud mounted drives like onedrive or dropbox *) Applications that must handle large quantities of files or monitor them for changes like coding applications (Visual Studio/ VSCode)

    Still, these bugs can cause silent file corruption, file misplacement, deleted files, performance degradation, data leakage ( a file shared with someone externally could have its contents overwritten by any sensitive file on your computer), missed file changes, and potential other issues for a small portion of users (I have had nearly all these things occur).  It may also trigger some BSOD crashes, I had one such crash that is likely related.  Due to the subtle nature some of these bugs can present with, it may be hard to notice they are happening even if they are.  In addition, these issues can occur even without file mirroring and files pinned to a specific drive.  I do have some potential workarounds/suggestions at the bottom.
    More details are at the bottom but the important bug facts upfront:
    Windows has a native file changed notification API using overlapped IO calls.  This allows an application to listen for changes on a folder, or a folder and sub folders, without having to constantly check every file to see if it changed.  Stablebit triggers "file changed" notifications even when files are just accessed (read) in certain ways.  Stablebit does NOT generate notification events on the parent folder when a file under it changes (Windows does).  Stablebit does NOT generate a notification event only when a FileID changes (next bug talks about FileIDs).  
    Windows, like linux, has a unique ID number for each file written on the hard drive.  If there are hardlinks to the same file, it has the same unique ID (so one File ID may have multiple paths associated with it). In linux this is called the inode number, Windows calls it the FileID.  Rather than accessing a file by its path, you can open a file by its FileID.  In addition it is impossible for two files to share the same FileID, it is a 128 bit number persistent across reboots (128 bits means the number of unique numbers represented is 39 digits long, or has the uniqueness of something like the MD5 hash).  A FileID does not change when a file moves or is modified.  Stablebit, by default, supports FileIDs however they seem to be ephemeral, they do not seem to survive across reboots or file moves.  Keep in mind FileIDs are used for directories as well, it is not just files. Further, if a directory is moved/renamed not only does its FileID change but every file under it changes. I am not sure if there are other situations in which they may change.  In addition, if a descendant file/directory FileID changes due to something like a directory rename Stablebit does NOT generate a notification event that it has changed (the application gets the directory event notification but nothing on the children).
    There are some other things to consider as well, DrivePool does not implement the standard windows USN Journal (a system of tracking file changes on a drive).  It specifically identifies itself as not supporting this so applications shouldn't be trying to use it with a drivepool drive. That does mean that applications that traditionally don't use the file change notification API or the FileIDs may fall back to a combination of those to accomplish what they would otherwise use the USN Journal for (and this can exacerbate the problem).  The same is true of Volume Shadow Copy (VSS) where applications that might traditionally use this cannot (and drivepool identifies it cannot do VSS) so may resort to methods below that they do not traditionally use.

    Now the effects of the above bugs may not be completely apparent:
    For the overlapped IO / File change notification  This means an application monitoring for changes on a DrivePool folder or sub-folder will get erroneous notifications files changed when anything even accesses them. Just opening something like file explorer on a folder, or even switching between applications can cause file accesses that trigger the notification. If an application takes actions on a notification and then checks the file at the end of the notification this in itself may cause another notification.  Applications that rely on getting a folder changed notification when a child changes will not get these at all with DrivePool.  If it isn't monitoring children at all just the folder, this means no notifications could be generated (vs just the child) so it could miss changes.
    For FileIDs It depends what the application uses the FileID for but it may assume the FileID should stay the same when a file moves, as it doesn't with DrivePool this might mean it reads or backs up, or syncs the entire file again if it is moved (perf issue).  An application that uses the Windows API to open a File by its ID may not get the file it is expecting or the file that was simply moved will throw an error when opened by its old FileID as drivepool has changed the ID.   For an example lets say an application caches that the FileID for ImportantDoc1.docx is 12345 but then 12345 refers to ImportantDoc2.docx due to a restart.  If this application is a file sync application and ImportantDoc1.docx is changed remotely when it goes to write those remote changes to the local file if it uses the OpenFileById method to do so it will actually override ImportantDoc2.docx with those changes.
    I didn't spend the time to read Windows file system requirements to know when Windows expects a FileID to potentially change (or not change).  It is important to note that even if theoretical changes/reuse are allowed if they are not common place (because windows uses essentially a number like an md5 hash in terms of repeats) applications may just assume it doesn't happen even if it is technically allowed to do so.  A backup of file sync program might assume that a file with specific FileID is always the same file, if FileID 12345 is c:\MyDocuments\ImportantDoc1.docx one day and then c:\MyDocuments\ImportantDoc2.docx another it may mistake document 2 for document 1, overriding important data or restore data to the wrong place.  If it is trying to create a whole drive backup it may assume it has already backed up c:\MyDocuments\ImportantDoc2.docx if it now has the same File ID as ImportantDoc1.docx by the time it reaches it (at which point DrivePool would have a different FileID for Document1).

    Why might applications use FileIDs or file change notifiers? It may not seem intuitive why applications would use these but a few major reasons are: *) Performance, file change notifiers are a event/push based system so the application is told when something changes, the common alternative is a poll based system where an application must scan all the files looking for changes (and may try to rely on file timestamps or even hashing the entire file to determine this) this causes a good bit more overhead / slowdown.  *)  FileID's are nice because they already handle hardlink file de-duplication (Windows may have multiple copies of a file on a drive for various reasons, but if you backup based on FileID you backup that file once rather than multiple times.  FileIDs are also great for handling renames.  Lets say you are an application that syncs files and the user backs up c:\temp\mydir with 1000 files under it.  If they rename c:\temp\mydir to c:\temp\mydir2 an application use FileIDS can say, wait that folder is the same it was just renamed. OK rename that folder in our remote version too.  This is a very minimal operation on both ends.  With DrivePool however the FileID changes for the directory and all sub-files.  If the sync application uses this to determine changes it now uploads all these files to the system using a good bit more resources locally and remotely.  If the application also uses versioning this may be far more likely to cause a conflict with two or more clients syncing, as mass amounts of files are seemingly being changed.
    Finally, even if an application is trying to monitor for FileIDs changing using the file change API, due to notification bugs above it may not get any notifications when child FileIDs change so it might assume it has not.

    Real Examples
    OneDrive
    This started with massive onedrive failures.  I would find onedrive was re-uploading hundreds of gigabytes of images an videos multiple times a week.  These were not changing or moving.  I don't know if the issue is onedrive uses FileIDs to determine if a file is already uploaded, or if it is because when it scanned a directory it may have triggered a notification that all the files in that directory changed and based on that notification it reuploads.  After this I noticed files were becoming deleted both locally and in the cloud.  I don't know what caused this, it might have been because the old file it thought was deleted as the FileID was gone and while there was a new file (actually the same file) in its place there may have been some odd race condition.   It is also possible that it queued the file for upload, the FileID changed and when it went to open it to upload it found it was 'deleted' as the FileID no longer pointed to a file and queued the delete operation.   I also found that files that were uploaded into the cloud in one folder were sometimes downloading to an alternate folder locally.  I am guessing this is because the folder FileID changed.  It thought the 2023 folder was with ID XYZ but that now pointed to a different folder and so it put the file in the wrong place.  The final form of corruption was finding the data from one photo or video actually in a file with a completely different name.  This is almost guaranteed to be due to the FileID bugs.  This is highly destructive as backups make this far harder to correct.  With one files contents replaced with another you need to know when the good content existed and in what files were effected.  Depending on retention policies the file contents that replaced it may override the good backups before you notice.  I also had a BSOD with onedrive where it was trying to set attributes on a file and the CoveFS driver corrupted some memory.  It is possible this was a race condition as onedrive may have been doing hundreds of files very rapidly due to the bugs.  I have not captured a second BSOD due to it, but also stopped using onedrive on DrivePool due to the corruption.   Another example of this is data leakage.  Lets say you share your favorite article on kittens with a group of people.   Onedrive, believing that file has changed, goes to open it using the FileID however that file ID could essentially now correspond to any file on your computer now the contents of some sensitive file are put in the place of that kitten file, and everyone you shared it with can access it.
    Visual Studio Failures
    Visual studio is a code editor/compiler.  There are three distinct bugs that happen.  First, when compiling if you touched one file in a folder it seemed to recompile the entire folder, this due likely to the notification bug.  This is just a slow down, but an annoying one.  Second, Visual Studio has compiler generated code support.  This means the compiler will generate actual source code that lives next to your own source code.   Normally once compiled it doesn't regenerate and compile this source unless it must change but due to the notification bugs it regenerates this code constantly and if there is an error in other code it causes an error there causing several other invalid errors.  When debugging visual studio by default will only use symbols (debug location data) as the notifications from DrivePool happen on certain file accesses visual studio constantly thinks the source has changed since it was compiled and you will only be able to breakpoint inside source if you disable the exact symbol match default.  If you have multiple projects in a solution with one dependent on another it will often rebuild other project deps even when they haven't changed, for large solutions that can be crippling (perf issue).  Finally I often had intellisense errors showing up even though no errors during compiling, and worse intellisense would completely break at points.  All due to DrivePool.

    Technical details / full background & disclaimer
    I have sample code and logs to document these issues in greater detail if anyone wants to replicate it themselves.
    It is important for me to state drivepool is closed source and I don't have the technical details of how it works.  I also don't have the technical details on how applications like onedrive or visual studio work.  So some of these things may be guesses as to why the applications fail/etc.
    The facts stated are true (to the best of my knowledge) 

    Shortly before my trial expired in October of last year I discovered some odd behavior.  I had a technical ticket filed within a week and within a month had traced down at least one of the bugs.  The issue can be seen https://stablebit.com/Admin/IssueAnalysis/28720 , it does show priority 2/important which I would assume is the second highest (probably critical or similar above).  It is great it has priority but as we are over 6 months since filed without updates I figured warning others about the potential corruption was important.  

    The FileSystemWatcher API is implemented in windows using async overlapped IO the exact code can be seen: https://github.com/dotnet/runtime/blob/57bfe474518ab5b7cfe6bf7424a79ce3af9d6657/src/libraries/System.IO.FileSystem.Watcher/src/System/IO/FileSystemWatcher.Win32.cs#L32-L66
    That corresponds to this kernel api:
    https://learn.microsoft.com/en-us/windows/win32/fileio/synchronous-and-asynchronous-i-o
    Newer api calls use GetFileInformationByHandleEx to get the FileID but with older stats calls represented by nFileIndexHigh/nFileIndexLow.  

    In terms of the FileID bug I wouldn't normally have even thought about it but the advanced config (https://wiki.covecube.com/StableBit_DrivePool_2.x_Advanced_Settings) mentions this under CoveFs_OpenByFileId  "When enabled, the pool will keep track of every file ID that it gives out in pageable memory (memory that is saved to disk and loaded as necessary).".   Keeping track of files in memory is certainly very different from Windows so I thought this may be the source of issue.  I also don't know if there are caps on the maximum number of files it will track as if it resets FileIDs in situations other than reboots that could be much worse. Turning this off will atleast break nfs servers as it mentions it right in the docs "required by the NFS server".
    Finally, the FileID numbers given out by DrivePool are incremental and very low.  This means when they do reset you almost certainly will get collisions with former numbers.   What is not clear is if there is the chance of potential FileID corruption issues.  If when it is assigning these ids in a multi-threaded scenario with many different files at the same time could this system fail? I have seen no proof this happens, but when incremental ids are assigned like this for mass quantities of potential files it has a higher chance of occurring.
    Microsoft mentions this about deleting the USN Journal: "Deleting the change journal impacts the File Replication Service (FRS) and the Indexing Service, because it requires these services to perform a complete (and time-consuming) scan of the volume. This in turn negatively impacts FRS SYSVOL replication and replication between DFS link alternates while the volume is being rescanned.".  Now DrivePool never has the USN journal supported so it isn't exactly the same thing, but it is clear that several core Windows services do use it for normal operations I do not know what backups they use when it is unavailable. 

    Potential Fixes
    There are advanced settings for drivepool https://wiki.covecube.com/StableBit_DrivePool_2.x_Advanced_Settings beware these changes may break other things.
    CoveFs_OpenByFileId - Set to false, by default it is true.  This will disable the OpenByFileID API.  It is clear several applications use this API.  In addition, while DrivePool may disable that function with this setting it doesn't disable FileID's themselves.  Any application using FileIDs as static identifiers for files may still run into problems. 
    I would avoid any file backup/synchronization tools and DrivePool drives (if possible).  These likely have the highest chance of lost files, misplaced files, file content being mixed up, and excess resource usage.   If not avoiding consider taking file hashes for the entire drivepool directory tree.  Do this again at a later point and make sure files that shouldn't have changed still have the same hash.
    If you have files that rarely change after being created then hashing each file at some point after creation and alerting if that file disappears or hash changes would easily act as an early warning to a bug here being hit.
  5. Thanks
    MitchC got a reaction from roirraWedorehT in Google Drive Backup and Sync   
    There is a good chance the google drive problems could be related to this:
     
  6. Thanks
    MitchC got a reaction from Shane in Beware of DrivePool corruption / data leakage / file deletion / performance degradation scenarios Windows 10/11   
    Mostly.  As I think you mentioned earlier in this thread that doesn't disable FileIds and applications could still get the FileID of a file.  Depending how that ID is used it could still cause issues.  An example below is snapraid which doesn't use OpenByFileID but does trust that the same FileID is the same file.
    For the biggest problems (data loss, corruption, leakage) this is correct.  Of course, one generally can't know if an application is using FileIDs (especially if not open source) it is likely not mentioned in the documentation.  It also doesn't mean your favorite app may not start to do so tomorrow, and then all the sudden the application that worked perfectly for 4 years starts to silently corrupt random data.  By far the most likely apps to do this are backup apps, data sync apps, cloud storage apps, file sharing apps, things that have some reason to potentially try to track what files are created/moved/deleted/etc.  
    The other issue (and sure if I could go back in time I would split this thread in two) of the change notification bugs in DrivePool won't directly lead to data loss (although can greatly speed up the process above) .  It will, however, have the potential for odd errors and performance issues in a wide range of applications.  The file change API is used by many applications, not just the app types listed above (which often will use it if they run 24/7) but any app that interfaces with many files at once (IE coding IDE's/compilers, file explorers, music or video catalogs, etc).  This API is common, easy to use for developers, and generally can greatly increase performance of apps as they no longer need to manually check if every file they can just install one event listener on a parent directory and even if they only care about the notifications for some of the files in the directories under it they can just ignore the change events they don't care about.  It may be very hard to trace these performance issues or errors to drive pool due to how they may present themselves.  You are far more likely to think the application is buggy or at fault.
    Short Example of Disaster
    As it is a complex issue to understand I will give a short example of how FileIDs being reused can be devastating. 
    Lets say you use Google Drive or some other cloud backup / sharing application and it relies on the fact that as long as FileID 123 around it is always pointing to the same file.  This is all but guaranteed with NTFS.
    You only use Google Drive to backup your photos from your phone, from your work camera, or what have you.   You have the following layout on your computer:
    c:\camera\work\2021\OfficialWiringDiagram.png with file ID 1005
    c:\camera\personal\nudes\2024Collection\VeryTasteful.png with file ID 3909
    c:\work\govt\ClassifiedSatPhotoNotToPostOnTwitter.png with file ID 6050
    You have OfficialWiringDiagram.png shared with the office as its an important reason anytime someone tries to figure out where the network cables are going.
    Enter drive pool.  You don't change any of these files but DrivePool generates a file changed notification for OfficialWiringDiagram.png.  GoogleDrive says OK I know that file, I already have it backed up and it has file ID 1005.  It then opens File ID 1005 locally reads the new contents, and uploads it to the cloud overriding the old OfficialWiringDiagram.png.  Only problem is you rebooted, so 1005 was OfficialWiringDiagram.png before, but now file 1005 is actually your nude file VeryTasteful.png.  So it has just backed up your nude file into the cloud but as "OfficialWiringDiagram.png", and remember that file is shared to the cloud.  Next time someone goes to look at the office wiring diagram they are in for a surprise.  Depending on the application if 'ClassifiedSatPhotoNotToPostOnTwitter.png' became FileID 1005 even though it got a change notification for the path "c:\camera\work\2021\OfficialWiringDiagram.png" which is under the main folder it monitors ("c:\camera") when it opens File 1005 it instead now gets a file completely outside your camera folder and reads the highly sensitive file from c:\work\govt and now a file that should never be uploaded is shared to the entire office.
     
    Now you follow many best practices.  Google drive you restrict to only the c:\camera folder, it doesn't backup or access files anywhere else.  You have a Raid 6 SSD setup incase of drive failure, and image files from prior years are never changed, so once written to the drive they are not likely to move unless the drive was de-fragmented meaning pretty low chance of conflicts or some abrupt power failure causing it to be corrupted.   You even have some photo scanner that checks for corrupt photos just to be safe.  Except none of these things will save you from the above example.   Even if you kept 6 months of backup archives offsite in cold storage (made perfectly and not effected by the bug) and all deleted files are kept for 5 years, if you don't reference OfficialWiringDiagram.png but once a year you might not notice it was changed and the original data overwritten until after all your backups are corrupted with the nude and the original file might be lost forever.
    FileIDs are generally better than relying on file paths, if they used file paths when you renamed or moved file 123 to a new name in the same folder it would break anyone you previously had shared the file with if only file names are used.   If instead when you rename "BobsChristmasPhoto.png" to "BobsHolidayPhoto.png" the application knows it is the file being renamed as it still has File ID 123 then it can silently update on the backend the sharing data so when people click the existing link it still loads the photo.  Even if an application uses moderate de-duplication techniques like hashing the file to tell if it has just moved, if you move a file and slightly change it (say you clear the photo location metadata out that your phone put there) it would think it is an all new file without File IDs.
    FileID collisions are not just possible but basically guaranteed with drive pool.  With the change notification bug a sync application might think all your files are changing often as even reading the file or browsing the directory might trigger a notification it has changed.  This means it is backing up all those files again, which might be tens of thousands of photos.   As any time you reboot the File ID changes that means if it syncs that file after the reboot uploading the wrong contents (as it used File ID) and then you had a second computer it downloaded that file to you could put yourself in a never ending loop for backups and downloads that overrides one file with another file at random.  As the FileID it was known last time for might not exist when it goes to back it up (which I assume would trigger many applications to fall back to path validation) only part of your catalog would get corrupted each iteration.  The application might also validate that if the file is renamed it stayed within the root directory it interacts with.  This means if your christmas photo's file ID now pointed to something under "c:\windows" it would fall back to file paths as it knows that is not under the "c:\camera" directory it works with.
    This is not some hypothetical situation these are actual occurrences and behaviors I have seen happen to files I have hosted on drivepool.  These are not two-bit applications written by some one person dev team these are massively used first party applications, and commercial enterprise applications.
     
    If you can and you care about your data I would.  The convenience of drivepool is great, there are countless users it works fine for (at least as far as they know), but even with high technical understanding it can be quite difficult to detect what applications are effected by this. 
    If you thought you were safe because you use something like snapraid it won't stop this sort of corruption.  As far as snapraid is concerned you just deleted a file and renamed another on top of it.  Snapraid may even contribute further to the problem as it (like many) uses the windows FileID as the Windows equivalent of an inode number https://github.com/amadvance/snapraid/blob/e6b8c4c8a066b184b4fa7e4fdf631c2dee5f5542/cmdline/mingw.c#L512-L518  .  Applications assume inodes and FileIDs that are the same as before are the same file.  That is unless you use DrivePool, oops.  
    Apps might use timestamps in addition to FileIDs although timestamps can overlap say if you downloaded a zip archive and extracted it with Windows native (by design choice it ignores timestamps even if the zip contained them).
    SnapRAID can even use some advanced checks with syncing but in a worst case where a files content has actually changed but the FileID in question has the same size/timestamp SnapRAID assumes it is actually unmodified and leaves the parity data alone.  This means if you had two files with the same size/timestamp anywhere on the drive and one of them got the FileID of the other it would end up with incorrect parity data associated with that file.   Running a snapraid fix could actually result in  corruption as snapraid would believe the parity data is correct but the content on disk it thinks go with it does not.  Note:  I don't use snapraid but was asked this question and reading the manual here and the source above I believe this is technically correct.  It is great SnapRAID is open source and has such technical documentation plenty of backup / sync programs don't and you don't know what checking they do.
  7. Thanks
    MitchC got a reaction from Jonibhoni in Beware of DrivePool corruption / data leakage / file deletion / performance degradation scenarios Windows 10/11   
    Mostly.  As I think you mentioned earlier in this thread that doesn't disable FileIds and applications could still get the FileID of a file.  Depending how that ID is used it could still cause issues.  An example below is snapraid which doesn't use OpenByFileID but does trust that the same FileID is the same file.
    For the biggest problems (data loss, corruption, leakage) this is correct.  Of course, one generally can't know if an application is using FileIDs (especially if not open source) it is likely not mentioned in the documentation.  It also doesn't mean your favorite app may not start to do so tomorrow, and then all the sudden the application that worked perfectly for 4 years starts to silently corrupt random data.  By far the most likely apps to do this are backup apps, data sync apps, cloud storage apps, file sharing apps, things that have some reason to potentially try to track what files are created/moved/deleted/etc.  
    The other issue (and sure if I could go back in time I would split this thread in two) of the change notification bugs in DrivePool won't directly lead to data loss (although can greatly speed up the process above) .  It will, however, have the potential for odd errors and performance issues in a wide range of applications.  The file change API is used by many applications, not just the app types listed above (which often will use it if they run 24/7) but any app that interfaces with many files at once (IE coding IDE's/compilers, file explorers, music or video catalogs, etc).  This API is common, easy to use for developers, and generally can greatly increase performance of apps as they no longer need to manually check if every file they can just install one event listener on a parent directory and even if they only care about the notifications for some of the files in the directories under it they can just ignore the change events they don't care about.  It may be very hard to trace these performance issues or errors to drive pool due to how they may present themselves.  You are far more likely to think the application is buggy or at fault.
    Short Example of Disaster
    As it is a complex issue to understand I will give a short example of how FileIDs being reused can be devastating. 
    Lets say you use Google Drive or some other cloud backup / sharing application and it relies on the fact that as long as FileID 123 around it is always pointing to the same file.  This is all but guaranteed with NTFS.
    You only use Google Drive to backup your photos from your phone, from your work camera, or what have you.   You have the following layout on your computer:
    c:\camera\work\2021\OfficialWiringDiagram.png with file ID 1005
    c:\camera\personal\nudes\2024Collection\VeryTasteful.png with file ID 3909
    c:\work\govt\ClassifiedSatPhotoNotToPostOnTwitter.png with file ID 6050
    You have OfficialWiringDiagram.png shared with the office as its an important reason anytime someone tries to figure out where the network cables are going.
    Enter drive pool.  You don't change any of these files but DrivePool generates a file changed notification for OfficialWiringDiagram.png.  GoogleDrive says OK I know that file, I already have it backed up and it has file ID 1005.  It then opens File ID 1005 locally reads the new contents, and uploads it to the cloud overriding the old OfficialWiringDiagram.png.  Only problem is you rebooted, so 1005 was OfficialWiringDiagram.png before, but now file 1005 is actually your nude file VeryTasteful.png.  So it has just backed up your nude file into the cloud but as "OfficialWiringDiagram.png", and remember that file is shared to the cloud.  Next time someone goes to look at the office wiring diagram they are in for a surprise.  Depending on the application if 'ClassifiedSatPhotoNotToPostOnTwitter.png' became FileID 1005 even though it got a change notification for the path "c:\camera\work\2021\OfficialWiringDiagram.png" which is under the main folder it monitors ("c:\camera") when it opens File 1005 it instead now gets a file completely outside your camera folder and reads the highly sensitive file from c:\work\govt and now a file that should never be uploaded is shared to the entire office.
     
    Now you follow many best practices.  Google drive you restrict to only the c:\camera folder, it doesn't backup or access files anywhere else.  You have a Raid 6 SSD setup incase of drive failure, and image files from prior years are never changed, so once written to the drive they are not likely to move unless the drive was de-fragmented meaning pretty low chance of conflicts or some abrupt power failure causing it to be corrupted.   You even have some photo scanner that checks for corrupt photos just to be safe.  Except none of these things will save you from the above example.   Even if you kept 6 months of backup archives offsite in cold storage (made perfectly and not effected by the bug) and all deleted files are kept for 5 years, if you don't reference OfficialWiringDiagram.png but once a year you might not notice it was changed and the original data overwritten until after all your backups are corrupted with the nude and the original file might be lost forever.
    FileIDs are generally better than relying on file paths, if they used file paths when you renamed or moved file 123 to a new name in the same folder it would break anyone you previously had shared the file with if only file names are used.   If instead when you rename "BobsChristmasPhoto.png" to "BobsHolidayPhoto.png" the application knows it is the file being renamed as it still has File ID 123 then it can silently update on the backend the sharing data so when people click the existing link it still loads the photo.  Even if an application uses moderate de-duplication techniques like hashing the file to tell if it has just moved, if you move a file and slightly change it (say you clear the photo location metadata out that your phone put there) it would think it is an all new file without File IDs.
    FileID collisions are not just possible but basically guaranteed with drive pool.  With the change notification bug a sync application might think all your files are changing often as even reading the file or browsing the directory might trigger a notification it has changed.  This means it is backing up all those files again, which might be tens of thousands of photos.   As any time you reboot the File ID changes that means if it syncs that file after the reboot uploading the wrong contents (as it used File ID) and then you had a second computer it downloaded that file to you could put yourself in a never ending loop for backups and downloads that overrides one file with another file at random.  As the FileID it was known last time for might not exist when it goes to back it up (which I assume would trigger many applications to fall back to path validation) only part of your catalog would get corrupted each iteration.  The application might also validate that if the file is renamed it stayed within the root directory it interacts with.  This means if your christmas photo's file ID now pointed to something under "c:\windows" it would fall back to file paths as it knows that is not under the "c:\camera" directory it works with.
    This is not some hypothetical situation these are actual occurrences and behaviors I have seen happen to files I have hosted on drivepool.  These are not two-bit applications written by some one person dev team these are massively used first party applications, and commercial enterprise applications.
     
    If you can and you care about your data I would.  The convenience of drivepool is great, there are countless users it works fine for (at least as far as they know), but even with high technical understanding it can be quite difficult to detect what applications are effected by this. 
    If you thought you were safe because you use something like snapraid it won't stop this sort of corruption.  As far as snapraid is concerned you just deleted a file and renamed another on top of it.  Snapraid may even contribute further to the problem as it (like many) uses the windows FileID as the Windows equivalent of an inode number https://github.com/amadvance/snapraid/blob/e6b8c4c8a066b184b4fa7e4fdf631c2dee5f5542/cmdline/mingw.c#L512-L518  .  Applications assume inodes and FileIDs that are the same as before are the same file.  That is unless you use DrivePool, oops.  
    Apps might use timestamps in addition to FileIDs although timestamps can overlap say if you downloaded a zip archive and extracted it with Windows native (by design choice it ignores timestamps even if the zip contained them).
    SnapRAID can even use some advanced checks with syncing but in a worst case where a files content has actually changed but the FileID in question has the same size/timestamp SnapRAID assumes it is actually unmodified and leaves the parity data alone.  This means if you had two files with the same size/timestamp anywhere on the drive and one of them got the FileID of the other it would end up with incorrect parity data associated with that file.   Running a snapraid fix could actually result in  corruption as snapraid would believe the parity data is correct but the content on disk it thinks go with it does not.  Note:  I don't use snapraid but was asked this question and reading the manual here and the source above I believe this is technically correct.  It is great SnapRAID is open source and has such technical documentation plenty of backup / sync programs don't and you don't know what checking they do.
  8. Like
    MitchC got a reaction from Shane in Beware of DrivePool corruption / data leakage / file deletion / performance degradation scenarios Windows 10/11   
    Sorry but even in my mission-not important environment I am not a fan of data loss or leakage.   Also, extremely low is an understatement.   NTFS supports 2^32 possible files on a drive.  The MFT file index is actually a 48 bit entry, that means you could max out your new MFT records 65K times prior to it needing to loop around.  The sequence number (how many times that specific MFT record is updated) is an additional 16 bits on its own so if you could delete and realloc a file to the exact same MFT record you still would need to do so with that specific record 65K times.  If an application is monitoring for file changes, hopefully it catches one of those:)
    It is nearly impossible to know how an application may use FileID especially as it may only be used as a fallback due to other features drivepool does not implement and maybe they combine FileID with something else.   If an application says hey I know file 1234  and on startup it checks file 1234. If that file exists it can be near positive its the same file if is gone it simply removes file 1234 from its known files and by the time 1234 it reused it hasn't known about it in forever.
    The problem here is not necessarily when FileIDs change  id wager most applications could probably handle file ids changing even though the file has not fine (you might get extra transfer, or backed up extra data, or performance loss temporarily).  It is the FileID reuse that is what leads to the worst effects of data loss, data leakage, and corruption.  The file id is 64 bits, the max file space is 32 bits (and realistically most people probably have a good bit fewer than 4 billion files). DrivePool could randomly assign file ids willy nilly every boot and probably cause far fewer disasters.  DrivePool could use underlying FIleIDs likely through some black magic hackery.  The MFT counter is 48 bit, but I doubt those last 9 bits are touched on most normal systems.   If DrivePool assigned an incremental number to each drive  and then overwrote those 9 bits of the FileID from the underlying system with the drive ID you would support 512 hard drives in one drive 'pool' and still have nearly the same near zero file collision of FileID, while also having a stable file ID.   It would only change the FIleID if a file moved in the background from one drive to another(and not just mirrored).   It could even keep it the same with a zero byte ID file left behind on a ghost folder if so desired, but the file ID changing is probably far less a problem.  A backup restore program that deleted the old file and created it again would also change the FileID and I doubt that causes issues.
    That said, it is not really my job to figure out how to solve this problem in a commercial product.
    As you mentioned back in December it is unquestionable that drivepool is doing the wrong thing:
    it uses MUST in caps.  
    My problem isn't that this bug exists (although that sucks). My problem is this has been and continues to be handled exceptionally poorly by Stablebit even though it can pose significant risk to users without them even knowing it.  I likely spent more of my time investigating their bug then they have.  We are literally looking at nearly two years now since my initial notification and users can make the same mistakes now as back then despite the fact they could be warned or prevented from doing so.
  9. Thanks
    MitchC got a reaction from Thronic in Beware of DrivePool corruption / data leakage / file deletion / performance degradation scenarios Windows 10/11   
    To start, while new to DrivePool I love its potential I own multiple licenses and their full suite.  If you only use drivepool for basic file archiving of large files with simple applications accessing them for periodic reads it is probably uncommon you would hit these bugs.  This assumes you don't use any file synchronization / backup solutions.  Further, I don't know how many thousands (tens or hundreds?) of DrivePool users there are, but clearly many are not hitting these bugs or recognizing they are hitting these bugs, so this IT NOT some new destructive my files are 100% going to die issue.  Some of the reports I have seen on the forums though may be actually issues due to these things without it being recognized as such. As far as I know previously CoveCube was not aware of these issues, so tickets may not have even considered this possibility.
    I started reporting these bugs to StableBit ~9 months ago, and informed I would be putting this post together ~1 month ago.  Please see the disclaimer below as well, as some of this is based on observations over known facts.
    You are most likely to run into these bugs with applications that: *) Synchronize or backup files, including cloud mounted drives like onedrive or dropbox *) Applications that must handle large quantities of files or monitor them for changes like coding applications (Visual Studio/ VSCode)

    Still, these bugs can cause silent file corruption, file misplacement, deleted files, performance degradation, data leakage ( a file shared with someone externally could have its contents overwritten by any sensitive file on your computer), missed file changes, and potential other issues for a small portion of users (I have had nearly all these things occur).  It may also trigger some BSOD crashes, I had one such crash that is likely related.  Due to the subtle nature some of these bugs can present with, it may be hard to notice they are happening even if they are.  In addition, these issues can occur even without file mirroring and files pinned to a specific drive.  I do have some potential workarounds/suggestions at the bottom.
    More details are at the bottom but the important bug facts upfront:
    Windows has a native file changed notification API using overlapped IO calls.  This allows an application to listen for changes on a folder, or a folder and sub folders, without having to constantly check every file to see if it changed.  Stablebit triggers "file changed" notifications even when files are just accessed (read) in certain ways.  Stablebit does NOT generate notification events on the parent folder when a file under it changes (Windows does).  Stablebit does NOT generate a notification event only when a FileID changes (next bug talks about FileIDs).  
    Windows, like linux, has a unique ID number for each file written on the hard drive.  If there are hardlinks to the same file, it has the same unique ID (so one File ID may have multiple paths associated with it). In linux this is called the inode number, Windows calls it the FileID.  Rather than accessing a file by its path, you can open a file by its FileID.  In addition it is impossible for two files to share the same FileID, it is a 128 bit number persistent across reboots (128 bits means the number of unique numbers represented is 39 digits long, or has the uniqueness of something like the MD5 hash).  A FileID does not change when a file moves or is modified.  Stablebit, by default, supports FileIDs however they seem to be ephemeral, they do not seem to survive across reboots or file moves.  Keep in mind FileIDs are used for directories as well, it is not just files. Further, if a directory is moved/renamed not only does its FileID change but every file under it changes. I am not sure if there are other situations in which they may change.  In addition, if a descendant file/directory FileID changes due to something like a directory rename Stablebit does NOT generate a notification event that it has changed (the application gets the directory event notification but nothing on the children).
    There are some other things to consider as well, DrivePool does not implement the standard windows USN Journal (a system of tracking file changes on a drive).  It specifically identifies itself as not supporting this so applications shouldn't be trying to use it with a drivepool drive. That does mean that applications that traditionally don't use the file change notification API or the FileIDs may fall back to a combination of those to accomplish what they would otherwise use the USN Journal for (and this can exacerbate the problem).  The same is true of Volume Shadow Copy (VSS) where applications that might traditionally use this cannot (and drivepool identifies it cannot do VSS) so may resort to methods below that they do not traditionally use.

    Now the effects of the above bugs may not be completely apparent:
    For the overlapped IO / File change notification  This means an application monitoring for changes on a DrivePool folder or sub-folder will get erroneous notifications files changed when anything even accesses them. Just opening something like file explorer on a folder, or even switching between applications can cause file accesses that trigger the notification. If an application takes actions on a notification and then checks the file at the end of the notification this in itself may cause another notification.  Applications that rely on getting a folder changed notification when a child changes will not get these at all with DrivePool.  If it isn't monitoring children at all just the folder, this means no notifications could be generated (vs just the child) so it could miss changes.
    For FileIDs It depends what the application uses the FileID for but it may assume the FileID should stay the same when a file moves, as it doesn't with DrivePool this might mean it reads or backs up, or syncs the entire file again if it is moved (perf issue).  An application that uses the Windows API to open a File by its ID may not get the file it is expecting or the file that was simply moved will throw an error when opened by its old FileID as drivepool has changed the ID.   For an example lets say an application caches that the FileID for ImportantDoc1.docx is 12345 but then 12345 refers to ImportantDoc2.docx due to a restart.  If this application is a file sync application and ImportantDoc1.docx is changed remotely when it goes to write those remote changes to the local file if it uses the OpenFileById method to do so it will actually override ImportantDoc2.docx with those changes.
    I didn't spend the time to read Windows file system requirements to know when Windows expects a FileID to potentially change (or not change).  It is important to note that even if theoretical changes/reuse are allowed if they are not common place (because windows uses essentially a number like an md5 hash in terms of repeats) applications may just assume it doesn't happen even if it is technically allowed to do so.  A backup of file sync program might assume that a file with specific FileID is always the same file, if FileID 12345 is c:\MyDocuments\ImportantDoc1.docx one day and then c:\MyDocuments\ImportantDoc2.docx another it may mistake document 2 for document 1, overriding important data or restore data to the wrong place.  If it is trying to create a whole drive backup it may assume it has already backed up c:\MyDocuments\ImportantDoc2.docx if it now has the same File ID as ImportantDoc1.docx by the time it reaches it (at which point DrivePool would have a different FileID for Document1).

    Why might applications use FileIDs or file change notifiers? It may not seem intuitive why applications would use these but a few major reasons are: *) Performance, file change notifiers are a event/push based system so the application is told when something changes, the common alternative is a poll based system where an application must scan all the files looking for changes (and may try to rely on file timestamps or even hashing the entire file to determine this) this causes a good bit more overhead / slowdown.  *)  FileID's are nice because they already handle hardlink file de-duplication (Windows may have multiple copies of a file on a drive for various reasons, but if you backup based on FileID you backup that file once rather than multiple times.  FileIDs are also great for handling renames.  Lets say you are an application that syncs files and the user backs up c:\temp\mydir with 1000 files under it.  If they rename c:\temp\mydir to c:\temp\mydir2 an application use FileIDS can say, wait that folder is the same it was just renamed. OK rename that folder in our remote version too.  This is a very minimal operation on both ends.  With DrivePool however the FileID changes for the directory and all sub-files.  If the sync application uses this to determine changes it now uploads all these files to the system using a good bit more resources locally and remotely.  If the application also uses versioning this may be far more likely to cause a conflict with two or more clients syncing, as mass amounts of files are seemingly being changed.
    Finally, even if an application is trying to monitor for FileIDs changing using the file change API, due to notification bugs above it may not get any notifications when child FileIDs change so it might assume it has not.

    Real Examples
    OneDrive
    This started with massive onedrive failures.  I would find onedrive was re-uploading hundreds of gigabytes of images an videos multiple times a week.  These were not changing or moving.  I don't know if the issue is onedrive uses FileIDs to determine if a file is already uploaded, or if it is because when it scanned a directory it may have triggered a notification that all the files in that directory changed and based on that notification it reuploads.  After this I noticed files were becoming deleted both locally and in the cloud.  I don't know what caused this, it might have been because the old file it thought was deleted as the FileID was gone and while there was a new file (actually the same file) in its place there may have been some odd race condition.   It is also possible that it queued the file for upload, the FileID changed and when it went to open it to upload it found it was 'deleted' as the FileID no longer pointed to a file and queued the delete operation.   I also found that files that were uploaded into the cloud in one folder were sometimes downloading to an alternate folder locally.  I am guessing this is because the folder FileID changed.  It thought the 2023 folder was with ID XYZ but that now pointed to a different folder and so it put the file in the wrong place.  The final form of corruption was finding the data from one photo or video actually in a file with a completely different name.  This is almost guaranteed to be due to the FileID bugs.  This is highly destructive as backups make this far harder to correct.  With one files contents replaced with another you need to know when the good content existed and in what files were effected.  Depending on retention policies the file contents that replaced it may override the good backups before you notice.  I also had a BSOD with onedrive where it was trying to set attributes on a file and the CoveFS driver corrupted some memory.  It is possible this was a race condition as onedrive may have been doing hundreds of files very rapidly due to the bugs.  I have not captured a second BSOD due to it, but also stopped using onedrive on DrivePool due to the corruption.   Another example of this is data leakage.  Lets say you share your favorite article on kittens with a group of people.   Onedrive, believing that file has changed, goes to open it using the FileID however that file ID could essentially now correspond to any file on your computer now the contents of some sensitive file are put in the place of that kitten file, and everyone you shared it with can access it.
    Visual Studio Failures
    Visual studio is a code editor/compiler.  There are three distinct bugs that happen.  First, when compiling if you touched one file in a folder it seemed to recompile the entire folder, this due likely to the notification bug.  This is just a slow down, but an annoying one.  Second, Visual Studio has compiler generated code support.  This means the compiler will generate actual source code that lives next to your own source code.   Normally once compiled it doesn't regenerate and compile this source unless it must change but due to the notification bugs it regenerates this code constantly and if there is an error in other code it causes an error there causing several other invalid errors.  When debugging visual studio by default will only use symbols (debug location data) as the notifications from DrivePool happen on certain file accesses visual studio constantly thinks the source has changed since it was compiled and you will only be able to breakpoint inside source if you disable the exact symbol match default.  If you have multiple projects in a solution with one dependent on another it will often rebuild other project deps even when they haven't changed, for large solutions that can be crippling (perf issue).  Finally I often had intellisense errors showing up even though no errors during compiling, and worse intellisense would completely break at points.  All due to DrivePool.

    Technical details / full background & disclaimer
    I have sample code and logs to document these issues in greater detail if anyone wants to replicate it themselves.
    It is important for me to state drivepool is closed source and I don't have the technical details of how it works.  I also don't have the technical details on how applications like onedrive or visual studio work.  So some of these things may be guesses as to why the applications fail/etc.
    The facts stated are true (to the best of my knowledge) 

    Shortly before my trial expired in October of last year I discovered some odd behavior.  I had a technical ticket filed within a week and within a month had traced down at least one of the bugs.  The issue can be seen https://stablebit.com/Admin/IssueAnalysis/28720 , it does show priority 2/important which I would assume is the second highest (probably critical or similar above).  It is great it has priority but as we are over 6 months since filed without updates I figured warning others about the potential corruption was important.  

    The FileSystemWatcher API is implemented in windows using async overlapped IO the exact code can be seen: https://github.com/dotnet/runtime/blob/57bfe474518ab5b7cfe6bf7424a79ce3af9d6657/src/libraries/System.IO.FileSystem.Watcher/src/System/IO/FileSystemWatcher.Win32.cs#L32-L66
    That corresponds to this kernel api:
    https://learn.microsoft.com/en-us/windows/win32/fileio/synchronous-and-asynchronous-i-o
    Newer api calls use GetFileInformationByHandleEx to get the FileID but with older stats calls represented by nFileIndexHigh/nFileIndexLow.  

    In terms of the FileID bug I wouldn't normally have even thought about it but the advanced config (https://wiki.covecube.com/StableBit_DrivePool_2.x_Advanced_Settings) mentions this under CoveFs_OpenByFileId  "When enabled, the pool will keep track of every file ID that it gives out in pageable memory (memory that is saved to disk and loaded as necessary).".   Keeping track of files in memory is certainly very different from Windows so I thought this may be the source of issue.  I also don't know if there are caps on the maximum number of files it will track as if it resets FileIDs in situations other than reboots that could be much worse. Turning this off will atleast break nfs servers as it mentions it right in the docs "required by the NFS server".
    Finally, the FileID numbers given out by DrivePool are incremental and very low.  This means when they do reset you almost certainly will get collisions with former numbers.   What is not clear is if there is the chance of potential FileID corruption issues.  If when it is assigning these ids in a multi-threaded scenario with many different files at the same time could this system fail? I have seen no proof this happens, but when incremental ids are assigned like this for mass quantities of potential files it has a higher chance of occurring.
    Microsoft mentions this about deleting the USN Journal: "Deleting the change journal impacts the File Replication Service (FRS) and the Indexing Service, because it requires these services to perform a complete (and time-consuming) scan of the volume. This in turn negatively impacts FRS SYSVOL replication and replication between DFS link alternates while the volume is being rescanned.".  Now DrivePool never has the USN journal supported so it isn't exactly the same thing, but it is clear that several core Windows services do use it for normal operations I do not know what backups they use when it is unavailable. 

    Potential Fixes
    There are advanced settings for drivepool https://wiki.covecube.com/StableBit_DrivePool_2.x_Advanced_Settings beware these changes may break other things.
    CoveFs_OpenByFileId - Set to false, by default it is true.  This will disable the OpenByFileID API.  It is clear several applications use this API.  In addition, while DrivePool may disable that function with this setting it doesn't disable FileID's themselves.  Any application using FileIDs as static identifiers for files may still run into problems. 
    I would avoid any file backup/synchronization tools and DrivePool drives (if possible).  These likely have the highest chance of lost files, misplaced files, file content being mixed up, and excess resource usage.   If not avoiding consider taking file hashes for the entire drivepool directory tree.  Do this again at a later point and make sure files that shouldn't have changed still have the same hash.
    If you have files that rarely change after being created then hashing each file at some point after creation and alerting if that file disappears or hash changes would easily act as an early warning to a bug here being hit.
  10. Like
    MitchC got a reaction from Shane in Beware of DrivePool corruption / data leakage / file deletion / performance degradation scenarios Windows 10/11   
    Shane, as always, has done a great job summarizing everything and I certainly agree with most of it.  I do want to provide some clarification, and also differ on a few things:
    *) This is not about DrivePool being required to precisely emulate NTFS and all its features, that is probably a never going to happen.  At best DrivePool may be able to provide a driver level drive implementation that could allow it to be formatted in the way Shane describes CloudDrive does. One of the things this critical bug is made worse by is the fact DrivePool specifically doesn't implement VSS or similar
    *) The two issues here are not the same, or one causing the other.  They are distinct, but the incorrect file changed bug makes the FileID problem potentially so much worse (or maybe in unlucky situations it to happen at all).   Merely by browsing a folder can cause file change notifications to fire on the files on it in certain situations.  This means unmodified files an application listening to the notification would believe have been modified.  It is possible if this bug did not exist then only would written files have the potential for corruption rather than all files.

    These next two points are not facts but IMO:
    *) DrivePool claims to be NTFS if it cannot support certain NTFS features it should break them as cleanly as possible (not as compatible as possible as it might currently).  FileID support should be as disabled as possible by DrivePool.  Open by file ID clearly banned.  I don't know what would happen if FileID returned 0 or claimed not available on the system even thought it is an NTFS volume.  There are things DrivePool could potentially due to minimize the fatal damage this FileID bug can cause (ie not resetting to zero) but honestly even then all FileID support should be as turned off as possible.   If a user wants to enable these features DrivePool should provide a massive disclaimer about the possible damage this might cause.
    *) DrivePool has an ethical responsibility to its users it is currently violating.  It has a feature that can cause massive performance problems, data loss, and data corruption.  It has other bugs that accelerate these issues.  DrivePool is aware of this, they should warn users using these features that unexpected behaviors and possible irreversible damage could occur.  It annoys me the effort I had to exert to research this bug.  As a developer if I had a file system product users were paying for and it could cause silent corruption I would find this highly disturbing and do what I could to protect other users.   It is critical to remember this can result in corruption of the worst kind.  Corruption that normal health monitoring tools would not detect (files can still be read and written) but it can corrupt files that are not being 'changed' in the background at random rates.  It wouldn't matter if you kept daily backups for 6 months if you didn't detect this for 9 months you would have archived the corruption into those backups and have no way of recovering that data.  It can happen slowly and literally only validating the file contents against some known good would show it.  Now StableBit may feel they skirt some of the responsibility as they don't do the corruption directly, some other application relying on drivepool's drive acting as NTFS says it will, and DrivePool tries to pretend to do to get the data loss.  The problem is drivepools incorrect implementation is the direct reason this corruption occurs, and the applications that can cause it are not doing anything wrong.
  11. Like
    MitchC got a reaction from MrPapaya in Beware of DrivePool corruption / data leakage / file deletion / performance degradation scenarios Windows 10/11   
    So this is correct, as the documentation you linked to states.  One item I mentioned though, is the fact that even if it can  be re-used if in practice it isn't software may make the wrong assumption that it won't.  Not good on that software but it may be a practical exception that one might try to meet.  Further, that documentation also states:
    "In the NTFS file system, a file keeps the same file ID until it is deleted. "
    As DrivePool identifies itself as NTFS it is breaking that expectation.
    I am not sure how well things work if you just disable File IDs, maybe software will fallback to a more safe behavior (even if less performant).    In addition, I think the biggest issue is silent file corruption.  I think that can only happen due to File ID collisions (rather than just the FIle ID changing).   It is a 128 bit number, GUID's are 128 bits.  Just randomize the sucker the first time you assign a file ID (rather than using the incremental behavior currently).  Aside from it being more thread safe as you don't have a single locked increment counter it is highly unlikely you would hit a collision.  Could you run into a duplicate ? sure.  Likely? Probably not.   Maybe over many reboots (or whatever resets the ID's in drivepool beside that) but as long as whatever app that uses the FileID has detected it is gone before it is reused it eventually colliding would likely not have much effect.   Not perfect but probably an easier solution.  Granted apps like onedrive may still think all the files are deleted and re-upload them if the FileID's change (although that may be more likely due to the notification bug).
    Sure.  Except one doesn't always know how tools work.  I am only making a highly educated guess this is what OneDrive is using, but only made this after significant file corruption and research.  One would hope you don't need to have corruption before figuring out the tool you are using uses the FileID.    In addition, FileID may not be the primary item a backup/sync tool uses but something like USF may be a much more common first choice.  It may only fall back to other options when that is not available.
    Is it possible the 5-6 apps I have found that run into issues are the only ones out there that uses these things? Sure.  I just would guess I am not that lucky so there are likely many more that use these features.
     
    I did see either you (or someone else) who posted about the file hashing issue with the read striping.  It is a big shame, reporting data corruption (invalid hash values or rather returning the wrong read data which is what would lead to that) is another fairly massive problem.    Marking good data bad because of an inconsistent read can lead to someone thinking they lost data and trashing it, or restoring an older version that may cause newer data to be lost in an attempt to fix.  I would look into a more consistent read striping repro test but at the end of the day these other things stop me from being able to use drivepool for most things I would like to.
  12. Like
    MitchC got a reaction from MrPapaya in Beware of DrivePool corruption / data leakage / file deletion / performance degradation scenarios Windows 10/11   
    Sorry, should also mention this is confirmed by StableBit and can be easily reproduced.   The attached powershell script is a basic example of the file monitoring api.  Run it by "monitor.ps1 my_folder"  where my folder is what you want to monitor.  Have a file say hello.txt inside.   Open that file in notepad.     It should instantly generate a monitoring file change event.  Further tab away from notepad and tab back to it, you will again get a changed event for that file.  Run the same thing on a true NTFS system and it will not do the same.
    You can also reproduce the lack of notifications for other events by changing the IncludeSubdirectories variable in it and doing some of the tests I mention above.
    watcher.ps1
  13. Thanks
    MitchC got a reaction from MrPapaya in Beware of DrivePool corruption / data leakage / file deletion / performance degradation scenarios Windows 10/11   
    To start, while new to DrivePool I love its potential I own multiple licenses and their full suite.  If you only use drivepool for basic file archiving of large files with simple applications accessing them for periodic reads it is probably uncommon you would hit these bugs.  This assumes you don't use any file synchronization / backup solutions.  Further, I don't know how many thousands (tens or hundreds?) of DrivePool users there are, but clearly many are not hitting these bugs or recognizing they are hitting these bugs, so this IT NOT some new destructive my files are 100% going to die issue.  Some of the reports I have seen on the forums though may be actually issues due to these things without it being recognized as such. As far as I know previously CoveCube was not aware of these issues, so tickets may not have even considered this possibility.
    I started reporting these bugs to StableBit ~9 months ago, and informed I would be putting this post together ~1 month ago.  Please see the disclaimer below as well, as some of this is based on observations over known facts.
    You are most likely to run into these bugs with applications that: *) Synchronize or backup files, including cloud mounted drives like onedrive or dropbox *) Applications that must handle large quantities of files or monitor them for changes like coding applications (Visual Studio/ VSCode)

    Still, these bugs can cause silent file corruption, file misplacement, deleted files, performance degradation, data leakage ( a file shared with someone externally could have its contents overwritten by any sensitive file on your computer), missed file changes, and potential other issues for a small portion of users (I have had nearly all these things occur).  It may also trigger some BSOD crashes, I had one such crash that is likely related.  Due to the subtle nature some of these bugs can present with, it may be hard to notice they are happening even if they are.  In addition, these issues can occur even without file mirroring and files pinned to a specific drive.  I do have some potential workarounds/suggestions at the bottom.
    More details are at the bottom but the important bug facts upfront:
    Windows has a native file changed notification API using overlapped IO calls.  This allows an application to listen for changes on a folder, or a folder and sub folders, without having to constantly check every file to see if it changed.  Stablebit triggers "file changed" notifications even when files are just accessed (read) in certain ways.  Stablebit does NOT generate notification events on the parent folder when a file under it changes (Windows does).  Stablebit does NOT generate a notification event only when a FileID changes (next bug talks about FileIDs).  
    Windows, like linux, has a unique ID number for each file written on the hard drive.  If there are hardlinks to the same file, it has the same unique ID (so one File ID may have multiple paths associated with it). In linux this is called the inode number, Windows calls it the FileID.  Rather than accessing a file by its path, you can open a file by its FileID.  In addition it is impossible for two files to share the same FileID, it is a 128 bit number persistent across reboots (128 bits means the number of unique numbers represented is 39 digits long, or has the uniqueness of something like the MD5 hash).  A FileID does not change when a file moves or is modified.  Stablebit, by default, supports FileIDs however they seem to be ephemeral, they do not seem to survive across reboots or file moves.  Keep in mind FileIDs are used for directories as well, it is not just files. Further, if a directory is moved/renamed not only does its FileID change but every file under it changes. I am not sure if there are other situations in which they may change.  In addition, if a descendant file/directory FileID changes due to something like a directory rename Stablebit does NOT generate a notification event that it has changed (the application gets the directory event notification but nothing on the children).
    There are some other things to consider as well, DrivePool does not implement the standard windows USN Journal (a system of tracking file changes on a drive).  It specifically identifies itself as not supporting this so applications shouldn't be trying to use it with a drivepool drive. That does mean that applications that traditionally don't use the file change notification API or the FileIDs may fall back to a combination of those to accomplish what they would otherwise use the USN Journal for (and this can exacerbate the problem).  The same is true of Volume Shadow Copy (VSS) where applications that might traditionally use this cannot (and drivepool identifies it cannot do VSS) so may resort to methods below that they do not traditionally use.

    Now the effects of the above bugs may not be completely apparent:
    For the overlapped IO / File change notification  This means an application monitoring for changes on a DrivePool folder or sub-folder will get erroneous notifications files changed when anything even accesses them. Just opening something like file explorer on a folder, or even switching between applications can cause file accesses that trigger the notification. If an application takes actions on a notification and then checks the file at the end of the notification this in itself may cause another notification.  Applications that rely on getting a folder changed notification when a child changes will not get these at all with DrivePool.  If it isn't monitoring children at all just the folder, this means no notifications could be generated (vs just the child) so it could miss changes.
    For FileIDs It depends what the application uses the FileID for but it may assume the FileID should stay the same when a file moves, as it doesn't with DrivePool this might mean it reads or backs up, or syncs the entire file again if it is moved (perf issue).  An application that uses the Windows API to open a File by its ID may not get the file it is expecting or the file that was simply moved will throw an error when opened by its old FileID as drivepool has changed the ID.   For an example lets say an application caches that the FileID for ImportantDoc1.docx is 12345 but then 12345 refers to ImportantDoc2.docx due to a restart.  If this application is a file sync application and ImportantDoc1.docx is changed remotely when it goes to write those remote changes to the local file if it uses the OpenFileById method to do so it will actually override ImportantDoc2.docx with those changes.
    I didn't spend the time to read Windows file system requirements to know when Windows expects a FileID to potentially change (or not change).  It is important to note that even if theoretical changes/reuse are allowed if they are not common place (because windows uses essentially a number like an md5 hash in terms of repeats) applications may just assume it doesn't happen even if it is technically allowed to do so.  A backup of file sync program might assume that a file with specific FileID is always the same file, if FileID 12345 is c:\MyDocuments\ImportantDoc1.docx one day and then c:\MyDocuments\ImportantDoc2.docx another it may mistake document 2 for document 1, overriding important data or restore data to the wrong place.  If it is trying to create a whole drive backup it may assume it has already backed up c:\MyDocuments\ImportantDoc2.docx if it now has the same File ID as ImportantDoc1.docx by the time it reaches it (at which point DrivePool would have a different FileID for Document1).

    Why might applications use FileIDs or file change notifiers? It may not seem intuitive why applications would use these but a few major reasons are: *) Performance, file change notifiers are a event/push based system so the application is told when something changes, the common alternative is a poll based system where an application must scan all the files looking for changes (and may try to rely on file timestamps or even hashing the entire file to determine this) this causes a good bit more overhead / slowdown.  *)  FileID's are nice because they already handle hardlink file de-duplication (Windows may have multiple copies of a file on a drive for various reasons, but if you backup based on FileID you backup that file once rather than multiple times.  FileIDs are also great for handling renames.  Lets say you are an application that syncs files and the user backs up c:\temp\mydir with 1000 files under it.  If they rename c:\temp\mydir to c:\temp\mydir2 an application use FileIDS can say, wait that folder is the same it was just renamed. OK rename that folder in our remote version too.  This is a very minimal operation on both ends.  With DrivePool however the FileID changes for the directory and all sub-files.  If the sync application uses this to determine changes it now uploads all these files to the system using a good bit more resources locally and remotely.  If the application also uses versioning this may be far more likely to cause a conflict with two or more clients syncing, as mass amounts of files are seemingly being changed.
    Finally, even if an application is trying to monitor for FileIDs changing using the file change API, due to notification bugs above it may not get any notifications when child FileIDs change so it might assume it has not.

    Real Examples
    OneDrive
    This started with massive onedrive failures.  I would find onedrive was re-uploading hundreds of gigabytes of images an videos multiple times a week.  These were not changing or moving.  I don't know if the issue is onedrive uses FileIDs to determine if a file is already uploaded, or if it is because when it scanned a directory it may have triggered a notification that all the files in that directory changed and based on that notification it reuploads.  After this I noticed files were becoming deleted both locally and in the cloud.  I don't know what caused this, it might have been because the old file it thought was deleted as the FileID was gone and while there was a new file (actually the same file) in its place there may have been some odd race condition.   It is also possible that it queued the file for upload, the FileID changed and when it went to open it to upload it found it was 'deleted' as the FileID no longer pointed to a file and queued the delete operation.   I also found that files that were uploaded into the cloud in one folder were sometimes downloading to an alternate folder locally.  I am guessing this is because the folder FileID changed.  It thought the 2023 folder was with ID XYZ but that now pointed to a different folder and so it put the file in the wrong place.  The final form of corruption was finding the data from one photo or video actually in a file with a completely different name.  This is almost guaranteed to be due to the FileID bugs.  This is highly destructive as backups make this far harder to correct.  With one files contents replaced with another you need to know when the good content existed and in what files were effected.  Depending on retention policies the file contents that replaced it may override the good backups before you notice.  I also had a BSOD with onedrive where it was trying to set attributes on a file and the CoveFS driver corrupted some memory.  It is possible this was a race condition as onedrive may have been doing hundreds of files very rapidly due to the bugs.  I have not captured a second BSOD due to it, but also stopped using onedrive on DrivePool due to the corruption.   Another example of this is data leakage.  Lets say you share your favorite article on kittens with a group of people.   Onedrive, believing that file has changed, goes to open it using the FileID however that file ID could essentially now correspond to any file on your computer now the contents of some sensitive file are put in the place of that kitten file, and everyone you shared it with can access it.
    Visual Studio Failures
    Visual studio is a code editor/compiler.  There are three distinct bugs that happen.  First, when compiling if you touched one file in a folder it seemed to recompile the entire folder, this due likely to the notification bug.  This is just a slow down, but an annoying one.  Second, Visual Studio has compiler generated code support.  This means the compiler will generate actual source code that lives next to your own source code.   Normally once compiled it doesn't regenerate and compile this source unless it must change but due to the notification bugs it regenerates this code constantly and if there is an error in other code it causes an error there causing several other invalid errors.  When debugging visual studio by default will only use symbols (debug location data) as the notifications from DrivePool happen on certain file accesses visual studio constantly thinks the source has changed since it was compiled and you will only be able to breakpoint inside source if you disable the exact symbol match default.  If you have multiple projects in a solution with one dependent on another it will often rebuild other project deps even when they haven't changed, for large solutions that can be crippling (perf issue).  Finally I often had intellisense errors showing up even though no errors during compiling, and worse intellisense would completely break at points.  All due to DrivePool.

    Technical details / full background & disclaimer
    I have sample code and logs to document these issues in greater detail if anyone wants to replicate it themselves.
    It is important for me to state drivepool is closed source and I don't have the technical details of how it works.  I also don't have the technical details on how applications like onedrive or visual studio work.  So some of these things may be guesses as to why the applications fail/etc.
    The facts stated are true (to the best of my knowledge) 

    Shortly before my trial expired in October of last year I discovered some odd behavior.  I had a technical ticket filed within a week and within a month had traced down at least one of the bugs.  The issue can be seen https://stablebit.com/Admin/IssueAnalysis/28720 , it does show priority 2/important which I would assume is the second highest (probably critical or similar above).  It is great it has priority but as we are over 6 months since filed without updates I figured warning others about the potential corruption was important.  

    The FileSystemWatcher API is implemented in windows using async overlapped IO the exact code can be seen: https://github.com/dotnet/runtime/blob/57bfe474518ab5b7cfe6bf7424a79ce3af9d6657/src/libraries/System.IO.FileSystem.Watcher/src/System/IO/FileSystemWatcher.Win32.cs#L32-L66
    That corresponds to this kernel api:
    https://learn.microsoft.com/en-us/windows/win32/fileio/synchronous-and-asynchronous-i-o
    Newer api calls use GetFileInformationByHandleEx to get the FileID but with older stats calls represented by nFileIndexHigh/nFileIndexLow.  

    In terms of the FileID bug I wouldn't normally have even thought about it but the advanced config (https://wiki.covecube.com/StableBit_DrivePool_2.x_Advanced_Settings) mentions this under CoveFs_OpenByFileId  "When enabled, the pool will keep track of every file ID that it gives out in pageable memory (memory that is saved to disk and loaded as necessary).".   Keeping track of files in memory is certainly very different from Windows so I thought this may be the source of issue.  I also don't know if there are caps on the maximum number of files it will track as if it resets FileIDs in situations other than reboots that could be much worse. Turning this off will atleast break nfs servers as it mentions it right in the docs "required by the NFS server".
    Finally, the FileID numbers given out by DrivePool are incremental and very low.  This means when they do reset you almost certainly will get collisions with former numbers.   What is not clear is if there is the chance of potential FileID corruption issues.  If when it is assigning these ids in a multi-threaded scenario with many different files at the same time could this system fail? I have seen no proof this happens, but when incremental ids are assigned like this for mass quantities of potential files it has a higher chance of occurring.
    Microsoft mentions this about deleting the USN Journal: "Deleting the change journal impacts the File Replication Service (FRS) and the Indexing Service, because it requires these services to perform a complete (and time-consuming) scan of the volume. This in turn negatively impacts FRS SYSVOL replication and replication between DFS link alternates while the volume is being rescanned.".  Now DrivePool never has the USN journal supported so it isn't exactly the same thing, but it is clear that several core Windows services do use it for normal operations I do not know what backups they use when it is unavailable. 

    Potential Fixes
    There are advanced settings for drivepool https://wiki.covecube.com/StableBit_DrivePool_2.x_Advanced_Settings beware these changes may break other things.
    CoveFs_OpenByFileId - Set to false, by default it is true.  This will disable the OpenByFileID API.  It is clear several applications use this API.  In addition, while DrivePool may disable that function with this setting it doesn't disable FileID's themselves.  Any application using FileIDs as static identifiers for files may still run into problems. 
    I would avoid any file backup/synchronization tools and DrivePool drives (if possible).  These likely have the highest chance of lost files, misplaced files, file content being mixed up, and excess resource usage.   If not avoiding consider taking file hashes for the entire drivepool directory tree.  Do this again at a later point and make sure files that shouldn't have changed still have the same hash.
    If you have files that rarely change after being created then hashing each file at some point after creation and alerting if that file disappears or hash changes would easily act as an early warning to a bug here being hit.
  14. Like
    MitchC got a reaction from fjih in Beware of DrivePool corruption / data leakage / file deletion / performance degradation scenarios Windows 10/11   
    To start, while new to DrivePool I love its potential I own multiple licenses and their full suite.  If you only use drivepool for basic file archiving of large files with simple applications accessing them for periodic reads it is probably uncommon you would hit these bugs.  This assumes you don't use any file synchronization / backup solutions.  Further, I don't know how many thousands (tens or hundreds?) of DrivePool users there are, but clearly many are not hitting these bugs or recognizing they are hitting these bugs, so this IT NOT some new destructive my files are 100% going to die issue.  Some of the reports I have seen on the forums though may be actually issues due to these things without it being recognized as such. As far as I know previously CoveCube was not aware of these issues, so tickets may not have even considered this possibility.
    I started reporting these bugs to StableBit ~9 months ago, and informed I would be putting this post together ~1 month ago.  Please see the disclaimer below as well, as some of this is based on observations over known facts.
    You are most likely to run into these bugs with applications that: *) Synchronize or backup files, including cloud mounted drives like onedrive or dropbox *) Applications that must handle large quantities of files or monitor them for changes like coding applications (Visual Studio/ VSCode)

    Still, these bugs can cause silent file corruption, file misplacement, deleted files, performance degradation, data leakage ( a file shared with someone externally could have its contents overwritten by any sensitive file on your computer), missed file changes, and potential other issues for a small portion of users (I have had nearly all these things occur).  It may also trigger some BSOD crashes, I had one such crash that is likely related.  Due to the subtle nature some of these bugs can present with, it may be hard to notice they are happening even if they are.  In addition, these issues can occur even without file mirroring and files pinned to a specific drive.  I do have some potential workarounds/suggestions at the bottom.
    More details are at the bottom but the important bug facts upfront:
    Windows has a native file changed notification API using overlapped IO calls.  This allows an application to listen for changes on a folder, or a folder and sub folders, without having to constantly check every file to see if it changed.  Stablebit triggers "file changed" notifications even when files are just accessed (read) in certain ways.  Stablebit does NOT generate notification events on the parent folder when a file under it changes (Windows does).  Stablebit does NOT generate a notification event only when a FileID changes (next bug talks about FileIDs).  
    Windows, like linux, has a unique ID number for each file written on the hard drive.  If there are hardlinks to the same file, it has the same unique ID (so one File ID may have multiple paths associated with it). In linux this is called the inode number, Windows calls it the FileID.  Rather than accessing a file by its path, you can open a file by its FileID.  In addition it is impossible for two files to share the same FileID, it is a 128 bit number persistent across reboots (128 bits means the number of unique numbers represented is 39 digits long, or has the uniqueness of something like the MD5 hash).  A FileID does not change when a file moves or is modified.  Stablebit, by default, supports FileIDs however they seem to be ephemeral, they do not seem to survive across reboots or file moves.  Keep in mind FileIDs are used for directories as well, it is not just files. Further, if a directory is moved/renamed not only does its FileID change but every file under it changes. I am not sure if there are other situations in which they may change.  In addition, if a descendant file/directory FileID changes due to something like a directory rename Stablebit does NOT generate a notification event that it has changed (the application gets the directory event notification but nothing on the children).
    There are some other things to consider as well, DrivePool does not implement the standard windows USN Journal (a system of tracking file changes on a drive).  It specifically identifies itself as not supporting this so applications shouldn't be trying to use it with a drivepool drive. That does mean that applications that traditionally don't use the file change notification API or the FileIDs may fall back to a combination of those to accomplish what they would otherwise use the USN Journal for (and this can exacerbate the problem).  The same is true of Volume Shadow Copy (VSS) where applications that might traditionally use this cannot (and drivepool identifies it cannot do VSS) so may resort to methods below that they do not traditionally use.

    Now the effects of the above bugs may not be completely apparent:
    For the overlapped IO / File change notification  This means an application monitoring for changes on a DrivePool folder or sub-folder will get erroneous notifications files changed when anything even accesses them. Just opening something like file explorer on a folder, or even switching between applications can cause file accesses that trigger the notification. If an application takes actions on a notification and then checks the file at the end of the notification this in itself may cause another notification.  Applications that rely on getting a folder changed notification when a child changes will not get these at all with DrivePool.  If it isn't monitoring children at all just the folder, this means no notifications could be generated (vs just the child) so it could miss changes.
    For FileIDs It depends what the application uses the FileID for but it may assume the FileID should stay the same when a file moves, as it doesn't with DrivePool this might mean it reads or backs up, or syncs the entire file again if it is moved (perf issue).  An application that uses the Windows API to open a File by its ID may not get the file it is expecting or the file that was simply moved will throw an error when opened by its old FileID as drivepool has changed the ID.   For an example lets say an application caches that the FileID for ImportantDoc1.docx is 12345 but then 12345 refers to ImportantDoc2.docx due to a restart.  If this application is a file sync application and ImportantDoc1.docx is changed remotely when it goes to write those remote changes to the local file if it uses the OpenFileById method to do so it will actually override ImportantDoc2.docx with those changes.
    I didn't spend the time to read Windows file system requirements to know when Windows expects a FileID to potentially change (or not change).  It is important to note that even if theoretical changes/reuse are allowed if they are not common place (because windows uses essentially a number like an md5 hash in terms of repeats) applications may just assume it doesn't happen even if it is technically allowed to do so.  A backup of file sync program might assume that a file with specific FileID is always the same file, if FileID 12345 is c:\MyDocuments\ImportantDoc1.docx one day and then c:\MyDocuments\ImportantDoc2.docx another it may mistake document 2 for document 1, overriding important data or restore data to the wrong place.  If it is trying to create a whole drive backup it may assume it has already backed up c:\MyDocuments\ImportantDoc2.docx if it now has the same File ID as ImportantDoc1.docx by the time it reaches it (at which point DrivePool would have a different FileID for Document1).

    Why might applications use FileIDs or file change notifiers? It may not seem intuitive why applications would use these but a few major reasons are: *) Performance, file change notifiers are a event/push based system so the application is told when something changes, the common alternative is a poll based system where an application must scan all the files looking for changes (and may try to rely on file timestamps or even hashing the entire file to determine this) this causes a good bit more overhead / slowdown.  *)  FileID's are nice because they already handle hardlink file de-duplication (Windows may have multiple copies of a file on a drive for various reasons, but if you backup based on FileID you backup that file once rather than multiple times.  FileIDs are also great for handling renames.  Lets say you are an application that syncs files and the user backs up c:\temp\mydir with 1000 files under it.  If they rename c:\temp\mydir to c:\temp\mydir2 an application use FileIDS can say, wait that folder is the same it was just renamed. OK rename that folder in our remote version too.  This is a very minimal operation on both ends.  With DrivePool however the FileID changes for the directory and all sub-files.  If the sync application uses this to determine changes it now uploads all these files to the system using a good bit more resources locally and remotely.  If the application also uses versioning this may be far more likely to cause a conflict with two or more clients syncing, as mass amounts of files are seemingly being changed.
    Finally, even if an application is trying to monitor for FileIDs changing using the file change API, due to notification bugs above it may not get any notifications when child FileIDs change so it might assume it has not.

    Real Examples
    OneDrive
    This started with massive onedrive failures.  I would find onedrive was re-uploading hundreds of gigabytes of images an videos multiple times a week.  These were not changing or moving.  I don't know if the issue is onedrive uses FileIDs to determine if a file is already uploaded, or if it is because when it scanned a directory it may have triggered a notification that all the files in that directory changed and based on that notification it reuploads.  After this I noticed files were becoming deleted both locally and in the cloud.  I don't know what caused this, it might have been because the old file it thought was deleted as the FileID was gone and while there was a new file (actually the same file) in its place there may have been some odd race condition.   It is also possible that it queued the file for upload, the FileID changed and when it went to open it to upload it found it was 'deleted' as the FileID no longer pointed to a file and queued the delete operation.   I also found that files that were uploaded into the cloud in one folder were sometimes downloading to an alternate folder locally.  I am guessing this is because the folder FileID changed.  It thought the 2023 folder was with ID XYZ but that now pointed to a different folder and so it put the file in the wrong place.  The final form of corruption was finding the data from one photo or video actually in a file with a completely different name.  This is almost guaranteed to be due to the FileID bugs.  This is highly destructive as backups make this far harder to correct.  With one files contents replaced with another you need to know when the good content existed and in what files were effected.  Depending on retention policies the file contents that replaced it may override the good backups before you notice.  I also had a BSOD with onedrive where it was trying to set attributes on a file and the CoveFS driver corrupted some memory.  It is possible this was a race condition as onedrive may have been doing hundreds of files very rapidly due to the bugs.  I have not captured a second BSOD due to it, but also stopped using onedrive on DrivePool due to the corruption.   Another example of this is data leakage.  Lets say you share your favorite article on kittens with a group of people.   Onedrive, believing that file has changed, goes to open it using the FileID however that file ID could essentially now correspond to any file on your computer now the contents of some sensitive file are put in the place of that kitten file, and everyone you shared it with can access it.
    Visual Studio Failures
    Visual studio is a code editor/compiler.  There are three distinct bugs that happen.  First, when compiling if you touched one file in a folder it seemed to recompile the entire folder, this due likely to the notification bug.  This is just a slow down, but an annoying one.  Second, Visual Studio has compiler generated code support.  This means the compiler will generate actual source code that lives next to your own source code.   Normally once compiled it doesn't regenerate and compile this source unless it must change but due to the notification bugs it regenerates this code constantly and if there is an error in other code it causes an error there causing several other invalid errors.  When debugging visual studio by default will only use symbols (debug location data) as the notifications from DrivePool happen on certain file accesses visual studio constantly thinks the source has changed since it was compiled and you will only be able to breakpoint inside source if you disable the exact symbol match default.  If you have multiple projects in a solution with one dependent on another it will often rebuild other project deps even when they haven't changed, for large solutions that can be crippling (perf issue).  Finally I often had intellisense errors showing up even though no errors during compiling, and worse intellisense would completely break at points.  All due to DrivePool.

    Technical details / full background & disclaimer
    I have sample code and logs to document these issues in greater detail if anyone wants to replicate it themselves.
    It is important for me to state drivepool is closed source and I don't have the technical details of how it works.  I also don't have the technical details on how applications like onedrive or visual studio work.  So some of these things may be guesses as to why the applications fail/etc.
    The facts stated are true (to the best of my knowledge) 

    Shortly before my trial expired in October of last year I discovered some odd behavior.  I had a technical ticket filed within a week and within a month had traced down at least one of the bugs.  The issue can be seen https://stablebit.com/Admin/IssueAnalysis/28720 , it does show priority 2/important which I would assume is the second highest (probably critical or similar above).  It is great it has priority but as we are over 6 months since filed without updates I figured warning others about the potential corruption was important.  

    The FileSystemWatcher API is implemented in windows using async overlapped IO the exact code can be seen: https://github.com/dotnet/runtime/blob/57bfe474518ab5b7cfe6bf7424a79ce3af9d6657/src/libraries/System.IO.FileSystem.Watcher/src/System/IO/FileSystemWatcher.Win32.cs#L32-L66
    That corresponds to this kernel api:
    https://learn.microsoft.com/en-us/windows/win32/fileio/synchronous-and-asynchronous-i-o
    Newer api calls use GetFileInformationByHandleEx to get the FileID but with older stats calls represented by nFileIndexHigh/nFileIndexLow.  

    In terms of the FileID bug I wouldn't normally have even thought about it but the advanced config (https://wiki.covecube.com/StableBit_DrivePool_2.x_Advanced_Settings) mentions this under CoveFs_OpenByFileId  "When enabled, the pool will keep track of every file ID that it gives out in pageable memory (memory that is saved to disk and loaded as necessary).".   Keeping track of files in memory is certainly very different from Windows so I thought this may be the source of issue.  I also don't know if there are caps on the maximum number of files it will track as if it resets FileIDs in situations other than reboots that could be much worse. Turning this off will atleast break nfs servers as it mentions it right in the docs "required by the NFS server".
    Finally, the FileID numbers given out by DrivePool are incremental and very low.  This means when they do reset you almost certainly will get collisions with former numbers.   What is not clear is if there is the chance of potential FileID corruption issues.  If when it is assigning these ids in a multi-threaded scenario with many different files at the same time could this system fail? I have seen no proof this happens, but when incremental ids are assigned like this for mass quantities of potential files it has a higher chance of occurring.
    Microsoft mentions this about deleting the USN Journal: "Deleting the change journal impacts the File Replication Service (FRS) and the Indexing Service, because it requires these services to perform a complete (and time-consuming) scan of the volume. This in turn negatively impacts FRS SYSVOL replication and replication between DFS link alternates while the volume is being rescanned.".  Now DrivePool never has the USN journal supported so it isn't exactly the same thing, but it is clear that several core Windows services do use it for normal operations I do not know what backups they use when it is unavailable. 

    Potential Fixes
    There are advanced settings for drivepool https://wiki.covecube.com/StableBit_DrivePool_2.x_Advanced_Settings beware these changes may break other things.
    CoveFs_OpenByFileId - Set to false, by default it is true.  This will disable the OpenByFileID API.  It is clear several applications use this API.  In addition, while DrivePool may disable that function with this setting it doesn't disable FileID's themselves.  Any application using FileIDs as static identifiers for files may still run into problems. 
    I would avoid any file backup/synchronization tools and DrivePool drives (if possible).  These likely have the highest chance of lost files, misplaced files, file content being mixed up, and excess resource usage.   If not avoiding consider taking file hashes for the entire drivepool directory tree.  Do this again at a later point and make sure files that shouldn't have changed still have the same hash.
    If you have files that rarely change after being created then hashing each file at some point after creation and alerting if that file disappears or hash changes would easily act as an early warning to a bug here being hit.
  15. Thanks
    MitchC got a reaction from Jonibhoni in Beware of DrivePool corruption / data leakage / file deletion / performance degradation scenarios Windows 10/11   
    To start, while new to DrivePool I love its potential I own multiple licenses and their full suite.  If you only use drivepool for basic file archiving of large files with simple applications accessing them for periodic reads it is probably uncommon you would hit these bugs.  This assumes you don't use any file synchronization / backup solutions.  Further, I don't know how many thousands (tens or hundreds?) of DrivePool users there are, but clearly many are not hitting these bugs or recognizing they are hitting these bugs, so this IT NOT some new destructive my files are 100% going to die issue.  Some of the reports I have seen on the forums though may be actually issues due to these things without it being recognized as such. As far as I know previously CoveCube was not aware of these issues, so tickets may not have even considered this possibility.
    I started reporting these bugs to StableBit ~9 months ago, and informed I would be putting this post together ~1 month ago.  Please see the disclaimer below as well, as some of this is based on observations over known facts.
    You are most likely to run into these bugs with applications that: *) Synchronize or backup files, including cloud mounted drives like onedrive or dropbox *) Applications that must handle large quantities of files or monitor them for changes like coding applications (Visual Studio/ VSCode)

    Still, these bugs can cause silent file corruption, file misplacement, deleted files, performance degradation, data leakage ( a file shared with someone externally could have its contents overwritten by any sensitive file on your computer), missed file changes, and potential other issues for a small portion of users (I have had nearly all these things occur).  It may also trigger some BSOD crashes, I had one such crash that is likely related.  Due to the subtle nature some of these bugs can present with, it may be hard to notice they are happening even if they are.  In addition, these issues can occur even without file mirroring and files pinned to a specific drive.  I do have some potential workarounds/suggestions at the bottom.
    More details are at the bottom but the important bug facts upfront:
    Windows has a native file changed notification API using overlapped IO calls.  This allows an application to listen for changes on a folder, or a folder and sub folders, without having to constantly check every file to see if it changed.  Stablebit triggers "file changed" notifications even when files are just accessed (read) in certain ways.  Stablebit does NOT generate notification events on the parent folder when a file under it changes (Windows does).  Stablebit does NOT generate a notification event only when a FileID changes (next bug talks about FileIDs).  
    Windows, like linux, has a unique ID number for each file written on the hard drive.  If there are hardlinks to the same file, it has the same unique ID (so one File ID may have multiple paths associated with it). In linux this is called the inode number, Windows calls it the FileID.  Rather than accessing a file by its path, you can open a file by its FileID.  In addition it is impossible for two files to share the same FileID, it is a 128 bit number persistent across reboots (128 bits means the number of unique numbers represented is 39 digits long, or has the uniqueness of something like the MD5 hash).  A FileID does not change when a file moves or is modified.  Stablebit, by default, supports FileIDs however they seem to be ephemeral, they do not seem to survive across reboots or file moves.  Keep in mind FileIDs are used for directories as well, it is not just files. Further, if a directory is moved/renamed not only does its FileID change but every file under it changes. I am not sure if there are other situations in which they may change.  In addition, if a descendant file/directory FileID changes due to something like a directory rename Stablebit does NOT generate a notification event that it has changed (the application gets the directory event notification but nothing on the children).
    There are some other things to consider as well, DrivePool does not implement the standard windows USN Journal (a system of tracking file changes on a drive).  It specifically identifies itself as not supporting this so applications shouldn't be trying to use it with a drivepool drive. That does mean that applications that traditionally don't use the file change notification API or the FileIDs may fall back to a combination of those to accomplish what they would otherwise use the USN Journal for (and this can exacerbate the problem).  The same is true of Volume Shadow Copy (VSS) where applications that might traditionally use this cannot (and drivepool identifies it cannot do VSS) so may resort to methods below that they do not traditionally use.

    Now the effects of the above bugs may not be completely apparent:
    For the overlapped IO / File change notification  This means an application monitoring for changes on a DrivePool folder or sub-folder will get erroneous notifications files changed when anything even accesses them. Just opening something like file explorer on a folder, or even switching between applications can cause file accesses that trigger the notification. If an application takes actions on a notification and then checks the file at the end of the notification this in itself may cause another notification.  Applications that rely on getting a folder changed notification when a child changes will not get these at all with DrivePool.  If it isn't monitoring children at all just the folder, this means no notifications could be generated (vs just the child) so it could miss changes.
    For FileIDs It depends what the application uses the FileID for but it may assume the FileID should stay the same when a file moves, as it doesn't with DrivePool this might mean it reads or backs up, or syncs the entire file again if it is moved (perf issue).  An application that uses the Windows API to open a File by its ID may not get the file it is expecting or the file that was simply moved will throw an error when opened by its old FileID as drivepool has changed the ID.   For an example lets say an application caches that the FileID for ImportantDoc1.docx is 12345 but then 12345 refers to ImportantDoc2.docx due to a restart.  If this application is a file sync application and ImportantDoc1.docx is changed remotely when it goes to write those remote changes to the local file if it uses the OpenFileById method to do so it will actually override ImportantDoc2.docx with those changes.
    I didn't spend the time to read Windows file system requirements to know when Windows expects a FileID to potentially change (or not change).  It is important to note that even if theoretical changes/reuse are allowed if they are not common place (because windows uses essentially a number like an md5 hash in terms of repeats) applications may just assume it doesn't happen even if it is technically allowed to do so.  A backup of file sync program might assume that a file with specific FileID is always the same file, if FileID 12345 is c:\MyDocuments\ImportantDoc1.docx one day and then c:\MyDocuments\ImportantDoc2.docx another it may mistake document 2 for document 1, overriding important data or restore data to the wrong place.  If it is trying to create a whole drive backup it may assume it has already backed up c:\MyDocuments\ImportantDoc2.docx if it now has the same File ID as ImportantDoc1.docx by the time it reaches it (at which point DrivePool would have a different FileID for Document1).

    Why might applications use FileIDs or file change notifiers? It may not seem intuitive why applications would use these but a few major reasons are: *) Performance, file change notifiers are a event/push based system so the application is told when something changes, the common alternative is a poll based system where an application must scan all the files looking for changes (and may try to rely on file timestamps or even hashing the entire file to determine this) this causes a good bit more overhead / slowdown.  *)  FileID's are nice because they already handle hardlink file de-duplication (Windows may have multiple copies of a file on a drive for various reasons, but if you backup based on FileID you backup that file once rather than multiple times.  FileIDs are also great for handling renames.  Lets say you are an application that syncs files and the user backs up c:\temp\mydir with 1000 files under it.  If they rename c:\temp\mydir to c:\temp\mydir2 an application use FileIDS can say, wait that folder is the same it was just renamed. OK rename that folder in our remote version too.  This is a very minimal operation on both ends.  With DrivePool however the FileID changes for the directory and all sub-files.  If the sync application uses this to determine changes it now uploads all these files to the system using a good bit more resources locally and remotely.  If the application also uses versioning this may be far more likely to cause a conflict with two or more clients syncing, as mass amounts of files are seemingly being changed.
    Finally, even if an application is trying to monitor for FileIDs changing using the file change API, due to notification bugs above it may not get any notifications when child FileIDs change so it might assume it has not.

    Real Examples
    OneDrive
    This started with massive onedrive failures.  I would find onedrive was re-uploading hundreds of gigabytes of images an videos multiple times a week.  These were not changing or moving.  I don't know if the issue is onedrive uses FileIDs to determine if a file is already uploaded, or if it is because when it scanned a directory it may have triggered a notification that all the files in that directory changed and based on that notification it reuploads.  After this I noticed files were becoming deleted both locally and in the cloud.  I don't know what caused this, it might have been because the old file it thought was deleted as the FileID was gone and while there was a new file (actually the same file) in its place there may have been some odd race condition.   It is also possible that it queued the file for upload, the FileID changed and when it went to open it to upload it found it was 'deleted' as the FileID no longer pointed to a file and queued the delete operation.   I also found that files that were uploaded into the cloud in one folder were sometimes downloading to an alternate folder locally.  I am guessing this is because the folder FileID changed.  It thought the 2023 folder was with ID XYZ but that now pointed to a different folder and so it put the file in the wrong place.  The final form of corruption was finding the data from one photo or video actually in a file with a completely different name.  This is almost guaranteed to be due to the FileID bugs.  This is highly destructive as backups make this far harder to correct.  With one files contents replaced with another you need to know when the good content existed and in what files were effected.  Depending on retention policies the file contents that replaced it may override the good backups before you notice.  I also had a BSOD with onedrive where it was trying to set attributes on a file and the CoveFS driver corrupted some memory.  It is possible this was a race condition as onedrive may have been doing hundreds of files very rapidly due to the bugs.  I have not captured a second BSOD due to it, but also stopped using onedrive on DrivePool due to the corruption.   Another example of this is data leakage.  Lets say you share your favorite article on kittens with a group of people.   Onedrive, believing that file has changed, goes to open it using the FileID however that file ID could essentially now correspond to any file on your computer now the contents of some sensitive file are put in the place of that kitten file, and everyone you shared it with can access it.
    Visual Studio Failures
    Visual studio is a code editor/compiler.  There are three distinct bugs that happen.  First, when compiling if you touched one file in a folder it seemed to recompile the entire folder, this due likely to the notification bug.  This is just a slow down, but an annoying one.  Second, Visual Studio has compiler generated code support.  This means the compiler will generate actual source code that lives next to your own source code.   Normally once compiled it doesn't regenerate and compile this source unless it must change but due to the notification bugs it regenerates this code constantly and if there is an error in other code it causes an error there causing several other invalid errors.  When debugging visual studio by default will only use symbols (debug location data) as the notifications from DrivePool happen on certain file accesses visual studio constantly thinks the source has changed since it was compiled and you will only be able to breakpoint inside source if you disable the exact symbol match default.  If you have multiple projects in a solution with one dependent on another it will often rebuild other project deps even when they haven't changed, for large solutions that can be crippling (perf issue).  Finally I often had intellisense errors showing up even though no errors during compiling, and worse intellisense would completely break at points.  All due to DrivePool.

    Technical details / full background & disclaimer
    I have sample code and logs to document these issues in greater detail if anyone wants to replicate it themselves.
    It is important for me to state drivepool is closed source and I don't have the technical details of how it works.  I also don't have the technical details on how applications like onedrive or visual studio work.  So some of these things may be guesses as to why the applications fail/etc.
    The facts stated are true (to the best of my knowledge) 

    Shortly before my trial expired in October of last year I discovered some odd behavior.  I had a technical ticket filed within a week and within a month had traced down at least one of the bugs.  The issue can be seen https://stablebit.com/Admin/IssueAnalysis/28720 , it does show priority 2/important which I would assume is the second highest (probably critical or similar above).  It is great it has priority but as we are over 6 months since filed without updates I figured warning others about the potential corruption was important.  

    The FileSystemWatcher API is implemented in windows using async overlapped IO the exact code can be seen: https://github.com/dotnet/runtime/blob/57bfe474518ab5b7cfe6bf7424a79ce3af9d6657/src/libraries/System.IO.FileSystem.Watcher/src/System/IO/FileSystemWatcher.Win32.cs#L32-L66
    That corresponds to this kernel api:
    https://learn.microsoft.com/en-us/windows/win32/fileio/synchronous-and-asynchronous-i-o
    Newer api calls use GetFileInformationByHandleEx to get the FileID but with older stats calls represented by nFileIndexHigh/nFileIndexLow.  

    In terms of the FileID bug I wouldn't normally have even thought about it but the advanced config (https://wiki.covecube.com/StableBit_DrivePool_2.x_Advanced_Settings) mentions this under CoveFs_OpenByFileId  "When enabled, the pool will keep track of every file ID that it gives out in pageable memory (memory that is saved to disk and loaded as necessary).".   Keeping track of files in memory is certainly very different from Windows so I thought this may be the source of issue.  I also don't know if there are caps on the maximum number of files it will track as if it resets FileIDs in situations other than reboots that could be much worse. Turning this off will atleast break nfs servers as it mentions it right in the docs "required by the NFS server".
    Finally, the FileID numbers given out by DrivePool are incremental and very low.  This means when they do reset you almost certainly will get collisions with former numbers.   What is not clear is if there is the chance of potential FileID corruption issues.  If when it is assigning these ids in a multi-threaded scenario with many different files at the same time could this system fail? I have seen no proof this happens, but when incremental ids are assigned like this for mass quantities of potential files it has a higher chance of occurring.
    Microsoft mentions this about deleting the USN Journal: "Deleting the change journal impacts the File Replication Service (FRS) and the Indexing Service, because it requires these services to perform a complete (and time-consuming) scan of the volume. This in turn negatively impacts FRS SYSVOL replication and replication between DFS link alternates while the volume is being rescanned.".  Now DrivePool never has the USN journal supported so it isn't exactly the same thing, but it is clear that several core Windows services do use it for normal operations I do not know what backups they use when it is unavailable. 

    Potential Fixes
    There are advanced settings for drivepool https://wiki.covecube.com/StableBit_DrivePool_2.x_Advanced_Settings beware these changes may break other things.
    CoveFs_OpenByFileId - Set to false, by default it is true.  This will disable the OpenByFileID API.  It is clear several applications use this API.  In addition, while DrivePool may disable that function with this setting it doesn't disable FileID's themselves.  Any application using FileIDs as static identifiers for files may still run into problems. 
    I would avoid any file backup/synchronization tools and DrivePool drives (if possible).  These likely have the highest chance of lost files, misplaced files, file content being mixed up, and excess resource usage.   If not avoiding consider taking file hashes for the entire drivepool directory tree.  Do this again at a later point and make sure files that shouldn't have changed still have the same hash.
    If you have files that rarely change after being created then hashing each file at some point after creation and alerting if that file disappears or hash changes would easily act as an early warning to a bug here being hit.
  16. Like
    MitchC got a reaction from Shane in Beware of DrivePool corruption / data leakage / file deletion / performance degradation scenarios Windows 10/11   
    Sorry, should also mention this is confirmed by StableBit and can be easily reproduced.   The attached powershell script is a basic example of the file monitoring api.  Run it by "monitor.ps1 my_folder"  where my folder is what you want to monitor.  Have a file say hello.txt inside.   Open that file in notepad.     It should instantly generate a monitoring file change event.  Further tab away from notepad and tab back to it, you will again get a changed event for that file.  Run the same thing on a true NTFS system and it will not do the same.
    You can also reproduce the lack of notifications for other events by changing the IncludeSubdirectories variable in it and doing some of the tests I mention above.
    watcher.ps1
  17. Like
    MitchC got a reaction from Shane in Beware of DrivePool corruption / data leakage / file deletion / performance degradation scenarios Windows 10/11   
    So this is correct, as the documentation you linked to states.  One item I mentioned though, is the fact that even if it can  be re-used if in practice it isn't software may make the wrong assumption that it won't.  Not good on that software but it may be a practical exception that one might try to meet.  Further, that documentation also states:
    "In the NTFS file system, a file keeps the same file ID until it is deleted. "
    As DrivePool identifies itself as NTFS it is breaking that expectation.
    I am not sure how well things work if you just disable File IDs, maybe software will fallback to a more safe behavior (even if less performant).    In addition, I think the biggest issue is silent file corruption.  I think that can only happen due to File ID collisions (rather than just the FIle ID changing).   It is a 128 bit number, GUID's are 128 bits.  Just randomize the sucker the first time you assign a file ID (rather than using the incremental behavior currently).  Aside from it being more thread safe as you don't have a single locked increment counter it is highly unlikely you would hit a collision.  Could you run into a duplicate ? sure.  Likely? Probably not.   Maybe over many reboots (or whatever resets the ID's in drivepool beside that) but as long as whatever app that uses the FileID has detected it is gone before it is reused it eventually colliding would likely not have much effect.   Not perfect but probably an easier solution.  Granted apps like onedrive may still think all the files are deleted and re-upload them if the FileID's change (although that may be more likely due to the notification bug).
    Sure.  Except one doesn't always know how tools work.  I am only making a highly educated guess this is what OneDrive is using, but only made this after significant file corruption and research.  One would hope you don't need to have corruption before figuring out the tool you are using uses the FileID.    In addition, FileID may not be the primary item a backup/sync tool uses but something like USF may be a much more common first choice.  It may only fall back to other options when that is not available.
    Is it possible the 5-6 apps I have found that run into issues are the only ones out there that uses these things? Sure.  I just would guess I am not that lucky so there are likely many more that use these features.
     
    I did see either you (or someone else) who posted about the file hashing issue with the read striping.  It is a big shame, reporting data corruption (invalid hash values or rather returning the wrong read data which is what would lead to that) is another fairly massive problem.    Marking good data bad because of an inconsistent read can lead to someone thinking they lost data and trashing it, or restoring an older version that may cause newer data to be lost in an attempt to fix.  I would look into a more consistent read striping repro test but at the end of the day these other things stop me from being able to use drivepool for most things I would like to.
  18. Thanks
    MitchC got a reaction from Pafegori in Beware of DrivePool corruption / data leakage / file deletion / performance degradation scenarios Windows 10/11   
    To start, while new to DrivePool I love its potential I own multiple licenses and their full suite.  If you only use drivepool for basic file archiving of large files with simple applications accessing them for periodic reads it is probably uncommon you would hit these bugs.  This assumes you don't use any file synchronization / backup solutions.  Further, I don't know how many thousands (tens or hundreds?) of DrivePool users there are, but clearly many are not hitting these bugs or recognizing they are hitting these bugs, so this IT NOT some new destructive my files are 100% going to die issue.  Some of the reports I have seen on the forums though may be actually issues due to these things without it being recognized as such. As far as I know previously CoveCube was not aware of these issues, so tickets may not have even considered this possibility.
    I started reporting these bugs to StableBit ~9 months ago, and informed I would be putting this post together ~1 month ago.  Please see the disclaimer below as well, as some of this is based on observations over known facts.
    You are most likely to run into these bugs with applications that: *) Synchronize or backup files, including cloud mounted drives like onedrive or dropbox *) Applications that must handle large quantities of files or monitor them for changes like coding applications (Visual Studio/ VSCode)

    Still, these bugs can cause silent file corruption, file misplacement, deleted files, performance degradation, data leakage ( a file shared with someone externally could have its contents overwritten by any sensitive file on your computer), missed file changes, and potential other issues for a small portion of users (I have had nearly all these things occur).  It may also trigger some BSOD crashes, I had one such crash that is likely related.  Due to the subtle nature some of these bugs can present with, it may be hard to notice they are happening even if they are.  In addition, these issues can occur even without file mirroring and files pinned to a specific drive.  I do have some potential workarounds/suggestions at the bottom.
    More details are at the bottom but the important bug facts upfront:
    Windows has a native file changed notification API using overlapped IO calls.  This allows an application to listen for changes on a folder, or a folder and sub folders, without having to constantly check every file to see if it changed.  Stablebit triggers "file changed" notifications even when files are just accessed (read) in certain ways.  Stablebit does NOT generate notification events on the parent folder when a file under it changes (Windows does).  Stablebit does NOT generate a notification event only when a FileID changes (next bug talks about FileIDs).  
    Windows, like linux, has a unique ID number for each file written on the hard drive.  If there are hardlinks to the same file, it has the same unique ID (so one File ID may have multiple paths associated with it). In linux this is called the inode number, Windows calls it the FileID.  Rather than accessing a file by its path, you can open a file by its FileID.  In addition it is impossible for two files to share the same FileID, it is a 128 bit number persistent across reboots (128 bits means the number of unique numbers represented is 39 digits long, or has the uniqueness of something like the MD5 hash).  A FileID does not change when a file moves or is modified.  Stablebit, by default, supports FileIDs however they seem to be ephemeral, they do not seem to survive across reboots or file moves.  Keep in mind FileIDs are used for directories as well, it is not just files. Further, if a directory is moved/renamed not only does its FileID change but every file under it changes. I am not sure if there are other situations in which they may change.  In addition, if a descendant file/directory FileID changes due to something like a directory rename Stablebit does NOT generate a notification event that it has changed (the application gets the directory event notification but nothing on the children).
    There are some other things to consider as well, DrivePool does not implement the standard windows USN Journal (a system of tracking file changes on a drive).  It specifically identifies itself as not supporting this so applications shouldn't be trying to use it with a drivepool drive. That does mean that applications that traditionally don't use the file change notification API or the FileIDs may fall back to a combination of those to accomplish what they would otherwise use the USN Journal for (and this can exacerbate the problem).  The same is true of Volume Shadow Copy (VSS) where applications that might traditionally use this cannot (and drivepool identifies it cannot do VSS) so may resort to methods below that they do not traditionally use.

    Now the effects of the above bugs may not be completely apparent:
    For the overlapped IO / File change notification  This means an application monitoring for changes on a DrivePool folder or sub-folder will get erroneous notifications files changed when anything even accesses them. Just opening something like file explorer on a folder, or even switching between applications can cause file accesses that trigger the notification. If an application takes actions on a notification and then checks the file at the end of the notification this in itself may cause another notification.  Applications that rely on getting a folder changed notification when a child changes will not get these at all with DrivePool.  If it isn't monitoring children at all just the folder, this means no notifications could be generated (vs just the child) so it could miss changes.
    For FileIDs It depends what the application uses the FileID for but it may assume the FileID should stay the same when a file moves, as it doesn't with DrivePool this might mean it reads or backs up, or syncs the entire file again if it is moved (perf issue).  An application that uses the Windows API to open a File by its ID may not get the file it is expecting or the file that was simply moved will throw an error when opened by its old FileID as drivepool has changed the ID.   For an example lets say an application caches that the FileID for ImportantDoc1.docx is 12345 but then 12345 refers to ImportantDoc2.docx due to a restart.  If this application is a file sync application and ImportantDoc1.docx is changed remotely when it goes to write those remote changes to the local file if it uses the OpenFileById method to do so it will actually override ImportantDoc2.docx with those changes.
    I didn't spend the time to read Windows file system requirements to know when Windows expects a FileID to potentially change (or not change).  It is important to note that even if theoretical changes/reuse are allowed if they are not common place (because windows uses essentially a number like an md5 hash in terms of repeats) applications may just assume it doesn't happen even if it is technically allowed to do so.  A backup of file sync program might assume that a file with specific FileID is always the same file, if FileID 12345 is c:\MyDocuments\ImportantDoc1.docx one day and then c:\MyDocuments\ImportantDoc2.docx another it may mistake document 2 for document 1, overriding important data or restore data to the wrong place.  If it is trying to create a whole drive backup it may assume it has already backed up c:\MyDocuments\ImportantDoc2.docx if it now has the same File ID as ImportantDoc1.docx by the time it reaches it (at which point DrivePool would have a different FileID for Document1).

    Why might applications use FileIDs or file change notifiers? It may not seem intuitive why applications would use these but a few major reasons are: *) Performance, file change notifiers are a event/push based system so the application is told when something changes, the common alternative is a poll based system where an application must scan all the files looking for changes (and may try to rely on file timestamps or even hashing the entire file to determine this) this causes a good bit more overhead / slowdown.  *)  FileID's are nice because they already handle hardlink file de-duplication (Windows may have multiple copies of a file on a drive for various reasons, but if you backup based on FileID you backup that file once rather than multiple times.  FileIDs are also great for handling renames.  Lets say you are an application that syncs files and the user backs up c:\temp\mydir with 1000 files under it.  If they rename c:\temp\mydir to c:\temp\mydir2 an application use FileIDS can say, wait that folder is the same it was just renamed. OK rename that folder in our remote version too.  This is a very minimal operation on both ends.  With DrivePool however the FileID changes for the directory and all sub-files.  If the sync application uses this to determine changes it now uploads all these files to the system using a good bit more resources locally and remotely.  If the application also uses versioning this may be far more likely to cause a conflict with two or more clients syncing, as mass amounts of files are seemingly being changed.
    Finally, even if an application is trying to monitor for FileIDs changing using the file change API, due to notification bugs above it may not get any notifications when child FileIDs change so it might assume it has not.

    Real Examples
    OneDrive
    This started with massive onedrive failures.  I would find onedrive was re-uploading hundreds of gigabytes of images an videos multiple times a week.  These were not changing or moving.  I don't know if the issue is onedrive uses FileIDs to determine if a file is already uploaded, or if it is because when it scanned a directory it may have triggered a notification that all the files in that directory changed and based on that notification it reuploads.  After this I noticed files were becoming deleted both locally and in the cloud.  I don't know what caused this, it might have been because the old file it thought was deleted as the FileID was gone and while there was a new file (actually the same file) in its place there may have been some odd race condition.   It is also possible that it queued the file for upload, the FileID changed and when it went to open it to upload it found it was 'deleted' as the FileID no longer pointed to a file and queued the delete operation.   I also found that files that were uploaded into the cloud in one folder were sometimes downloading to an alternate folder locally.  I am guessing this is because the folder FileID changed.  It thought the 2023 folder was with ID XYZ but that now pointed to a different folder and so it put the file in the wrong place.  The final form of corruption was finding the data from one photo or video actually in a file with a completely different name.  This is almost guaranteed to be due to the FileID bugs.  This is highly destructive as backups make this far harder to correct.  With one files contents replaced with another you need to know when the good content existed and in what files were effected.  Depending on retention policies the file contents that replaced it may override the good backups before you notice.  I also had a BSOD with onedrive where it was trying to set attributes on a file and the CoveFS driver corrupted some memory.  It is possible this was a race condition as onedrive may have been doing hundreds of files very rapidly due to the bugs.  I have not captured a second BSOD due to it, but also stopped using onedrive on DrivePool due to the corruption.   Another example of this is data leakage.  Lets say you share your favorite article on kittens with a group of people.   Onedrive, believing that file has changed, goes to open it using the FileID however that file ID could essentially now correspond to any file on your computer now the contents of some sensitive file are put in the place of that kitten file, and everyone you shared it with can access it.
    Visual Studio Failures
    Visual studio is a code editor/compiler.  There are three distinct bugs that happen.  First, when compiling if you touched one file in a folder it seemed to recompile the entire folder, this due likely to the notification bug.  This is just a slow down, but an annoying one.  Second, Visual Studio has compiler generated code support.  This means the compiler will generate actual source code that lives next to your own source code.   Normally once compiled it doesn't regenerate and compile this source unless it must change but due to the notification bugs it regenerates this code constantly and if there is an error in other code it causes an error there causing several other invalid errors.  When debugging visual studio by default will only use symbols (debug location data) as the notifications from DrivePool happen on certain file accesses visual studio constantly thinks the source has changed since it was compiled and you will only be able to breakpoint inside source if you disable the exact symbol match default.  If you have multiple projects in a solution with one dependent on another it will often rebuild other project deps even when they haven't changed, for large solutions that can be crippling (perf issue).  Finally I often had intellisense errors showing up even though no errors during compiling, and worse intellisense would completely break at points.  All due to DrivePool.

    Technical details / full background & disclaimer
    I have sample code and logs to document these issues in greater detail if anyone wants to replicate it themselves.
    It is important for me to state drivepool is closed source and I don't have the technical details of how it works.  I also don't have the technical details on how applications like onedrive or visual studio work.  So some of these things may be guesses as to why the applications fail/etc.
    The facts stated are true (to the best of my knowledge) 

    Shortly before my trial expired in October of last year I discovered some odd behavior.  I had a technical ticket filed within a week and within a month had traced down at least one of the bugs.  The issue can be seen https://stablebit.com/Admin/IssueAnalysis/28720 , it does show priority 2/important which I would assume is the second highest (probably critical or similar above).  It is great it has priority but as we are over 6 months since filed without updates I figured warning others about the potential corruption was important.  

    The FileSystemWatcher API is implemented in windows using async overlapped IO the exact code can be seen: https://github.com/dotnet/runtime/blob/57bfe474518ab5b7cfe6bf7424a79ce3af9d6657/src/libraries/System.IO.FileSystem.Watcher/src/System/IO/FileSystemWatcher.Win32.cs#L32-L66
    That corresponds to this kernel api:
    https://learn.microsoft.com/en-us/windows/win32/fileio/synchronous-and-asynchronous-i-o
    Newer api calls use GetFileInformationByHandleEx to get the FileID but with older stats calls represented by nFileIndexHigh/nFileIndexLow.  

    In terms of the FileID bug I wouldn't normally have even thought about it but the advanced config (https://wiki.covecube.com/StableBit_DrivePool_2.x_Advanced_Settings) mentions this under CoveFs_OpenByFileId  "When enabled, the pool will keep track of every file ID that it gives out in pageable memory (memory that is saved to disk and loaded as necessary).".   Keeping track of files in memory is certainly very different from Windows so I thought this may be the source of issue.  I also don't know if there are caps on the maximum number of files it will track as if it resets FileIDs in situations other than reboots that could be much worse. Turning this off will atleast break nfs servers as it mentions it right in the docs "required by the NFS server".
    Finally, the FileID numbers given out by DrivePool are incremental and very low.  This means when they do reset you almost certainly will get collisions with former numbers.   What is not clear is if there is the chance of potential FileID corruption issues.  If when it is assigning these ids in a multi-threaded scenario with many different files at the same time could this system fail? I have seen no proof this happens, but when incremental ids are assigned like this for mass quantities of potential files it has a higher chance of occurring.
    Microsoft mentions this about deleting the USN Journal: "Deleting the change journal impacts the File Replication Service (FRS) and the Indexing Service, because it requires these services to perform a complete (and time-consuming) scan of the volume. This in turn negatively impacts FRS SYSVOL replication and replication between DFS link alternates while the volume is being rescanned.".  Now DrivePool never has the USN journal supported so it isn't exactly the same thing, but it is clear that several core Windows services do use it for normal operations I do not know what backups they use when it is unavailable. 

    Potential Fixes
    There are advanced settings for drivepool https://wiki.covecube.com/StableBit_DrivePool_2.x_Advanced_Settings beware these changes may break other things.
    CoveFs_OpenByFileId - Set to false, by default it is true.  This will disable the OpenByFileID API.  It is clear several applications use this API.  In addition, while DrivePool may disable that function with this setting it doesn't disable FileID's themselves.  Any application using FileIDs as static identifiers for files may still run into problems. 
    I would avoid any file backup/synchronization tools and DrivePool drives (if possible).  These likely have the highest chance of lost files, misplaced files, file content being mixed up, and excess resource usage.   If not avoiding consider taking file hashes for the entire drivepool directory tree.  Do this again at a later point and make sure files that shouldn't have changed still have the same hash.
    If you have files that rarely change after being created then hashing each file at some point after creation and alerting if that file disappears or hash changes would easily act as an early warning to a bug here being hit.
  19. Thanks
    MitchC got a reaction from Shane in Beware of DrivePool corruption / data leakage / file deletion / performance degradation scenarios Windows 10/11   
    To start, while new to DrivePool I love its potential I own multiple licenses and their full suite.  If you only use drivepool for basic file archiving of large files with simple applications accessing them for periodic reads it is probably uncommon you would hit these bugs.  This assumes you don't use any file synchronization / backup solutions.  Further, I don't know how many thousands (tens or hundreds?) of DrivePool users there are, but clearly many are not hitting these bugs or recognizing they are hitting these bugs, so this IT NOT some new destructive my files are 100% going to die issue.  Some of the reports I have seen on the forums though may be actually issues due to these things without it being recognized as such. As far as I know previously CoveCube was not aware of these issues, so tickets may not have even considered this possibility.
    I started reporting these bugs to StableBit ~9 months ago, and informed I would be putting this post together ~1 month ago.  Please see the disclaimer below as well, as some of this is based on observations over known facts.
    You are most likely to run into these bugs with applications that: *) Synchronize or backup files, including cloud mounted drives like onedrive or dropbox *) Applications that must handle large quantities of files or monitor them for changes like coding applications (Visual Studio/ VSCode)

    Still, these bugs can cause silent file corruption, file misplacement, deleted files, performance degradation, data leakage ( a file shared with someone externally could have its contents overwritten by any sensitive file on your computer), missed file changes, and potential other issues for a small portion of users (I have had nearly all these things occur).  It may also trigger some BSOD crashes, I had one such crash that is likely related.  Due to the subtle nature some of these bugs can present with, it may be hard to notice they are happening even if they are.  In addition, these issues can occur even without file mirroring and files pinned to a specific drive.  I do have some potential workarounds/suggestions at the bottom.
    More details are at the bottom but the important bug facts upfront:
    Windows has a native file changed notification API using overlapped IO calls.  This allows an application to listen for changes on a folder, or a folder and sub folders, without having to constantly check every file to see if it changed.  Stablebit triggers "file changed" notifications even when files are just accessed (read) in certain ways.  Stablebit does NOT generate notification events on the parent folder when a file under it changes (Windows does).  Stablebit does NOT generate a notification event only when a FileID changes (next bug talks about FileIDs).  
    Windows, like linux, has a unique ID number for each file written on the hard drive.  If there are hardlinks to the same file, it has the same unique ID (so one File ID may have multiple paths associated with it). In linux this is called the inode number, Windows calls it the FileID.  Rather than accessing a file by its path, you can open a file by its FileID.  In addition it is impossible for two files to share the same FileID, it is a 128 bit number persistent across reboots (128 bits means the number of unique numbers represented is 39 digits long, or has the uniqueness of something like the MD5 hash).  A FileID does not change when a file moves or is modified.  Stablebit, by default, supports FileIDs however they seem to be ephemeral, they do not seem to survive across reboots or file moves.  Keep in mind FileIDs are used for directories as well, it is not just files. Further, if a directory is moved/renamed not only does its FileID change but every file under it changes. I am not sure if there are other situations in which they may change.  In addition, if a descendant file/directory FileID changes due to something like a directory rename Stablebit does NOT generate a notification event that it has changed (the application gets the directory event notification but nothing on the children).
    There are some other things to consider as well, DrivePool does not implement the standard windows USN Journal (a system of tracking file changes on a drive).  It specifically identifies itself as not supporting this so applications shouldn't be trying to use it with a drivepool drive. That does mean that applications that traditionally don't use the file change notification API or the FileIDs may fall back to a combination of those to accomplish what they would otherwise use the USN Journal for (and this can exacerbate the problem).  The same is true of Volume Shadow Copy (VSS) where applications that might traditionally use this cannot (and drivepool identifies it cannot do VSS) so may resort to methods below that they do not traditionally use.

    Now the effects of the above bugs may not be completely apparent:
    For the overlapped IO / File change notification  This means an application monitoring for changes on a DrivePool folder or sub-folder will get erroneous notifications files changed when anything even accesses them. Just opening something like file explorer on a folder, or even switching between applications can cause file accesses that trigger the notification. If an application takes actions on a notification and then checks the file at the end of the notification this in itself may cause another notification.  Applications that rely on getting a folder changed notification when a child changes will not get these at all with DrivePool.  If it isn't monitoring children at all just the folder, this means no notifications could be generated (vs just the child) so it could miss changes.
    For FileIDs It depends what the application uses the FileID for but it may assume the FileID should stay the same when a file moves, as it doesn't with DrivePool this might mean it reads or backs up, or syncs the entire file again if it is moved (perf issue).  An application that uses the Windows API to open a File by its ID may not get the file it is expecting or the file that was simply moved will throw an error when opened by its old FileID as drivepool has changed the ID.   For an example lets say an application caches that the FileID for ImportantDoc1.docx is 12345 but then 12345 refers to ImportantDoc2.docx due to a restart.  If this application is a file sync application and ImportantDoc1.docx is changed remotely when it goes to write those remote changes to the local file if it uses the OpenFileById method to do so it will actually override ImportantDoc2.docx with those changes.
    I didn't spend the time to read Windows file system requirements to know when Windows expects a FileID to potentially change (or not change).  It is important to note that even if theoretical changes/reuse are allowed if they are not common place (because windows uses essentially a number like an md5 hash in terms of repeats) applications may just assume it doesn't happen even if it is technically allowed to do so.  A backup of file sync program might assume that a file with specific FileID is always the same file, if FileID 12345 is c:\MyDocuments\ImportantDoc1.docx one day and then c:\MyDocuments\ImportantDoc2.docx another it may mistake document 2 for document 1, overriding important data or restore data to the wrong place.  If it is trying to create a whole drive backup it may assume it has already backed up c:\MyDocuments\ImportantDoc2.docx if it now has the same File ID as ImportantDoc1.docx by the time it reaches it (at which point DrivePool would have a different FileID for Document1).

    Why might applications use FileIDs or file change notifiers? It may not seem intuitive why applications would use these but a few major reasons are: *) Performance, file change notifiers are a event/push based system so the application is told when something changes, the common alternative is a poll based system where an application must scan all the files looking for changes (and may try to rely on file timestamps or even hashing the entire file to determine this) this causes a good bit more overhead / slowdown.  *)  FileID's are nice because they already handle hardlink file de-duplication (Windows may have multiple copies of a file on a drive for various reasons, but if you backup based on FileID you backup that file once rather than multiple times.  FileIDs are also great for handling renames.  Lets say you are an application that syncs files and the user backs up c:\temp\mydir with 1000 files under it.  If they rename c:\temp\mydir to c:\temp\mydir2 an application use FileIDS can say, wait that folder is the same it was just renamed. OK rename that folder in our remote version too.  This is a very minimal operation on both ends.  With DrivePool however the FileID changes for the directory and all sub-files.  If the sync application uses this to determine changes it now uploads all these files to the system using a good bit more resources locally and remotely.  If the application also uses versioning this may be far more likely to cause a conflict with two or more clients syncing, as mass amounts of files are seemingly being changed.
    Finally, even if an application is trying to monitor for FileIDs changing using the file change API, due to notification bugs above it may not get any notifications when child FileIDs change so it might assume it has not.

    Real Examples
    OneDrive
    This started with massive onedrive failures.  I would find onedrive was re-uploading hundreds of gigabytes of images an videos multiple times a week.  These were not changing or moving.  I don't know if the issue is onedrive uses FileIDs to determine if a file is already uploaded, or if it is because when it scanned a directory it may have triggered a notification that all the files in that directory changed and based on that notification it reuploads.  After this I noticed files were becoming deleted both locally and in the cloud.  I don't know what caused this, it might have been because the old file it thought was deleted as the FileID was gone and while there was a new file (actually the same file) in its place there may have been some odd race condition.   It is also possible that it queued the file for upload, the FileID changed and when it went to open it to upload it found it was 'deleted' as the FileID no longer pointed to a file and queued the delete operation.   I also found that files that were uploaded into the cloud in one folder were sometimes downloading to an alternate folder locally.  I am guessing this is because the folder FileID changed.  It thought the 2023 folder was with ID XYZ but that now pointed to a different folder and so it put the file in the wrong place.  The final form of corruption was finding the data from one photo or video actually in a file with a completely different name.  This is almost guaranteed to be due to the FileID bugs.  This is highly destructive as backups make this far harder to correct.  With one files contents replaced with another you need to know when the good content existed and in what files were effected.  Depending on retention policies the file contents that replaced it may override the good backups before you notice.  I also had a BSOD with onedrive where it was trying to set attributes on a file and the CoveFS driver corrupted some memory.  It is possible this was a race condition as onedrive may have been doing hundreds of files very rapidly due to the bugs.  I have not captured a second BSOD due to it, but also stopped using onedrive on DrivePool due to the corruption.   Another example of this is data leakage.  Lets say you share your favorite article on kittens with a group of people.   Onedrive, believing that file has changed, goes to open it using the FileID however that file ID could essentially now correspond to any file on your computer now the contents of some sensitive file are put in the place of that kitten file, and everyone you shared it with can access it.
    Visual Studio Failures
    Visual studio is a code editor/compiler.  There are three distinct bugs that happen.  First, when compiling if you touched one file in a folder it seemed to recompile the entire folder, this due likely to the notification bug.  This is just a slow down, but an annoying one.  Second, Visual Studio has compiler generated code support.  This means the compiler will generate actual source code that lives next to your own source code.   Normally once compiled it doesn't regenerate and compile this source unless it must change but due to the notification bugs it regenerates this code constantly and if there is an error in other code it causes an error there causing several other invalid errors.  When debugging visual studio by default will only use symbols (debug location data) as the notifications from DrivePool happen on certain file accesses visual studio constantly thinks the source has changed since it was compiled and you will only be able to breakpoint inside source if you disable the exact symbol match default.  If you have multiple projects in a solution with one dependent on another it will often rebuild other project deps even when they haven't changed, for large solutions that can be crippling (perf issue).  Finally I often had intellisense errors showing up even though no errors during compiling, and worse intellisense would completely break at points.  All due to DrivePool.

    Technical details / full background & disclaimer
    I have sample code and logs to document these issues in greater detail if anyone wants to replicate it themselves.
    It is important for me to state drivepool is closed source and I don't have the technical details of how it works.  I also don't have the technical details on how applications like onedrive or visual studio work.  So some of these things may be guesses as to why the applications fail/etc.
    The facts stated are true (to the best of my knowledge) 

    Shortly before my trial expired in October of last year I discovered some odd behavior.  I had a technical ticket filed within a week and within a month had traced down at least one of the bugs.  The issue can be seen https://stablebit.com/Admin/IssueAnalysis/28720 , it does show priority 2/important which I would assume is the second highest (probably critical or similar above).  It is great it has priority but as we are over 6 months since filed without updates I figured warning others about the potential corruption was important.  

    The FileSystemWatcher API is implemented in windows using async overlapped IO the exact code can be seen: https://github.com/dotnet/runtime/blob/57bfe474518ab5b7cfe6bf7424a79ce3af9d6657/src/libraries/System.IO.FileSystem.Watcher/src/System/IO/FileSystemWatcher.Win32.cs#L32-L66
    That corresponds to this kernel api:
    https://learn.microsoft.com/en-us/windows/win32/fileio/synchronous-and-asynchronous-i-o
    Newer api calls use GetFileInformationByHandleEx to get the FileID but with older stats calls represented by nFileIndexHigh/nFileIndexLow.  

    In terms of the FileID bug I wouldn't normally have even thought about it but the advanced config (https://wiki.covecube.com/StableBit_DrivePool_2.x_Advanced_Settings) mentions this under CoveFs_OpenByFileId  "When enabled, the pool will keep track of every file ID that it gives out in pageable memory (memory that is saved to disk and loaded as necessary).".   Keeping track of files in memory is certainly very different from Windows so I thought this may be the source of issue.  I also don't know if there are caps on the maximum number of files it will track as if it resets FileIDs in situations other than reboots that could be much worse. Turning this off will atleast break nfs servers as it mentions it right in the docs "required by the NFS server".
    Finally, the FileID numbers given out by DrivePool are incremental and very low.  This means when they do reset you almost certainly will get collisions with former numbers.   What is not clear is if there is the chance of potential FileID corruption issues.  If when it is assigning these ids in a multi-threaded scenario with many different files at the same time could this system fail? I have seen no proof this happens, but when incremental ids are assigned like this for mass quantities of potential files it has a higher chance of occurring.
    Microsoft mentions this about deleting the USN Journal: "Deleting the change journal impacts the File Replication Service (FRS) and the Indexing Service, because it requires these services to perform a complete (and time-consuming) scan of the volume. This in turn negatively impacts FRS SYSVOL replication and replication between DFS link alternates while the volume is being rescanned.".  Now DrivePool never has the USN journal supported so it isn't exactly the same thing, but it is clear that several core Windows services do use it for normal operations I do not know what backups they use when it is unavailable. 

    Potential Fixes
    There are advanced settings for drivepool https://wiki.covecube.com/StableBit_DrivePool_2.x_Advanced_Settings beware these changes may break other things.
    CoveFs_OpenByFileId - Set to false, by default it is true.  This will disable the OpenByFileID API.  It is clear several applications use this API.  In addition, while DrivePool may disable that function with this setting it doesn't disable FileID's themselves.  Any application using FileIDs as static identifiers for files may still run into problems. 
    I would avoid any file backup/synchronization tools and DrivePool drives (if possible).  These likely have the highest chance of lost files, misplaced files, file content being mixed up, and excess resource usage.   If not avoiding consider taking file hashes for the entire drivepool directory tree.  Do this again at a later point and make sure files that shouldn't have changed still have the same hash.
    If you have files that rarely change after being created then hashing each file at some point after creation and alerting if that file disappears or hash changes would easily act as an early warning to a bug here being hit.
  20. Like
    MitchC got a reaction from TMnet in Beware of DrivePool corruption / data leakage / file deletion / performance degradation scenarios Windows 10/11   
    To start, while new to DrivePool I love its potential I own multiple licenses and their full suite.  If you only use drivepool for basic file archiving of large files with simple applications accessing them for periodic reads it is probably uncommon you would hit these bugs.  This assumes you don't use any file synchronization / backup solutions.  Further, I don't know how many thousands (tens or hundreds?) of DrivePool users there are, but clearly many are not hitting these bugs or recognizing they are hitting these bugs, so this IT NOT some new destructive my files are 100% going to die issue.  Some of the reports I have seen on the forums though may be actually issues due to these things without it being recognized as such. As far as I know previously CoveCube was not aware of these issues, so tickets may not have even considered this possibility.
    I started reporting these bugs to StableBit ~9 months ago, and informed I would be putting this post together ~1 month ago.  Please see the disclaimer below as well, as some of this is based on observations over known facts.
    You are most likely to run into these bugs with applications that: *) Synchronize or backup files, including cloud mounted drives like onedrive or dropbox *) Applications that must handle large quantities of files or monitor them for changes like coding applications (Visual Studio/ VSCode)

    Still, these bugs can cause silent file corruption, file misplacement, deleted files, performance degradation, data leakage ( a file shared with someone externally could have its contents overwritten by any sensitive file on your computer), missed file changes, and potential other issues for a small portion of users (I have had nearly all these things occur).  It may also trigger some BSOD crashes, I had one such crash that is likely related.  Due to the subtle nature some of these bugs can present with, it may be hard to notice they are happening even if they are.  In addition, these issues can occur even without file mirroring and files pinned to a specific drive.  I do have some potential workarounds/suggestions at the bottom.
    More details are at the bottom but the important bug facts upfront:
    Windows has a native file changed notification API using overlapped IO calls.  This allows an application to listen for changes on a folder, or a folder and sub folders, without having to constantly check every file to see if it changed.  Stablebit triggers "file changed" notifications even when files are just accessed (read) in certain ways.  Stablebit does NOT generate notification events on the parent folder when a file under it changes (Windows does).  Stablebit does NOT generate a notification event only when a FileID changes (next bug talks about FileIDs).  
    Windows, like linux, has a unique ID number for each file written on the hard drive.  If there are hardlinks to the same file, it has the same unique ID (so one File ID may have multiple paths associated with it). In linux this is called the inode number, Windows calls it the FileID.  Rather than accessing a file by its path, you can open a file by its FileID.  In addition it is impossible for two files to share the same FileID, it is a 128 bit number persistent across reboots (128 bits means the number of unique numbers represented is 39 digits long, or has the uniqueness of something like the MD5 hash).  A FileID does not change when a file moves or is modified.  Stablebit, by default, supports FileIDs however they seem to be ephemeral, they do not seem to survive across reboots or file moves.  Keep in mind FileIDs are used for directories as well, it is not just files. Further, if a directory is moved/renamed not only does its FileID change but every file under it changes. I am not sure if there are other situations in which they may change.  In addition, if a descendant file/directory FileID changes due to something like a directory rename Stablebit does NOT generate a notification event that it has changed (the application gets the directory event notification but nothing on the children).
    There are some other things to consider as well, DrivePool does not implement the standard windows USN Journal (a system of tracking file changes on a drive).  It specifically identifies itself as not supporting this so applications shouldn't be trying to use it with a drivepool drive. That does mean that applications that traditionally don't use the file change notification API or the FileIDs may fall back to a combination of those to accomplish what they would otherwise use the USN Journal for (and this can exacerbate the problem).  The same is true of Volume Shadow Copy (VSS) where applications that might traditionally use this cannot (and drivepool identifies it cannot do VSS) so may resort to methods below that they do not traditionally use.

    Now the effects of the above bugs may not be completely apparent:
    For the overlapped IO / File change notification  This means an application monitoring for changes on a DrivePool folder or sub-folder will get erroneous notifications files changed when anything even accesses them. Just opening something like file explorer on a folder, or even switching between applications can cause file accesses that trigger the notification. If an application takes actions on a notification and then checks the file at the end of the notification this in itself may cause another notification.  Applications that rely on getting a folder changed notification when a child changes will not get these at all with DrivePool.  If it isn't monitoring children at all just the folder, this means no notifications could be generated (vs just the child) so it could miss changes.
    For FileIDs It depends what the application uses the FileID for but it may assume the FileID should stay the same when a file moves, as it doesn't with DrivePool this might mean it reads or backs up, or syncs the entire file again if it is moved (perf issue).  An application that uses the Windows API to open a File by its ID may not get the file it is expecting or the file that was simply moved will throw an error when opened by its old FileID as drivepool has changed the ID.   For an example lets say an application caches that the FileID for ImportantDoc1.docx is 12345 but then 12345 refers to ImportantDoc2.docx due to a restart.  If this application is a file sync application and ImportantDoc1.docx is changed remotely when it goes to write those remote changes to the local file if it uses the OpenFileById method to do so it will actually override ImportantDoc2.docx with those changes.
    I didn't spend the time to read Windows file system requirements to know when Windows expects a FileID to potentially change (or not change).  It is important to note that even if theoretical changes/reuse are allowed if they are not common place (because windows uses essentially a number like an md5 hash in terms of repeats) applications may just assume it doesn't happen even if it is technically allowed to do so.  A backup of file sync program might assume that a file with specific FileID is always the same file, if FileID 12345 is c:\MyDocuments\ImportantDoc1.docx one day and then c:\MyDocuments\ImportantDoc2.docx another it may mistake document 2 for document 1, overriding important data or restore data to the wrong place.  If it is trying to create a whole drive backup it may assume it has already backed up c:\MyDocuments\ImportantDoc2.docx if it now has the same File ID as ImportantDoc1.docx by the time it reaches it (at which point DrivePool would have a different FileID for Document1).

    Why might applications use FileIDs or file change notifiers? It may not seem intuitive why applications would use these but a few major reasons are: *) Performance, file change notifiers are a event/push based system so the application is told when something changes, the common alternative is a poll based system where an application must scan all the files looking for changes (and may try to rely on file timestamps or even hashing the entire file to determine this) this causes a good bit more overhead / slowdown.  *)  FileID's are nice because they already handle hardlink file de-duplication (Windows may have multiple copies of a file on a drive for various reasons, but if you backup based on FileID you backup that file once rather than multiple times.  FileIDs are also great for handling renames.  Lets say you are an application that syncs files and the user backs up c:\temp\mydir with 1000 files under it.  If they rename c:\temp\mydir to c:\temp\mydir2 an application use FileIDS can say, wait that folder is the same it was just renamed. OK rename that folder in our remote version too.  This is a very minimal operation on both ends.  With DrivePool however the FileID changes for the directory and all sub-files.  If the sync application uses this to determine changes it now uploads all these files to the system using a good bit more resources locally and remotely.  If the application also uses versioning this may be far more likely to cause a conflict with two or more clients syncing, as mass amounts of files are seemingly being changed.
    Finally, even if an application is trying to monitor for FileIDs changing using the file change API, due to notification bugs above it may not get any notifications when child FileIDs change so it might assume it has not.

    Real Examples
    OneDrive
    This started with massive onedrive failures.  I would find onedrive was re-uploading hundreds of gigabytes of images an videos multiple times a week.  These were not changing or moving.  I don't know if the issue is onedrive uses FileIDs to determine if a file is already uploaded, or if it is because when it scanned a directory it may have triggered a notification that all the files in that directory changed and based on that notification it reuploads.  After this I noticed files were becoming deleted both locally and in the cloud.  I don't know what caused this, it might have been because the old file it thought was deleted as the FileID was gone and while there was a new file (actually the same file) in its place there may have been some odd race condition.   It is also possible that it queued the file for upload, the FileID changed and when it went to open it to upload it found it was 'deleted' as the FileID no longer pointed to a file and queued the delete operation.   I also found that files that were uploaded into the cloud in one folder were sometimes downloading to an alternate folder locally.  I am guessing this is because the folder FileID changed.  It thought the 2023 folder was with ID XYZ but that now pointed to a different folder and so it put the file in the wrong place.  The final form of corruption was finding the data from one photo or video actually in a file with a completely different name.  This is almost guaranteed to be due to the FileID bugs.  This is highly destructive as backups make this far harder to correct.  With one files contents replaced with another you need to know when the good content existed and in what files were effected.  Depending on retention policies the file contents that replaced it may override the good backups before you notice.  I also had a BSOD with onedrive where it was trying to set attributes on a file and the CoveFS driver corrupted some memory.  It is possible this was a race condition as onedrive may have been doing hundreds of files very rapidly due to the bugs.  I have not captured a second BSOD due to it, but also stopped using onedrive on DrivePool due to the corruption.   Another example of this is data leakage.  Lets say you share your favorite article on kittens with a group of people.   Onedrive, believing that file has changed, goes to open it using the FileID however that file ID could essentially now correspond to any file on your computer now the contents of some sensitive file are put in the place of that kitten file, and everyone you shared it with can access it.
    Visual Studio Failures
    Visual studio is a code editor/compiler.  There are three distinct bugs that happen.  First, when compiling if you touched one file in a folder it seemed to recompile the entire folder, this due likely to the notification bug.  This is just a slow down, but an annoying one.  Second, Visual Studio has compiler generated code support.  This means the compiler will generate actual source code that lives next to your own source code.   Normally once compiled it doesn't regenerate and compile this source unless it must change but due to the notification bugs it regenerates this code constantly and if there is an error in other code it causes an error there causing several other invalid errors.  When debugging visual studio by default will only use symbols (debug location data) as the notifications from DrivePool happen on certain file accesses visual studio constantly thinks the source has changed since it was compiled and you will only be able to breakpoint inside source if you disable the exact symbol match default.  If you have multiple projects in a solution with one dependent on another it will often rebuild other project deps even when they haven't changed, for large solutions that can be crippling (perf issue).  Finally I often had intellisense errors showing up even though no errors during compiling, and worse intellisense would completely break at points.  All due to DrivePool.

    Technical details / full background & disclaimer
    I have sample code and logs to document these issues in greater detail if anyone wants to replicate it themselves.
    It is important for me to state drivepool is closed source and I don't have the technical details of how it works.  I also don't have the technical details on how applications like onedrive or visual studio work.  So some of these things may be guesses as to why the applications fail/etc.
    The facts stated are true (to the best of my knowledge) 

    Shortly before my trial expired in October of last year I discovered some odd behavior.  I had a technical ticket filed within a week and within a month had traced down at least one of the bugs.  The issue can be seen https://stablebit.com/Admin/IssueAnalysis/28720 , it does show priority 2/important which I would assume is the second highest (probably critical or similar above).  It is great it has priority but as we are over 6 months since filed without updates I figured warning others about the potential corruption was important.  

    The FileSystemWatcher API is implemented in windows using async overlapped IO the exact code can be seen: https://github.com/dotnet/runtime/blob/57bfe474518ab5b7cfe6bf7424a79ce3af9d6657/src/libraries/System.IO.FileSystem.Watcher/src/System/IO/FileSystemWatcher.Win32.cs#L32-L66
    That corresponds to this kernel api:
    https://learn.microsoft.com/en-us/windows/win32/fileio/synchronous-and-asynchronous-i-o
    Newer api calls use GetFileInformationByHandleEx to get the FileID but with older stats calls represented by nFileIndexHigh/nFileIndexLow.  

    In terms of the FileID bug I wouldn't normally have even thought about it but the advanced config (https://wiki.covecube.com/StableBit_DrivePool_2.x_Advanced_Settings) mentions this under CoveFs_OpenByFileId  "When enabled, the pool will keep track of every file ID that it gives out in pageable memory (memory that is saved to disk and loaded as necessary).".   Keeping track of files in memory is certainly very different from Windows so I thought this may be the source of issue.  I also don't know if there are caps on the maximum number of files it will track as if it resets FileIDs in situations other than reboots that could be much worse. Turning this off will atleast break nfs servers as it mentions it right in the docs "required by the NFS server".
    Finally, the FileID numbers given out by DrivePool are incremental and very low.  This means when they do reset you almost certainly will get collisions with former numbers.   What is not clear is if there is the chance of potential FileID corruption issues.  If when it is assigning these ids in a multi-threaded scenario with many different files at the same time could this system fail? I have seen no proof this happens, but when incremental ids are assigned like this for mass quantities of potential files it has a higher chance of occurring.
    Microsoft mentions this about deleting the USN Journal: "Deleting the change journal impacts the File Replication Service (FRS) and the Indexing Service, because it requires these services to perform a complete (and time-consuming) scan of the volume. This in turn negatively impacts FRS SYSVOL replication and replication between DFS link alternates while the volume is being rescanned.".  Now DrivePool never has the USN journal supported so it isn't exactly the same thing, but it is clear that several core Windows services do use it for normal operations I do not know what backups they use when it is unavailable. 

    Potential Fixes
    There are advanced settings for drivepool https://wiki.covecube.com/StableBit_DrivePool_2.x_Advanced_Settings beware these changes may break other things.
    CoveFs_OpenByFileId - Set to false, by default it is true.  This will disable the OpenByFileID API.  It is clear several applications use this API.  In addition, while DrivePool may disable that function with this setting it doesn't disable FileID's themselves.  Any application using FileIDs as static identifiers for files may still run into problems. 
    I would avoid any file backup/synchronization tools and DrivePool drives (if possible).  These likely have the highest chance of lost files, misplaced files, file content being mixed up, and excess resource usage.   If not avoiding consider taking file hashes for the entire drivepool directory tree.  Do this again at a later point and make sure files that shouldn't have changed still have the same hash.
    If you have files that rarely change after being created then hashing each file at some point after creation and alerting if that file disappears or hash changes would easily act as an early warning to a bug here being hit.
×
×
  • Create New...