
MitchC
Members-
Posts
12 -
Joined
-
Last visited
-
Days Won
5
MitchC last won the day on October 22 2024
MitchC had the most liked content!
Recent Profile Visitors
The recent visitors block is disabled and is not being shown to other users.
MitchC's Achievements

Member (2/3)
19
Reputation
-
Except I don't think you would see a delete and replace, it would be a modify. An application that triggers the drivepool bug on the array would open a file handle by its ID/Inode number, only problem is the file it thinks it is opening is not the one it actually opens. It just writes the contents it thinks goes with that file into that handle though. I would assume this could potentially be possible with modified files too, although depending on what you use drivepool for one might normally have a good number of modified files. First, I don't think there is any contest about if the FileID bug is true, even the developer has acknowledged it but mostly said wont fix. As for catching onedrive, im not sure. I ran drivepool for months and it wasn't like overnight just all files changed to other files (and most of the files being corrupted were not modified for quite some time). Maybe it is something like onedrive notices a file change and adds it to a sync index by file ID, if the machine reboots before it actually syncs it it then reads it by ID and gets the wrong contents. Yeah for most projects I kept the source on main drives but I sadly work on some massive hogs from time to time so those went on drivepool. It is nice having all the chrome source and debug binaries around but its 100G per build so;0. I assume you mean check every scrub right? Sync updates the parity info on new/changed data so would not want to be part of some check process as if you did detect this happening sync destroys that ability to recover. Part of my concern would be if a triggering FileID program on the drivepool starts replacing existing file content with alternate file content but the timestamp for modified time doesn't change can snapraid potentially miss this change. As any data that snapraid hasn't calculated the parity yet on is that much data you can't recover that you think you can. If you ran a Sync on Sunday and then by Friday you have modified/deleted 5GB of files and this FileID bug replaced the content on another 10GB of files then you have 15 GB less than your parity size that you can recover. Now if you are recovering that 10GB from this bug no biggy, but if you have If your scrub checks new data and 10% of old data it would catch this if one of these files was part of that 10%. If you scrub once a week thought it would be over 2 months before you are guaranteed to catch these silent modifications. Also, Office and OneDrive have a very specific integration so if you are signed into a personal or work microsoft account in an office program and have onedrive / sharepoint(onedrive for business) enabled (which MS makes it decently hard to avoid) you have cooperative syncing that may also be vulnerable to this issue. The best way I would see to detect this would be to hash all files and store the date of the last time they were hashed. At a later date you would hash all files again, if the hash changes but the modified date was before the last time the files were hashed something is probably wrong. This isn't foolproof (for example a sync program that syncs an older file down after a hashing and sets the modified time to before the hashing). None of these things would also help to detect "corrupted" backups. If you backup to the cloud maybe it wouldn't self sabotage by overriding files on your PC with files it think changed on that same PC but it could end up backing up the wrong contents to your remote backup space. Here you would see the same file names on your PC in the backup location but the backup location would have alternate content. One day your PC breaks/is lost/etc you go to restore from backup and then discover the issue.
-
@CharredChar I would agree if Snapraid is used on the underlying drives it should avoid triggering corruption due to the FileID bug. Keep in mind that Snapraid does not like data that is changed/moved around so I would make sure to minimize any drivepool balancing. It also will throw 'errors' if you have writes going on during its scrub due to balancing. Keep in mind as well that it is mainly meant to detect corruption due to a bad disk/hardware issue it does not guard well against intentionally overwritten files. I believe (and as not a snapraid user maybe my understanding is wrong) you normally run a scrub shortly after running a sync as any modifications between the two would show warnings during a scrub. If a FileID issue happens it really would just look like a file was overwritten so while it is recoverable if you notice, as soon as you run a sync it would just update the parity like anything else and there wouldn't be any warning about an issue. There is also some possibility of snapraid not being able to properly detect modified files effected by the FileID bug. It normally uses the inode (which wouldn't change) and last modified timestamp for detecting changes. A synchronization/backup/cloud share program may intentionally set a last modified timestamp on a file which normally wouldn't be an issue, even if you were adding old files to the system. With old files as they wouldn't have existed prior snapraid would pickup that they were 'new' as not seen before. With the FileID bug thought it gets more complicated. Say you have Image X.jpg and it has a timestamp of 2020-01-01 and then you have ImportantWork.docx and it has a timestamp of 2024-05-05. With the File ID bug a program may override ImportantWork.docx with the content of X.jpg and set the timestamp to 2020-01-01. Now if snapraid detects changes if the timestamps simply don't match then no problem it will pickup this override (for better or for worse) without an issue. If however it looks for things modified since the last sync time it could potentially miss that change as the FileID didn't change and the timestamp is older than the last sync. I don't know enough to say about that part on snapraid. Here is a Q/A that talks about changes being missed with non-changing timestamps: Why are VeraCrypt containers never saved? VeraCrypt (a fork of TrueCrypt) by default has enabled the option Preserve modification time-stamp of file containers that makes impossible at SnapRAID, and at other backup programs, to detect that a file container is changed. Ensure to disable this option in VeraCrypt. Snapraid doesn't really have any point in time snapshotting. It does very well to detect drive errors or recover from a lost drive but any fileid data loss will be lost forever as soon as you run sync after it happens. This also means if you don't detect this issue right away it happening slowly over time will still lead to you losing everything except between that last sync and when you detect it. My concern was if you point snapraid at any drivepools directly. I am not a snapraid user/developer nor one for drivepool so I didn't spend any time to actually try and repro corruption with it, but anything that touches the file ID for anything other than an in memory (or live comparison) constant could be at risk. Re: @Shane 's question about what is effected otherwise Again even if you are quite technical and have very obvious failures it may be awhile before you are able to confirm the problem is drivepool. Even as the OP here and finding the technical bugs with drivepool some app failures took me months to attribute and confirm. With these bugs outstanding I have temporarily suspended my use of DrivePool so I no longer investigate or track app failures related to drivepool. I would guess most tools that might have the potential of a few dozen or more files open at once specifically but not on the list biggest ones would be: Visual Studio Code Visual Studio *) Severe performance impact on these IDEs. They use file changed notifications to determine if source should be recompiled. This can result in much longer build times. *) Debug symbol match failures, as they get notifications the symbols/files have changed they never believe the symbols are an exact match (can be worked around by allowing non-matching symbol use). *) Complicated build failures if compiler generated code is used. *) For visual studio, intellisense will show errors when there are none, and will completely break at times
-
roirraWedorehT reacted to an answer to a question: Beware of DrivePool corruption / data leakage / file deletion / performance degradation scenarios Windows 10/11
-
roirraWedorehT reacted to an answer to a question: Beware of DrivePool corruption / data leakage / file deletion / performance degradation scenarios Windows 10/11
-
roirraWedorehT reacted to an answer to a question: Google Drive Backup and Sync
-
Shane reacted to an answer to a question: Beware of DrivePool corruption / data leakage / file deletion / performance degradation scenarios Windows 10/11
-
Jonibhoni reacted to an answer to a question: Beware of DrivePool corruption / data leakage / file deletion / performance degradation scenarios Windows 10/11
-
There is a good chance the google drive problems could be related to this:
-
Mostly. As I think you mentioned earlier in this thread that doesn't disable FileIds and applications could still get the FileID of a file. Depending how that ID is used it could still cause issues. An example below is snapraid which doesn't use OpenByFileID but does trust that the same FileID is the same file. For the biggest problems (data loss, corruption, leakage) this is correct. Of course, one generally can't know if an application is using FileIDs (especially if not open source) it is likely not mentioned in the documentation. It also doesn't mean your favorite app may not start to do so tomorrow, and then all the sudden the application that worked perfectly for 4 years starts to silently corrupt random data. By far the most likely apps to do this are backup apps, data sync apps, cloud storage apps, file sharing apps, things that have some reason to potentially try to track what files are created/moved/deleted/etc. The other issue (and sure if I could go back in time I would split this thread in two) of the change notification bugs in DrivePool won't directly lead to data loss (although can greatly speed up the process above) . It will, however, have the potential for odd errors and performance issues in a wide range of applications. The file change API is used by many applications, not just the app types listed above (which often will use it if they run 24/7) but any app that interfaces with many files at once (IE coding IDE's/compilers, file explorers, music or video catalogs, etc). This API is common, easy to use for developers, and generally can greatly increase performance of apps as they no longer need to manually check if every file they can just install one event listener on a parent directory and even if they only care about the notifications for some of the files in the directories under it they can just ignore the change events they don't care about. It may be very hard to trace these performance issues or errors to drive pool due to how they may present themselves. You are far more likely to think the application is buggy or at fault. Short Example of Disaster As it is a complex issue to understand I will give a short example of how FileIDs being reused can be devastating. Lets say you use Google Drive or some other cloud backup / sharing application and it relies on the fact that as long as FileID 123 around it is always pointing to the same file. This is all but guaranteed with NTFS. You only use Google Drive to backup your photos from your phone, from your work camera, or what have you. You have the following layout on your computer: c:\camera\work\2021\OfficialWiringDiagram.png with file ID 1005 c:\camera\personal\nudes\2024Collection\VeryTasteful.png with file ID 3909 c:\work\govt\ClassifiedSatPhotoNotToPostOnTwitter.png with file ID 6050 You have OfficialWiringDiagram.png shared with the office as its an important reason anytime someone tries to figure out where the network cables are going. Enter drive pool. You don't change any of these files but DrivePool generates a file changed notification for OfficialWiringDiagram.png. GoogleDrive says OK I know that file, I already have it backed up and it has file ID 1005. It then opens File ID 1005 locally reads the new contents, and uploads it to the cloud overriding the old OfficialWiringDiagram.png. Only problem is you rebooted, so 1005 was OfficialWiringDiagram.png before, but now file 1005 is actually your nude file VeryTasteful.png. So it has just backed up your nude file into the cloud but as "OfficialWiringDiagram.png", and remember that file is shared to the cloud. Next time someone goes to look at the office wiring diagram they are in for a surprise. Depending on the application if 'ClassifiedSatPhotoNotToPostOnTwitter.png' became FileID 1005 even though it got a change notification for the path "c:\camera\work\2021\OfficialWiringDiagram.png" which is under the main folder it monitors ("c:\camera") when it opens File 1005 it instead now gets a file completely outside your camera folder and reads the highly sensitive file from c:\work\govt and now a file that should never be uploaded is shared to the entire office. Now you follow many best practices. Google drive you restrict to only the c:\camera folder, it doesn't backup or access files anywhere else. You have a Raid 6 SSD setup incase of drive failure, and image files from prior years are never changed, so once written to the drive they are not likely to move unless the drive was de-fragmented meaning pretty low chance of conflicts or some abrupt power failure causing it to be corrupted. You even have some photo scanner that checks for corrupt photos just to be safe. Except none of these things will save you from the above example. Even if you kept 6 months of backup archives offsite in cold storage (made perfectly and not effected by the bug) and all deleted files are kept for 5 years, if you don't reference OfficialWiringDiagram.png but once a year you might not notice it was changed and the original data overwritten until after all your backups are corrupted with the nude and the original file might be lost forever. FileIDs are generally better than relying on file paths, if they used file paths when you renamed or moved file 123 to a new name in the same folder it would break anyone you previously had shared the file with if only file names are used. If instead when you rename "BobsChristmasPhoto.png" to "BobsHolidayPhoto.png" the application knows it is the file being renamed as it still has File ID 123 then it can silently update on the backend the sharing data so when people click the existing link it still loads the photo. Even if an application uses moderate de-duplication techniques like hashing the file to tell if it has just moved, if you move a file and slightly change it (say you clear the photo location metadata out that your phone put there) it would think it is an all new file without File IDs. FileID collisions are not just possible but basically guaranteed with drive pool. With the change notification bug a sync application might think all your files are changing often as even reading the file or browsing the directory might trigger a notification it has changed. This means it is backing up all those files again, which might be tens of thousands of photos. As any time you reboot the File ID changes that means if it syncs that file after the reboot uploading the wrong contents (as it used File ID) and then you had a second computer it downloaded that file to you could put yourself in a never ending loop for backups and downloads that overrides one file with another file at random. As the FileID it was known last time for might not exist when it goes to back it up (which I assume would trigger many applications to fall back to path validation) only part of your catalog would get corrupted each iteration. The application might also validate that if the file is renamed it stayed within the root directory it interacts with. This means if your christmas photo's file ID now pointed to something under "c:\windows" it would fall back to file paths as it knows that is not under the "c:\camera" directory it works with. This is not some hypothetical situation these are actual occurrences and behaviors I have seen happen to files I have hosted on drivepool. These are not two-bit applications written by some one person dev team these are massively used first party applications, and commercial enterprise applications. If you can and you care about your data I would. The convenience of drivepool is great, there are countless users it works fine for (at least as far as they know), but even with high technical understanding it can be quite difficult to detect what applications are effected by this. If you thought you were safe because you use something like snapraid it won't stop this sort of corruption. As far as snapraid is concerned you just deleted a file and renamed another on top of it. Snapraid may even contribute further to the problem as it (like many) uses the windows FileID as the Windows equivalent of an inode number https://github.com/amadvance/snapraid/blob/e6b8c4c8a066b184b4fa7e4fdf631c2dee5f5542/cmdline/mingw.c#L512-L518 . Applications assume inodes and FileIDs that are the same as before are the same file. That is unless you use DrivePool, oops. Apps might use timestamps in addition to FileIDs although timestamps can overlap say if you downloaded a zip archive and extracted it with Windows native (by design choice it ignores timestamps even if the zip contained them). SnapRAID can even use some advanced checks with syncing but in a worst case where a files content has actually changed but the FileID in question has the same size/timestamp SnapRAID assumes it is actually unmodified and leaves the parity data alone. This means if you had two files with the same size/timestamp anywhere on the drive and one of them got the FileID of the other it would end up with incorrect parity data associated with that file. Running a snapraid fix could actually result in corruption as snapraid would believe the parity data is correct but the content on disk it thinks go with it does not. Note: I don't use snapraid but was asked this question and reading the manual here and the source above I believe this is technically correct. It is great SnapRAID is open source and has such technical documentation plenty of backup / sync programs don't and you don't know what checking they do.
-
Shane reacted to an answer to a question: Beware of DrivePool corruption / data leakage / file deletion / performance degradation scenarios Windows 10/11
-
Sorry but even in my mission-not important environment I am not a fan of data loss or leakage. Also, extremely low is an understatement. NTFS supports 2^32 possible files on a drive. The MFT file index is actually a 48 bit entry, that means you could max out your new MFT records 65K times prior to it needing to loop around. The sequence number (how many times that specific MFT record is updated) is an additional 16 bits on its own so if you could delete and realloc a file to the exact same MFT record you still would need to do so with that specific record 65K times. If an application is monitoring for file changes, hopefully it catches one of those:) It is nearly impossible to know how an application may use FileID especially as it may only be used as a fallback due to other features drivepool does not implement and maybe they combine FileID with something else. If an application says hey I know file 1234 and on startup it checks file 1234. If that file exists it can be near positive its the same file if is gone it simply removes file 1234 from its known files and by the time 1234 it reused it hasn't known about it in forever. The problem here is not necessarily when FileIDs change id wager most applications could probably handle file ids changing even though the file has not fine (you might get extra transfer, or backed up extra data, or performance loss temporarily). It is the FileID reuse that is what leads to the worst effects of data loss, data leakage, and corruption. The file id is 64 bits, the max file space is 32 bits (and realistically most people probably have a good bit fewer than 4 billion files). DrivePool could randomly assign file ids willy nilly every boot and probably cause far fewer disasters. DrivePool could use underlying FIleIDs likely through some black magic hackery. The MFT counter is 48 bit, but I doubt those last 9 bits are touched on most normal systems. If DrivePool assigned an incremental number to each drive and then overwrote those 9 bits of the FileID from the underlying system with the drive ID you would support 512 hard drives in one drive 'pool' and still have nearly the same near zero file collision of FileID, while also having a stable file ID. It would only change the FIleID if a file moved in the background from one drive to another(and not just mirrored). It could even keep it the same with a zero byte ID file left behind on a ghost folder if so desired, but the file ID changing is probably far less a problem. A backup restore program that deleted the old file and created it again would also change the FileID and I doubt that causes issues. That said, it is not really my job to figure out how to solve this problem in a commercial product. As you mentioned back in December it is unquestionable that drivepool is doing the wrong thing: it uses MUST in caps. My problem isn't that this bug exists (although that sucks). My problem is this has been and continues to be handled exceptionally poorly by Stablebit even though it can pose significant risk to users without them even knowing it. I likely spent more of my time investigating their bug then they have. We are literally looking at nearly two years now since my initial notification and users can make the same mistakes now as back then despite the fact they could be warned or prevented from doing so.
-
Shane reacted to an answer to a question: Beware of DrivePool corruption / data leakage / file deletion / performance degradation scenarios Windows 10/11
-
Shane, as always, has done a great job summarizing everything and I certainly agree with most of it. I do want to provide some clarification, and also differ on a few things: *) This is not about DrivePool being required to precisely emulate NTFS and all its features, that is probably a never going to happen. At best DrivePool may be able to provide a driver level drive implementation that could allow it to be formatted in the way Shane describes CloudDrive does. One of the things this critical bug is made worse by is the fact DrivePool specifically doesn't implement VSS or similar *) The two issues here are not the same, or one causing the other. They are distinct, but the incorrect file changed bug makes the FileID problem potentially so much worse (or maybe in unlucky situations it to happen at all). Merely by browsing a folder can cause file change notifications to fire on the files on it in certain situations. This means unmodified files an application listening to the notification would believe have been modified. It is possible if this bug did not exist then only would written files have the potential for corruption rather than all files. These next two points are not facts but IMO: *) DrivePool claims to be NTFS if it cannot support certain NTFS features it should break them as cleanly as possible (not as compatible as possible as it might currently). FileID support should be as disabled as possible by DrivePool. Open by file ID clearly banned. I don't know what would happen if FileID returned 0 or claimed not available on the system even thought it is an NTFS volume. There are things DrivePool could potentially due to minimize the fatal damage this FileID bug can cause (ie not resetting to zero) but honestly even then all FileID support should be as turned off as possible. If a user wants to enable these features DrivePool should provide a massive disclaimer about the possible damage this might cause. *) DrivePool has an ethical responsibility to its users it is currently violating. It has a feature that can cause massive performance problems, data loss, and data corruption. It has other bugs that accelerate these issues. DrivePool is aware of this, they should warn users using these features that unexpected behaviors and possible irreversible damage could occur. It annoys me the effort I had to exert to research this bug. As a developer if I had a file system product users were paying for and it could cause silent corruption I would find this highly disturbing and do what I could to protect other users. It is critical to remember this can result in corruption of the worst kind. Corruption that normal health monitoring tools would not detect (files can still be read and written) but it can corrupt files that are not being 'changed' in the background at random rates. It wouldn't matter if you kept daily backups for 6 months if you didn't detect this for 9 months you would have archived the corruption into those backups and have no way of recovering that data. It can happen slowly and literally only validating the file contents against some known good would show it. Now StableBit may feel they skirt some of the responsibility as they don't do the corruption directly, some other application relying on drivepool's drive acting as NTFS says it will, and DrivePool tries to pretend to do to get the data loss. The problem is drivepools incorrect implementation is the direct reason this corruption occurs, and the applications that can cause it are not doing anything wrong.
-
MrPapaya reacted to an answer to a question: Beware of DrivePool corruption / data leakage / file deletion / performance degradation scenarios Windows 10/11
-
Issues when I set google drive sync folder location onto the pool.
MitchC replied to calmasacow's question in General
There is a better than not chance this is your problem, the bugs are over 1.5 years old while acknowledged by the developer and potentially having devastating effects there is no feedback of what is being done to fix (or when). I would highly recommend NOT using any sync service that shows any funnyness with drivepool as it can be hard to know how these bugs will effect the programs themselves. -
Great finds by @Shane one more fantastic one: https://learn.microsoft.com/en-us/openspecs/windows_protocols/ms-fscc/d4bc551b-7aaf-4b4f-ba0e-3a75e7c528f0#Appendix_A_10 Table of the 7 file systems normally considered all support File IDs however per: https://learn.microsoft.com/en-us/windows-hardware/drivers/ddi/ntifs/ns-ntifs-_file_objectid_information "File object IDs are supported only on NTFS volumes". In terms of the beta from the vague description it seems there was another problem were opening a File even with the current correct File ID could lead to an invalid memory access error, which may be what caused my BSODs related to this. It does not sound like they fixed any of the items mentioned here with File IDs or notifications. Thanks Christopher for posting here, I would have certainly missed it. This really has all sorts of implications including data leakage where a file someone is given access to is overwritten by the contents of another file they should not have access to when the application used FileIDs. In terms of Christopher's first post I think the quoting you did got a bit screwed up including quotes and responses: I believe your first point is apps are wrong to rely on File IDs Microsoft says they may change per the documentation at https://learn.microsoft.com/en-us/windows/win32/api/fileapi/ns-fileapi-by_handle_file_information. As I mention above, this is correct it states that, it then goes on to state "In the NTFS file system, a file keeps the same file ID until it is deleted.". DrivePool emulates NTFS, therefore this being true is not only likely an assumed fact by developers but a reasonable assumption given microsoft states exactly that. If DrivePool was not identifying as NTFS it might be a different story ( but still NTFS is the de-facto in windows so developers may just assume it true across all, even if incorrect). There are 3 file identifier methods commonly used I believe: 1) the GetFileInformationByHandle: nFileIndexLow/nFileIndexHigh 2) the newer GetFileInformationByHandleEx: with the FileIdInfo flag returning the FILE_ID_INFO struct 3) The object ID methods like what you mentioned via ZwQueryDirectoryFile/DeviceIoControl We have discussed one (per above). The second: https://learn.microsoft.com/en-us/windows/win32/api/winbase/ns-winbase-file_id_info which states "The file identifier and the volume serial number uniquely identify a file on a single computer" although in reality it actually uses the same info from #1 just composited together into a single FILE_ID_128 field. This can be compueted from nFileIndexLow/nFileIndexHigh for normal NTFS systems. Finally, object ID. This is a different identifier the problem is neither File ID or Object ID are guaranteed to be supported by a file system. In NTFS both are possible as windows guesses if the number is bigger than 64 bits its an object ID and not a file ID. In ReFS object ID's do not exist and are not supported but it does support File IDs. Fat32 also doesn't support Object IDs. The other issue with Object IDs? They are not really designed to be primary application APIs. It is queryable via two primary methods I believe: ZwQueryDirectoryFile a literal kernel call and the DeviceIoControl you linked to. https://learn.microsoft.com/en-us/windows-hardware/drivers/ddi/ntifs/ns-ntifs-_file_objectid_information DeviceIoControl overall is "Sends a control code directly to a specified device driver, causing the corresponding device to perform the corresponding operation." A user level application generally should not need to be interacting directly with a device driver. File ID's can be operated on with file handles, Object IDs require you having the device handle and going that route. In short: *) Per Shane's great finds all file systems should make it unique and stable *) Specifications for NTFS identifying file systems is that the File ID is stable, unique and does not change *) Object IDs is not supported on multiple file systems (out of 7 major tested systems only NTFS fully supports them) *) File IDs have been the de-facto standard for quite some time on Windows This isn't a good solution to say "use device ids". Lets assume Stablebit was correct, and File IDs for NTFS are not stable/unique even then most people are not the application developer and changing the application is not likely. Microsoft has teams that work on sharepoint and onedrive, I don't think messaging them to say "StableBit requires you to change the application to use Object IDs" is going to gain much ground. Still given above we can see for NTFS it is incorrect anyway to say it is not stable. This bug causes excessive data transfer, corruption, data loss, and data leakage with an unknown number of applications. This problem is heavily compounded for continuous monitoring applications due to the fact DrivePool incorrectly reports file changes to listeners when a file is only read-accessed. A listener who then checks the File notified about and sees the ID has changed then can open that by ID and do god knows what with the wrong information/data. Final compounding problem is the fact DrivePool doesn't use remotely unique File IDs basically guaranteeing collision with previous File IDs. They essentially start counting from 1 and issue from there. Its a 64 bit field, give a random 64 bit number. It won't solve excessive data transfer but it would have reduced the horrific impacts of this bug greatly.
-
Sorry, should also mention this is confirmed by StableBit and can be easily reproduced. The attached powershell script is a basic example of the file monitoring api. Run it by "monitor.ps1 my_folder" where my folder is what you want to monitor. Have a file say hello.txt inside. Open that file in notepad. It should instantly generate a monitoring file change event. Further tab away from notepad and tab back to it, you will again get a changed event for that file. Run the same thing on a true NTFS system and it will not do the same. You can also reproduce the lack of notifications for other events by changing the IncludeSubdirectories variable in it and doing some of the tests I mention above. watcher.ps1
-
So this is correct, as the documentation you linked to states. One item I mentioned though, is the fact that even if it can be re-used if in practice it isn't software may make the wrong assumption that it won't. Not good on that software but it may be a practical exception that one might try to meet. Further, that documentation also states: "In the NTFS file system, a file keeps the same file ID until it is deleted. " As DrivePool identifies itself as NTFS it is breaking that expectation. I am not sure how well things work if you just disable File IDs, maybe software will fallback to a more safe behavior (even if less performant). In addition, I think the biggest issue is silent file corruption. I think that can only happen due to File ID collisions (rather than just the FIle ID changing). It is a 128 bit number, GUID's are 128 bits. Just randomize the sucker the first time you assign a file ID (rather than using the incremental behavior currently). Aside from it being more thread safe as you don't have a single locked increment counter it is highly unlikely you would hit a collision. Could you run into a duplicate ? sure. Likely? Probably not. Maybe over many reboots (or whatever resets the ID's in drivepool beside that) but as long as whatever app that uses the FileID has detected it is gone before it is reused it eventually colliding would likely not have much effect. Not perfect but probably an easier solution. Granted apps like onedrive may still think all the files are deleted and re-upload them if the FileID's change (although that may be more likely due to the notification bug). Sure. Except one doesn't always know how tools work. I am only making a highly educated guess this is what OneDrive is using, but only made this after significant file corruption and research. One would hope you don't need to have corruption before figuring out the tool you are using uses the FileID. In addition, FileID may not be the primary item a backup/sync tool uses but something like USF may be a much more common first choice. It may only fall back to other options when that is not available. Is it possible the 5-6 apps I have found that run into issues are the only ones out there that uses these things? Sure. I just would guess I am not that lucky so there are likely many more that use these features. I did see either you (or someone else) who posted about the file hashing issue with the read striping. It is a big shame, reporting data corruption (invalid hash values or rather returning the wrong read data which is what would lead to that) is another fairly massive problem. Marking good data bad because of an inconsistent read can lead to someone thinking they lost data and trashing it, or restoring an older version that may cause newer data to be lost in an attempt to fix. I would look into a more consistent read striping repro test but at the end of the day these other things stop me from being able to use drivepool for most things I would like to.
-
To start, while new to DrivePool I love its potential I own multiple licenses and their full suite. If you only use drivepool for basic file archiving of large files with simple applications accessing them for periodic reads it is probably uncommon you would hit these bugs. This assumes you don't use any file synchronization / backup solutions. Further, I don't know how many thousands (tens or hundreds?) of DrivePool users there are, but clearly many are not hitting these bugs or recognizing they are hitting these bugs, so this IT NOT some new destructive my files are 100% going to die issue. Some of the reports I have seen on the forums though may be actually issues due to these things without it being recognized as such. As far as I know previously CoveCube was not aware of these issues, so tickets may not have even considered this possibility. I started reporting these bugs to StableBit ~9 months ago, and informed I would be putting this post together ~1 month ago. Please see the disclaimer below as well, as some of this is based on observations over known facts. You are most likely to run into these bugs with applications that: *) Synchronize or backup files, including cloud mounted drives like onedrive or dropbox *) Applications that must handle large quantities of files or monitor them for changes like coding applications (Visual Studio/ VSCode) Still, these bugs can cause silent file corruption, file misplacement, deleted files, performance degradation, data leakage ( a file shared with someone externally could have its contents overwritten by any sensitive file on your computer), missed file changes, and potential other issues for a small portion of users (I have had nearly all these things occur). It may also trigger some BSOD crashes, I had one such crash that is likely related. Due to the subtle nature some of these bugs can present with, it may be hard to notice they are happening even if they are. In addition, these issues can occur even without file mirroring and files pinned to a specific drive. I do have some potential workarounds/suggestions at the bottom. More details are at the bottom but the important bug facts upfront: Windows has a native file changed notification API using overlapped IO calls. This allows an application to listen for changes on a folder, or a folder and sub folders, without having to constantly check every file to see if it changed. Stablebit triggers "file changed" notifications even when files are just accessed (read) in certain ways. Stablebit does NOT generate notification events on the parent folder when a file under it changes (Windows does). Stablebit does NOT generate a notification event only when a FileID changes (next bug talks about FileIDs). Windows, like linux, has a unique ID number for each file written on the hard drive. If there are hardlinks to the same file, it has the same unique ID (so one File ID may have multiple paths associated with it). In linux this is called the inode number, Windows calls it the FileID. Rather than accessing a file by its path, you can open a file by its FileID. In addition it is impossible for two files to share the same FileID, it is a 128 bit number persistent across reboots (128 bits means the number of unique numbers represented is 39 digits long, or has the uniqueness of something like the MD5 hash). A FileID does not change when a file moves or is modified. Stablebit, by default, supports FileIDs however they seem to be ephemeral, they do not seem to survive across reboots or file moves. Keep in mind FileIDs are used for directories as well, it is not just files. Further, if a directory is moved/renamed not only does its FileID change but every file under it changes. I am not sure if there are other situations in which they may change. In addition, if a descendant file/directory FileID changes due to something like a directory rename Stablebit does NOT generate a notification event that it has changed (the application gets the directory event notification but nothing on the children). There are some other things to consider as well, DrivePool does not implement the standard windows USN Journal (a system of tracking file changes on a drive). It specifically identifies itself as not supporting this so applications shouldn't be trying to use it with a drivepool drive. That does mean that applications that traditionally don't use the file change notification API or the FileIDs may fall back to a combination of those to accomplish what they would otherwise use the USN Journal for (and this can exacerbate the problem). The same is true of Volume Shadow Copy (VSS) where applications that might traditionally use this cannot (and drivepool identifies it cannot do VSS) so may resort to methods below that they do not traditionally use. Now the effects of the above bugs may not be completely apparent: For the overlapped IO / File change notification This means an application monitoring for changes on a DrivePool folder or sub-folder will get erroneous notifications files changed when anything even accesses them. Just opening something like file explorer on a folder, or even switching between applications can cause file accesses that trigger the notification. If an application takes actions on a notification and then checks the file at the end of the notification this in itself may cause another notification. Applications that rely on getting a folder changed notification when a child changes will not get these at all with DrivePool. If it isn't monitoring children at all just the folder, this means no notifications could be generated (vs just the child) so it could miss changes. For FileIDs It depends what the application uses the FileID for but it may assume the FileID should stay the same when a file moves, as it doesn't with DrivePool this might mean it reads or backs up, or syncs the entire file again if it is moved (perf issue). An application that uses the Windows API to open a File by its ID may not get the file it is expecting or the file that was simply moved will throw an error when opened by its old FileID as drivepool has changed the ID. For an example lets say an application caches that the FileID for ImportantDoc1.docx is 12345 but then 12345 refers to ImportantDoc2.docx due to a restart. If this application is a file sync application and ImportantDoc1.docx is changed remotely when it goes to write those remote changes to the local file if it uses the OpenFileById method to do so it will actually override ImportantDoc2.docx with those changes. I didn't spend the time to read Windows file system requirements to know when Windows expects a FileID to potentially change (or not change). It is important to note that even if theoretical changes/reuse are allowed if they are not common place (because windows uses essentially a number like an md5 hash in terms of repeats) applications may just assume it doesn't happen even if it is technically allowed to do so. A backup of file sync program might assume that a file with specific FileID is always the same file, if FileID 12345 is c:\MyDocuments\ImportantDoc1.docx one day and then c:\MyDocuments\ImportantDoc2.docx another it may mistake document 2 for document 1, overriding important data or restore data to the wrong place. If it is trying to create a whole drive backup it may assume it has already backed up c:\MyDocuments\ImportantDoc2.docx if it now has the same File ID as ImportantDoc1.docx by the time it reaches it (at which point DrivePool would have a different FileID for Document1). Why might applications use FileIDs or file change notifiers? It may not seem intuitive why applications would use these but a few major reasons are: *) Performance, file change notifiers are a event/push based system so the application is told when something changes, the common alternative is a poll based system where an application must scan all the files looking for changes (and may try to rely on file timestamps or even hashing the entire file to determine this) this causes a good bit more overhead / slowdown. *) FileID's are nice because they already handle hardlink file de-duplication (Windows may have multiple copies of a file on a drive for various reasons, but if you backup based on FileID you backup that file once rather than multiple times. FileIDs are also great for handling renames. Lets say you are an application that syncs files and the user backs up c:\temp\mydir with 1000 files under it. If they rename c:\temp\mydir to c:\temp\mydir2 an application use FileIDS can say, wait that folder is the same it was just renamed. OK rename that folder in our remote version too. This is a very minimal operation on both ends. With DrivePool however the FileID changes for the directory and all sub-files. If the sync application uses this to determine changes it now uploads all these files to the system using a good bit more resources locally and remotely. If the application also uses versioning this may be far more likely to cause a conflict with two or more clients syncing, as mass amounts of files are seemingly being changed. Finally, even if an application is trying to monitor for FileIDs changing using the file change API, due to notification bugs above it may not get any notifications when child FileIDs change so it might assume it has not. Real Examples OneDrive This started with massive onedrive failures. I would find onedrive was re-uploading hundreds of gigabytes of images an videos multiple times a week. These were not changing or moving. I don't know if the issue is onedrive uses FileIDs to determine if a file is already uploaded, or if it is because when it scanned a directory it may have triggered a notification that all the files in that directory changed and based on that notification it reuploads. After this I noticed files were becoming deleted both locally and in the cloud. I don't know what caused this, it might have been because the old file it thought was deleted as the FileID was gone and while there was a new file (actually the same file) in its place there may have been some odd race condition. It is also possible that it queued the file for upload, the FileID changed and when it went to open it to upload it found it was 'deleted' as the FileID no longer pointed to a file and queued the delete operation. I also found that files that were uploaded into the cloud in one folder were sometimes downloading to an alternate folder locally. I am guessing this is because the folder FileID changed. It thought the 2023 folder was with ID XYZ but that now pointed to a different folder and so it put the file in the wrong place. The final form of corruption was finding the data from one photo or video actually in a file with a completely different name. This is almost guaranteed to be due to the FileID bugs. This is highly destructive as backups make this far harder to correct. With one files contents replaced with another you need to know when the good content existed and in what files were effected. Depending on retention policies the file contents that replaced it may override the good backups before you notice. I also had a BSOD with onedrive where it was trying to set attributes on a file and the CoveFS driver corrupted some memory. It is possible this was a race condition as onedrive may have been doing hundreds of files very rapidly due to the bugs. I have not captured a second BSOD due to it, but also stopped using onedrive on DrivePool due to the corruption. Another example of this is data leakage. Lets say you share your favorite article on kittens with a group of people. Onedrive, believing that file has changed, goes to open it using the FileID however that file ID could essentially now correspond to any file on your computer now the contents of some sensitive file are put in the place of that kitten file, and everyone you shared it with can access it. Visual Studio Failures Visual studio is a code editor/compiler. There are three distinct bugs that happen. First, when compiling if you touched one file in a folder it seemed to recompile the entire folder, this due likely to the notification bug. This is just a slow down, but an annoying one. Second, Visual Studio has compiler generated code support. This means the compiler will generate actual source code that lives next to your own source code. Normally once compiled it doesn't regenerate and compile this source unless it must change but due to the notification bugs it regenerates this code constantly and if there is an error in other code it causes an error there causing several other invalid errors. When debugging visual studio by default will only use symbols (debug location data) as the notifications from DrivePool happen on certain file accesses visual studio constantly thinks the source has changed since it was compiled and you will only be able to breakpoint inside source if you disable the exact symbol match default. If you have multiple projects in a solution with one dependent on another it will often rebuild other project deps even when they haven't changed, for large solutions that can be crippling (perf issue). Finally I often had intellisense errors showing up even though no errors during compiling, and worse intellisense would completely break at points. All due to DrivePool. Technical details / full background & disclaimer I have sample code and logs to document these issues in greater detail if anyone wants to replicate it themselves. It is important for me to state drivepool is closed source and I don't have the technical details of how it works. I also don't have the technical details on how applications like onedrive or visual studio work. So some of these things may be guesses as to why the applications fail/etc. The facts stated are true (to the best of my knowledge) Shortly before my trial expired in October of last year I discovered some odd behavior. I had a technical ticket filed within a week and within a month had traced down at least one of the bugs. The issue can be seen https://stablebit.com/Admin/IssueAnalysis/28720 , it does show priority 2/important which I would assume is the second highest (probably critical or similar above). It is great it has priority but as we are over 6 months since filed without updates I figured warning others about the potential corruption was important. The FileSystemWatcher API is implemented in windows using async overlapped IO the exact code can be seen: https://github.com/dotnet/runtime/blob/57bfe474518ab5b7cfe6bf7424a79ce3af9d6657/src/libraries/System.IO.FileSystem.Watcher/src/System/IO/FileSystemWatcher.Win32.cs#L32-L66 That corresponds to this kernel api: https://learn.microsoft.com/en-us/windows/win32/fileio/synchronous-and-asynchronous-i-o Newer api calls use GetFileInformationByHandleEx to get the FileID but with older stats calls represented by nFileIndexHigh/nFileIndexLow. In terms of the FileID bug I wouldn't normally have even thought about it but the advanced config (https://wiki.covecube.com/StableBit_DrivePool_2.x_Advanced_Settings) mentions this under CoveFs_OpenByFileId "When enabled, the pool will keep track of every file ID that it gives out in pageable memory (memory that is saved to disk and loaded as necessary).". Keeping track of files in memory is certainly very different from Windows so I thought this may be the source of issue. I also don't know if there are caps on the maximum number of files it will track as if it resets FileIDs in situations other than reboots that could be much worse. Turning this off will atleast break nfs servers as it mentions it right in the docs "required by the NFS server". Finally, the FileID numbers given out by DrivePool are incremental and very low. This means when they do reset you almost certainly will get collisions with former numbers. What is not clear is if there is the chance of potential FileID corruption issues. If when it is assigning these ids in a multi-threaded scenario with many different files at the same time could this system fail? I have seen no proof this happens, but when incremental ids are assigned like this for mass quantities of potential files it has a higher chance of occurring. Microsoft mentions this about deleting the USN Journal: "Deleting the change journal impacts the File Replication Service (FRS) and the Indexing Service, because it requires these services to perform a complete (and time-consuming) scan of the volume. This in turn negatively impacts FRS SYSVOL replication and replication between DFS link alternates while the volume is being rescanned.". Now DrivePool never has the USN journal supported so it isn't exactly the same thing, but it is clear that several core Windows services do use it for normal operations I do not know what backups they use when it is unavailable. Potential Fixes There are advanced settings for drivepool https://wiki.covecube.com/StableBit_DrivePool_2.x_Advanced_Settings beware these changes may break other things. CoveFs_OpenByFileId - Set to false, by default it is true. This will disable the OpenByFileID API. It is clear several applications use this API. In addition, while DrivePool may disable that function with this setting it doesn't disable FileID's themselves. Any application using FileIDs as static identifiers for files may still run into problems. I would avoid any file backup/synchronization tools and DrivePool drives (if possible). These likely have the highest chance of lost files, misplaced files, file content being mixed up, and excess resource usage. If not avoiding consider taking file hashes for the entire drivepool directory tree. Do this again at a later point and make sure files that shouldn't have changed still have the same hash. If you have files that rarely change after being created then hashing each file at some point after creation and alerting if that file disappears or hash changes would easily act as an early warning to a bug here being hit.
-
.NET, Fatshark game launcher issues on Drivepool
MitchC replied to kachunkachunk's question in Nuts & Bolts
The xaml parse error can be a bit misleading as anything causing a constructor to crash can throw it. Best option is to run process monitor and look at the file accesses right before the error by the process.