Jump to content

eujanro

Members
  • Posts

    12
  • Joined

  • Last visited

  • Days Won

    1

Posts posted by eujanro

  1. 16 hours ago, MitchC said:

    If this is bi-directional syncing as well this is a bit of the nightmare scenario for this bug.   80TB is a massive amount if you had just 00.1% of your data (aka 100GB) changed would you notice? 

    SHA hashing of every file is good at detecting ordinary corruption but would likely not catch the data loss this set of bugs can cause.   The issue here would appear more like you overwrote a file with new content intentionally (or depending on the application that you copied a file over another).  

    I can assume if your data is all pristine then Syncthing probably doesn't use file ID's right now.

     

    It is well shown that drivepools file change notification system has multiple large flaws.  Many applications can trigger a 'file content changed' notification from Windows even when they are only reading it.  Maybe syncthing checks the file timestamp and size at that point and if its the same it does nothing.  If it listens to the file system though at best you just have the file completely read and re-hashed for sync-thing to decide no changes, at worst it just queues it for syncing and you sync extra data.     Either way you could be wearing the drives out faster that needed, losing performance, or potentially wasting bandwidth and backup time.   We also know that drivepool does not properly bubble up file change notifications when writes actually happen which, depending how syncthing is watching, could mean it misses some files that change.  Not a huge deal but if it does a full scan monthly to make sure it has all file changes detected and in between you rely on file change notifications to catch files to sync it means you might think everything was in sync to right before a crash when in reality it might be up to a month out of date.

    If the file is likely to actually have changed (say a log file) I would say unrelated.   Even for one time writes, it could be the application was still writing the file as well when syncthing starts hashing and again not related.  It is also possible though that it goes to read a changed file, but causes the notification bug so it gets the file has changed and then provides that warning.  This could be a race condition as it would likely cause it right at the start of the read so depending when it starts considering the notifications after it reads to be a change it may only happen some times.   Another option would be if something else also has file change notifications then if the other app reads the file after syncthing starts reading it the other app causes a 'write notification' even though its only reading due ot this bug.

     

    First, there is 0% chance these bugs are not critically problematic with drivepool.   They can lead to direct data loss or corruption and sensitive data to be leaked which is a horrid statement for a file system.   The question is do these bugs affect your specific use case.  

    The problem is it may not present uniformly.   Maybe syncthing does diff based syncing for only the parts of a file that changed for bigger files (say over 10MB) but any files it thinks have changed that are under 10MB it just syncs them blindly as they are so small and it keeps cpu usage down.  Maybe it uses a more simple solution.  If a file is 'changed' it tries to optimize append based changes.  It hashes the file file size and if that equals the old hash it knows it only needs to sync the newer bytes, otherwise it syncs the whole file. 

    Even if the worst that happens right now is you have excess drive reads of bandwidth spend that speaks nothing of tomorrow.   Maybe syncthing decides it does not need to hash a file when it gets a change notification as that causes a full additional read of the entire file (hurting performance and drive life) so it starts to just trust windows file change notifications.  Maybe you never even upgrade syncthing but right now you don't use an application that triggers the 'file content changed' notification when it just opens a file for reading (IE VSCode might not but something like notepad does).  You start using a new file editor or video player and now it does trigger that bug, so now syncthing is getting a whole lot more of these change notifications.   When you upgrade syncthing do you read about all changes between versions?  Who knows if some internal changes wold even make the change log.  If syncthing starts relying on FileID's more in the next version then your data may slowly corrupt.

    If most of your data doesn't change then hashing it all now,   and hashing it down the line and comparing would show you if a file changes that shouldn't.   This is not the same hashing that syncthing does as it is looking for corruption from the system/disk/transfer and not for the file contents being updated on purpose.   Still, even then as these bugs are likely to effect files that change more often first that may not catch things quickly (mainly there you are waiting for it to go to write a FileID of one  of the new files but it ends up overriding an old file instead).

    I briefly looked at syncthings documentation that it said it compares file modification times for detecting if a file changed.   I don't know if their documentation just didn't mention it using size as well or if it actually only looks at the modification time to detect changes.  If so this could be more risky as well.

    Personally I moved to Truenas which while not as flexible in terms of drive pooling got me what I wanted and the snapshotting makes backups fantastic.   For others if unraid or similar is possible you could still have such flexibility without the liability of drivepool.  This is not a fun game of chance to play where you are hoping you don't find yourself using the wrong combination of apps that leads to data loss.

    Drivepool clearly works (at least mostly as many may not know perf or other issues are caused by drivepool) correct for most people.  Because of the exceptional difficulty of knowing how these bugs could effect you today or in the future I still see it as quite reckless that drivepool does not at least warn you of these possibilities.   This is not dis-similar to the fact that there seems to be a decent bug in reading striped data from drivepool for multiple users yet there is no such warning that striping can cause such a bad issue: 

     

    Some are uni-directional an some are bi-directional. But as I understand check-summing is deterministic by input in the end. So, I suppose a SHA-256 in Syncthing is expected to be same globally and therefore is atomically the same on both sides. And as per the Syncthing documentation "There are two methods how Syncthing detects changes: By regular full scans and by notifications received from the filesystem (“watcher”)" and in my case full scan of ALL (80TB+) data is taking place around each 24h timeframe. 

    Moreover, I'm using the versioning option for each of the synced critical directories. This means that if change has been detecting by reading the file the, updated file will be moved to .stversion folder, inside of the root synced directory in question. But beside self modified data, I have never found in the last 5 years, massive corrupted data that has been "corrupted" and propagated, and therefore moved in the .stversion folder, because let's say has been "changed/corrupted" silently on source and then replicated on target, and thus deleting the "good" file presumably. 

    Meantime, I have executed your watcher.ps1 script, and I confirm that "change" operation is reported indeed on files when opened. But in my case there is a "catch": i'm doing 95% percent of file level operations using Total Commander, and surprisingly opening/managing the same file, through TC on watcher.ps1 monitored path, does NOT report any change using TC! Only Windows Explorer is reporting it changed. So could it be that by using TC exclusively, I avoided the issue/bug amplification. Now is the question if Windows Explorer has issues or DP, or both...

    It is in the end still concerning and disturbing having trust issues and doubts in this case, and everything should be treated and expected by @Alex as soon as possible.

     

  2. Only my 2 cents here, I'm using DrivePool over 10 years and never had any issues as reported here. 

    In my case, there are two geographically different located systems that are kept in sync state (NTFS formatted HDD/SSD's, 80TB+), using Syncthing app. Nevertheless, Syncthing has it's own internal mechanism for the files sync mechanism, using a self "index-*.db" of all the hashed files/folders, using SHA-256 algorithm. Some folders  in my Syntching configuration, have also the "fsWatcherEnabled" option, that "detects changes to files in the folder and scans them" automatically, and this is also working flawlessly, never had any issues with.

    Still, I have very rarely discovered some hashing warnings (eg "File changed during hashing..") especially with new files, but the warnings disappeared after a second re-hash/scan of files/folders. 

    Should I assume that if such issues were critically problematic between the DrivePools, I should have seen reports of Syncthing struggling to keep the files in a clean synced state? 

  3. Hi Doug,

    Thank you for your answer. I was using the Adaptec 72405 with Server 2016 Essentials and has worked flawlessly.

    Now I have tried the "Adaptec RAID Driver v7.5.0.52013 for Windows, Microsoft Certified" dated Aug 2017, from the Microsemi site, but this also was provided by the Operating System. It didn't work either. I think it has do to with the Server 2022 or something. My Scanner settings are nothing but the default one, aside the Unsafe and NoWMI option which I manually checked, to see if bring something to my issue in question. But nothing is working. 

    In the end, I think it's a Server 2022 issue, because under Server 2016, the same controller and disks, worked without issues.

    I will try further with the same Stablebit Drivepool/Scanner version, that was on the Server 2016 (same Adaptec RAID Driver v7.5.0.52013 also) installed, but I don't think it will change something

     

  4.  [Update 2]

    After adding a 16TB new HDD, the Balancing rules have wonderfull worked, and the replacement rules have been respected.

    I had to really further set exclusions file placement rule for other non /MUSIC /PICTURE root structure directories also. It's really wonderfull how this is working. A real state of the art software.

    In summary, it was the over 90% usage of other pooled hard drives that forced an Overflow of data, to the SSD. 

  5. On 5/17/2022 at 5:03 AM, gtaus said:

    I have found that DrivePool balancing and duplication often have problems when I hit that 90% threshold. I have seen my SSD cache get kicked offline as well. Fortunately, adding more HDD storage and/or removing unused files corrects the problems on my system. I have to manually recheck my SSD to tell DrivePool that it should be used as an SSD cache and not an archive disk. But it seems to work again without any problems.

    If you are happy with your current custom SSD cache settings, I would write them down because it seems to me that I had reenter my settings. After many months of not looking at DrivePool (it just works), I had forgotten my custom SSD cache settings and had to play around with it again until I got it back to where it works best for my system. If you use the default settings, then it might not be an issue.

    Actually I'm not using the SSD's as "cache" on my configuration. Their are just (faster) storage devices in my pool, an therefore I just wan't them to hold only data that must be delivered as quickly as technically possible.

    In my case, the Plex Photo library just skyrocketed in responsiveness and browsing speed, by delivering the content, from where it is stored, the SSD drives, no matter of inside or out of my home network. Couch-time family photos browsing is a delight now.

    SSD caching works great and make a difference indeed when direct file level access/delivery on the network is required. But that's not my model usage for now.

  6. Total Commander is allowing you to search on the Pool itself or on the drives (letter or mountpoints). You can also Save the searches as templates, and use them quickly later.

    Is also supporting attributes, Text, conditions, time based attributes, Regex ...

    It's a shareware application, and if your bothered by the startup message, you could register it.

  7. [Update]

    I think I have identify the reason behind the file placement rule violation (overflowing).

    It seems that all other HDD's total used space is 90% filled or more, and the default Pool Overflow rule is permitting this:
    image.thumb.png.1af02e8cf5023936ef0150d223c918a7.png

    I will add another 8TB to the pool, rebalance, and will check afterwards, if the disk total used space will drop under 90%, the SSD data will also be properly rebalanced.

  8. Hi covecube,

    Using Windows 10 Pro, DP 2.3.1.1410 BETA, Scanner 2.6.2.3869 BETA

    One Pool under letter B:\ with over 40TB size

    Long time user of Covecube solutions, just wanted to optimize speed for certain files, by using two  SSD for \PICTURES and \MUSIC folders.

    For that, I just added the two SSD drives into the Pool and configure a "File Placement" rule using "File Placement Option" to USE only the 2 drives for mentioned folders. 

    Looking on the SSD's root GUID PoolPart directory and lo-and-behold, other files\Folders (eg. BACKUP, MOVIES, ...) landed there, and no matter I do, they love to stay on the expensive NAND. The \PICTURES Folder is 2x Duplicated (around 900GB including duplication usage) and the \MUSIC is 770GB not duplicated. The SSD are reporting 1.31TB and 932GB free.

    Everything under "Balancers" Plugins, is set to Default and active are the: Stable Bit Scanner, Volume Equalization, Drive Usage Limiter, Prevent Drive Overfill, Duplication Space Optimizer

    The follwing are "Disabled": Disk Space Equalizer, SSD Optimizer, Ordered File Placement Rules.

    Why are the "File Placement Rules" not working ?!?

     

  9. Hi everyone,

    First, I would like to share that I am very satisfied with DP&Scanner. This IS a "State of the art" software.

    Second, I have personally experienced 4 HDD drives fail, burned by the PSU,(99% data was professionally $$$$ recovered) and a content information, would have been comfortable, just to rapid compare and have a status overview.

    I also asked myself, how to catalog the pooled drives content, logging/versioning, just to know, if a pooled drive will die, if professional recovery make sense (again), but also, to check the duplication algorithm is working as advertised.

    Being a fan of "as simple as it get's", I have found a simple free File lister, command line capable.

    https://www.jam-software.com/filelist/

    I have build up a .cmd file to export Drive letter (eg: %Drive_letter_%Label%_YYYYMMDDSS.txt), for each pooled drives. Then I scheduled a job to run every 3hours, and before running, just pack all previous .txt's into an archive, for versioning purposes. 

    I get for each 10*2TB, 60% filled pooled HDD's, around 15-20MB .txt file (with excluding content filter option) in ~20minute time. An zipped archive, with all files inside, is getting 20MB per archive. For checking, I just use Notepad++ "Find in Files" function, point down to the desired .txt's folder path, and I get what I'm looking for, on each file per drive.

    I would love to see such options for finding the file on each drive, built up in DP interface.

    Hopefully good info, and not a long post.

    Good luck!

     

×
×
  • Create New...