Jump to content
  • 0

Measuring Every Boot, Very Slow During


toxic

Question

I've noticed every single time I reboot, the whole pool remeasures every disk, which takes ~3 hours to complete. During this time, the pool is borderline unusable. Launching a  Steam game installed to the pool may take 5 minutes or more to launch, and then stutter for a few seconds every 5 seconds or so (completely I/O starved). Even video playback can stutter, it seems the pool just has no leftover I/O to serve requests over the measuring. 

I understand that measuring has a high impact, but it doesn't seem like it should be this bad. It seems like outside I/O should have priority over the measuring. 

Secondly, I understand the pool shouldn't need to remeasure every reboot. Any way to troubleshoot why this is happening? It makes the machine pretty much unusable for hours after rebooting, which I need to do periodically for updates etc. 

The one thing I have which is a little unusual is the pool is all made of REFS disks. Is this possibly the culprit? Obviously moving back to NTFS would be quite an undertaking, though with Microsoft trying to drop support from the desktop versions of Win 10 it's certainly something I'm open to. 

Disks in the pool are: 

  • 1x8TB WD Red
  • 2x4TB WD Red
  • 1x4TB WD Black

EDIT: forgot to add that when the measuring is finished, the performance settles down and becomes pretty reasonable, comparable to a normal disk. So I've been trying to avoid reboots as much as possible not to have to render the machine unusable for hours at a time. 

I've also completely disabled the Windows Search Service (indexing), as I have no need for it and know it can cause performance impacts. 

Link to comment
Share on other sites

9 answers to this question

Recommended Posts

  • 0

Measuring every boot is not typical, and indicates an issue.

Would you try installing this version?
http://dl.covecube.com/DrivePoolWindows/beta/download/StableBit.DrivePool_2.2.3.953_x64_BETA.exe

If that doesn't help, enable boot logging, reproduce, and upload the logs:
http://wiki.covecube.com/StableBit_DrivePool_2.x_Boot_Time_Log_Collection

Also, if you could open a ticket at https://stablebit.com/contact

And then run the StableBit Troubleshooter on the system in question.
http://wiki.covecube.com/StableBit_Troubleshooter

Link to comment
Share on other sites

  • 0

I still have a support job open about this, and I wasn't yet able to fix it. I've just been keeping the reboots to as minimum as possible and doing them as I go to bed when the measuring will not impact me. It's still a massive pain and I'd love to get it sorted. What filesystem are you using out of curiosity? 

I'm in the process of re-evaluating some things and may end up change my setup a bit. Issue being as well that Microsoft have blocked being able to format a drive REFS without the enterprise or 'pro for workstations' license, which I have neither. As such it's actually pretty difficult to format a new drive to add into the pool for me at the moment (running VM trial of enterprise to format the drive?). But If Microsoft are trying as hard as they are to stop me using REFS I'm a little hesitant to fight them on it and invest further. I'm sort of wondering what I should do next, maybe one by one redo the drives as NTFS and swap them back in (I have some juggle room to do this). Maybe recreate the pool from scratch as NTFS and migrate things over. 

I've also started using Snapraid for my largest and mostly static data, and that's given me a lot of extra space to play with, and Snapraid doesn't recommend using REFS https://www.snapraid.it/faq#fs either. I've also heard of REFS's failure mode in single drive mode is to just delete any damaged file silently (not likely an issue in a two way mirror in drivepool though), which would then be gone forever after the next snapraid sync, so I'd much rather have corrupt file so I can restore it. 

Link to comment
Share on other sites

  • 0

Sorry to resurrect this, but did you resolve it? I'm having the exact same issue. I can barely even open file explorer while the pool is being measured (and it takes a very, very long time). Is perhaps a drive failing? I have no smart errors

Link to comment
Share on other sites

  • 0

Not really no. Despite a lot of back and forth working with Drashna & Co. we never did get to the bottom of the issue. I even deleted the pool entirely, and recreated a fresh one (and move all the data back into the new pool folders). One of the drives (WD Black) did turn out to be failing and removing it from the system improved the measure times a tiny bit since that disk was performing slowly, but overall problem remained.

What did help was when I converted the pool back to NTFS (one disk at a time, format & copy). I haven't looked lately, but I believe it still measures each boot. However, under NTFS the measuring is very quick, and doesn't have any real impact on performance. I guess it only needs to read the NTFS journal during a measure now, whereas on REFS it was having to walk the entire directory tree (and I have millions of files in some directories). Additionally, I bought a 2TB SSD for my games and other data I don't care about losing and moved most of the performance sensitive stuff to that. Pool performance is now not so much of a concern.

Long term I'm thinking of moving to a dedicated Linux NAS system with mergerfs and snapraid, but for now my current setup is fine - under NTFS it's not really problematic anymore.

 

Link to comment
Share on other sites

  • 0
On 12/22/2020 at 4:19 PM, sfg said:

I can barely even open file explorer while the pool is being measured (and it takes a very, very long time). Is perhaps a drive failing?

FWIW, when I had a drive that was failing, Windows would take forever to read/reread the directory. If that HDD was part of DrivePool, then I would think DrivePool would probably just sit there attempting to measure/remeasure the pool. It might have nothing to do with DrivePool itself, but rather Windows trying to read the drive.

Normally, my DrivePool does not remeasure my pool (NTFS HDDs) upon reboot. I have had a few instances when one, or both, of my ProBoxes did not auto-reconnect upon reboot and the HDDs were listed as missing in DrivePool. After reconnecting the ProBoxes, DrivePool then would find the HDDs and start a remeasure of the pool. My DrivePool is currently ~65TB, and I estimate it takes ~45 minutes to remeasure. Is that a long time? I guess it depends. However, I am still able to use my system while DrivePool is remeasuring and I don't notice much performance hit. My DrivePool is mainly used as a media storage system so my demands on it are not too high.

I don't know what HDD monitoring program(s) you use, but Stablebit Scanner is a good choice and may detect early warning signs of drive failure. I had a problem with a pool HDD which was detected by Hard Disk Sentinel and that gave me time to remove almost all files off the drive before it completely died. Hard Disk Sentinel has a free version that will monitor your drives, but the paid version has extra features such as advanced testing and repair.

SMART is a great feature on drives, but sometimes you get a failure that may not be reported in SMART. In those cases, I find the programs that actively test the drives are better at detecting problems.

If you are able to find the problem with your DrivePool slow startups/remeasuring, I hope you come back and let us know. If you already use Stablebit Scanner, I would still encourage you to download the free version of Hard Disk Sentinel and let it run the basic diagnostics overnight on your pool drives to see if can detect anything abnormal. It won't cost you anything but a little time and it may detect something that you could fix. Good luck.

Link to comment
Share on other sites

  • 0

What gtaus has suggested is good advice. In my case I did remove one questionable drive from the system (which hadn't shown up much under S.M.A.R.T \ SB Scanner, but Windows eventlogs showed occasional failed writes on the drive). And there were some small improvements removing it from the system but the re-measure persisted and it was switching REFS->NTFS that fixed the performance issues.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Answer this question...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...