Jump to content
Covecube Inc.
  • 0

Measuring the entire pool at every boot?


Question

Recommended Posts

  • 0
On 11/21/2018 at 8:23 PM, Christopher (Drashna) said:

It shouldn't be.  It should remember the settings.  

Unless one or more of the disks isn't reconnecting properly.

That said, it may be a good idea to run a CHKDSK pass on all of the pooled disks, and see if that helps. 

I ran chkdsk /f /r /x on all drives in the pool. Didn't find anything bad. However, now Pool Organization keeps halting with warnings of "mismatching file parts" and DrivePool asks me if I want to automatically delete the older file parts or let me manually delete the incorrect ones. Which surprises me is that DrivePool doesn't see which of the file versions is broken, since thus far the ones mentioned under the halted error's details list all clearly have only 1 version that's correct, namely the one that has a correct CRC32. All were old rar files, apparently some on some disk were not OK, because I could not even open them. Just found that perhaps some older archives had attempts at virus-planting in them from a really old event I don't even remember. DrivePool doesn't recognize it as such though. I'll try and do a systemwide archives scan using ClamWin and see if that fixes the issue.

Link to post
Share on other sites
  • 0

Basically you have file corruption. If you have not had any system crashes that would have caused a write error and your HDDs were fine the causes is one of the following.

Bit rot, the sector on one of the drives had not been accessed in a long time resulting in file corruption (very unlikley)

Memory error, (if your not running ECC ram this is very likely)

I would recommend doing some tests on your memory subsystem to ensure it is not causing file corruption.

Link to post
Share on other sites
  • 0
On 11/30/2018 at 4:57 PM, zeroibis said:

Basically you have file corruption. If you have not had any system crashes that would have caused a write error and your HDDs were fine the causes is one of the following.

Bit rot, the sector on one of the drives had not been accessed in a long time resulting in file corruption (very unlikley)

Memory error, (if your not running ECC ram this is very likely)

I would recommend doing some tests on your memory subsystem to ensure it is not causing file corruption.

If it would be file corruption, scandisk would have solved it. It didn't find any corruption. Memory/RAM is all fine.

Upon closer look, the files that DrivePool had trouble with could be moved within the pool, but not deleted (also not manually, not from any pool-member outside of StableBit control). Really strange. They were old files (pre 2000), from really old HDD sources that I've copied to the pool long ago. Apparently, according to MicroSoft, they had a permission order problem. All the solutions MS offered against that did not solve it, but using lockhunter on the folder they were in did fix it! Finally. I had many files with this exact issue, LockHunter cured the pool for me. Cute little tool, and free!

Link to post
Share on other sites
  • 0
2 hours ago, Julius said:

If it would be file corruption, scandisk would have solved it. It didn't find any corruption. Memory/RAM is all fine.

Upon closer look, the files that DrivePool had trouble with could be moved within the pool, but not deleted (also not manually, not from any pool-member outside of StableBit control). Really strange. They were old files (pre 2000), from really old HDD sources that I've copied to the pool long ago. Apparently, according to MicroSoft, they had a permission order problem. All the solutions MS offered against that did not solve it, but using lockhunter on the folder they were in did fix it! Finally. I had many files with this exact issue, LockHunter cured the pool for me. Cute little tool, and free!

Wow, very interesting. I have had a problem like this long ago but not in DrivePool. Same thing, some ancient files from the 90s with a permissions bug. Will definitely take note of LockHunter if I ever run into that problem again.

Thanks for sharing what fixed it!

Link to post
Share on other sites
  • 0
On 11/14/2018 at 11:47 AM, Julius said:

It seems awful overkill that Stablebit DrivePool does 'measuring' whenever I rebooted the OS. Can't it just continue where it left off, recall last state? Is there a config switch I've overlooked, where I can tell it not to rescan all files on all drives at boottime?

I still feel it's doing the "Measuring..." way too soon and too often. Just now I accidentally removed a drive, put it back in immediately, and now it's decided to measure the entire pool (32 Terabytes of data) again. This is not good for the drives, a lot of bonus wear and tear, a lot of bonus reads (and probably writes as well), and waste of time and resources. Perhaps a good idea to make the trigger for "Measuring..." user configurable? I mean, to me this comes across as serious overkill, the fact that one pool member just shortly got disconnected does not warrant yet another scan of the entire pool. There were not even any reads or writes going on in the pool while the disconnection occurred.

Link to post
Share on other sites
  • 0
On 12/27/2018 at 10:30 PM, Christopher (Drashna) said:

If you pulled the disk, and it showed up as missing, and then you replaced the disk, then this is normal, expected and desirable.   There is no way to know for certain that nothing was changed when the disk was removed without rechecking it.

 

I understand that it needs to at least roughly check *that* disk, but it checks the entire pool again (reads all disks). That really seems overkill. It's 2019, aren't we at a level (of AI) where it can grasp the fact that if one disk was removed, it was disconnected for less than, say, 5 minutes, software should wonder if that was intended or not, and if not, just decide it wasn't? And then only (prompt to) check what changed on *that* disk alone?

Link to post
Share on other sites
  • 0
On 1/31/2019 at 10:39 PM, Christopher (Drashna) said:

Maybe, but you're assuming that the data on the disk that was removed is still identical to the data on the pool.  BOTH copies need to be checked, and specifically, checked against each other.

 

That is still not explaining why the entire pool needs to be checked. I disconnect only small part of the pool, I'm assuming the pooling software knows what is (or was) on that part, that disk that got temporarily removed for a few seconds, otherwise using a drivepool would not make any sense. So the question remains: Why does it have to actually *read* the entire pool of data from disks that don't even have any of the data that was on the removed disk?

Link to post
Share on other sites
  • 0

Hi, I am experiencing the same issue - measuring and checking after every boot. 

Tried this to fix possible permission problems

On 12/11/2018 at 1:21 AM, Christopher (Drashna) said:

Wow, nice digging! 

And sorry for not getting back here sooner! 

Also, for the permissions, this should work too:
http://wiki.covecube.com/StableBit_DrivePool_Q5510455

but was not successful so far. The system shuts down properly and after every reboot, also when waking up from hibernation, drivepool re-measures and checks. In this time, the system is quite slow and limited usable. 

Is there an option to disable this in drivepool directly?

Thanks a lot!

Link to post
Share on other sites
  • 0
14 hours ago, Mathemagier said:

Hi, I am experiencing the same issue - measuring and checking after every boot. 

Tried this to fix possible permission problems

but was not successful so far. The system shuts down properly and after every reboot, also when waking up from hibernation, drivepool re-measures and checks. In this time, the system is quite slow and limited usable. 

Is there an option to disable this in drivepool directly?

Thanks a lot!

Did you open a ticket already? 

 

And are these drives USB drives?

Link to post
Share on other sites
  • 0
4 hours ago, laurooon said:

Hi,

I have the same issue. My drives are USB drives, but this should not matter. Each time the pool is first "measured" and then "checked". Takes almost 35 Minutes in my case. I have 3x16TB disks, please can you help me?

Could you open a ticket at https://stablebit.com/Contact

Link to post
Share on other sites
  • 0

Is there any way to disable this measuring process, or at least make it optional?

 

I have one drive that is a bit slow on the uptake, and half the time it's "missing" after a reboot.  Simply pulling the tray and re-inserting it fixes the issue 100% of the time - I don't know if it's the drive, the tray, or the backplane.  Scanner shows no issues with the drive, and I'm happy to simply pull and reinsert the tray once or twice a month when it happens.

 

What I'm less content with is the 4+ hours worth of measuring and checking that drivepool goes through every time this happens.  I have 34 disks adding up to over 140TB in one big pool, and remeasuring all of that every other reboot just plain isn't necessary.  It's one thing if the server crashed or restarted for unknown reasons, but if the reboot was just for regularly scheduled updates, then the measuring process is just unnecessarily slowing things down substantially for hours on end - not to mention putting unnecessary wear on every drive in the system.

 

Could you add an option to disable automatic remeasuring, or at least throw a dialog window when a missing disk comes back online to ask whether a remeasure should be performed?  

Link to post
Share on other sites
  • 0

That probably wouldn't help.  I'm often not physically at the server when I do a reboot, and it could be anywhere from 5 minutes to a couple hours before I get around to checking the pool and reseating the drive if necessary.

 

But just for reference, I rebooted the server 4.5hrs ago for a windows security update, and it's only at 73% on the check process.  It's quite time consuming when you have a large collection of disks, and performance suffers significantly until it's complete.

Link to post
Share on other sites
  • 0

Only current way I can think of to disable the automatic remeasuring would be to disable the DrivePool service itself (or set its start to manual), which would also disable background balancing, background duplication, GUI access, etc, but existing pools should still be readable and writable? Then just enable/start it when you get around to checking it.

Otherwise perhaps submit a feature request, whether for a GUI option or at least a "please add a DrivePool_AutoMeasure (default:true) option in the config file to control whether DrivePool automatically remeasures a pool when one of its drives is temporarily offline or fails to connect on boot"?

Link to post
Share on other sites
  • 0

I usually have my DrivePool server running 24/7. However, if I reboot my system on a normal shutdown and restart, DrivePool does not have to perform the long, complete, re-measure of the pool. If I have a missing disk, or if there was some problem affecting the drives before the reboot, then DrivePool does re-measure everything. On my 76 TB pool, with 18 USB 3.0 drives, that can take about 2 hours to complete. However, I can still use my DrivePool home media server and the re-measure task does not prevent me from normal use of most of my files.

I am all for anything that improves the performance of DrivePool, of course, but at the moment I am just happy that DrivePool will re-check the complete pool if it suspects any changes to file integrity.

If there was some way that DrivePool could quickly check a drive for "unauthorized" changes, that would help speed up the pool re-measure task by skipping over those drives that have not changed. For example, if I have one of my 18 USB HDDs go offline, for whatever reason, DrivePool will re-measure the entire pool even though the other 17 drives have not been changed. I don't know if a running hash code could be saved for each drive, and then just compare drive hash codes after reboots/reattaching missing drives for changes, skipping over drives where hash codes match and only re-measure drives where hash codes do not match.

I don't want to complain too much, however, because I came from a Windows Storage Spaces environment where a missing disk could permanently corrupt the entire pool - despite having 1 or 2 disk failure enabled. I have had at least 3 HDDs fail in my DrivePool in the past year, and never once did I lose my entire pool. At this point, I am more than happy to let DrivePool re-check my pool for a couple hours as compared to trying to manually recover my Storage Spaces for weeks, and weeks, and weeks, before giving up everything as a total loss.

Link to post
Share on other sites
  • 0
On 3/14/2021 at 1:48 AM, gtaus said:

I usually have my DrivePool server running 24/7. However, if I reboot my system on a normal shutdown and restart, DrivePool does not have to perform the long, complete, re-measure of the pool. If I have a missing disk, or if there was some problem affecting the drives before the reboot, then DrivePool does re-measure everything. On my 76 TB pool, with 18 USB 3.0 drives, that can take about 2 hours to complete. However, I can still use my DrivePool home media server and the re-measure task does not prevent me from normal use of most of my files.

That would be normal, actually. 

As for "unauthorized" changes, the issue is how that would be tracked.  It basically leads into the same issue.  And it's not just that it's checking the one disk, but it needs to compare the duplicated contents to make sure they match, too.

Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Answer this question...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...