Jump to content
Covecube Inc.
  • 1

Bad drive causing BSOD


denywinarto
 Share

Question

About 2 days ago i instaled 1tb hdd which i thought was still in good condition because HDD sentinel and stablebit scanner didnt report anything strange.

Last night i think my server got rebooted couple of times and some drivers got corrupted.

Thankfully i made daily backup, so i restored it.

But then just few hours ago it rebooted again, and that 1tb HDD became "RAW" and offline.

I also remember when moving bad drive from the pool last year i got BSOD once or twice.

So Is this a known issue? I'm using the latest build.

Link to comment
Share on other sites

21 answers to this question

Recommended Posts

  • 0

I'd say it's a "known" issue in the sense that drives can go bad in various ways, and some of those ways aren't necessarily handled well by the various hardware and software layers involved (e.g. drive controller, controller driver, kernel file system driver, etc). Any one of those throws a tantrum in response, there's only so much DrivePool can do about it.

Link to comment
Share on other sites

  • 0
10 minutes ago, Shane said:

I'd say it's a "known" issue in the sense that drives can go bad in various ways, and some of those ways aren't necessarily handled well by the various hardware and software layers involved (e.g. drive controller, controller driver, kernel file system driver, etc). Any one of those throws a tantrum in response, there's only so much DrivePool can do about it.

So you're saying its more like OS issue?

Hmm i'm using server 2019 and LSI 9300 8e.. 

Any possible workaround that is recommended?

I mean eventually all my drives are gonna go bad someday, just wondering if there's a way to prevent it from screwing up my OS installation 

Edited by denywinarto
Link to comment
Share on other sites

  • 0

Buying decent hardware (not always easy/obvious I admit), updating firmware and software (OS/drivers/apps) to the latest "stable" or "production" version (e.g. avoiding anything marked as "unstable" or "beta" or "alpha" where possible) and keeping regular backups.

Link to comment
Share on other sites

  • 0

I am having the same problem removing a drive.  I am running it on Server 2016 OS. It blue screens about 45% through the removal process. I have a new drive that I want to replace it with. I wonder if I could copy the files from the existing drive to the new drive and swap them.  Will that work?

Link to comment
Share on other sites

  • 0
2 hours ago, muaddib said:

I am having the same problem removing a drive.  I am running it on Server 2016 OS. It blue screens about 45% through the removal process. I have a new drive that I want to replace it with. I wonder if I could copy the files from the existing drive to the new drive and swap them.  Will that work?

1. Manually move all files to the outside of the poolpart.xxx folder on the bad disk so that drivepool can't see them any more.
2. Remove the old disk from DrivePool (it will be instantly complete since the poolpart folder is empty).

3. Insert new disk (new different PoolPart.xxx is created).
4. Manually copy the files from the old disk to the new poolpart folder on the new disk.

I found out by doing this i didn't get BSOD compared to moving the disk using DP

On 10/9/2020 at 10:27 PM, Shane said:

Buying decent hardware (not always easy/obvious I admit), updating firmware and software (OS/drivers/apps) to the latest "stable" or "production" version (e.g. avoiding anything marked as "unstable" or "beta" or "alpha" where possible) and keeping regular backups.

Above is why i think the BSOD is related to drivepool, what i'm worried about if one my drives go bad then it would just throw random BSOD and screw up my OS

Link to comment
Share on other sites

  • 0
1 hour ago, denywinarto said:

1. Manually move all files to the outside of the poolpart.xxx folder on the bad disk so that drivepool can't see them any more.
2. Remove the old disk from DrivePool (it will be instantly complete since the poolpart folder is empty).

3. Insert new disk (new different PoolPart.xxx is created).
4. Manually copy the files from the old disk to the new poolpart folder on the new disk.

I found out by doing this i didn't get BSOD compared to moving the disk using DP

Above is why i think the BSOD is related to drivepool, what i'm worried about if one my drives go bad then it would just throw random BSOD and screw up my OS

Which service do I shutdown?

Link to comment
Share on other sites

  • 0

Actually, if you have duplication then this procedure is, IMHO, incomplete. When you move files out of the poolpart.* folder, DP may remeasure and suddenly find duplicates are missing and re-duplicate.

If you have duplication, you can just power off the machine, physically remove the drive, reboot and then remove the missing/faulty drive from the Pool through the GUI. Remeasure/re-duplicate and it is done.

If you do not have duplication, then I would attach the new drive, stop the drivepool service (so DP will not interfere),  copy the data from the faulty HDD poolpart.* folder to the new drive (to ensure you do not perform any unnecessary writes on the faulty drive), power off, physically remove the drive, reboot, start drivepool service if necessary, remove faulty drive from the Pool through the GUI, add the new drive to the Pool, stop the service, move contents on the new Drive to the poolpart.* folder on that drive, restart drivepool service, remeasure and it is done. I think.

Link to comment
Share on other sites

  • 0

Well got another case now with stablebit scanner. This time its worse, its 3x4 tb wd black in storage spaces. It became RAW after BSOD

I'm pretty sure its scanner cause as soon as it starts scanning after installation BSOD occurs. And the reliability windows page back this up.

(I installed scanner on 6.36 and input the serial about half an hour later.)

Its not drivepool cause the drives arent in the pool. This is on newly installed Server 2019, i just reinstalled it yesterday

I immediately uninstall scanner.

What should i do now?

I need scanner to warn me of bad drives so i can replace them before they go bad.

But i have 2 dozens of drives if a single bad drive + scanner = BSOD then that means 2 dozens reliability holes. 

 

Edit : I should have add, The reason i reinstalled my machine was because i'm having multiple BSOD's everyday.

Now i suspect my recent midnight BSOD's were  caused by scanner scanning at midnight.

1. A new installation BSOD'ed right after i install scanner and it's scaning

2. The BSOD always happens at midnight, i check task scheduler and nothing is run at midnight. Seeing scanner's description of "always scans at midnight" reminds me of this.

While i know the bad drives are the fault, i think scanner could do a better job preventing windows BSOD trigger

BSOD.PNG

BSOD2.PNG

Link to comment
Share on other sites

  • 0

Keep in mind that Stablebit is not some huge company, it's a small business. Current circumstances (COVID et al) might not be helping either.

Can you avoid using Storage Spaces? I've never heard good things about that "feature" outside the marketing hype, only horror stories about people discovering that it seems great until if (when) something goes wrong, then it tends to go BADLY wrong.

Could be a bad drive (WD Blacks are usually great drives but I've had one go bad so don't rule them out), could be a bad controller. Try using Disk Settings in Scanner to turn off surface scanning for all drives, then turn it back only for a single drive each night to discover if it's a specific drive (or drive cable, or drive caddy, or controller port) at fault, then after that turn on an additional drive each night (1 drive, 2 drives, 3, 4, etc) to discover if it's a controller/driver/system load-related issue. Isolating a hardware fault can be a pain in the backside, I know from experience.

Edit: regarding "While i know the bad drives are the fault, i think scanner could do a better job preventing windows BSOD trigger", the short story is that's not possible - a BSOD is the OS kernel equivalent of an Emergency Stop button on factory machinery. If something is wrong with the machinery and someone (Windows) decides to press the button, by design nothing is supposed to stop the machinery from being halted.

 

Link to comment
Share on other sites

  • 0
10 hours ago, Shane said:

Keep in mind that Stablebit is not some huge company, it's a small business. Current circumstances (COVID et al) might not be helping either.

Can you avoid using Storage Spaces? I've never heard good things about that "feature" outside the marketing hype, only horror stories about people discovering that it seems great until if (when) something goes wrong, then it tends to go BADLY wrong.

Could be a bad drive (WD Blacks are usually great drives but I've had one go bad so don't rule them out), could be a bad controller. Try using Disk Settings in Scanner to turn off surface scanning for all drives, then turn it back only for a single drive each night to discover if it's a specific drive (or drive cable, or drive caddy, or controller port) at fault, then after that turn on an additional drive each night (1 drive, 2 drives, 3, 4, etc) to discover if it's a controller/driver/system load-related issue. Isolating a hardware fault can be a pain in the backside, I know from experience.

Edit: regarding "While i know the bad drives are the fault, i think scanner could do a better job preventing windows BSOD trigger", the short story is that's not possible - a BSOD is the OS kernel equivalent of an Emergency Stop button on factory machinery. If something is wrong with the machinery and someone (Windows) decides to press the button, by design nothing is supposed to stop the machinery from being halted.

 

 

Did you read my post about the chronological order of BSOD?

 windows server  2019  + corrupted Storage spaces MBR= no BSOD, windows simply gives warning "this drive needs to be scanned blablabla"

 windows server  2019 + corrupted Storage spaces MBR +  stablebit scanner running = BSOD, right after inserting license code and scanner starts running. As you can see on the reliability pic on my last post.

The fix is for scanner is quite simple, the trigger is bad MBR sector from Storage spaces harddisk, so simply prevent it from scanning once it detects corrupted SS drive.

 

I know this is stablebit forum, but without customer critics a product won't get any better. 

the 3 drives are now fine in SS after chkdsk, without stablebit scanner of course.

Link to comment
Share on other sites

  • 0

Yes, I read the post.

If your drives are now "fine", they should be able to cope with being scanned. If they still can't cope, then you have bigger problems than a damaged MBR (and based on my own experiences, a damaged MBR is a symptom of something else having gone wrong).

I'd also be changing away from Storage Spaces anyway, because collapsing into RAW because its drives got poked by a diagnostic utility seems a terrible failure mode for a storage system.

"3 days and not a single response? Not even my ticket gets responded"

3 days isn't a long time for a small programming business with just a few employees (and two-ish volunteer forum mods). For comparison, Dell has hundreds (thousands?) of service reps and earlier this year Dell took four months to finally find and send me a backpack I'd paid for. However, since Alex and Christoper are (in my experience) usually quite prompt in responding to fault tickets, it may be that something has happened (e.g. COVID, though I hope not). If you haven't had any response to your ticket after a full week, let me know via a direct Message and I'll see what I can find out.

P.S. If this was happening to me, I'd be wanting to know if I had a bad drive and I'd be trying to find out exactly which drive(s) trigger the BSOD when scanned - that's why I suggested turning Scanner off for all but a single drive at a time.

Link to comment
Share on other sites

  • 0
3 hours ago, Shane said:

Yes, I read the post.

If your drives are now "fine", they should be able to cope with being scanned. If they still can't cope, then you have bigger problems than a damaged MBR (and based on my own experiences, a damaged MBR is a symptom of something else having gone wrong).

I'd also be changing away from Storage Spaces anyway, because collapsing into RAW because its drives got poked by a diagnostic utility seems a terrible failure mode for a storage system.

"3 days and not a single response? Not even my ticket gets responded"

3 days isn't a long time for a small programming business with just a few employees (and two-ish volunteer forum mods). For comparison, Dell has hundreds (thousands?) of service reps and earlier this year Dell took four months to finally find and send me a backpack I'd paid for. However, since Alex and Christoper are (in my experience) usually quite prompt in responding to fault tickets, it may be that something has happened (e.g. COVID, though I hope not). If you haven't had any response to your ticket after a full week, let me know via a direct Message and I'll see what I can find out.

P.S. If this was happening to me, I'd be wanting to know if I had a bad drive and I'd be trying to find out exactly which drive(s) trigger the BSOD when scanned - that's why I suggested turning Scanner off for all but a single drive at a time.

Okay, i'm not expecting a fix in 3 days or something like that,

Just a word acknowledging this issue and a confirmation of future version with fix would be nice.

In my opinion this should be regarded as critical issue because its affecting OS stability.

About storage spaces,

Well, I need 12tb with ssd-like read speed, with the current price of SSD i can't find any other alternative which offers similar performance to size ratio,

and i'd rather stick with what i have now. Maybe once SSD price goes down i'll reconsider it.

I plan to make OS backup and run scanner and turn the scanner off for the storage spaces drives. If anything goes wrong i can just revert to the backup, but hopefully i wont need to do that.

 

 

 

Link to comment
Share on other sites

  • 0

Bad news, this is what i did.

Made a backup

Installed 3x4 TB wd in the rack, (In my other spare machine with same motherboard and cloned OS it had no issue)

The SS Disks were offline. So i thought i'd exclude it from scanner's surface test lists first.

Install stablebit scanner,  exclude it from scanner

It does appear on scanner as "Microsoft storage space drive" IIRC.

All of a sudden BSOD, haven't even re-initiate the disks yet.

So i unplug the 3x4 TB.

ill got BSOD.

So i restored the backup before scanner installation, thankfully it appears to be stable until now.

So conclusion is, excluding it from scanner's list doesn't work..

I'm gonna check the SS drives on my spare machine again, but last time it was repaired and there were no issues

Link to comment
Share on other sites

  • 0

If I'm understanding correctly, it was BSOD'ing without it even running a scan?

Perhaps try ticking the "Do not use Direct I/O when querying S.M.A.R.T." in Disk Settings for all drives (maybe even "Do not query S.M.A.R.T.") and see if that's stable?

Hmm. Are you running any particular antivirus on the machine? E.g. Avast or Kaspersky? Sometimes those get overzealous.

Otherwise I'm out of ideas.

Link to comment
Share on other sites

  • 0

Just windows defender.

I have ticked all options and i'm pretty sure that one as well.

You're right about SS it could be really a PITA.

But i dont see any other options at this point that could stripe 4x3tb drives. Drivepool doesn't seem to capable.

Even 8TB ssd still costs $800, 3x 4tb wd black is not even half the price, not to mention i have to sell these at less price.

I suspect i will have to rebuild the Storage space pool from scratch,

makes sense somehow because these drives have been migrated from a different motherboard, (but it was working fine for 3 months after migration)

I haven't actually tested storage spaces running without Sb scanner on my main server,

just on my spare server with exact same motherboard and OS (it was fine for 3 days with scanner service stopped)

It's too risky since i'm definitely sure it messes up the OS.

Just bought another HDD for the backup, i'l ltry recreating storage spaces from scratch once it arrives.

Link to comment
Share on other sites

  • 0
On 10/21/2020 at 9:42 PM, denywinarto said:

Anyone know a decent alternative? (To Stablebit Scanner)

I had some problems with a HDD that Stablebit Scanner did not detect. I found a very useful program called Hard Disk Sentinel which is like Scanner, but also different. Hard Disk Sentinel comes in both a free version, which is what I am using, but also a paid version with extra features. The free version of Hard Disk Sentinel will check/test your HDDs and give you an estimate of their Health Status, which is a %. For example, most of my HDDs have a Health Status of 95-100%, but the failing drive tested out and reported an estimated 3% Heath Status with a warning to immediately move all data to another drive. It correctly reported that the drive was failing, whereas, Stablebit Scanner, for whatever reason, did not flag it. The paid version of Hard Disk Sentinel has extra features which allow your to "fix" some HDD problems and restore the health of the drive towards 100%. Essentially, from what I understand, it scans the HDD for bad sectors, blocks them off from use, so they no longer are used to store files and corrupt your data. In my case, the bad HDD was totally failing and it could not have been fixed even with the paid version.

Link to comment
Share on other sites

  • 0
17 hours ago, gtaus said:

I had some problems with a HDD that Stablebit Scanner did not detect. I found a very useful program called Hard Disk Sentinel which is like Scanner, but also different. Hard Disk Sentinel comes in both a free version, which is what I am using, but also a paid version with extra features. The free version of Hard Disk Sentinel will check/test your HDDs and give you an estimate of their Health Status, which is a %. For example, most of my HDDs have a Health Status of 95-100%, but the failing drive tested out and reported an estimated 3% Heath Status with a warning to immediately move all data to another drive. It correctly reported that the drive was failing, whereas, Stablebit Scanner, for whatever reason, did not flag it. The paid version of Hard Disk Sentinel has extra features which allow your to "fix" some HDD problems and restore the health of the drive towards 100%. Essentially, from what I understand, it scans the HDD for bad sectors, blocks them off from use, so they no longer are used to store files and corrupt your data. In my case, the bad HDD was totally failing and it could not have been fixed even with the paid version.

Thanks for sharing your experience, thats also what i've been using together with stablebit scanner.

My only concern is that it lacks daily scanning scanner has, or at least i havent figured out how to do it since i'm used to rely to scanner.

Sentinel also has this estimated lifetime, but i found out it wasn't always accurate as well. One of my 6tb red wd died before the estimated time. 

But then again my experience with scanner wasn't much better, iirc it only warned me a smart error less than 24h before the drive died. 

For me now whenever a drive shows filesystem error multiple times im replacing it right away. 

 

Edit : i forgot to add, some of my drives were failing even at 100%, so its not a guarantee.

Link to comment
Share on other sites

  • 0
On 10/31/2020 at 4:21 AM, denywinarto said:

Thanks for sharing your experience, thats also what i've been using together with stablebit scanner.

My only concern is that it lacks daily scanning scanner has, or at least i havent figured out how to do it since i'm used to rely to scanner.

Sentinel also has this estimated lifetime, but i found out it wasn't always accurate as well. One of my 6tb red wd died before the estimated time. 

But then again my experience with scanner wasn't much better, iirc it only warned me a smart error less than 24h before the drive died. 

For me now whenever a drive shows filesystem error multiple times im replacing it right away. 

 

Edit : i forgot to add, some of my drives were failing even at 100%, so its not a guarantee.

Definitely, no guarantees with any of these software monitoring programs, but they each have some features that may help. Too bad that HDDs don't gradually "wear out" and show signs of problems long before they completely crap out. In my experience, most of my serious problems have been with drives that just suddenly die, for whatever reason, and I really had no prior notification. However, the monitoring programs have saved me a small number of times and automatically moved data off a failing HDD before it completely died. Replacement still seems to the best option whenever you notice any problems.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Answer this question...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
 Share

×
×
  • Create New...