Jump to content
Covecube Inc.
  • 1
denywinarto

Bad drive causing BSOD

Question

About 2 days ago i instaled 1tb hdd which i thought was still in good condition because HDD sentinel and stablebit scanner didnt report anything strange.

Last night i think my server got rebooted couple of times and some drivers got corrupted.

Thankfully i made daily backup, so i restored it.

But then just few hours ago it rebooted again, and that 1tb HDD became "RAW" and offline.

I also remember when moving bad drive from the pool last year i got BSOD once or twice.

So Is this a known issue? I'm using the latest build.

Share this post


Link to post
Share on other sites

15 answers to this question

Recommended Posts

  • 0

I'd say it's a "known" issue in the sense that drives can go bad in various ways, and some of those ways aren't necessarily handled well by the various hardware and software layers involved (e.g. drive controller, controller driver, kernel file system driver, etc). Any one of those throws a tantrum in response, there's only so much DrivePool can do about it.

Share this post


Link to post
Share on other sites
  • 0
Posted (edited)
10 minutes ago, Shane said:

I'd say it's a "known" issue in the sense that drives can go bad in various ways, and some of those ways aren't necessarily handled well by the various hardware and software layers involved (e.g. drive controller, controller driver, kernel file system driver, etc). Any one of those throws a tantrum in response, there's only so much DrivePool can do about it.

So you're saying its more like OS issue?

Hmm i'm using server 2019 and LSI 9300 8e.. 

Any possible workaround that is recommended?

I mean eventually all my drives are gonna go bad someday, just wondering if there's a way to prevent it from screwing up my OS installation 

Edited by denywinarto

Share this post


Link to post
Share on other sites
  • 0

Buying decent hardware (not always easy/obvious I admit), updating firmware and software (OS/drivers/apps) to the latest "stable" or "production" version (e.g. avoiding anything marked as "unstable" or "beta" or "alpha" where possible) and keeping regular backups.

Share this post


Link to post
Share on other sites
  • 0

I am having the same problem removing a drive.  I am running it on Server 2016 OS. It blue screens about 45% through the removal process. I have a new drive that I want to replace it with. I wonder if I could copy the files from the existing drive to the new drive and swap them.  Will that work?

Share this post


Link to post
Share on other sites
  • 0
2 hours ago, muaddib said:

I am having the same problem removing a drive.  I am running it on Server 2016 OS. It blue screens about 45% through the removal process. I have a new drive that I want to replace it with. I wonder if I could copy the files from the existing drive to the new drive and swap them.  Will that work?

1. Manually move all files to the outside of the poolpart.xxx folder on the bad disk so that drivepool can't see them any more.
2. Remove the old disk from DrivePool (it will be instantly complete since the poolpart folder is empty).

3. Insert new disk (new different PoolPart.xxx is created).
4. Manually copy the files from the old disk to the new poolpart folder on the new disk.

I found out by doing this i didn't get BSOD compared to moving the disk using DP

On 10/9/2020 at 10:27 PM, Shane said:

Buying decent hardware (not always easy/obvious I admit), updating firmware and software (OS/drivers/apps) to the latest "stable" or "production" version (e.g. avoiding anything marked as "unstable" or "beta" or "alpha" where possible) and keeping regular backups.

Above is why i think the BSOD is related to drivepool, what i'm worried about if one my drives go bad then it would just throw random BSOD and screw up my OS

Share this post


Link to post
Share on other sites
  • 0
1 hour ago, denywinarto said:

1. Manually move all files to the outside of the poolpart.xxx folder on the bad disk so that drivepool can't see them any more.
2. Remove the old disk from DrivePool (it will be instantly complete since the poolpart folder is empty).

3. Insert new disk (new different PoolPart.xxx is created).
4. Manually copy the files from the old disk to the new poolpart folder on the new disk.

I found out by doing this i didn't get BSOD compared to moving the disk using DP

Above is why i think the BSOD is related to drivepool, what i'm worried about if one my drives go bad then it would just throw random BSOD and screw up my OS

Which service do I shutdown?

Share this post


Link to post
Share on other sites
  • 0

Actually, if you have duplication then this procedure is, IMHO, incomplete. When you move files out of the poolpart.* folder, DP may remeasure and suddenly find duplicates are missing and re-duplicate.

If you have duplication, you can just power off the machine, physically remove the drive, reboot and then remove the missing/faulty drive from the Pool through the GUI. Remeasure/re-duplicate and it is done.

If you do not have duplication, then I would attach the new drive, stop the drivepool service (so DP will not interfere),  copy the data from the faulty HDD poolpart.* folder to the new drive (to ensure you do not perform any unnecessary writes on the faulty drive), power off, physically remove the drive, reboot, start drivepool service if necessary, remove faulty drive from the Pool through the GUI, add the new drive to the Pool, stop the service, move contents on the new Drive to the poolpart.* folder on that drive, restart drivepool service, remeasure and it is done. I think.

Share this post


Link to post
Share on other sites
  • 0

Well got another case now with stablebit scanner. This time its worse, its 3x4 tb wd black in storage spaces. It became RAW after BSOD

I'm pretty sure its scanner cause as soon as it starts scanning after installation BSOD occurs. And the reliability windows page back this up.

(I installed scanner on 6.36 and input the serial about half an hour later.)

Its not drivepool cause the drives arent in the pool. This is on newly installed Server 2019, i just reinstalled it yesterday

I immediately uninstall scanner.

What should i do now?

I need scanner to warn me of bad drives so i can replace them before they go bad.

But i have 2 dozens of drives if a single bad drive + scanner = BSOD then that means 2 dozens reliability holes. 

 

Edit : I should have add, The reason i reinstalled my machine was because i'm having multiple BSOD's everyday.

Now i suspect my recent midnight BSOD's were  caused by scanner scanning at midnight.

1. A new installation BSOD'ed right after i install scanner and it's scaning

2. The BSOD always happens at midnight, i check task scheduler and nothing is run at midnight. Seeing scanner's description of "always scans at midnight" reminds me of this.

While i know the bad drives are the fault, i think scanner could do a better job preventing windows BSOD trigger

BSOD.PNG

BSOD2.PNG

Share this post


Link to post
Share on other sites
  • 0

Update : thankfully CHKDSK saves the day. it says the MBR was corrupted but it was able to repair it

So now i can probably use my storage spaces again

But should i use scanner or not now? Any words from Chris / Alex or the devs?

Can't risk another BSOD

Share this post


Link to post
Share on other sites
  • 0

Keep in mind that Stablebit is not some huge company, it's a small business. Current circumstances (COVID et al) might not be helping either.

Can you avoid using Storage Spaces? I've never heard good things about that "feature" outside the marketing hype, only horror stories about people discovering that it seems great until if (when) something goes wrong, then it tends to go BADLY wrong.

Could be a bad drive (WD Blacks are usually great drives but I've had one go bad so don't rule them out), could be a bad controller. Try using Disk Settings in Scanner to turn off surface scanning for all drives, then turn it back only for a single drive each night to discover if it's a specific drive (or drive cable, or drive caddy, or controller port) at fault, then after that turn on an additional drive each night (1 drive, 2 drives, 3, 4, etc) to discover if it's a controller/driver/system load-related issue. Isolating a hardware fault can be a pain in the backside, I know from experience.

Edit: regarding "While i know the bad drives are the fault, i think scanner could do a better job preventing windows BSOD trigger", the short story is that's not possible - a BSOD is the OS kernel equivalent of an Emergency Stop button on factory machinery. If something is wrong with the machinery and someone (Windows) decides to press the button, by design nothing is supposed to stop the machinery from being halted.

 

Share this post


Link to post
Share on other sites
  • 0
10 hours ago, Shane said:

Keep in mind that Stablebit is not some huge company, it's a small business. Current circumstances (COVID et al) might not be helping either.

Can you avoid using Storage Spaces? I've never heard good things about that "feature" outside the marketing hype, only horror stories about people discovering that it seems great until if (when) something goes wrong, then it tends to go BADLY wrong.

Could be a bad drive (WD Blacks are usually great drives but I've had one go bad so don't rule them out), could be a bad controller. Try using Disk Settings in Scanner to turn off surface scanning for all drives, then turn it back only for a single drive each night to discover if it's a specific drive (or drive cable, or drive caddy, or controller port) at fault, then after that turn on an additional drive each night (1 drive, 2 drives, 3, 4, etc) to discover if it's a controller/driver/system load-related issue. Isolating a hardware fault can be a pain in the backside, I know from experience.

Edit: regarding "While i know the bad drives are the fault, i think scanner could do a better job preventing windows BSOD trigger", the short story is that's not possible - a BSOD is the OS kernel equivalent of an Emergency Stop button on factory machinery. If something is wrong with the machinery and someone (Windows) decides to press the button, by design nothing is supposed to stop the machinery from being halted.

 

 

Did you read my post about the chronological order of BSOD?

 windows server  2019  + corrupted Storage spaces MBR= no BSOD, windows simply gives warning "this drive needs to be scanned blablabla"

 windows server  2019 + corrupted Storage spaces MBR +  stablebit scanner running = BSOD, right after inserting license code and scanner starts running. As you can see on the reliability pic on my last post.

The fix is for scanner is quite simple, the trigger is bad MBR sector from Storage spaces harddisk, so simply prevent it from scanning once it detects corrupted SS drive.

 

I know this is stablebit forum, but without customer critics a product won't get any better. 

the 3 drives are now fine in SS after chkdsk, without stablebit scanner of course.

Share this post


Link to post
Share on other sites
  • 0

Yes, I read the post.

If your drives are now "fine", they should be able to cope with being scanned. If they still can't cope, then you have bigger problems than a damaged MBR (and based on my own experiences, a damaged MBR is a symptom of something else having gone wrong).

I'd also be changing away from Storage Spaces anyway, because collapsing into RAW because its drives got poked by a diagnostic utility seems a terrible failure mode for a storage system.

"3 days and not a single response? Not even my ticket gets responded"

3 days isn't a long time for a small programming business with just a few employees (and two-ish volunteer forum mods). For comparison, Dell has hundreds (thousands?) of service reps and earlier this year Dell took four months to finally find and send me a backpack I'd paid for. However, since Alex and Christoper are (in my experience) usually quite prompt in responding to fault tickets, it may be that something has happened (e.g. COVID, though I hope not). If you haven't had any response to your ticket after a full week, let me know via a direct Message and I'll see what I can find out.

P.S. If this was happening to me, I'd be wanting to know if I had a bad drive and I'd be trying to find out exactly which drive(s) trigger the BSOD when scanned - that's why I suggested turning Scanner off for all but a single drive at a time.

Share this post


Link to post
Share on other sites
  • 0
3 hours ago, Shane said:

Yes, I read the post.

If your drives are now "fine", they should be able to cope with being scanned. If they still can't cope, then you have bigger problems than a damaged MBR (and based on my own experiences, a damaged MBR is a symptom of something else having gone wrong).

I'd also be changing away from Storage Spaces anyway, because collapsing into RAW because its drives got poked by a diagnostic utility seems a terrible failure mode for a storage system.

"3 days and not a single response? Not even my ticket gets responded"

3 days isn't a long time for a small programming business with just a few employees (and two-ish volunteer forum mods). For comparison, Dell has hundreds (thousands?) of service reps and earlier this year Dell took four months to finally find and send me a backpack I'd paid for. However, since Alex and Christoper are (in my experience) usually quite prompt in responding to fault tickets, it may be that something has happened (e.g. COVID, though I hope not). If you haven't had any response to your ticket after a full week, let me know via a direct Message and I'll see what I can find out.

P.S. If this was happening to me, I'd be wanting to know if I had a bad drive and I'd be trying to find out exactly which drive(s) trigger the BSOD when scanned - that's why I suggested turning Scanner off for all but a single drive at a time.

Okay, i'm not expecting a fix in 3 days or something like that,

Just a word acknowledging this issue and a confirmation of future version with fix would be nice.

In my opinion this should be regarded as critical issue because its affecting OS stability.

About storage spaces,

Well, I need 12tb with ssd-like read speed, with the current price of SSD i can't find any other alternative which offers similar performance to size ratio,

and i'd rather stick with what i have now. Maybe once SSD price goes down i'll reconsider it.

I plan to make OS backup and run scanner and turn the scanner off for the storage spaces drives. If anything goes wrong i can just revert to the backup, but hopefully i wont need to do that.

 

 

 

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Answer this question...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...

×
×
  • Create New...