Jump to content
Covecube Inc.
  • 0
Lurifax

Newly created DP seemingly causes BSOD when exposed to intense read/write

Question

Hi all, 

 

I just started evaluating Stablebit DP after having read in many places that it was superior to Storage Spaces.

 

I'm equally new on this forum and nor very familiar with how to properly make these posts, what kind of info that may be relevant to to add initially or so. So I'll give it a try and hopefully I can supply any other relevant info on request.

 

Situation and problem:

 

I created a DP on my recently built Win8.1 machine.

The pool consists of two 4TB WD Red. Brand new.

OS runs on an Intel 120GB SSD. Brand new.

I have started loading data on to the pool from external drives. This has worked fine.

I can open documents and view images. That works fine.

 

But when I try to open movies it halts for a sec right after the software launches and then I get the BSOD.

Same thing happens when I try to create backups of drives/computers. After only a couple of seconds of writing it's the BSOD.

Thus, is there seemingly a correlation between intense read/write activities and what causes the BSOD?.

The message in the BSOD is along the lines of "kernel mode exception not handled".

 

I've obviously goggled it a little but really only found that it seems to be a very generic type of message.

I  have also installed Stable bit Scanner after this occurred the first time. No indications of any HW related issues.

 

So now I'm turning to you for help. I couldn't find any other topic on this, please forgive me if I missed something already covering this. In either case I'd be very happy to be pointed in the right direction for how to solve it.

 

 

Thanks a bunch in advance!

 

Share this post


Link to post
Share on other sites

21 answers to this question

Recommended Posts

  • 0

First, BSODs are never a good thing. Ever.

So, could you upload the dump from the system to us, so we can take a look at it?

http://wiki.covecube.com/StableBit_DrivePool_System_Crashes

 

Also, during the BSOD, it will sometimes indicate the file that caused the BSOD. If it's CoveFS.sys, then that's definitely us. And we will definitely want to get that fixed.

 

 

Also, what version of DrivePool are you using?

Share this post


Link to post
Share on other sites
  • 0

Hi and thanks for swift ping back.

 

I have now uploaded the MEMORY.DMP to the indicated destination.

To make it a bit easier for you to connect the file with this thread I put this in the description: Regarding forum Topic: "Newly created DP seemingly causes BSOD when exposed to intense read/write" /Lurifax

 

To your questions.

- No, there's noting written about, or pointing to, any particular file in the BSOD. Aside from Msft's comforting phrases there isn't any other text helping in understanding the problem. And the error is merely described with the words "kernel mode exception not handled".

- Version of DrivePool is 2.0.0.420 x86

 

Another thing probably worth mentioning. When I unhide the drive pool root folders, on the actual hard drives, and launch a movie from the folder structure there - then it plays just fine.

 

 

I will happily provide any other info relevant for you to investigate this.

 

Thank you!

Share this post


Link to post
Share on other sites
  • 0

Lurifax,

 

Thank you for uploading that.

As for the issue.... I really can't tell as I don't have access to any of the diagnostic tools.... so Alex will have to look into this. (I usually do a brief check to make sure it's not something else, as to help prevent Alex from getting overloaded, but I'm not at home, so....)

 

For reference:

https://stablebit.com/Admin/IssueAnalysis/2157

Share this post


Link to post
Share on other sites
  • 0

Lurifax,

 

Thanks for sending in that memory dump. I've analyzed it and unfortunately it indeed was a bug in build 420. It only affects x86 hosts, which is why you are seeing it.

 

To resolve the issue you can download the latest public BETA right here:

http://stablebit.com/DrivePool/Download

 

Or you can get the very latest internal BETA here:

http://wiki.covecube.com/Downloads (these contain the very latest fixes as people report them)

 

I'm going to try to get 2.1 out as a release final sooner rather than later, so the fix will be in that as well. However, we have a few other important issues to address before that happens.

 

IAR: https://stablebit.com/Admin/IssueAnalysis/2157

Share this post


Link to post
Share on other sites
  • 0

Hello again,

 

thanks for the instructions and pointers. I have now tried both alternatives, installing first the latest public BETA (StableBit.DrivePool_2.1.0.432_x86_BETA.exe)

and then the internal one (StableBit.DrivePool_2.1.0.486_x86_BETA.exe).

 

Unfortunately this hasn't led to any change in behavior. I still get the BSOD in the same cases as described earlier.

 

 

To proceed, would you be helped from a new memory dump with this latest installed version?

Or do you perhaps have any other suggestions or workarounds based on previous findings?

 

Please let me know how I can be of any assistance.

 

 

Best regards

/L

Share this post


Link to post
Share on other sites
  • 0

You're still experiencing this in 2.1.0.486?

If so, then yes, please upload the additional dumps. 

 

The issue that caused the first dump was fixed in theory, but either it wasn't, or there was another issue here. So more dumps, means we can see what's going on better.

 

Also, If you hadn't already, could you run a memory test, just in case it's not specifically an issue with DrivePool?

Share this post


Link to post
Share on other sites
  • 0

I don't mean to bump a slightly older thread, but I am having this exact same issue with my drive pool as well! It appears that whenever the disks are under heavy read/writes it will bsod and reboot my server. I've check to see whats causing this from the dumps and it says its either covefs.sys or ntoskrnl. I thought this was memory related at first, but i've gone through many different sets now with the same results. 

 

I'm running.... 

 

Windows Server 2012 R2 Essentials

Dual Xeon E5530's

8GB DDR3 ram

Mostly onboard sata for the pooled drives

SSD for OS connected to a startech sata card

5 pooled drives total.

 

And I know my PSU is up to the task. 

 

I'm on the latest StableBit DP which is the 2.1 final release. 

 

If you need anymore info from me let me know!

Share this post


Link to post
Share on other sites
  • 0

@bb12489,

 

Well, CoveFS.sys would definitely be us.....

 

 

Could you upload the memory dumps.

http://wiki.covecube.com/StableBit_DrivePool_System_Crashes

 

Make sure you post a link to the forum URL.

 

Also, please run a memory test on the system.

http://wiki.covecube.com/StableBit_DrivePool_Q537229

 

 

Also, do you have any antivirus installed (such as Avast)? Or disk imaging tools?

Share this post


Link to post
Share on other sites
  • 0

@bb12489,

 

Well, CoveFS.sys would definitely be us.....

 

 

Could you upload the memory dumps.

http://wiki.covecube.com/StableBit_DrivePool_System_Crashes

 

Make sure you post a link to the forum URL.

 

Also, please run a memory test on the system.

http://wiki.covecube.com/StableBit_DrivePool_Q537229

 

 

Also, do you have any antivirus installed (such as Avast)? Or disk imaging tools?

I don't run any anti virus on my server, and I've run memtest on all the sticks. They all appear to be fine. I'll definetly upload the dumps when I get a chance. I'm currently up in the Adirondacks camping g lol.

 

Most of the time the bsod is cause by ntoskrnl, but accompanied by covefs.sys. or so says WhoCrashed.

Share this post


Link to post
Share on other sites
  • 0

WhoCrashed is a good utility for diagnosing what is the cause, but not for figuring out specifically what is going wrong.

 

So, I do believe that CoveFS (the DrivePool driver) is definitely involved here, but to what extent is hard to tell. 

 

 

And no worries. Enjoy you trip. When you get back, or get a chance, please upload the dumps (and the minidumps too, if you don't mind) and we'll try to see what exactly is going on.

Share this post


Link to post
Share on other sites
  • 0

WhoCrashed is a good utility for diagnosing what is the cause, but not for figuring out specifically what is going wrong.

 

So, I do believe that CoveFS (the DrivePool driver) is definitely involved here, but to what extent is hard to tell.

 

 

And no worries. Enjoy you trip. When you get back, or get a chance, please upload the dumps (and the minidumps too, if you don't mind) and we'll try to see what exactly is going on.

Will do. I should be able to get them uploaded tonight. I've been checking on my server and its been up for over 2 days now with no bsod. But it's been pretty well idle in terms of disk activity.

Share this post


Link to post
Share on other sites
  • 0

Awesome. Thanks!

 

Hopefully, we can get this sorted out and fixed quickly for you. Once you get those dumps uploaded, that is.

Ok, memory dump is uploading! It's about 500Mb worth. I made a note in the description of my username for the forum. I also have the mini dumps from my server if you need those too. Let me know what else I can do to help. 

Share this post


Link to post
Share on other sites
  • 0

Got it. 

 

A quick look at the dump indicates memory corruption, actually:

 

BugCheck 3B, {c0000005, fffff8005b2ecac6, ffffd00021d46170, 0}
 
Probably caused by : memory_corruption
 
Followup: memory_corruption
 
If this is the case, then two things:
  1. upload the minidumps as well. Just in case. (select them all, right click, send to -> compressed folder).
  2. run a memory test (memory diagnostics) on the system to ensure that the memory modules you're using are 100%.
    http://wiki.covecube.com/StableBit_DrivePool_Q537229

     

I've also flagged the dump for Alex, so that he'll take an in depth look at it.

Share this post


Link to post
Share on other sites
  • 0

 

Got it. 

 

A quick look at the dump indicates memory corruption, actually:

 

BugCheck 3B, {c0000005, fffff8005b2ecac6, ffffd00021d46170, 0}
 
Probably caused by : memory_corruption
 
Followup: memory_corruption
 
If this is the case, then two things:
  1. upload the minidumps as well. Just in case. (select them all, right click, send to -> compressed folder).
  2. run a memory test (memory diagnostics) on the system to ensure that the memory modules you're using are 100%.

    http://wiki.covecube.com/StableBit_DrivePool_Q537229

     

I've also flagged the dump for Alex, so that he'll take an in depth look at it.

 

Ok I've uploaded the mini dumps as well. One of them is dated the 3rd of July when I know the covefs.sys BSOD happend. I hope that sheds more light on this. I'm starting my memory check now and will report back when it's finished. Thanks again for looking into this. 

Share this post


Link to post
Share on other sites
  • 0

Alex has looked at the dumps. It definitely looks like a memory related issue here:

https://stablebit.com/Admin/IssueAnalysis/6255

Just to report back. I believe i have solved my BSOD's so far. Ever since I've disabled the pool performance settings such as Network IO Boost, Read Striping, & Real Time Duplication, I have not had a single BSOD related to covefs.sys or ntoskrn.sys. I don't currently have duplication enabled since I need more drives for it, so I thought having the Read Striping enabled was kinda silly. Also I was seeing some errors in the event log related to the network controller, so I thought maybe I should also disable the Network IO boost as well. Obviously I've changed too many variables to know for sure if this helped at all, but so far so good! The only BSOD I've had is ntfs.sys, so I'm running a check on some of my older drives at the moment. 

Share this post


Link to post
Share on other sites
  • 0

Well, I'd recommend re-enabling realtime duplication. Especially if you're writing a lot to the pool. Or keeping files open. This will prevent them from getting out of sync.

 

However, Network IO Boost does use more resources, and based on some of the stuff you've said, I suspect that this may be part of the issue here. 

 

Read Striping is definitely optional, but it's good for boosting performance. But again, it uses a bit more resources.

 

 

And if you haven't already, I would really recommend running an extended memory test of the system. (the extended option, and like 5 passes).

http://wiki.covecube.com/StableBit_DrivePool_Q537229

 

Also, I'd check to make sure the drivers are up to date, or see if anyone else is having issues with them. Especially network and disk drives.

 

And you mention the ntfs.sys and older disks? If you have StableBit Scanner installed, you can run a burst test on the disks, which is a good way to troubleshoot issues with them.

Share this post


Link to post
Share on other sites
  • 0

Well, I'd recommend re-enabling realtime duplication. Especially if you're writing a lot to the pool. Or keeping files open. This will prevent them from getting out of sync.

 

However, Network IO Boost does use more resources, and based on some of the stuff you've said, I suspect that this may be part of the issue here. 

 

Read Striping is definitely optional, but it's good for boosting performance. But again, it uses a bit more resources.

 

 

And if you haven't already, I would really recommend running an extended memory test of the system. (the extended option, and like 5 passes).

http://wiki.covecube.com/StableBit_DrivePool_Q537229

 

Also, I'd check to make sure the drivers are up to date, or see if anyone else is having issues with them. Especially network and disk drives.

 

And you mention the ntfs.sys and older disks? If you have StableBit Scanner installed, you can run a burst test on the disks, which is a good way to troubleshoot issues with them.

 

So I've run a memtest for 3+hours now. No errors detected. I'm beginning to think it may be a drive issue with this particular board. Server 2012 R2 has all the drivers for this board built in, so there are no other newer versions of drivers for it. I did receive an ntoskrn.sys bsod shortly after I replied that I've had non..... Go figure lol. Also all the hard disks have tested healthy. I think my only other option at this point is to re-install the OS. Maybe go with Server 2012 R2 standard with the essentials role instead of straight up 2012 R2 essentials itself. 

 

Aside from that, the only hardware issue I can think of is one of two CPU's. I'll re-seat them and re-paste them just to see. Stranger things have happen right? lol.  

 

I do have the memory dumps from the last two BSOD's, but I'd hate to take up your guy's time to look into them if it's not a problem with your software/drivers. 

Share this post


Link to post
Share on other sites
  • 0

I don't know if you looked at Alex's report yet or not. But basically, "not good" is about the best way to describe it:

 

 

MSDN suggests that this could be related to memory exhaustion or bad video drivers. I've checked the memory usage at the time of the crash and I don't see anything excessive, certainly nothing that should cause a BSOD. See below for full memory stats.

 

It sounds to me like there's something wrong with the RAM or hardware that interacts with the RAM.

 

 

That's why I recommended the extensive memory test. If that doesn't help, then I'd recommend updating the drivers to the latest version, if possible. Even if you have the generic drivers installed.

 

I don't think a reinstall will help though. Nor a reseating. But you never know. It maybe be a hardware issue (such as with the memory controller), and may require replacing the hardware.

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Answer this question...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...

×
×
  • Create New...