Jump to content
  • 0

Server 2012R2 reboots under heavy drivepool traffic


charlieny100

Question

I have noticed that if I try and transfer a large volume of video files to or from the drive pool my server will blue screen and reboot. Are there logs that might tell me what is causing this? I run my virtual machines outside the pool and occasionally move them around without the reboot problem. 

Link to comment
Share on other sites

23 answers to this question

Recommended Posts

  • 0

A quick look at the crash dump seems to indicate that the issue is with the sil3132.sys driver. That's the Silicon Images (Sil) 3132 chipset driver. 

 

I'm guessing that this happens when the heavy load is specifically occurring on/from the drives connected to this controller? 

You can verify that using the Performance counters, or even by using Task Manager's Performance tab.

 

Unfortunately, this is a common issue with that chipset (BSODs, or "losing" connection to the drives while under load, etc), and you may want to just replace the card.

 

Regards

Link to comment
Share on other sites

  • 0

I have a USB3 -> esata adapter that many people have given high recommendations to with external drive enclosures like the one I have. I'll give it a try. Do you know if I need to uninstall that driver or can I just remove the esata card that uses that driver?

 

Thanks for looking at my issue.

Link to comment
Share on other sites

  • 0

Well, that works. :)

If you do want recommendations on a good card, anything that uses the ASMedia ASM1061 chipset should be rock solid.

 

As for the Sil3132 card, as long as it's not being used, you probably could leave it in the system.

However, physically removing the card would "disable" the driver as the hardware is no longer present.  And that would be the best option.

Link to comment
Share on other sites

  • 0

Thanks for the suggestion. I connected my drive enclosure via my USB3 -> esata adapter and the server booted in seconds (it runs off of an SSD), instead of minutes. I guess something with the driver in the esata card was affecting more than the BSOD. I ordered a card with the above chip set and and will test the difference between the adapter and the card. With the USB3 -> esata adapter I can transfer files on/off the pool at about 130mb/s, which is ok for my needs. I'd imagine not having to go through the additional conversion will speed things up. I'll update this post so others can learn from my experience.

Link to comment
Share on other sites

  • 0

Yeah, a lot of controller cards have their own management bios that slows do the boot process. Some don't though, which is nice.

If you were using the RAID firmware instead of the BASE, it would definitely do that.

 

And I'm glad that you're seeing much better speeds. Though, I'm not sure if you'll see higher speeds off of the controller card though. However, I do believe that some people have had issues with those adapters (long term stability). 

 

As for speed, I'm not sure, as that's pretty close to the top speeds you'll see out of a HDD.

 

Regards

Link to comment
Share on other sites

  • 0

I am having the same problem.

 

I analyzed the crashdump file, and it tells me that covefs.sys is causing the crash.

 

This happens quite frequently, and it is not very consistant.

 

I just uploaded my MEMORY.zip file from here http://wiki.covecube.com/StableBit_DrivePool_System_Crashes

What version of Windows are you using and what version of DrivePool?

 

Also, do you have any antivirus installed on the system?

Link to comment
Share on other sites

  • 0

Okay, thank you. 

 

I've flagged the memory dump for Alex, and he'll take a look at it.

 

In the meanwhile, could you please run a memory test just to be sure?

 

Do you have Network IO Boost or Read Striping enabled?

 

Also, are any of the disks in the system having issues?

And was there something specific that may have been happening before the BSOD?

 

 

IAR:

https://stablebit.com/Admin/IssueAnalysis/13504

Link to comment
Share on other sites

  • 0

No antivirus, DrivePool version 2.1.1.561. OS is windows Server 2012 R2.

If you want, you can use the above link to take a look at Alex's analysis of the issue.

It's rather technical though. But it boils down to "really weird issue".

 

 

Either way, could you download the latest beta build and see if that helps?

http://dl.covecube.com/DrivePoolWindows/beta/download/StableBit.DrivePool_2.2.0.598_x64_BETA.exe

This build should fix the BSOD problem that you're seeing. 

If it doesn't, it may indicate other hardware issues (running a memory test, and a CPU stress test may be a good idea just in case). And please upload the new dumps, in this case.

Link to comment
Share on other sites

  • 0

If you want, you can use the above link to take a look at Alex's analysis of the issue.

It's rather technical though. But it boils down to "really weird issue".

 

 

Either way, could you download the latest beta build and see if that helps?

http://dl.covecube.com/DrivePoolWindows/beta/download/StableBit.DrivePool_2.2.0.598_x64_BETA.exe

This build should fix the BSOD problem that you're seeing. 

If it doesn't, it may indicate other hardware issues (running a memory test, and a CPU stress test may be a good idea just in case). And please upload the new dumps, in this case.

 

Can you provide release notes for this beta version?

 

I do have network boost enabled, and read stripping enabled. But the BSOD only happens during heavy writing (correction reading and writing), and not reading. It generally happens when all of my servers are doing backups, and doing file systems to the system.

Link to comment
Share on other sites

  • 0

The change log is always available in the folder along with the installers. I

I just usually link the actual file for simplicity.

 

Here is the changelog:

http://dl.covecube.com/DrivePoolWindows/beta/download/changes.txt

And no, the logs don't include the changes between builds, but only between major (release, RC, public beta builds)

 

But you can get a full list of all the files and the changelog here:

http://dl.covecube.com/DrivePoolWindows/beta/download/

 

 

And after installing the updated beta build (598), are you're still seeing the issues?

If so, please upload the new crash dumps.

Link to comment
Share on other sites

  • 0

If you want, you can use the above link to take a look at Alex's analysis of the issue.

It's rather technical though. But it boils down to "really weird issue".

 

 

Either way, could you download the latest beta build and see if that helps?

http://dl.covecube.com/DrivePoolWindows/beta/download/StableBit.DrivePool_2.2.0.598_x64_BETA.exe

This build should fix the BSOD problem that you're seeing. 

If it doesn't, it may indicate other hardware issues (running a memory test, and a CPU stress test may be a good idea just in case). And please upload the new dumps, in this case.

 

Tried the suggested beta, and CloveFS is still causing system to BlueScreen and restart.

 

Here is my Latest memory dump: https://dl.dropboxusercontent.com/u/48061/StableBit/DrivePool/2015-03-12_CloveFS_MEMORY.zip

 

 

Please let me know what Alex finds out, as this is very frustrating.

 

G

Link to comment
Share on other sites

  • 0

I absolutely understand the frustration, and I'm sorry that it's still occurring.

 

However, have you ran a memory test recently? 

Sometimes, frequent BSODs are caused by memory issues.

 

Also, could you disable the read striping option (Pool Options -> performance).

 

 

Issue:

https://stablebit.com/Admin/IssueAnalysis/14522

Link to comment
Share on other sites

  • 0

I absolutely understand the frustration, and I'm sorry that it's still occurring.

 

However, have you ran a memory test recently? 

Sometimes, frequent BSODs are caused by memory issues.

 

Also, could you disable the read striping option (Pool Options -> performance).

 

 

Issue:

https://stablebit.com/Admin/IssueAnalysis/14522

 

 

No RAM issues with all of the battery of tests i have run against it.

The servers ram is Registered ECC ram, and was stress test when the server was built.

I have since run many other tests today, and still no errors have been found in the RAM.

 

G

Link to comment
Share on other sites

  • 0

I'll add my two cents to this discussion. I thought that Drivepool was causing BSOD problems on my server. I found out I had two problems: a cheap Sil3132 card with bad drivers and a 3TB drive that was failing. The folks here helped me find my issue. I was a convert from using Windows drive spaces (or whatever its called). When I experienced the same issue with that I lost the entire volume and all the data on it. Drivepool is a much safer way to go. Since replacing the ESATA card and drive my setup has been rock solid, even through some power outages.

Link to comment
Share on other sites

  • 0

No RAM issues with all of the battery of tests i have run against it.

The servers ram is Registered ECC ram, and was stress test when the server was built.

I have since run many other tests today, and still no errors have been found in the RAM.

 

G

Okay, I just wanted to make sure here.

 

 

I'll add my two cents to this discussion. I thought that Drivepool was causing BSOD problems on my server. I found out I had two problems: a cheap Sil3132 card with bad drivers and a 3TB drive that was failing. The folks here helped me find my issue. I was a convert from using Windows drive spaces (or whatever its called). When I experienced the same issue with that I lost the entire volume and all the data on it. Drivepool is a much safer way to go. Since replacing the ESATA card and drive my setup has been rock solid, even through some power outages.

 

A bad controller card could potentially be a cause here.

Using the "burst test" on StableBit Scanner would be a good way to test.

If you can stop using the pool in the meanwhile, and run a burst test on one or more of the disks... if it causes the BSOD or system instability there... then it may be a controller you're using.

 

However, given the issue here, I suspect that this isn't the case (it could be).

Have you run "sfc /scannow" on the system, to ensure the system integrity?

Also, have you run a virus scan on the system, just in case?

Link to comment
Share on other sites

  • 0

I absolutely understand the frustration, and I'm sorry that it's still occurring.

 

However, have you ran a memory test recently? 

Sometimes, frequent BSODs are caused by memory issues.

 

Also, could you disable the read striping option (Pool Options -> performance).

 

 

Issue:

https://stablebit.com/Admin/IssueAnalysis/14522

 

I had another restart, and this time the memdump had something different in it.

Here is the Zipped Memdump.

https://dl.dropboxusercontent.com/u/48061/StableBit/DrivePool/2015-03-16%2001-31-00%20MEMORY.zip

Link to comment
Share on other sites

  • 0

Greg,

 

(I'm assuming that's your name, I apologize if that's wrong).

 

Alex has taken a look at the dumps, and this issue doesn't look to be DrivePool related, directly.

 

The last dump you uploaded... it was pretty mangled. Something messed with the memory badly, which is definitely what caused the BSOD.

 

From that dump, though, it looks like you have PrimoCache loaded. 

I mention this, because ... well, because the PrimoCache app seems to be what was activate at the time, and may be the cause of the issue.

 

I would recommend uninstalling or disabling it for now, and see if that helps fix the issue.

Link to comment
Share on other sites

  • 0

Greg,

 

(I'm assuming that's your name, I apologize if that's wrong).

 

Alex has taken a look at the dumps, and this issue doesn't look to be DrivePool related, directly.

 

The last dump you uploaded... it was pretty mangled. Something messed with the memory badly, which is definitely what caused the BSOD.

 

From that dump, though, it looks like you have PrimoCache loaded. 

I mention this, because ... well, because the PrimoCache app seems to be what was activate at the time, and may be the cause of the issue.

 

I would recommend uninstalling or disabling it for now, and see if that helps fix the issue.

 

 

PrimoCache is installed, but it wasn't running at the time of the dump.

 

I will try some of your other suggestions for the burst test on the Scanner.

 

I will report back with what I find.

 

G

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Answer this question...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...