Jump to content
Covecube Inc.
  • 0
kihimcarr

DrivePool UI freezes Dashboard when swapping drives between bays

Question

The DrivePool UI froze the WS2012e dashboard when I swapped 2 drives to different bays in my case. I know DP is still running in the background because I received email notifications when the drives went missing and when they were reconnected and successfully polled by DrivePool. I can also read and write to certain folders in the pool. After trying to start the dashboard several times and using task manager to end the task, the dashboard and the DP UI finally came up but not before I had to answer this question:

 

post-500-0-37190700-1378580493_thumb.jpg

 

Once in the DP UI the drives still displayed as missing and Scanner displayed them as N/A. In Scanner I tried to view the disk properties of one of the drives and the Dashboard UI crashed. I relaunched it and got the following dialog:

 

post-500-0-62617700-1378580100_thumb.jpg

 

I then proceeded to restart the DP service and now the drives display correctly, but now then I could not connect to the Scanner UI:

 

post-500-0-31194500-1378580656_thumb.jpg

 

The Scanner UI came back up after restarting the Scanner service.

 

I'd like to get your thoughts on this.

Share this post


Link to post
Share on other sites

21 answers to this question

Recommended Posts

  • 0

I reset the settings and the exact same thing happened when moved drives between bays. This time I inserted one drive at a time to make sure DP would pick up the reinserted drive. I was able to get DP to pick up the one drive by clicking "Identify" (pinging all drives for 1 minute). The other drives were not picked up at all. I tried restarting the DP service and got this warning:

 

drivepool_services_issue.jpg

 

After rebooting my server and opening the Dashboard I received this warning again:

 

drivepool_suspect_add-ins.jpg

 

Clicking continue launched the Dashboard successfully.

Share this post


Link to post
Share on other sites
  • 0

Kihimcarr,

 

Okay, then something weird is definitely happening here.

I suspect that it may be the "Virtual disk server" not updating properly, but I'm not sure.

 

Would you mind grabbing the Error Reports and opening a ticket?

http://wiki.covecube.com/StableBit_DrivePool_2.x_Error_Reports

Share this post


Link to post
Share on other sites
  • 0

Kihimcarr,

 

Okay, then something weird is definitely happening here.

I suspect that it may be the "Virtual disk server" not updating properly, but I'm not sure.

 

Would you mind grabbing the Error Reports and opening a ticket?

http://wiki.covecube.com/StableBit_DrivePool_2.x_Error_Reports

 

There are no error reports to zip up. That particular folder is empty. This is weird.

Share this post


Link to post
Share on other sites
  • 0

I can't say that I'm completely surprised by that.

 

The next time that happens, would you mind getting a memory dump of the system?

http://wiki.covecube.com/StableBit_DrivePool_System_Freeze

 

This will cause BSOD (it's intentional), but just make sure that you run the "bugcheck off" entry after doing this.

Share this post


Link to post
Share on other sites
  • 0

Any updates on this? I've submitted several dumps for your review. I figured out a workaround until this is fixed though.

 

My Server:

VMWare ESXi 5.1 U1, Dual Socket Opteron 6 Cores each, 32GB RAM, RAID 0 using 3 240GB SSD for guests, 64GB SSD for Host

Windows Server 2012 Essentials VM with 12 vCPU/8GB RAM

Drivepoool 2.0.0.400/Scanner 2.5.0.2941 BETA

2 LSI SAS cards using PCI passthrough (SAS9201-16i and SAS9211-8i) with 20 drives attached

 

If you move a drive from one bay to another the Drivepool and Scanner UI will freeze and the Dashboard will become unresponsive

  1. After you have moved the HDD(s), open Disk Management and verify the drives are back online
  2. Open the Task Manager and Kill the "DrivePoolService" and the "ScannerService (32bit)"
  3. Open the Services snap-in and start the "StableBit DrivePool Service" and "StableBit Scanner Service"
  4. All UI will be operational and drives will be available in both DrivePool and Scanner

I hope this helps if anyone else is experiencing this issue.

Share this post


Link to post
Share on other sites
  • 0

Kihim,

 

I've looked at the DrivePool dumps and it's pretty clear that the DrivePool service is waiting for a WMI query to return. The query is intended to synchronize disk metadata between the StableBit Scanner and StableBit DrivePool.

 

It looks like this:

 

Management scope: root\StableBit\Scanner

SELECT * FROM Disks WHERE DeviceIndex = "..."

 

I would need to see a memory dump of the Scanner.Service.exe when it is in this locked up state, in order to understand why it has locked up. Perhaps it's for the same reason, a stuck WMI query. The Scanner does query WMI whenever it detects a new disk, which would make sense in your case, since you're swapping disks.

Share this post


Link to post
Share on other sites
  • 0

Alex, Thanks for the response. Can it be setup to do a Direct I/O request instead of using WMI? I notice that there's a setting in the Scanner.Service.config file. In the meantime I will do a drive swap to freeze the UI and send you a dump of the Scanner service.

Share this post


Link to post
Share on other sites
  • 0

I have confirmed that Scanner is the issue. I had to move some drives around this weekend to troubleshoot a controller issue and as long as I stopped the Scanner Service I experienced no Dashboard lock ups. DrivePool was able to re-identify the drive(s) fine.

Share this post


Link to post
Share on other sites
  • 0

Alex, Thanks for the response. Can it be setup to do a Direct I/O request instead of using WMI? I notice that there's a setting in the Scanner.Service.config file. In the meantime I will do a drive swap to freeze the UI and send you a dump of the Scanner service.

 

Yes, for some data.

 

You can set Smart_NoWmi to True (service .config setting) to never use WMI for SMART data. But the Scanner always uses WMI to enumerate the list of available disks.

 

I see your Scanner service dump and I'll take a look at it now to see why it's locked up.

Share this post


Link to post
Share on other sites
  • 0

Oh no... It seems that you've taken a 64-bit dump of a 32-bit process (the Scanner is a 32-bit process). Unfortunately I can't open such a dump, it's simply not possible.

 

You will need to launch a 32-bit task manager in order to take a dump of the Scanner service:

C:\Windows\SysWOW64\taskmgr.exe

 

I'm sorry about that (sigh). I wish we could make this process simpler.

Share this post


Link to post
Share on other sites
  • 0

I've looked at the dump, and it's clear that the Virtual Disk Service has stopped responding. VDS is a core component of Windows that enumerates information about all of the disks in the system.

 

In particular the following call is stuck: http://msdn.microsoft.com/en-us/library/windows/desktop/aa383021(v=vs.85).aspx

 

As a result, everything else eventually locks up as well. I'm not sure what I can do about this in my code. I can probably at some complexity put the whole update code in a new thread and then kill the entire thread if the VDS calls don't come back within some reasonable amount of time, but this is really counterproductive because nothing will be able to enumerate new disks in the system, which would make the Scanner quite useless.

 

Here's a little test utility that I wrote some time ago to enumerate disks using VDS. It will simply enumerate all the disks and print out the enumeration data to the screen. It will also listed for disk update events and print them to the console as they come in. You may want to try running it when the service locks up to confirm that it is indeed VDS. The utility should lock up as well.

 

Download: http://dl.covecube.com/VdsTest/VdsTest.exe

Share this post


Link to post
Share on other sites
  • 0

Alex, I seem to have a new issue along with the Dashboard locking up. Every so often when I open the Dashboard and click the StableBit Scanner tab, the UI displays "Initializing" and never displays the drives. The only way to fix this is to kill the service in task manager and restart the service in the services snap-in. VDS must be still working because I used the tool above and the screen enumerated the disks to the command window. I am about to upload two dump files for the Scanner Service. I will use this thread as the description.

Share this post


Link to post
Share on other sites
  • 0

Alex, I seem to have a new issue along with the Dashboard locking up. Every so often when I open the Dashboard and click the StableBit Scanner tab, the UI displays "Initializing" and never displays the drives. The only way to fix this is to kill the service in task manager and restart the service in the services snap-in. VDS must be still working because I used the tool above and the screen enumerated the disks to the command window. I am about to upload two dump files for the Scanner Service. I will use this thread as the description.

 

Any word on the the supplied dumps? I had the Initializing error happen again.

Share this post


Link to post
Share on other sites
  • 0

Kihim,

 

Thanks for uploading the dumps. I've received dumps from 2 people with the same symptoms of stuck on "Initializing", but the issues do not appear to be related to each other. I've made some changes to the code based on both dumps to try and prevent both issues.

 

You can get the latest internal BETAs from here (currently at 2957):

http://dl.covecube.com/ScannerWhs2/beta/download/

http://dl.covecube.com/ScannerWindows/beta/download/

 

I haven't published these yet to stablebit.com because I'm collecting feedback as to whether the fix is effective.

 

Thanks,

 

Edit: Actually the .wssx of 2957 is still building, it will be up in a few minutes.

Share this post


Link to post
Share on other sites
  • 0

Alex, thanks for doing this. I'll keep an eye out for the "initializing" anomaly. The Dashboard did freeze up on me again, however. Something strange happened though:

  1. I ran the VdsTest.exe and all of my drives were found
  2. I captured a dump of the Scanner Service
  3. I captured a dump of the Dashboard task and low and behold the Dashboard unfroze! Imagine that?

 

Anyway, I am about to upload the dumps to see if maybe anything jumps out at you. Thanks again for your continued support!

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Answer this question...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...

×
×
  • Create New...