Jump to content
  • 0

Pool suddenly unstable


johnnj

Question

Hi,

For the past couple of months since migrating from Drive Bender to Drive Pool, my storage pool has been EXTREMELY stable.  Until now.

What happened was on the morning of the 4th of July I discovered that my server had crashed, with nothing in the event logs except the complaint that the machine had shut down unexpectedly.

After bringing it back up, I noticed that all of the custom names I had put into Scanner for each drive had revered back to the drive hardware names.

I noticed that one drive had some unmovable sector/bad read SMART messages, so I forced its removal from the pool and removed it from the JBOD.  Things seemed ok after that, but then the JBOD that disk had been in (a non-RAID 8 bay mediasonic, connected at that time via USB3) kept dropping off.  I took the drives out of it and put them into an old Sans DIgital 5 bay RAID array (running in JBOD) that I attached to a Mediasonic ESATA card and then the other 3 drives went into a new 4 bay Mediasonic on the same card.

SInce doing that it's been ok until all of a sudden when it's not.  It'll run happily for hours and then pool will lock up.  If I try to access the mount point Explorer will hang and the Drive Pool UI won't open.  I can't restart the DP service.  I reboot and then it's fine.  

Really the only thing I'm seeing in the event log is:

The IO operation at logical block address 0x21 for Disk 12 (PDO name: \Device\0000004c) was retried.

Disk 12 is the number of the pool mount.  In scanner the only errors I see are SMART indicators of exceeding Load Cycle Count.  However, when the drive names changed in Scanner I also lost the surface scan/fs scan results for every drive.

I've added a new drive back to the pool to replace the one with the sector errors but duplication hasn't finished yet because the pool hanging and then having to reboot to get it back.

When the pool is hung the system seems mostly normal except if I open Drive Management it'll hang during the drive discovery.  I can still access any other non-pool drive.

In terms of server specs:

MB:  MSI Krayt Z370

OS: Server 2016 Essentials

JBOD1:  Mediasonic 8 bay via MB USB3

JBOD2:  Mediasonic 4 bay via MB USB3

JBOD3:  Sans Digital 5 bay TR5UT via ESATA to Mediasonic card

JBOD4:  Mediasonic 4 bay via ESATA to same card as above

Any ideas?  Please let me know what other info I can provide to help get to the bottom of this.

Thanks,

John

 

 

Link to comment
Share on other sites

11 answers to this question

Recommended Posts

  • 1
1 hour ago, johnnj said:

They did offer an RMA, but ups ground shipping was going to cost me almost $90 so it’s going to the town dump instead. I have other arrays I’m using now and would rather use the money for the sas setup I’m planning on moving to. 

Ouch, sorry to hear that! :(

Link to comment
Share on other sites

  • 0
34 minutes ago, johnnj said:

The IO operation at logical block address 0x21 for Disk 12 (PDO name: \Device\0000004c) was retried.

This is normal.  LBA blocks don't exist on the pool, so querying for them will error out.  This won't affect the stability of the pool. 

 

For the rest, it sounds like the system disk experienced some corruption, and may have damaged some core files. 

I would highly recommend running "CHKDSK /r" on all of the disks in the system, and then "sfc /scannow".   There should identify and correct any issues. 

If the stability issues persist, then enable file system logging in StableBit DrivePool:
http://wiki.covecube.com/StableBit_DrivePool_2.x_Log_Collection

And then run the StableBit Troubleshooter, and ues 3723 as the Contact ID

http://wiki.covecube.com/StableBit_Troubleshooter

Link to comment
Share on other sites

  • 0

Thanks, Christopher.  I took your advice and started chkdsks on all the drives that were originally in the array that was disconnecting and one of them locked up twice when trying to do the check.  I was able to successfully remove it from the pool and now it's pulled from the array.  I re-initialized the other 8tb drive that I thought was the culprit and added that back to the pool.  It's happily running duplication now.  

Hopefully that drive was the problem and now it'll be back to it's rock solid old ways (knock wood).

Regarding Scanner, what makes it keep losing the custom names on all the drives?

Thanks for your help...

John

Link to comment
Share on other sites

  • 0

Well, I'm sorry to hear about the bad drive, but physically removing that should hopefully fix the issues that you've been having.

 

As for the custom names, make sure that you're on 2.5.3.3191.  We've improved how the settings are stored, and that shouldn't happen anymore on that version and future versions.

 

Link to comment
Share on other sites

  • 0

Thanks, again.  It seems like the 8 bay Mediasonic also has something wrong with it.  When I put the same drives (minus that one bad one) back in it the instability returns (pool locks up, server sometimes reboots during pool activity).  I think maybe the array is what caused the problem with the disk in the first place.  I re-initialized it with diskpart and am running chkdsk on it on another machine and it's about halfway done with no errors or lockups so far.  

I really need to stop messing around with these cheesy jbods and do what you do with SAS and a multibay server case.  This box was only about 2 months old and it's out of service already.

Link to comment
Share on other sites

  • 0
1 hour ago, johnnj said:

I really need to stop messing around with these cheesy jbods and do what you do with SAS and a multibay server case.  This box was only about 2 months old and it's out of service already.

I think Mediasonic offers a 1 year warranty on all enclosures they sell.  That should hold for those sold through 3rd party sellers as well.  Unless of course, it was refurbished or sold used.

Link to comment
Share on other sites

  • 0
On 7/14/2018 at 11:24 PM, Jaga said:

I think Mediasonic offers a 1 year warranty on all enclosures they sell.  That should hold for those sold through 3rd party sellers as well.  Unless of course, it was refurbished or sold used.

They did offer an RMA, but ups ground shipping was going to cost me almost $90 so it’s going to the town dump instead. I have other arrays I’m using now and would rather use the money for the sas setup I’m planning on moving to. 

Link to comment
Share on other sites

  • 0

Well, I’m +10 days after my rebuild switching to all internal SAS connected drives and it been ROCK SOLID.

I used a Norco 4224, LSI 9211 HBA, and an Intel expander. It was totally plug and play and a significant improvement in performance over the external enclosures. There are 20 pool drives totaling 105 TB in usable space.  

Additionally, I enjoy watching the blinking lights. I wish I had made this change years ago and I have no good reason why I didn’t. Would have saved me a lot of money in replacing constantly flaking external enclosures. 

John

Link to comment
Share on other sites

  • 0
3 minutes ago, Christopher (Drashna) said:

And yeah, the blinking lights are fun to watch! 

 

On 8/8/2018 at 5:39 PM, johnnj said:

Additionally, I enjoy watching the blinking lights.

 

Damn it - don't make me go buy a half-height rack and enclosure now guys...  :P 

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Answer this question...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...