Jump to content
Covecube Inc.
  • Announcements

    • Christopher (Drashna)

      Login issues   11/07/17

      If you have issues with logging in, make sure you use your display name and not the "username" or email.  Or head here for more info.   http://community.covecube.com/index.php?/topic/3252-login-issues/  
    • Christopher (Drashna)

      Getting Help   11/07/17

      If you're experiencing problems with the software, the best way to get ahold of us is to head to https://stablebit.com/Contact, especially if this is a licensing issue.    Issues submitted there are checked first, and handled more aggressively. So, especially if the problem is urgent, please head over there first. 
  • 0
KipKasper

why not use the other copy on read error

Question

I've noticed DrivePool gets stuck some times if the disk it's trying to read from has a problem.  I'm currently looking at Windows Events 7 & 153, the first read error was at 1:52pm and it is now 2:30pm and the Event log keeps writing new 153s and 7s.  The DrivePool GUI is hung, the Disk Management snap-in won't load, and other machines on the network can't read or write to the share of this pool.

 

My question is, why can't DrivePool give up and read from the other copy of the file?  I'm sure the Scanner would eventually tell me I have a bad drive and I could deal with it, but neither DrivePool or Scanner seem to know anything is wrong at the moment.

 

 

My instance of DrivePool 2.1.1.561 is on on win server 2012 R2

There is only one Pool and the whole thing to set to Duplicate data.

 

Windows Event 153: The IO operation at logical block address [some address] for Disk [some disk] (PDO name: \Device\[some device]]) was retried.

Windows Event 7: The device, \Device\Harddisk[some disk]\DR##, has a bad block.

 

 

Share this post


Link to post
Share on other sites

4 answers to this question

Recommended Posts

  • 0

Because it's a catch 22.

We don't know that the file/block is bad until we read from it.  And at that point, it's already too late, as the call to the file is already on the Windows I/O stack, and causing problems. 

 

 

As for StableBit Scanner, it doesn't read through the event logs. Doing so... isn't great.  But you can manually trigger a scan by clearing the scan status (double click on the disk in the UI, click on the 4th button down, the one with the green circle, and select mark disk as unchecked, this will cause it to scan the drive).

 

And it may be worth doing a manual disk check on the drive in question, or just pulling the disk and RMAing it immediately. 

Share this post


Link to post
Share on other sites
  • 0

Thanks Drashna,

 

So after some further reading, I think I'm suffering from the suboptimal SATA error recovery problem, as talked about here -> https://superuser.com/questions/954262/why-do-damaged-hard-drives-freeze-the-entire-system

 

Without moving off SATA to SAS, I'm I correct that I could limit this issue if I used only NAS grade SATA drives that have ERC? 

Share this post


Link to post
Share on other sites
  • 0

Honestly, there are a list of reasons that we REALLY, REALLY should be using SAS drives over SATA. Especially us data hoarders.....  (like for instance, SATA drives will lie about data being flushed to disk ........ just let *that* sink in). 

 

Using NAS drives won't really help, honestly.   NAS drives tend to be a bit higher quality, but the high performance desktop drives (such as the WD black drives) are going to be just as good/bad, in this case.   Enterprise SATA may be best, but ... at that price point, just go SAS. 

 

Also, the controller and drivers that you're using make a huge difference too. For instance, I personally HATE Silicon Images. Not just the cards, but the company.   Because of the absolutely shit quality of these controllers.  

 

As for ERC/TLER, these are good primarily in RAID configurations, where the drive can use the alternate drive while "waiting" on the "bad" drive to respond.  Or ... kick the drive from the array entirely.  For a stand alone drive, it doesn't really help, that much.

 

 

 

But the other problem is that Windows is very sensitive to I/O issues.  In fact, if you've used CloudDrive and seen the drive disconnect stuff.... that was implemented for THIS EXACT REASON.   We had too many cases where the system would lock up, because it was waiting on CloudDrive, which was waiting on the provider....  So, too many errors in a short window, and the drive gets disconnected now.   

 

 

 

Needless to say, this is a rather complicated subject... and not a cheap topic. 

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now


×