why not use the other copy on read error

KipKasper · September 1, 2017

I've noticed DrivePool gets stuck some times if the disk it's trying to read from has a problem. I'm currently looking at Windows Events 7 & 153, the first read error was at 1:52pm and it is now 2:30pm and the Event log keeps writing new 153s and 7s. The DrivePool GUI is hung, the Disk Management snap-in won't load, and other machines on the network can't read or write to the share of this pool.

My question is, why can't DrivePool give up and read from the other copy of the file? I'm sure the Scanner would eventually tell me I have a bad drive and I could deal with it, but neither DrivePool or Scanner seem to know anything is wrong at the moment.

My instance of DrivePool 2.1.1.561 is on on win server 2012 R2

There is only one Pool and the whole thing to set to Duplicate data.

Windows Event 153: The IO operation at logical block address [some address] for Disk [some disk] (PDO name: \Device\[some device]]) was retried.

Windows Event 7: The device, \Device\Harddisk[some disk]\DR##, has a bad block.

Christopher (Drashna) · September 1, 2017

Because it's a catch 22.

We don't know that the file/block is bad until we read from it. And at that point, it's already too late, as the call to the file is already on the Windows I/O stack, and causing problems.

As for StableBit Scanner, it doesn't read through the event logs. Doing so... isn't great. But you can manually trigger a scan by clearing the scan status (double click on the disk in the UI, click on the 4th button down, the one with the green circle, and select mark disk as unchecked, this will cause it to scan the drive).

And it may be worth doing a manual disk check on the drive in question, or just pulling the disk and RMAing it immediately.

KipKasper · September 5, 2017

Thanks Drashna,

So after some further reading, I think I'm suffering from the suboptimal SATA error recovery problem, as talked about here -> https://superuser.com/questions/954262/why-do-damaged-hard-drives-freeze-the-entire-system

Without moving off SATA to SAS, I'm I correct that I could limit this issue if I used only NAS grade SATA drives that have ERC?

Christopher (Drashna) · September 5, 2017

Honestly, there are a list of reasons that we REALLY, REALLY should be using SAS drives over SATA. Especially us data hoarders..... (like for instance, SATA drives will lie about data being flushed to disk ........ just let *that* sink in).

Using NAS drives won't really help, honestly. NAS drives tend to be a bit higher quality, but the high performance desktop drives (such as the WD black drives) are going to be just as good/bad, in this case. Enterprise SATA may be best, but ... at that price point, just go SAS.

Also, the controller and drivers that you're using make a huge difference too. For instance, I personally HATE Silicon Images. Not just the cards, but the company. Because of the absolutely shit quality of these controllers.

As for ERC/TLER, these are good primarily in RAID configurations, where the drive can use the alternate drive while "waiting" on the "bad" drive to respond. Or ... kick the drive from the array entirely. For a stand alone drive, it doesn't really help, that much.

But the other problem is that Windows is very sensitive to I/O issues. In fact, if you've used CloudDrive and seen the drive disconnect stuff.... that was implemented for THIS EXACT REASON. We had too many cases where the system would lock up, because it was waiting on CloudDrive, which was waiting on the provider.... So, too many errors in a short window, and the drive gets disconnected now.

Needless to say, this is a rather complicated subject... and not a cheap topic.

hermanarmstrong · September 13, 2017

Thank you for your useful reply

Sign In

why not use the other copy on read error

Question

KipKasper

Link to comment

Share on other sites

4 answers to this question

Recommended Posts

Christopher (Drashna)

Link to comment

Share on other sites

KipKasper

Link to comment

Share on other sites

Christopher (Drashna)

Link to comment

Share on other sites

hermanarmstrong

Link to comment

Share on other sites

Join the conversation

Browse

Activity