Jump to content
  • 0

Scanner didn't find my 2TB RED was going bad :-(


tomsliwowski

Question

So I've had scanner + drivepool running in my WHS 2011 box for 3 or so years. I recently purchased an 8TB RED to replace on of my 2TB RED drives.

 

I properly shut the box down, replaced the 2TB with the 8TB and placed the 2TB in an external USB enclosure. I added the 8TB disk to the pool and then tried to have drivepool remove the 2TB drive. I kept getting errors when trying to remove the drive so I decided to see if I could save the data by going into the drive and manually copying the contents. This took almost a day to get about 1.6GB of data and when I looked at what was copied I noticed that it was all corrupted (video which was unplayable or with crazy artifacts, pictures that were corrupted, music that sounded distorted, doc files that were unable to be opened). Luckily I had crashplan backing up the stuff I couldn't replace so the loss wasn't monumental.

 

My question is why did scanner not flag this drive as bad? I've had drives in the past report as "about to fail" and I was able to replace them in time but this drive went from "almost unreadable" to "will no longer recognize" without warning. The scanner said it had scanned the drive a few weeks prior,

Link to comment
Share on other sites

11 answers to this question

Recommended Posts

  • 0

Hi unfortunately drives do instantly fail not all give warning signs it happened to me with an ocz ssd the drive just failed not any of the computers I tried it in even showed it was connected it went from working to paper weight in the blink of an eye that's why backups are so important

Link to comment
Share on other sites

  • 0

As Lee said.

 

Also, if this happened after moving the drive into an external USB enclosure, it may have actually been caused by the enclosure (bad hardware, a loose connection, bad driver etc).  In this case, StableBit Scanner would not have detected an issue until it ran a new surface scan (every 30 days). 

 

Given what you describe, I think this is a connectivity or enclosure hardware issue. 

And in this case, I'd highly recommend running the 'Burst test' on the drive in question. If this errors out, then it's definitely a communication issue with the drive.... and likely the USB hardware. 

Link to comment
Share on other sites

  • 0

Just to follow up. I tried it in two USB enclosures and also tried plugging it directly to a SATA2 port. Same deal.

 

The drive looks like it was failing for a while because prior to it being replaced I did notice that things like reading or writing to the storage pool was rather slow (over the network it would start at 50MB/sec then slow down to <1MB then burst back up over and over again). This would indicate that the drive was going bad already.

Link to comment
Share on other sites

  • 0

Just to follow up. I tried it in two USB enclosures and also tried plugging it directly to a SATA2 port. Same deal.

 

The drive looks like it was failing for a while because prior to it being replaced I did notice that things like reading or writing to the storage pool was rather slow (over the network it would start at 50MB/sec then slow down to <1MB then burst back up over and over again). This would indicate that the drive was going bad already.

 

Well, there are a couple of things that will cause slow performance like this: 

  • bad hardware (cable, or controller)
  • loose connection
  • media issues (namely, a high number of reallocated sectors will adversely affect performance)
  • a short or defect on the PCB board on the drive, or corrosion on the contacts for the PCB

 

All but the last of these, StableBit Scanner should pick up (either via SMART data, or unreadable sectors).  However, the last one, depending on the exact nature of the short/defect would not show up in StableBit Scanner (or *any other utility*), as we rely on the data that this is sending to the system. 

 

I refer to this as "electrical failure/death", rather than "mechanical failure", as it's not a mechanical issue. 

And if this is what failed, this is the worst kind of failure, as it is silent and usually sudden. 

Link to comment
Share on other sites

  • 0

I guess that is what happened. Sorry if I came off as hostile, it was just a huge pain in the ass to restore music/pictures/docs from crashplan and then to randomly find corrupted videos from the files I didn't have backed up in crashplan :-/

 

No, you didn't come off as hostile, at all.  Frustrated, maybe. But that's completely understandable.  Drive failure is never pleasant, and we absolutely understand. 

 

It's just that we like to be rather verbose in our explanations, so that you have a good idea of what exactly is going on. Since storage systems can get complicated, quickly... we just want to help make things clear.

 

And it sounds like the recovery was a bit of a clusterfuck too, adding to that frustration. And I'm sorry to hear that. :(

 

 

Something that may be a good idea for the future, is to periodically check the Event Viewer. Specifically, look at the Windows Logs > System log, and check for disk, ntfs or controller related errors. This may indicate "non critical" issues that may not show up in StableBit Scanner, but may indicate a problem.  

Link to comment
Share on other sites

  • 0

Ah, that actually brings me to a new feature request for Scanner: To scan the system log for "atapi", "disk" and "ntfs".

 

This is something that has come up frequently, and I've pushed it a bit.

 

The problem is though, that there are a lot of errors that may happen normally, and don't actually affect the health or performance of the drive. A lot of these are "retry notifications" or just verbose logging.  

 

And generally, when you do start to see errors, usually they'll show up as SMART errors or issues with the scans. 

And some errors, like controllers resetting the drive is perfectly normal and within spec on USB devices. 

 

tl;dr: it's a complicated mess. 

Link to comment
Share on other sites

  • 0

I see. Still, if the advice is to periodically check the event log for issues like these,it is perhaps a nice service to have Scanner do it for you and present any messages. A notification that messages have been presented that may warrant inspection (and possibly adding priority flags for certain strings such as "controller X had an error on device Y" over time) might help users?

 

Ah yes, so much to do, so little time.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Answer this question...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...