Jump to content
  • 0

Failing/failed drive (maybe) - need advice.


fleggett1

Question

Yesterday, I got the dreaded notice from DP that a disk had dropped out of the pool.  After a reboot, the drive reappeared, so I started the removal process.  About 10% in, the drive dropped out again.  Rebooting seems to revive the drive, but now it only gets to 0.3% before it drops out.  Only the percentage figure shows progress (up to the aforementioned 0.3%) - the actual byte report remains static, which makes me suspicious that any data is genuinely being transferred.

What really sucks is that this drive (a 4 TB WD Red) was almost full.  I didn't RAID anything, so it's just a simple JBOD pool.

I'm old and probably going senile, so I have no clue on how to proceed.  I'd really like to get the data off the drive (if possible), but I'd rather not spend $1,000+ to do so.  I suppose I could contact an outfit like Rossmann Repair Group since they're now doing PC drive recovery, but I don't know if the resultant data structure would even be usable and could be reintegrated back into the pool.

Oh, also of note is that this drive didn't throw out any SMART errors.  I know SMART isn't 100% reliable at indicating bad or soon-to-be-bad drives, but it's still weird and irritating.  I doubt it's the controller, as there are other drives attached doing just fine.  The case's backplane could also be a point-of-failure, but that strikes me as unlikely.

Anyway, I'd really appreciate some pointers.  Should I throw the drive into a freezer covered in uncooked rice?  If I pull the drive now in its crippled state and try reading it on another Windows machine, what will come up when doing a simple dir or ls?  Can I try putting it in another drive bay (I have several free) or will that make DP go bonkers?

And lastly, if worse comes to worst and I have to trash the drive, how can I remove its presence from DP without upsetting the rest of the pool?

I'm using the latest version of DP (2.2.4.1162) on a Windows 10 machine.  Thanks in advance.

Link to comment
Share on other sites

20 answers to this question

Recommended Posts

  • 1

Pool structure is (deliberately) very simple re data recovery: a pool is simply the union set of folders in each of its poolpart drives, with duplication handled by having the same file in the same folder on multiple drives.

E.g. if your pool drive is P, and your poolpart drives are D, E and F, then the following collection of files:

d:\poolpart.guidstring\documents\file1.txt
e:\poolpart.guidstring\documents\file1.txt
e:\poolpart.guidstring\pictures\file2.txt
f:\poolpart.guidstring\pictures\file3.txt
f:\poolpart.guidstring\music\file4.txt

would be automatically collated and displayed by DrivePool as:

p:\documents\file1.txt
p:\pictures\file2.txt
p:\pictures\file3.txt
p:\music\file4.txt

The poolpart.guidstring folders themselves are hidden folders on the poolpart drives, and the guidstring helps DrivePool identify the pool. There's also a .covefs folder in each poolpart which you can safely ignore if you leave it untouched.

EDIT: Note that it doesn't matter how or where a poolpart drive is attached to the computer; so long as it shows up as a readable drive to Windows then DrivePool will find it unless you rename the poolpart folder on that drive (e.g. by putting an 'x' in front of 'poolpart').

Link to comment
Share on other sites

  • 1

Okay, to put it as simply as possible:

  • DrivePool presents a unified view of all the drives (poolpart folders) in a pool as if they were one big drive.
  • DrivePool does not keep its own "database" of where files are. It relies on the Windows file system.
  • Whenever you write a file to a pool, it picks one drive in the pool to store that file (multiple drives if duplication is on).
  • Balancing is when DrivePool moves files around in the background to ensure that none of its drives gets too full.
  • A folder might be split across multiple drives, but a file is NEVER split across multiple drives (any duplication makes complete copies).

"Here's a thought - could I simply just assign a drive letter to the troublesome drive and, while it's still being recognized, do a simple dir /s, capture the output, and go from there?"

First go into the Balancing GUI for DrivePool, select "Do not balance automatically" and un-tick "Allow balancing plug-ins to force immediate balancing" then click Save. This will prevent DrivePool from moving any files around in the background while you work on the drive.

I would recommend using dir /s /b instead of just dir /s.

Link to comment
Share on other sites

  • 0

Thanks for the quick reply.  I kinda/sorta get what you're saying on an abstract level, but not on a nitty-gritty one.

I suppose it comes down to the age-old question of where exactly are the files stored on a pooled system, especially and critically if they're not duplicated on other drives.  From what I recall when I installed DP some years ago, I chose to NOT have a software RAID-style setup, instead opting for a simple JBOD configuration since I didn't have many drives at the time and, therefore, almost no storage space to spare.  I figured if I needed to do any drive replacements in case of potential/probable future failures, DP would automagically sense that based on SMART alerts, move what it could off the affected drive(s), and notify me accordingly.  Since this drive didn't throw out any SMART errors, though, DP was (I assume) caught unaware.

So, assuming DP can't read and move the data on its own, what do I do now, apart from pulling the drive?  If the drive does prove to be unrecoverable (which is looking more and more likely), how do I know what I have and what I don't?  A huge concern is if DP spans data across drives, in which case everything on the pool could be in jeopardy, with bytes missing of various sizes from God only knows how many files.  And even if DP doesn't do that, I'm still saddled with the might-be-impossible task of determining which files are (or were) on the failed drive.

And if/when I do pull the drive, what next?  Once I remove the drive entry in DP, is there anything else I need to do?  Or can do?

BTW, I did notice where someone had posted a Powershell script to collate and report where each and every file is physically located on each and every drive in a pool.  I guess I should start using it on a pretty regular basis, perhaps daily.  I wish my eyes were as razor-sharp as hindsight.

Link to comment
Share on other sites

  • 0

Okay, I deslected those options within DP.  When I next rebooted, in the nick of time, I managed to assign a drive letter, do a dir /s /b on the drive, and capture the output.  Lotsa stuff that can be replaced, but predictably, there was more than a few files that will be exceedingly difficult to re-source.

It's a good thing I didn't keep my banking records on that drive.

One thing I thought I noticed were directory entries with no associated files.  Can DP store directory markers on one drive, but the files in that directory on another drive? - nvm, I confirmed that when I put the drive into an external dock and tried to copy the files (which didn't work).

That's a heckuva relief that DP doesn't span individual files across drives.  You can't imagine the nightmares I was beginning to have over that.

I'm strongly leaning towards going ahead and contacting RRP for a recovery quote.  It's the only place I know that (I presume) won't charge a kidney.  I'm thinking that, assuming I can get an affordable quote, I should probably just send the drive in now, while it's still being recognized for a few minutes.  I figure that, if I wait until the drive stops being recognized altogether, it'll just put the data in that much more jeopardy.

Thanks for the explanations and help!

Edited by fleggett1
Additional findings.
Link to comment
Share on other sites

  • 0

"It's a good thing I didn't keep my banking records on that drive."

Once you've got your pool going again, note that DrivePool can also perform duplication on a per-folder basis. For example, I have junk/temp folders with duplication off, most folders with duplication set to x2, and a few critical folders with duplication set to x3. (I also always turn on DrivePool's "verify after copy").

"One thing I thought I noticed were directory entries with no associated files.  Can DP store directory markers on one drive, but the files in that directory on another drive? - nvm, I confirmed that when I put the drive into an external dock and tried to copy the files (which didn't work)."

Correct; DrivePool won't delete a folder it has created on any disk in the pool unless either the folder is being deleted from the pool as a whole or you've ordered DrivePool to Remove that disk.

Link to comment
Share on other sites

  • 0

Huh, I didn't know DP could perform per-folder duplication.  I'll definitely check that out once I can get the system settled.  I should probably go ahead and purchase another 10+ TB drive while I'm at it.

Just as a follow-up, I put the drive in my fridge overnight and just tried reading it.  It worked for about two more minutes than before, then died again.  The motor still spins, but there's a spanner in the works somewhere.

Through some teeth gnashing, I also went ahead and contacted RRP to explore my options.  I'm now just waiting to hear back.

If RRP can recover, say, 90% of the data, is there an easy way to reintegrate such back into the pool?  I'm sure DP has its own descriptor files & folders that're required when doing a removal through DP itself which, if not present, makes that process impossible.  Which f&f's absolutely have to be present and what are they called?

Speaking of buying new drives, I've resorted to shucking since the OEM ones are still commanding absurd prices.  I also now know to be wary of shingled drives.  Any recommendations?  I've been a WD fan for most of my life, but am open to other makes & models.

Thanks again.

Link to comment
Share on other sites

  • 0

"If RRP can recover, say, 90% of the data, is there an easy way to reintegrate such back into the pool?  I'm sure DP has its own descriptor files & folders that're required when doing a removal through DP itself which, if not present, makes that process impossible."

Short answer is that if you know where the files are meant to go (e.g. RRP recovered not just the files but also their folder locations) then yes, it's relatively easy. Just copy the files back to their original locations.

DP does have (since some versions ago) a hidden metadata folder named ".covefs" which is used for specific pool configuration info (e.g. duplication levels) and to handle any symbolic links (file / directory), junction points and mount points in the pool. But even if you're not using duplication for your own data, DP will still automatically duplicate that particular folder to up to three drives (and even if it was somehow lost, it's not required for your pool to remain basically functional and DP can build a new one).

So the slightly longer answer is that if you've been messing around with creating symbolic/junction/mount points inside your pool (and if you don't know what those are, you almost certainly haven't), you may have to recreate those points first if they've been affected.

"Speaking of buying new drives, I've resorted to shucking since the OEM ones are still commanding absurd prices.  I also now know to be wary of shingled drives.  Any recommendations?  I've been a WD fan for most of my life, but am open to other makes & models."

I'm a WD fan too, but I was unimpressed with how they snuck SMR drives into their NAS range. I'd still go with WD/HGST*/Toshiba in preference to Seagate unless one has the money for pro/enterprise drives (at which point it's more weighing up MTBF, NRER, AFR, etc), but Seagate has lifted their game recently.

Hmm. One thing to note, especially if you're not using duplication as you've found, is that the bigger the drive the longer it's going to take to get everything off it if something starts going wrong (and thus the more chances of something else going wrong during recovery). For example a full 12TB drive will take roughly 22 hours at 150 MB/sec, non-stop, to copy. So if you've got the physical room then two 6TB drives might be a better choice than that one 12TB drive.

*WD now owns HGST.

Link to comment
Share on other sites

  • 0
On 12/2/2020 at 4:55 PM, fleggett1 said:

Speaking of buying new drives, I've resorted to shucking since the OEM ones are still commanding absurd prices.  I also now know to be wary of shingled drives.  Any recommendations?  I've been a WD fan for most of my life, but am open to other makes & models.

In the past, I used to buy drives of all different brands. My experience is that all drives, regardless of brand, will eventually fail. This typical happened to me about 1 month after the warranty period ended. Now I am buying generic white label drives from GoHardDrive.com and they seem to work as good as any brand name drives I used. Currently, I find the new 3TB HDDs with 2 year warranty at $40.00 is the best price/gig point.

Already mentioned is that large TB drives will take hours and hours, if not days, to remove files off a failing HDD. Unfortunately, most of my HDDs that have failed give little to no warning. However, I recently had a disk that started to fail, and the free disk monitoring program Hard Disk Sentinel correctly caught it in time for me to remove about 90% of the data off the drive before it completely died. That saved me a lot of work in restoring from an offline backup drive. Thankfully, DrivePool saves complete files on a drive and does not just stripe data across multiple drives. So you have a much better chance on a full or at least partial recovery with DrivePool. When I was running Windows Storage Spaces, 1 drive failure wiped out my entire 26 drive pool, despite being set up for drive failure. That was a massive loss of data and is now why I am using DrivePool. At worse, you only lose data on the one failed drive in DrivePool.

As stated, with DrivePool you can enable folder duplication for important files. I have my financial data folders set to duplicate 3X, so any 2 drive failure in my DrivePool should not affect my important financial data. Those data files are relatively small. I also use a free cloud backup service for important files in case of house fire, etc... When I had that drive failure in DrivePool, it correctly used the duplication files in the rebuild.

I am not a big fan of shingled drives, either. But, in my case, my DrivePool is mainly my media storage center and most files are write once, read once, and maybe just sit there for years.... I think even shingled drives will hold up to that limited use. So I will use the cheaper shingled drives in DrivePool if it saves me money, but only because I have offline backup drives with my original media files. 

Link to comment
Share on other sites

  • 0

Sorry for the late reply.  Some real-life stuff got in the way.

Thanks for all the feedback.  I wound-up sending the drive to RRP for possible recovery.  I mailed it last Friday, so I don't expect to hear anything for at least a few more days.  I put an unofficial limit of $300, but because the drive can be read for about a minute after the OS loads, I'm hoping that it'll be less than that and that it's something relatively simple, like maybe an overheating or otherwise marginal, but not 100% failed, part, like you see with a lot of RRP's Macbook repairs.

Regarding which drive to get, as I currently understand it, SMR is employed on WD drives up to 8 TB.  At 8 TB and beyond, it's CMR.  However, at the 8 - 12 TB range, you might be getting a regular air drive.  For a helium drive, you have to go above 12 TB models.

The creme-de-la-creme is the 18 TB monster, which I've been told are rebranded Red Pros.  I think I'm going to save up for one, which have been as low as $279 as recent as last week.  I know that's a risk, especially if you don't employ some sort of fancy striping, but I think I'm willing to rely on the drive technology versus getting into something I know absolutely nothing about (i.e., RAID).

On the really irritating front, I have another drive that hasn't failed, but is reporting a couple of SMART errors.  I simply cannot offload all the data onto anything else ATM, so I'm really stuck, especially if I do wind-up paying for the recovery.  What's interesting is that DP hasn't flagged it for removal, so maybe it isn't exhibiting particularly dire errors.  If anyone is curious, this is what Scanner says:

image.thumb.png.a3208ffc9be5905d4d14c0fb5b8d1731.png

I only noticed it a few days ago, so I don't know if those values are increasing at an alarming rate or if they've been static for months (maybe years).  The drive is an HGST.

I am going to investigate the folder-level duplication feature of DP, though given the massive size of my media folder versus the space I have on-hand, that may not prove to be (currently) viable.  Still, it's something to pursue and work towards.

So, that's currently all the news at my end that's fit to print.  Again, thanks for the feedback and pointers.

Link to comment
Share on other sites

  • 0
4 hours ago, fleggett1 said:

Regarding which drive to get, as I currently understand it, SMR is employed on WD drives up to 8 TB.  At 8 TB and beyond, it's CMR.  However, at the 8 - 12 TB range, you might be getting a regular air drive.  For a helium drive, you have to go above 12 TB models.

Not quite. You can get WD drives under 8 TB that use CMR, for example their Red Plus and Red Pro lines (not to be confused with their Red (no Plus or Pro) line).

4 hours ago, fleggett1 said:

The creme-de-la-creme is the 18 TB monster, which I've been told are rebranded Red Pros.  I think I'm going to save up for one, which have been as low as $279 as recent as last week.  I know that's a risk, especially if you don't employ some sort of fancy striping, but I think I'm willing to rely on the drive technology versus getting into something I know absolutely nothing about (i.e., RAID).

For whatever it's worth, my own buying strategy has long been "what's the cheapest drive on a $ per TB basis that has a good reputation and at least 3 years warranty?" and making sure I buy enough of them that I can keep all my irreplaceable data mirrored and backed up (though now I have to add "and is not SMR"). Seems to have worked so far.

4 hours ago, fleggett1 said:

On the really irritating front, I have another drive that hasn't failed, but is reporting a couple of SMART errors.  I simply cannot offload all the data onto anything else ATM, so I'm really stuck, especially if I do wind-up paying for the recovery.  What's interesting is that DP hasn't flagged it for removal, so maybe it isn't exhibiting particularly dire errors.  If anyone is curious, this is what Scanner says:

Those numbers indicate four bad sectors that couldn't be corrected and sixteen more suspicious sectors that the drive will attempt to correct (where "correction" usually means attempting to copy the data to the drive's reserve of spare sectors and then marking the bad sectors as not to be used again). Any uncorrectable sectors is often an indication of a failing drive; if the count stabilises at a low number then it may just be a bad 'spot' on an otherwise good drive (a bit like a dead pixel on a screen) but I'd be wary and I certainly would make sure I had backups/duplication. Drives can go from a few bad sectors to dead in anywhere from years to minutes.

Stablebit DrivePool by itself does not flag drives for removal if they trip SMART; for that ability it relies on the Stablebit Scanner program. If the latter is installed, a Scanner plugin can be enabled and configured in DrivePool to automatically evacuate drives (edit: note: this isn't the same as actually removing the drives). It remains up to you to decide whether to remove an evacuated drive.

EDIT: with the Scanner balancer plugin for DrivePool, it can be set to evacuate drives based on SMART warnings and/or on damage that Scanner itself detects, and whether to still allow files on evacuated drives if the other drives are full. NOTE: if you have no free space on other drives in your pool to evacuate a drive, DrivePool won't be able to evacuate that drive.

Link to comment
Share on other sites

  • 0

Well, there have been some pretty unfortunate incidents since I last posted.

I sent the drive off to RRG.  They said it was either a firmware or platter issue.  $400 for the former, around $900 for the latter.  And they couldn't tell me which one it is without delving into the drive, which means a repair commitment.  I told them that I'd have to think it over.

That's just the tip of the iceberg, though.  I think my system is trying to suicide itself.  That 6 TB drive that had the bad sectors finally gave up the ghost.  I was able to evacuate it via the Scanner plugin, but it might've been for naught.  A couple of days ago, Drivepool was complaining about a missing drive (a 4 TB Red).  I pulled it and tried to copy its contents on another system, but only managed to read a few megs before it quit.  No smart warnings, just died.  EXACTLY like the unit that started this mess.

And just tonight, yet another drive of the same make/model dropped out.  I'll be pulling it tomorrow, but expect similar disappointing results.  For those keeping count, that's one 6 TB HGST and three 4 TB Reds.  And God only knows what shenanigans the following days will provide.

I'm at a crossroads with my box.  One or two drive failures I could handle.  Three is pushing it.  Four in the span of just a few weeks has just-about driven me over the edge.  I'm simply not in a financial position to cover these losses ATM, nor will I be in the foreseeable future.  I'm strongly considering cobbling together the money for a NUC or something similar that's tiny, but technologically current that I can just download stuff onto, watch once, then delete.  My current setup would be considered archaic at this point, so trying to salvage it would probably be the definition of throwing good money after bad.

That said, I can't really complain all that much.  This system has been quite the trooper for several years up until this point.  And everything else works just fine.  When drives aren't keeling over, it's remarkably stable and I can't remember the last time I encountered a BSOD.

So, I guess that's about all that's fit to print (for now).  Wish me luck.

Link to comment
Share on other sites

  • 0

Sorry for the threadnecro, but due to some Best Buy sales, I was able to replace the four low-capacity drives with four 16TB monsters.  However, because of the way my enclosure is put together (insufficient airflow), it looks like I'm going to have to separate them due to heat issues (as reported by Scanner).

As such, I'll be repositioning these drives in my bays so that they're not immediately next to any other drives (they're low-capacity, so I'll probably just pull and retire them).  This isn't gonna screw anything up, will it?  Is Drivepool's drive identification bay-dependent or independent?  I would think the latter, but I know some RAID configurations can get scrambled if drives aren't in their initially installed bays.  I'm not doing RAID, but just want to make 100% sure.

EDIT - I should've noted that I've already added these drives to the pool.  If I need to evacuate them before the reposition, I can do that.

Thanks in advance.

Edited by fleggett1
Additional info.
Link to comment
Share on other sites

  • 0

That's excellent to hear, as I was thinking of replacing my two stone-age Supermicro HBAs:

https://www.supermicro.com/en/products/accessories/addon/AOC-SASLP-MV8.php

With this Broadcom/LSI controller since it supports 16 ports on just the one card:

https://www.broadcom.com/products/storage/host-bus-adapters/sas-nvme-9500-16i

Any problems I might run into?

Link to comment
Share on other sites

  • 0

Sorry for doing yet another thread necro, but I'm having "one of those days".

Got up today with Scanner reporting a smart error on one of my drives.  Fortunately, it turned out to be an older 4 TB Red, which I can do without.  I had already activated the plugin to automatically evacuate files in the event of a smart error.

Which it SEEMS to've done, but now I'm not sure.  I managed to pull a directory listing of the failing drive and all I saw were directories, but no actual files.  Is that normal?  Should I be worried that some of the actual files might not've made it to the other drives in my pool?

The other odd issue which makes me wonder if the drive was truly and fully evacuated is that qBitorrent is now reporting weird i/o errors for some files.  I've looked in my torrent directory and everything looks like it's there, so why would qBt be reporting such errors?  My torrent directory is part of the pool.

BTW, I've tried manually removing the drive with all checkboxes unchecked and it gets to the 3.2% mark and fails.

As always, thanks in advance.

Edited by fleggett1
Grammar.
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Answer this question...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...