Pre-purchase disk failure question.

PocketDemon · January 3, 2015

Hi - i'm looking at purchasing DrivePool (& the Scanner) in order to pool drives to reduce the no of drive letters & whatnot; whilst being able to set the placement of directories & being able to read the drives individually & whatnot; which is all clear.

However, since the auto duplication process obviously isn't a backup, i'd be looking at running pairs of pools separately (with identical drives & settings) - plus check-summing important data & whatnot... ...& having 3 copies of the data, ie a duplication + a backup, isn't financially viable.

Now, within this usage scenario there's a couple of things that aren't clear from the manual...

Well, imagining a drive in a pool suddenly catastrophically failed, obviously i'd have the separate pool to recover any files from, but is it simply the case that swapping in a new drive & assigning it to the pool would recreate the settings so that I could simply copy the files across... ...or would all of the file placement settings & whatnot for that drive have to be redone or would it rebuild them?

&, similarly, if a drive in a pool were detected as failing by the Scanner, i can see that DrivePool starts automatically moving files to other drives in the pool (which is great for my purposes obviously as it helps to maintain as much of the backup as possible); however what then happens when the drive is replaced? is everything then moved back to the new drive or does this have to be done manually?

Thanks

Tim.

Christopher (Drashna) · January 3, 2015

I'm going to apologize now. I'm probably goign to miss/ignore some of the points you've brought up here. This is intentional, as I think you're asking mostly about drive failure and recovery. I'm going to cover THAT in detail, here.

First, if realtime duplication is enabled, any data written to the pool that is duplicated is written in parallel to both disks. That means that it will immediately be duplicated.

If realtime duplication isn't enabled, or if you've just seeded the pool, it will measure the contents and immediately start duplicating the contents of the pool (as needed). Depending on how much data you have, this can take hours or days even. Additionally, duplication is done as a background IO Priority, so it make take more time than straight file copying (to prevent a performance hit while this happens).

Now, when a disk fails, if everything was duplicated, then you're fine. Just remove the missing disk from the pool. It will then immediately recheck the duplication and duplicate any files as needed. You can add a replacement disk at this point, if needed (or even beforehand).

If you have unduplicated data, then that means that you will lose any files that were on the disk.
Additionally, we (stableBit DrivePool) do not maintain an outside index of the data on the disks (as this would be very resource intensive).

In this case, you'd need to use your other "data set" to recover you data.

Additionally, if you manually remove a disk, before it fails, it will move all the contents off of the disk and then remove it. Alternatively, you can "duplicate later" which will move just the unduplicated data off of the pool, and then run a duplication pass and recheck the status of all of the files.

And if you have StableBit Scanner installed, it will move data off of a disk if it is marked as damaged (eg, has bad/unreadable sectors). Optionally, you can enable this behavior for SMART errors (which include overheating errors). And there is an option to avoid placing new files on drives that have overheated.

This should have answered most of your questions (if not all).

If you need any clarification, if I missed any questions, or if you have any additional questions, don't hesitate to ask.

PocketDemon · January 4, 2015

Hi - thanks hugely for the prompt reply.

I've obviously not been as clear as I hoped I'd been with my questions/examples, as I was attempting to find out very specific things; whereas you've quite reasonably, given that I'm clearly someone who's never used the s/w, started by explaining some far more general points about it.

So, for example, I fully understood already that, if a drive fails, "If you have unduplicated data that means that you will lose any files that were on the disk" - as this is obviously the same as having 2 non-pooled (or non-R1 or non-storage space'd mirrored or...) drives with the same contents on & one of them dying.

Similarly, I fully understand (with the exception of the what's being asked & would be covered by Scenario 2 below) how things would operate with a catastrophic drive failure 'if' I were to be using duplication - however, as mentioned, there are just major pros to having a backup vs any kind of automatic duplication which matter far more to me than having to manually copy data twice onto to 2 pools; given that, again, I cannot afford the 50% extra no of drives (or the extra 4U case & h/w to attach them to the raid controller for that matter) that would be required for both a backup & duplication.

Duplication, as with raid, is not a backup naturally.

Instead, I need a solution to use a larger no of drives than there are letters, & the advantages for me of the s/w, vs the alts, being able to both send some specific data to specific drives & have the drives being independently readable (on an identical raid card, as I have 2)...

...having already ruled out assigning drives to folders, which would have accomplished pretty much everything that I actually need for free (accepting that auto-balancing data that could be anywhere within a pool & automatically moving data if a drive were starting to fail would be additional advantages), as there's some other s/w that I use that has issues with accessing drives using this approach.

(I know that that s/w limitation isn't going to be an issue with DrivePool from something incidental gleaned from another thread btw - I did search for stuff first)

So, to attempt to rephrase what I was trying to find out, it's primarily trying to establish how placement rules are defined within two non-duplicated pool drive failure scenarios - ie are the rules drive based or pool based (& drive independent)?

Scenario 1. So, drive B in an A, B, C non-duplicated pool (there's then a separate D, E, F pool that has the self same contents on as a backup) suddenly catastrophically fails without any warning & there were rules attached to this disk; ie 'send all data in the folder XXXX to drive B'.

[NB Thinking about it, this would also be relevant if I wanted to upgrade drive B's capacity - given that it would be far quicker to pull the drive, stick the new larger one in & copy the data back across manually, than telling the s/w that I was removing the drive & it all needing to be copied to A & C & then back again (either automatically or by setting the rules up again - see Scenario 2).

Well, I'd still have 2 copies in the interim (both a working drive B that I'd pulled & could connect via the raid card in another machine or whatever, & the data somewhere on the D, E & F pool) so that maintains the backup whilst it were done without the double copying.]

Obviously (whether through failure or active choice), drive B no longer exists & so is missing from the pool - but would putting a replacement in & adding it to the pool set up any pre-established placement rules automatically again on its replacement, or would this need to be done manually?

Scenario 2. &, slightly differently, drive B in an A, B, C non-duplicated pool is picked up as starting to fail by the Scanner - drive B having some placement rules on.

(again there's a D, E, F backup pool)

As you wrote, & I understood already from the manual, the s/w will then attempt to move all non-damaged data onto drives A & C - along with warning that there's problems brewing & whatnot - which is great as it then helps to maintain there being a backup of as much data as possible.

Now, once it's done it's thing, I tell the s/w that I'm removing drive B, stick a replacement drive in & add it to the pool, but what does the s/w then do with the data that's been moved onto A & C - given that, again, there were pre-existing placement rules &, obviously, the pool will now be completely unbalanced?

Anyway, I hope this better explains what I am trying to find out here.

Thanks again

Tim.

[Edit] Oh, & I do realise that I should really bin drive B before I start, as it's clearly b useless & keeps on failing in different ways, but I like to live dangerously.

Christopher (Drashna) · January 4, 2015

This is going to be a joy to respond to.
But I'll try my best.

Hi - thanks hugely for the prompt reply.

I've obviously not been as clear as I hoped I'd been with my questions/examples, as I was attempting to find out very specific things; whereas you've quite reasonably, given that I'm clearly someone who's never used the s/w, started by explaining some far more general points about it.

I apologize for that. It can be hard to express what you mean and get the other party to understand. It is the nature of communication...

So, for example, I fully understood already that, if a drive fails, "If you have unduplicated data that means that you will lose any files that were on the disk" - as this is obviously the same as having 2 non-pooled (or non-R1 or non-storage space'd mirrored or...) drives with the same contents on & one of them dying.

Similarly, I fully understand (with the exception of the what's being asked & would be covered by Scenario 2 below) how things would operate with a catastrophic drive failure 'if' I were to be using duplication - however, as mentioned, there are just major pros to having a backup vs any kind of automatic duplication which matter far more to me than having to manually copy data twice onto to 2 pools; given that, again, I cannot afford the 50% extra no of drives (or the extra 4U case & h/w to attach them to the raid controller for that matter) that would be required for both a backup & duplication.

Duplication, as with raid, is not a backup naturally.

Okay, those are points that I did want to make explicitly clear.

And yeah, duplication is redundancy. Not a backup. But redundancy does make it easier to restore when something does go wrong.

And I understand the space issues. Though, just a heads up, the Seagate 8TB Archival disks may be a great use for this backup pool. They're meant for cold storage, so it may be perfect. Also, they're significantly cheaper than similar drives.

Instead, I need a solution to use a larger no of drives than there are letters, & the advantages for me of the s/w, vs the alts, being able to both send some specific data to specific drives & have the drives being independently readable (on an identical raid card, as I have 2)...

...having already ruled out assigning drives to folders, which would have accomplished pretty much everything that I actually need for free (accepting that auto-balancing data that could be anywhere within a pool & automatically moving data if a drive were starting to fail would be additional advantages), as there's some other s/w that I use that has issues with accessing drives using this approach.

(I know that that s/w limitation isn't going to be an issue with DrivePool from something incidental gleaned from another thread btw - I did search for stuff first)

So, to attempt to rephrase what I was trying to find out, it's primarily trying to establish how placement rules are defined within two non-duplicated pool drive failure scenarios - ie are the rules drive based or pool based (& drive independent)?

The file placement rules are per pool (meaning each pool has it's own set of rules). The same applies to the balancers. That way, you have significant control and flexibility over each pool.
For instance, you could use the "Ordered File Placement" balancer plugin on the backup pool to fill up the disks on at time.

From there, the file placement rules specify which disk (in that pool) that a files end up (by specifying the folder or file, by name or wildcards).
For instance, we had a ticket recently where a user wanted all the metadata on a specific disk (an SSD) and we helped set that up correctly (as the file placement rules can be confusing).

This means if you want the same configuration, you'd need to set up the rules for both pools separately.

Scenario 1. So, drive B in an A, B, C non-duplicated pool (there's then a separate D, E, F pool that has the self same contents on as a backup) suddenly catastrophically fails without any warning & there were rules attached to this disk; ie 'send all data in the folder XXXX to drive B'.

[NB Thinking about it, this would also be relevant if I wanted to upgrade drive B's capacity - given that it would be far quicker to pull the drive, stick the new larger one in & copy the data back across manually, than telling the s/w that I was removing the drive & it all needing to be copied to A & C & then back again (either automatically or by setting the rules up again - see Scenario 2).

Well, I'd still have 2 copies in the interim (both a working drive B that I'd pulled & could connect via the raid card in another machine or whatever, & the data somewhere on the D, E & F pool) so that maintains the backup whilst it were done without the double copying.]

Obviously (whether through failure or active choice), drive B no longer exists & so is missing from the pool - but would putting a replacement in & adding it to the pool set up any pre-established placement rules automatically again on its replacement, or would this need to be done manually?

The File Placement rules have an option for each rule, to "never allow files on other disks" or to allow files other disks if it's more than 90% (adjustable) full". There is also an "add new disks to this rule" option, that well, does what it sounds like it does.

In the case of the disk failing, the rule will be disabled effectively, and allow placement on other drives.

Depending on the specific rule and all the options, you would need to edit the rule to include the new/replacement disk, and then copy the contents back onto the pool.

Scenario 2. &, slightly differently, drive B in an A, B, C non-duplicated pool is picked up as starting to fail by the Scanner - drive B having some placement rules on.

(again there's a D, E, F backup pool)

As you wrote, & I understood already from the manual, the s/w will then attempt to move all non-damaged data onto drives A & C - along with warning that there's problems brewing & whatnot - which is great as it then helps to maintain there being a backup of as much data as possible.

Now, once it's done it's thing, I tell the s/w that I'm removing drive B, stick a replacement drive in & add it to the pool, but what does the s/w then do with the data that's been moved onto A & C - given that, again, there were pre-existing placement rules &, obviously, the pool will now be completely unbalanced?

If the data on the drive is duplicated... there are couple of options here. And it depends on what is happening exactly.

If the disk is marked as damaged by Scanner, then in most cases, it will attempt to move the data off of the disk onto other disks. You can "fine tune" or disable this behavior by using the "StableBit Scanner" balancer.

As for what happens after the data has been moved.. if the rule has been updated to use the new disk, and only that disk, then it should attempt to rebalance the pool at the next pass (or immediately) and move that data back to a drive where it's not violating the File Placement rules.
Also, it should notify you that the file placement is not optimal and want to rebalance the pool.

Anyway, I hope this better explains what I am trying to find out here.

Thanks again

Tim.

[Edit] Oh, & I do realise that I should really bin drive B before I start, as it's clearly b useless & keeps on failing in different ways, but I like to live dangerously.

Yes, you definitely should. Don't want to test this out in a live environment any sooner than you absolutely have to.

Also, it's worth noting that duplication is also done "per pool".

And I think I've answered everything. If I haven't let me know and I'll try to clarify further.

PocketDemon · January 4, 2015

That's answered exactly what I needed to know - ie that post any kind of drive failure, catastrophic or otherwise, or pulling, any placement rules will need setting up again on the replacement drive.

Yeah, it's simply about knowing the limitations/quirks of the s/w beforehand to make sure that there's not a better option - or rather that, as it seems pretty certain that I'll be buying it, in the event of any issue then I don't do anything stupid by wrongly assuming that the s/w will be doing something that it can't.

Otherwise, if you're offering to buy me some 8TB drives then that's really very kind of you & I'll look forward to using them.

More sensibly, the aim here is to try to rationalise lots of different bits of storage (a couple of 8 bay DASes & a NAS & a load of offline backup drives & some drives in a couple of different machines that I've bought as needed) into something a bit more cohesive & useful - so whilst there will be either 2 or 4 new 4TB drives bought alongside the 4U case (hopefully this week assuming some other h/w turns up & all works as it should), it's unfortunately mostly got to work with what I've got...

...though I'm sure it'd be vastly cheaper to buy a 2nd 24 bay 4U case plus something like a Chenbro CK13801 + CK23601 combo (+ the 8088 & 8087 cables needed to connect it all) & populate the entire thing with 4TBs, than it would be to replace enough of the drives I own with 8TB ones.

Anyway, thanks again for your assistance here - it's both really appreciated & is really quite reassuring to see, first hand, that there's such a prompt reply when there's questions asked.

Christopher (Drashna) · January 5, 2015

Well, I'm glad to could get that answered for you.

And yeah, I definitely understand wanting to know the limitations and abilities of the software before using it (though, I tend to test first and then ask questions).

And as for the 8TB drives, no we don't have any to "give away" (though that would be nice, maybe we should take to Seagate about that! ), but it was an option that may have been useful. If you're willing to invest in it.

Though, eventually the SAS Expanders may be be a good idea. I'm getting close to being there myself (my Norco RPC-4220 is 3/5 filled already). And from an organizational and connection standpoint, it may be a better idea in the long run

And I swear, I'm not trying to push you to spend money! Though, I may be trying to push myself....

And you're absolutely welcome! In this day and age, it's not enough to have a good product. You have to have good customer service as well, and we try to be prompt, friendly and helpful. And we are very happy when that's appreciated!

And if you have any other questions, don't hesitate to ask!

PocketDemon · January 5, 2015

Damn, you got my hopes up there with the promise of those 8TB drives... Well, whilst I really don't rate their consumer drives, their enterprise ones are pretty great in my experience (I've still got some 15.7Ks that I use for video editing as, with enough of them & separate source & destination arrays, they're still quite nippy & have been hugely reliable) & I'd gotten all excited thing about what to fill them all with.

Yeah, normally I would also test first, but, as & when I have the bits, I need to turn the thing round asap - esp as part of the funding for the server needs to come from flogging some of the old kit, & so need to be able to take it all offline & advertise it as quickly as I can once I've got the server up & running.

Similarly, creating failures (though I'm not entirely sure how I'd make the Scanner think that a disk was failing(?)) with enough data on to work around chance (re)placement & waiting for it to do its thing back & forth would obviously be somewhat time consuming even if there were no other pressures... ...& it's going to be much better to use what time there is available to look at the day to day quirks of the s/w that'd be instantly meaningful than hours/days(?) on emulating failures that, touch wood, would never happen & only need a simple answer as to what the s/w actually does.

It's also not like the kind of issues that could occur with a failed array, controller failure, non-Windows compatible file system, etc, where the data can't be natively read from any of the drives & money's got to be thrown at data recovery to get anything back - R-Studio's come up trumps in the past in those kind of instances btw - & with both a backup pool & a second identical raid card (there will also be at least a third further backup of more critical stuff naturally) then, within the realms of what's affordable, fingers crossed that 'should' be robust enough.

SAS expanders are really the way forward imho... Well, a 24 port SAS card is simply all the money, whereas, for this initial server, it'll be an 8 port LSI card (nothing special, just a 2960, as it matched the one I've owned for a few years) paired with one of those HP ones - which tend to be about £100-120 2nd hand over here & work very well with a large array of cards providing if they've got a late enough f/w on them (or you've got access to a HP system & controller to flash them on).

Yeah, those Chenbro things are simply a reasonably cost effective option that I've came across in looking at options for if/when I need to add more drives by connecting up a 2nd 4U case - &, rather than that Norco, in the UK XCase do their own slightly updated version (plus a singing & dancing version with SGPIO & stuff), which I'll be going for.

Anyway, enough waffling on.

Thanks again.

Tim.

Christopher (Drashna) · January 5, 2015

If i win the lotto, I'll send you a couple. Deal?

And that's a very good reason to not just test first.

And reusing parts definitely helps save on cost.

As for creating failures... we have a "test" mode for StableBit Scanner, which will actually intentionally create false positives (aka, create an error where none exists). It's in the Scanner settings section.

But if you'pre using all of your actual data, yes, it would be very time consuming.

However, you could use a small data set just to see how it works. Even better, if you have a system capable of running VMs... (Windows 8 Pro includes HyperV, just FYI) that would be a good way to experiment and experience them.

But

R-Studios is pretty good, IIRC. In fact, I think they're the only one that has a "recovery" solution for Storage Spaces already.

But yeah, block based solutions are tricky when they fail. You're left with a lot of raw bits and not much to do with them. It's why we went with the file based solution. While it may still suck if you lose a drive... you haven't lost everything at least.

As for the SAS stuff, I have a couple of IBM ServeRAID m1015 cards. They use the LSI SAS2008 chipset (equiv to LSI 9211-8i, IIRC). They're great cards, even in IT mode. And ... after doing more research, I only needed the one card, I just need the expander (in theory, PCIe 2.0 8x can handle maxing out ~35 hard drives at the same time).

Unfortunatley, I've only really been getting into the higher end storage stuff recently, as I hadn't had the need for it until now. But it's really fun.

Also, Intel has a SAS Expander card that works as well. Lots of options! But man, do they get expensive fast.

But yeah, the Chenbro are pretty good price-wise.

crowdx42 · January 7, 2015

Quick question on this thread, is there anything in place to stop Drivepool from placing files onto a new drive before it is scanned? I have had cases in the past where a drive dies after a day of use. I currently run WD Lifeguard Tools to read a new drive and then write zeros to it to make sure it works for a day or so of constant use, but not all drives are recognized with Lifeguard Tools (not sure why).

Coming from unRAID and the preclear script that folk run to clear a drive has made me quite paranoid of new drives failing out of the box

Christopher (Drashna) · January 7, 2015

It's been requested already and is on our "To-do" list:

https://stablebit.com/Admin/IssueAnalysis/5251

As for the WD LifeGuard Tool, it may be picky about the controller. Most of these OEM tools tend to be very picky about that. And even some of the good third party tools as well.

However, I am going to bug Alex about this again, as it should be simple to implement (altering the built in Scanner balancer that we already have).

In the meanwhile, you can do this manually, by using the Disk usage limiter balancer. Just uncheck both "duplicated" and "unduplicated" options, and save. Once it's scanned, check both again, and it will start using it.

crowdx42 · January 8, 2015

Well I am currently running a preclear on two new drives before placing them into service, that way they will be scanned and zeros written and read to the drive before using them in drivepool.

Christopher (Drashna) · January 8, 2015

Well, that works too.

crowdx42 · January 8, 2015

One final question on this, can a machine be rebooted while Drivepool is rebalancing? Or is it preferable to allow it to finish before a reboot? I want to move remove some older drives from the pool but I may need to restart the machine during the removal process and so I wonder if that has an impact when removing a drive?

Christopher (Drashna) · January 8, 2015

It absolutely can be rebooted during balancing. Or at any time.

Specifically, when balancing files, we use a "copytemp" file to move or copy it. Once the file has been copied, we rename it and then delete the original (if needed). So if the balancing doesn't complete (such as a reboot, or stopping the service), your data is unaffected and it's integrity is never put in a position where it might be compromised.

Also, stopping the service (like when you shut down), stops the service, and allows for gracefully cancelling the balancing.

Sign In

Pre-purchase disk failure question.

Question

Link to comment

Share on other sites

13 answers to this question

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation