Jump to content
Covecube Inc.
  • Announcements

    • Christopher (Drashna)

      Login issues   11/07/17

      If you have issues with logging in, make sure you use your display name and not the "username" or email.  Or head here for more info.   http://community.covecube.com/index.php?/topic/3252-login-issues/  
    • Christopher (Drashna)

      Getting Help   11/07/17

      If you're experiencing problems with the software, the best way to get ahold of us is to head to https://stablebit.com/Contact, especially if this is a licensing issue.    Issues submitted there are checked first, and handled more aggressively. So, especially if the problem is urgent, please head over there first. 
Alex

StableBit DrivePool - Controlling Folder Placement

Recommended Posts

I like writing these posts because they give me feedback as to what the community is really interested in. I can see that my last post about the Scanner was not very interesting, it was probably too technical and there's probably not much to add to what I've already said.

 

Well, this time let's talk about StableBit DrivePool. In particular, I'd like to talk about DrivePool beyond 2.0.

 

Controlling Folder Placement

 

I think that I have a few great ideas for DrivePool 2.1+ but some of them depend on the ability to control folder (or file) placement, per pool part. I've kind of hinted at this capability in the thread that talked about taking out per-folder duplication, but I think that I've figured out how we can make this work.

 

What I would like to be able to do in future versions is to give you guys the ability to associate folders with one or more disks that are part of the pool. So that any files in those folders would be stored on those pool parts only (unless they're full).

 

This should be trivial to implement on the file system level, but the balancing framework would need to be enhanced to support this, and I think that I've figured out how to make that work.

 

Theoretically, you should even be able to use wildcard patterns such as /Virtual Machines/Windows* to associate all of those files with a group of pooled disks.

 

What do you guys think, is this worthwhile doing?

Share this post


Link to post
Share on other sites

Because it's kind of wordy.... you mean to be able to control what files go to which disk in the pool, basically, correct?

If so, I know that I would definitely be interested, I'm sure there are others that definitely would be as well.

 

I can be a bit wordy by my very nature. But yes, that's exactly what I'm talking about, controlling which files go onto which pool part.

 

And, in the future, a pool part may not necessarily represent a single local physical disk, which would make this even more interesting :)

Share this post


Link to post
Share on other sites

This is something that I have hoped for, for a very long time. I really have never been keen on having things scattered around the pool. Especially with music... 10 album tracks scattered over 5 or 6 drives.. But then I'm probably a bit O.C.D.

 

I'm sure for people who have various devices streaming to different rooms this must be a good thing. Knowing that all the Disney films are all on one hard drive for the rug rats, and the teenage daughter can watch her twilight knowing its on a separate drive so no risk of intensive I/O and so on. And yes i know that this could be achieved by organising multiple pools. However when you get to 13 drives and around 22TB of data creating new pools seems like a hassle. 

 

First thought is that this would eliminate the need.

Second thought is that once implemented the folder placement would to my mind then simplify the operation of creating separate pools and may actually lead me to do it, instead of just thinking about it  :)

 

I'm all for it!!!

 

.

Share this post


Link to post
Share on other sites

At firtst I wasn't sure I would have any use for such a feature, but the more I think about it, the more I like the idea of having more control over what goes where.  It could be very useful as my children get old and skilled enough to use the server.  Bring on the progress!

 

Oh, yeah, and thanks so much for all your hard work Alex!  Drivepool and Scanner are awesome and the WS 2012 E integration in the last update felt like an early Christmas present. :)

Share this post


Link to post
Share on other sites

Oh, yeah, and thanks so much for all your hard work Alex!  Drivepool and Scanner are awesome and the WS 2012 E integration in the last update felt like an early Christmas present. :)

 

Thank you that is very much appreciated :)

Share this post


Link to post
Share on other sites

This is something that I have hoped for, for a very long time. I really have never been keen on having things scattered around the pool. Especially with music... 10 album tracks scattered over 5 or 6 drives.. But then I'm probably a bit O.C.D.

.

I'm right there with you. My storage is mainly movie rips and i don't do file duplication due to the storage space required and I can just re-rip if needed. However, if a drive were to fail, i would have to re-rip most of my 1900 disks since the files are scattered everywhere.

 

I would love to be able to rip to the pool and have all the files for the rip go to one physical drive. This way recovery from a drive failure would be much easier.

Share this post


Link to post
Share on other sites

I'd like the ability to say "keep together all files in any|each folder of folderpath" where "any" and "each" are modifiers affecting which files to keep together.

 

So given an example where folderpath was p:\media\sketches\* and I had files in p:\media\sketches\2012 and p:\media\sketches\2013

 

if the "any" modifier was selected 

then the files in sketches\2012 and sketches\2013 and any subfolders thereof would be kept all together

 

but if the "each" modifier was selected

and the files in sketches\2012 and any subfolders thereof would be kept together

and the files in sketches\2013 and any subfolders thereof would be kept together

yet sketches\2012 and sketches\2013 would be stored independently of each other.

 

Also, something along the lines of if folderpath ends in "folder\" it also applies to any files within that folder itself, while if it ends in "folder\*" it only applies to files within that folder's subfolders.

 

I hope this makes sense.

Share this post


Link to post
Share on other sites

Shane,

 

Very interesting.

 

I was thinking of doing it like this:

  • Internally the system would associate a standard path pattern with a set of pool parts.
  • For example:
    • \Sketches\2012\* ->Disk 0,1,2
    • \Sketches\2013\* -> Disk 2,3,4

And you would set up an unlimited amount of these patterns to configure any kind of folder placement strategy that you want. In addition, every rule would have a maximum fill limit (e.g. 90%), so that if you had to copy more files into a given folder than the set of disks (that store files for that folder) can contain, the rule would be violated and your "overflowing" files would be placed onto other disks. This is very similar to how our existing balancing system works.

 

Hmm... I'll think about if your suggestion can be implemented with my patterns scheme.

Share this post


Link to post
Share on other sites

Thanks Alex. For another example, I know some people (including myself) like to store a backup of their DVD collections on their server for easy access; if they were keeping their library under p:\dvdlibrary\ with each DVD having its own folder, and wanted to ensure that all the VOB files for a given title were kept on the same disk (e.g. to avoid having a movie interrupted if the server had to wake the next drive out of sleep/standby):

 

with my suggestion they could do so with a single rule, "keep together all files in each folder of p:\dvdlibrary\*" and the pool would internally handle which disk each set of files were kept on

 

whereas with only basic path-to-disks matching, as soon as their library exceeded the size of a single disk in the pool they would have to micro-manage which title(s) were associated with which disk(s).

 

Of course, the underlying logic of my method might end up requiring DrivePool to dynamically generate some sort of path-to-disk ruleset to handle the internal housekeeping, but as a user I would prefer that not to be my problem (*grin*).

Share this post


Link to post
Share on other sites

Just saw this thread, should have read it sooner!    This sounds great and I posted another thread about creating a performance index using a new performance test that might be added to Scanner in order to determine real world drive performance.   That could be used to determine which drives to place which data.

 

My suggestion was more of an automatic thing (ie flag a folder as needing high/medium/low performance disc) using the same list we currently use for duplication... that said I can also see, after reading the thread, the benefit of manually associating folders with drives.   I like the Disney example.

 

One thing I think might need addressed is making sure that users are able to properly identify discs more specifically.   Ie not just showing drive letter/volume name but also the drive model, if possible.

 

I can't figure out why no applications really let you see the physical drives detailed information associated with the Windows drive letter/controller channel, etc.   Drives me batty sometimes trying to identify a disk :)

 

Anyway this is a great thread and I'm looking forward to some of the proposed enhancements.   I think Drive Pool is fantastic and hope to see it grow and really get noticed by the market.  These kinds of things will do that I think.

Share this post


Link to post
Share on other sites

Was just wondering if this is this being worked on now or has Alex still got work going on with reparse points ? (I have googled reparse points and am still unsure what they do). But Kudos to Alex all the same for his tireless work.

I just bought 3 X 4TB drives and am on a mass data migration mission  :) to get rid of a couple of iffy (according to my scanner trial) 2TB drives. So my 9 pooled drives will soon be 8... But I really don't like having tracks off a single album scattered over 8 or 9 drives...It just doesn't make sense. 

So if it's still some way off then I will probably create a second pool with 2 of the drives and seeing as my music is one of the few folders I duplicate, it should in theory create 2 complete copies all neat and tidy. :lol:

That said, I'm still looking forward to being able to control folder placement from within drivepool itself.  :P

Share this post


Link to post
Share on other sites

http://blog.covecube.com/2013/11/stablebit-drivepool-2-1-0-432-beta-reparse-points/

 

Basically, we, we've added it, but we want to make sure that it's good and stable before rolling it out

Also, we've been working a lot and improving the backend of the site. Namely, the support system and related stuff, to make it easier for us to get to all of the tickets and deal with any issues that do arise.

Share this post


Link to post
Share on other sites

 

Basically, we, we've added it, but we want to make sure that it's good and stable before rolling it out

 

Just to clarify.. Are you talking about reparse points here, or the ability to control folder locations within the pool? 

Share this post


Link to post
Share on other sites

Reparse points. 

 

But the folder location thing is still on Alex's mind, and on his to-do list (for various reasons).

 

And I think I said it above, but the reparse points thing ended up being a LOT more tricky that Alex anticipated. This, and a couple of other things are the top of Alex's todo list (such as adding TrueCrypt support).  So, he definitely hasn't forgotten.

Share this post


Link to post
Share on other sites

The only concern I would have is how DP would deal with removal of drives. Say you have a pool of 4 drives, 2x duplication and videos go to drive 3 and 4. Removing drive 3 would then result in, uhm, what?

 

I think this is why my experience with DP 2.x on removing/adding is somewhat dissapointing. I have a 2x2TB pool, 2x duplication. Had to remove 1. I'm sure I did not use Best Practice but it is unclear to me what the best way is to deal with this. So I am sort of an idiot with weird self-induced problems but I can see something similar becoming an issue with folder placement restrictions / directives.

 

It does appear, to me, somewhat contradictory to the fundamental notion of a Pool (or _my_ notion of a Pool) and I would not use it and would fear for user-induced data loss. Setting up seperate Pools seems a way more consisten way to go.

 

Anyway, do what you want. I got DP&S and I'm very happy with what I got!

Share this post


Link to post
Share on other sites

I would assume that the Balancer would then use ANY available drive, as according the other balancers. As per normal.

 

As for... well, self induced problems, you're definitely not alone. :P

But as for the pool, If all the files were duplciated, then you could just remove a disk and then "remove" the missing disk from the pool. It would be unable to duplicate, but it would be "okay". Once you've attached another disk, it would want to duplicate to that "new" disk.

 

As for best practice, "Remove" link is the best method. That, or if you have the disks and space, and uptime is critical, then use "File Placement Limiter" to clear off the disk.

 

(it may be that I'm tired, but) I'm not sure what you mean by this:

"It does appear, to me, somewhat contradictory to the fundamental notion of a Pool (or _my_ notion of a Pool) and I would not use it and would fear for user-induced data loss. Setting up separate Pools seems a way more consisten way to go."

 

 

And as for "do what you want", we are very driven by user feedback. It's important to us. Even if we never use the feature, there are those that may.  Your opinions do matter to us!

Share this post


Link to post
Share on other sites

Well, the way I see it, a Pool is an abstraction layer over physical HDs (or actually, the poolpart folders on the related HDs) that represents or behaves like another HD. That DP does all kinds of stuff within (like duplication on different physical HDs, read-striping, allocation algorythms etc) is great but it is indifferent to the "kind of file" or "name of folder" etc. Folder Placement would, I think. break some sort of clean distinction between the physical HDs and the abstraction layer and I would fear that administration might become complicated (both for a user as for DP itself) and potentially contradictory.

 

I would really wonder what the net advantage would be over, for instance, defining various Pools. So, IF you forced a folder "Videos" to only store on HD 2, and HD 2 becomes full, what shoudl DP do? Spill over any excess to another HD? Then you've failed to meet the objective of having all "Videos" on HD2. Move other data from HD2 to other HDs to make room for "Videos"? Meanwhile meeting read-striping/performance and duplication optimisation/constraints? And what if HD2 is really full with only "Videos"? Perhaps you'd _want_  a "Disk Full" message, which a seperate Pool would actually get you....

 

It's just fishy to me. In part perhaps because my Pool is very small, I use a 2-disk, 2x duplication so I essentially have a very easy, smart and recoverable RAID-1 minus all the hassle of RAID. Folder Placement would be silly in my case. Anyway, do what you want ;) .

 

Edit: maybe someday I'll get me a real testing box. "Removing" a drive just seemed to take ages in my set-up. Perhaps I messed up because I essentially wanted to remove + de-duplicate as I was going to a 1-drive setup temporarily. Adding a drive appeared to take ages as well, perhaps as balancing and re-duplication is not a noce combination. Might be specific for a 2-disk 2x-duplication setup. Found that actually clearing the Pool (moving folders off-Pool), killing Pool, changing 1 HD, format + re-create Pool, moving back to Pool for me worked way way faster.

Share this post


Link to post
Share on other sites

It's just fishy to me. In part perhaps because my Pool is very small, I use a 2-disk, 2x duplication so I essentially have a very easy, smart and recoverable RAID-1 minus all the hassle of RAID. Folder Placement would be silly in my case. Anyway, do what you want ;) .

 

 

 

 

And that's just it. With your set up it would be pointless...but

For those of us with more drives (12 in my case) with a huge collection of music , it isn't ideal to have tracks off a single album scattered (theoretically)  over all those drives. The only option is to create a second pool to give any sort of control. 

Taking on board Drashnas comments, I'm guessing the feature is likely to be in the shape of a balancer, then as with all the other balancers they can all be toggled on or off. Nothing is forced on you. There will be nothing restrictive if you simply choose not to use it.

Sorry but it just comes across that you feel it's a lame feature purely on the basis that it will be of no use to you personally. I say again....don't use it. 

In the meantime... I am being FORCED to use a separate pool to achieve what I need Drivepool to do. It's just nice to have options. 

Share this post


Link to post
Share on other sites

Indeed, I do not know your situation, I don't know why it would be suboptimal to have tracks off a single album scattered and it's certainly not for me to say whether it'd be a good feature for you or in general or not. I just hope it won't harm stability and won't cause issues due to administrative errors.

 

Sorry for being critical, I would only hope that if my comments are not complete nonsense, they might at least contribute to a better product.

Share this post


Link to post
Share on other sites

×