Folder Placement Rules - Who would've thought!

B00ze · February 17, 2018

Good day.

So last night I disbanded the RAID and created my new shiny pool. I thought about using Ordered File Placement, since "By Folder Placement" is not implemented (Drive Bender wanted to implement this like 3 years ago, but it did not come.) But filling-up disk by disk is not quite what I want, it's close but not quite. Why? Because I will be using one or two of the disks "outside" of the poolpart, so I do not necessarily want to fill them up - letting DrivePool use always the least-full disk ensures each disk will always be as free as possible, and this makes sense if you're going to use the disks for stuff outside of the pool folder. Enter file-placement rules. I always thought I would never need them. But it turns out I will be using them after all - I will tell DrivePool to use 2 of the 3 disks for all my data files that are normal size and duplicated (e.g. documents and the like) and it will be placing the rest (e.g. movies) where it wants. I think this will be really good - all the important data will be together, mirrored on two disks, so it will make everything simple for recovery (e.g. just like with the Ordered File Placement) but I will also have the advantage of always having the maximum amount of free space on each disk. Amazing how flexible Drivepool is.

I still wish there was a per-folder balancer, but when I think about writing one, it quickly becomes complicated - you need a "chunk size" because if a folder is 6TB in size you can't just treat it as a single entity, you need to be able to split big folders into chucks so you have some flexibility in placement and duplication. Anyway, maybe one day Alex can think about it.

Regards,

DragonQ · February 17, 2018

I generally use file placement rules to keep folders on the lowest number of drives possible. Why? Because they're sleeping most of the time and waiting for 5-6 drives to wake is far slower than waiting for 1 or 2.

B00ze · February 23, 2018

Finally migrated my data to the pool.

Folder placement rules worked flawlessly, and all in real time, no balancing/duplicating pass needed afterwards. It also writes at full speed (~150MB/s) no matter how many duplicates it creates, and re-reads duplicated files at ~225MB/s, not quite doubling the writing speed (copying from the pool to a SSD). I was also able to change drive letters around, the pool did not have any issues following. In fact, it has no issues at all moving from Win 7 where the drives are I/J/K to Win 10 where the drives are some other letters. This tool is lots of fun, and the UI is A-1, if a bit funky (e.g. it uses its own controls for minimizing/closing the window which are a bit weird).

Best Regards,

Christopher (Drashna) · February 23, 2018

glad to hear it!

And yeah, StableBit DrivePool uses kernel/UNC paths for the drives. So it doesn't care about the mount points. It's nice, isn't it?
That said, we do recommend mounting to folder paths, so that the drives are easy to recognize and can be accessed easily (or more specifically, so you can easily run CHKDSK, as it does allow for folder mount paths)

12 hours ago, B00ze said:

re-reads duplicated files at ~225MB/s, not quite doubling the writing speed (copying from the pool to a SSD)

Fantastic!
And yeah, it's not going to get RAID like speeds, because the IO isn't completely split between drives. But glad to hear that it is a nice boost.

B00ze · February 24, 2018

Hey Christopher.

Yeah, the "studiness" of DrivePool is really really nice, it has agreed to everything I have thrown at it so far, including Multi-Boot!

As for performance of the read stripping, I have less good news: Today I tested copying from one of those disks but outside of the pool, to the SSD, and got ~200MB/s, so read stripping has only increased performance 12%. This is a far cry from the Intel RAID which simply doubled performance (Crystal Disk reported ~400MB/s sequential reads, but the RAID required Intel's own Write-Back cache or writes would fall down to 80MB/s). Of course you cannot match RAID, but 12% is a bit low, I was hoping for something like 50% faster. What do YOU get in term of performance improvement for 2x duplicated files on your server?

PS: I see now how you manage reparse points. There is 1 file per point in the covefs folder, which contains type and souce and target, and an ID tagged to the reparse point as an alternate data stream. The reparse point is just an empty folder (I haven't yet tried SymLinks on files but presumably they will be empty files). Might slow down if someone has thousands of reparse points but I won't reach that many. What's nice is that Robocopy is able to copy SymLinks as Symlinks TO AND FROM the pool! Woohoo! AND if I FSUTIL the reparse point to delete it, DrivePool does its job and the reparse point becomes a normal folder, exactly the way it works on a normal disk. This is awesome!

Best Regards,

B00ze · February 24, 2018

On 2/17/2018 at 10:04 AM, DragonQ said:

I generally use file placement rules to keep folders on the lowest number of drives possible. Why? Because they're sleeping most of the time and waiting for 5-6 drives to wake is far slower than waiting for 1 or 2.

Hey DragonQ: That won't work for me. I cannot get DrivePool to accept Junction file placement rules, ie. a folder name with no pattern, no files; it accepts the rule but doesn't move the Junction to the disk I specify. My main entry point into the pool has several folders and several Junctions, and the later are all created on the most empty disk which is the 3rd disk (the other 2 disks I do NOT expect to ever sleep because they have duplicated data AND most of my data, so they will always wake-up). So whenever I open the pool, it has to access the 3rd disk to resolve those Junctions. It's ok, I don't mind too much. DrivePool does everything I want with only 4 placement rules; it took me 30 minutes to think about it, find the solution, and 3 minutes to set it up.

Christopher (Drashna) · February 24, 2018

14 hours ago, B00ze said:

Yeah, the "studiness" of DrivePool is really really nice, it has agreed to everything I have thrown at it so far, including Multi-Boot!

Glad to hear it!

14 hours ago, B00ze said:

As for performance of the read stripping, I have less good news: Today I tested copying from one of those disks but outside of the pool, to the SSD, and got ~200MB/s, so read stripping has only increased performance 12%. This is a far cry from the Intel RAID which simply doubled performance (Crystal Disk reported ~400MB/s sequential reads, but the RAID required Intel's own Write-Back cache or writes would fall down to 80MB/s). Of course you cannot match RAID, but 12% is a bit low, I was hoping for something like 50% faster. What do YOU get in term of performance improvement for 2x duplicated files on your server?

Honestly, I couldn't give you a good answer here. For two reasons.

Because of how Read Striping works. It doesn't always read from both disks. Sometimes, it will read from a single disk, if one of the disks is "busier". So it may not be able to fully stripe here.
And even when it is, there are a number of factors that can affect this, so it can be very system and load dependant.
I only access stuff over the network, which is 1gb networking for me. So the max is 125MB/s, not counting overhead. Most of my drives can hit 180-200MB/s reads, especially since I'm using 64kb clusters (and I'm on ReFS).

I can do some testing, but again, this is system and load dependant. So "results will always vary".

And you can read a bit more about the read striping here:
http://stablebit.com/Support/DrivePool/2.X/Manual?Section=Performance Options

14 hours ago, B00ze said:

PS: I see now how you manage reparse points. There is 1 file per point in the covefs folder, which contains type and souce and target, and an ID tagged to the reparse point as an alternate data stream. The reparse point is just an empty folder (I haven't yet tried SymLinks on files but presumably they will be empty files). Might slow down if someone has thousands of reparse points but I won't reach that many. What's nice is that Robocopy is able to copy SymLinks as Symlinks TO AND FROM the pool! Woohoo! AND if I FSUTIL the reparse point to delete it, DrivePool does its job and the reparse point becomes a normal folder, exactly the way it works on a normal disk. This is awesome!

haha, yeah. That's how/where they're stored. But getting them working is a heck of a lot mor difficult.

Also, you shouldn't use SYMLINKs. These are meant to be resolved on the client side. So if a SYMLINK points to "C:\Windows\System32", the client resolves that, NOT the server.
You may see how this could cause IMMEDIATE issues. Junctions are what you want to use 99% of the time.

As for the slowdown, I think the limit was much higher, because of how they're handled, driver side. But I'd have to ask Alex (the Dev) about that.

Sadly, no hard links though. So no plex db on the pool.

B00ze · February 26, 2018

Good day.

On 2/24/2018 at 3:04 PM, Christopher (Drashna) said:

Honestly, I couldn't give you a good answer here. For two reasons.

Because of how Read Striping works. It doesn't always read from both disks. Sometimes, it will read from a single disk, if one of the disks is "busier". So it may not be able to fully stripe here.
And even when it is, there are a number of factors that can affect this, so it can be very system and load dependant.

I only access stuff over the network, which is 1gb networking for me. So the max is 125MB/s, not counting overhead. Most of my drives can hit 180-200MB/s reads, especially since I'm using 64kb clusters (and I'm on ReFS).

I can do some testing, but again, this is system and load dependant. So "results will always vary".

I'll start a new thread, see if some of the forum users can post some performance results - you obviously can't because your media sits on the network. FYI I tested by copying two 4GB files using Explorer to a fast SSD. I have a little tool to invalidate caches in Windows, so I can repeat the test without rebooting. When I tested, that copy was the only thing running.

On 2/24/2018 at 3:04 PM, Christopher (Drashna) said:

haha, yeah. That's how/where they're stored. But getting them working is a heck of a lot mor difficult.

Also, you shouldn't use SYMLINKs. These are meant to be resolved on the client side. So if a SYMLINK points to "C:\Windows\System32", the client resolves that, NOT the server.
You may see how this could cause IMMEDIATE issues. Junctions are what you want to use 99% of the time.

As for the slowdown, I think the limit was much higher, because of how they're handled, driver side. But I'd have to ask Alex (the Dev) about that.

Sadly, no hard links though. So no plex db on the pool.

You can't always use Junctions. For example, if you intend to Robocopy the stuff, it will have to be a SymLink, Robocopy doesn't copy Junctions as Junctions, it just makes a 2nd copy of the data. Also, you cannot Junction files; a HardLink is good there, but the pool doesn't support them. For the problem of evaluation of remote SymLinks, you have to enable R2L and R2R -> fsutil behavior set SymlinkEvaluation R2L:1 R2R:1

According to what I found on the web, it works this way:

            L stands for Local and R for "Remote" (who would've thunk?)
           The FIRST L or R refers to the location of the link itself (as opposed to its target) relative to the machine ACCESSING the link.
           The SECOND L or R refers to the location of the link's target relative to the machine where the LINK itself is located.

So R2L should be what needs to be enabled to follow SymLinks on remote volumes, but it does not work for me, I need to enable R2R and THEN I can follow SymLinks on remote systems that point to local locations on that system. Once I enable R2R SymLinks work the way they should, even over the network.

Best Regards,

B00ze · March 1, 2018

Thought I'd post this here rather than a new topic, since it has to do with placement rules.

I have now learned how DrivePool handles file some file moves, and it's pretty nice. I have 3 drives. Most of my data goes into drives 1 and 2, duplicated; there are file-placement rules to make sure this is what happens. My "Downloads" folder however has no file placement rule and no duplication. Since free space is higher on drive 3, when I download it goes into drive 3. Once it's downloaded, I usually move it to one of the folders that are controlled by file placement rules - that means the file should go onto drives 1 and 2. But DrivePool doesn't turn the file move into a copy. What it does instead is it creates the directory structure necessary so the file can be moved to the correct folder ON DRIVE 3! This ensures fast move operations. Only LATER during balancing will it remove the file from drive 3 and duplicate it on drives 1 and 2. I think this is clever.

Best Regards,

Christopher (Drashna) · March 1, 2018

Very nice.

Sign In

Folder Placement Rules - Who would've thought!

Question

B00ze

9 answers to this question

Recommended Posts

DragonQ

B00ze

Christopher (Drashna)

B00ze

B00ze

Christopher (Drashna)

B00ze

B00ze

Christopher (Drashna)

Join the conversation

Browse

Activity