Jump to content

SSD Optimizer Balancing Plugin


Alex

Recommended Posts

I've just finished coding a new balancing plugin for StableBit DrivePool, it's called the SSD Optimizer. This was actually a feature request, so here you go.

 

I know that a lot of people use the Archive Optimizer plugin with SSD drives and I would like this plugin to replace the Archive Optimizer for that kind of balancing scenario.

 

The idea behind the SSD Optimizer (as it was with the Archive Optimizer) is that if you have one or more SSDs in your pool, they can serve as "landing zones" for new files, and those files will later be automatically migrated to your slower spinning drives. Thus your SSDs would serve as a kind of super fast write buffer for the pool.

 

The new functionality of the SSD Optimizer is that now it's able to fill your Archive disks one at a time, like the Ordered File Placement plugin, but with support for SSD "Feeder" disks.

 

Check out the attached screenshot of what it looks like

 

Notes: http://dl.covecube.com/DrivePoolBalancingPlugins/SsdOptimizer/Notes.txt

Download: http://stablebit.com/DrivePool/Plugins

 

Edit: Now up on stablebit.com, link updated.

ssd_optimizer.png

Link to comment
Share on other sites

  • 1 month later...
  • 4 weeks later...

@4Frame,

 

Yes, that is absolutely correct. If you check the notes link, the 3rd bullet point states this. Specifically:

If you are using duplicated files, then you should specify multiple SDD disks or else the system will fall back 

  to an Archive disk for one or more of the file parts.
Link to comment
Share on other sites

  • 7 months later...
  • 6 months later...

I have started testing SSD plugin and there is something I don't anderstand : originaly I was thinking that SSD plugin would write data first to SSD and then copy data to HD in background. But for now I see data are staying on the SSD and are note moved to HD  :wacko:  even with "Fill SSD drives up to : 1%".

SSD Optimizer is at the top of balancers list.

Link to comment
Share on other sites

I have started testing SSD plugin and there is something I don't anderstand : originaly I was thinking that SSD plugin would write data first to SSD and then copy data to HD in background. But for now I see data are staying on the SSD and are note moved to HD  :wacko:  even with "Fill SSD drives up to : 1%".

SSD Optimizer is at the top of balancers list.

This depends on the balancing settings. 

 

Try setting the balancing to occur immediately, and disable the "but not more often than" option. Also, set the ratio slider on the main tab to "100 %" and setting the "or needs to move this much" to "1GB". 

 

This should help it be more aggressive about moving data out. 

And this should definitely work, as that's exactly what I'm doing on my system, and it's constantly moving data off of the 2x128GB SSDs I have.

Link to comment
Share on other sites

As I was testing SSD plugin, copying big movies (30GB), I think to something that I don't know if its possible or not :

now the file is beeing copied first to the SSD then when the entire file is on the SSD, it is then copied to the HD in background.

In order to speed up things, I was wondering if on big files it could be possible to devide the copy task into chunks file=A+B+C....?

This way, copy of part A from the SSD to HD would start as soon as part A is fully copied on the SSD and not only when A+B+C are on the SSD.

I don't know if I am clear enough  :D ...to sum up : something that would parallelize wrhite to SSD cache/ read and copy from SSD CACHE.

 

EDIT : forget to say that the other idea is to use SSD write/read to its max. On fast SSD I think that simultaneous W/R can be arround 500MB/s, not sure, but defenitly much faster that HD read or write.

Link to comment
Share on other sites

Are you doing this over the network? 

If so, then the max throughput is 125MB/s (not accounting for overhead).  So you're going to see 110-120MB/s tops, most likely. Adding more tasks will split the bandwidth.

If you're using 10gig networking, then ... this isn't accurate.

 

 

As for multithreaded copies, it depends.  If real time duplication is enable,d all copies of the files are written to all drives in parallel. More threads split the throughput. So, it may not actually help. However, if you're copying from multiple spinning drives to the pool at the same time, then it will definitely benefit.

Link to comment
Share on other sites

is there real time read as well with this plugin? because i do a lot of work editing raw video and some times i have to move video to my 60gb ssd and i have a problem with running out of space.

I'm not sure what you mean here.

 

There is the read striping feature which may boost read speeds for you.

 

Aside from that, there is the file placement rules, which you could use to lock certain files or folders to the SSDs to get better read speeds.

Link to comment
Share on other sites

  • 2 months later...

Christopher or Alex, did you had the time to check how single write (x1) is managed by SSD pluggin when multiples SSD are used as cache?

I can confirm that always the same SSD is used if you make a copy to a pool with xxx SSD.

I think you anderstand that this lead to wear on this SSD rather than dispatch the write, one time one SSD 1, an other time on SSD 2 and so on.

I guess a simple random choice will do the trick, but maybe it is more compicated to code, I don't know.

Link to comment
Share on other sites

  • 1 month later...

Good day.

 

I am reading-up on DriveBender and DrivePool. Whichever solution I enventually pick, I will want the Landing Zone feature, so here I am. I have a question or two regarding the SSD Optimizer.

 

First, this:

 

If you are using duplicated files, then you should specify multiple SDD disks or else the system will fall back to an Archive disk for one or more of the file parts.

 

 

I don't fully understand this. The Landing Zone, to me, should not duplicate files "on the fly" - it's only a Landing Zone. Whatever files are copied to the zone should be duplicated later when they are moved off the zone and into the archive disks. Ok, so the statement above kinda says that files going to the Landing Zone are duplicated on the fly. Question: If a duplicated file (x2) is dumped into the zone, and I have only a single SSD, then one part will go to a disk and one part to the zone. So using a Landing Zone is useless then? The I/O will be slow anyway since one of the parts is going to a disk. Do I understand this correctly? If I am understanding correctly then this severely limits the usefulness of this plugin :-(

 

Then there is this for this pluging: Ordered File Placement ->

This is just for filling up volumes in a specific order? Replacing the other plugin of that name?

 

Finally, do you have per-folder balancing yet? Where files inside a folder are kept together on a single disk (sub-folders can be placed on another disk, what counts is that files inside a folder are grouped together). If you do have it, can it be used together with the SSD Optimizer?

 

Thank you.

Best Regards,

 

Link to comment
Share on other sites

Specifically, the reason for the multiple disk requirement is the Real Time Duplication feature. This feature specifically writes out all copies of the files, to the disks in the pool in parallel.  

 

The reason this is a big feature is that when a program opens a file, it locks it. That means it can't be moved, copied or otherwise altered while the file is open. This includes data on the pool, and definitely affects the ability of our software to duplicate the data. 

And why does this matter? Well, for instance, if you have a database program, and you're storing the database on the pool, the files will ALWAYS be locked. So, StableBit DrivePool would not be able to duplicate the files. 

 

 

However, you absolutely can turn off Real Time Duplication (we don't recommend it), which means you'd only need the one disk. However, this means that the data is only duplicated at a specific time, and once per day. If the files are locked during that time, then the data doesn't get duplicated.  And that's why we don't recommend disabling Real Time Duplication. 

 

Also, the "SSD" drives don't have to be actual SSDs. They can be any sufficiently fast drives. 

 

 

 Question: If a duplicated file (x2) is dumped into the zone, and I have only a single SSD, then one part will go to a disk and one part to the zone. So using a Landing Zone is useless then? The I/O will be slow anyway since one of the parts is going to a disk. Do I understand this correctly? If I am understanding correctly then this severely limits the usefulness of this plugin :-(

 

If you have duplication (x2) enabled, are using the SSD Optimizer, have Real Time Duplication enabled, and only have one "SSD", then yes, it will fall back onto one of the "Archive" drives, and yes, you may see slower throughput because of it.

 

 

 

 

As for Ordered File Placement Balancer Plugin (it doesn't replace any other balancers, but it *is* included in the SSD Optimizer due to technical reasons),  it changes the pool's default "new file placement" strategy, which is to place new files on the disk with the most available free space, which will equalize the disk usage over time.   This balancer fills one (or two, with duplication) disks at a time. 

This balancer will also tend to keep the contents of a specific folder on the same disk (but it's not guaranteed).

 

 

Additionally, there is the "File Placement Rules" which can be used to "lock" specific files or folders to specific disks. Though, this may require more micromanaging of the data than you'd prefer.

You can see details about this feature here:

http://blog.covecube.com/2014/04/stablebit-drivepool-2-1-0-503-beta-per-folder-balancing/

 

And yes, you can use this in conjunction with the SSD Optimizer. However, if you do, you need to disable the "Unless the disk is being emptied" option in the main Balancing settings to get it to keep the files on the SSD drive. 

Link to comment
Share on other sites

Thanks for the quick response.

 

> Specifically, the reason for the multiple disk requirement is the Real Time Duplication feature.

> This feature specifically writes out all copies of the files, to the disks in the pool in parallel.

 

Ya, I figured as much while writing my message and thinking about it. But needing 2 SSDs in order to be able to use the Landing Zone is asking a lot; they're not cheap, and motherboard SATA connectors are at a premium too. So far, without having tried either, I prefer DrivePool over DriveBender simply because support is much better here, but DB does not need 2 SSDs for its Landing Zone...

 

> The reason this is a big feature is that when a program opens a file, it locks it.

> That means it can't be moved, copied or otherwise altered while the file is open.

> This includes data on the pool, and definitely affects the ability of our software

> to duplicate the data.

 

Yes, I was going to get there, and ask about locks and duplication at some point, in another thread. So you are saying that Real Time Duplication hooks low enough on the storage stack that you can duplicate even while the file is locked?

 

> And why does this matter? Well, for instance, if you have a database program,

> and you're storing the database on the pool, the files will ALWAYS be locked. So,

> StableBit DrivePool would not be able to duplicate the files. However, you absolutely

> can turn off Real Time Duplication (we don't recommend it), which means you'd

> only need the one disk. However, this means that the data is only duplicated at a

> specific time, and once per day. If the files are locked during that time, then the

> data doesn't get duplicated.

 

Hmmm, that might be ok with me; Thanks for mentionning it!

We can, I assume, specify how often the duplicating job runs and when?

 

> As for Ordered File Placement Balancer Plugin (it doesn't replace any other

> balancers, but it *is* included in the SSD Optimizer due to technical reasons),

> it changes the pool's default "new file placement" strategy, which is to place

> new files on the disk with the most available free space... This balancer will

> also tend to keep the contents of a specific folder on the same disk (but it's

> not guaranteed).

 

 

Yes of course, if you always fill one disk before the others, content will tend to be grouped by folder, but I was hoping that "Folder Balancing" (for lack of a better term) had been added to DP (it was mentionned in another old thread in this forum). Ideally I'd want to fill disks using the default settings (most empty disk first) but grouped by folder. To repeat the example the other guy used, I'd want all 10 tracks of a music CD on a single physical disk (which 10 files are all in one folder) just to make things simpler when accessing the disks when the pool is offline, while still filling-up disks most-empty-first...

 

Thank you.

Best Regards,

Link to comment
Share on other sites

You're very welcome. 

 

Ya, I figured as much while writing my message and thinking about it. But needing 2 SSDs in order to be able to use the Landing Zone is asking a lot; they're not cheap, and motherboard SATA connectors are at a premium too. So far, without having tried either, I prefer DrivePool over DriveBender simply because support is much better here, but DB does not need 2 SSDs for its Landing Zone...

 

Well, they have dropped in price recently, but yeah, the SATA connectors are a premium still (even if you get controller cards, they can be very expensive, very quickly).

 

And as I said, if you disable realtime duplication, that should "fix" the issue, although with the mentioned caveats. Not idea, but it works.

 

Yes, I was going to get there, and ask about locks and duplication at some point, in another thread. So you are saying that Real Time Duplication hooks low enough on the storage stack that you can duplicate even while the file is locked?

 

Yup, Specifically, with Real Time duplication, any time you write or modify data on the pool, the writes are done to all copies of the file in parallel.  

Essentially, the driver works as a proxy for the file system (this is a drastic oversimplification), and with real time duplication enabled, any file system commands (creating/modifying/etc) are sent to all destination disks at the same time. So locking will not affect the software's ability to write the data.

 

Alex (the developer) has a good post that goes over this feature (from the alpha version years ago), that may be worth taking a read:

http://blog.covecube.com/2012/02/file-duplication-in-stablebit-drivepool-beta-m4/

 

 

And if you disable real time duplication, then this occurs at 2AM normally. However, this can be changed by creating the advanced config file (a "default" version will always be present after installing or updating).

http://wiki.covecube.com/StableBit_DrivePool_2.x_Advanced_Settings

Set the "FileDuplication_DuplicateTime" to a valid time, and this is when it will run a duplication pass. 

Also, changing the duplication status should trigger a duplication pass, and the pool condition indicator (at the bottom) should allow you to re-duplicate manually. 

 

 

 

Yes of course, if you always fill one disk before the others, content will tend to be grouped by folder, but I was hoping that "Folder Balancing" (for lack of a better term) had been added to DP (it was mentionned in another old thread in this forum). Ideally I'd want to fill disks using the default settings (most empty disk first) but grouped by folder. To repeat the example the other guy used, I'd want all 10 tracks of a music CD on a single physical disk (which 10 files are all in one folder) just to make things simpler when accessing the disks when the pool is offline, while still filling-up disks most-empty-first...

 

Well, the Ordered File Placement will attempt to do so, but no, it's not guaranteed to do so. 

 

I believe there is a feature request pending for this already (a balancer that will specifically keep the contents of folders together), and once StableBit CloudDrive is released, we (Alex) should at least get to it, whether or not it's implemented. 

 

Though, as I mentioned, the "File Placement Rules" will allow you to do this, albeit completely manually. (though, you can also do neat things like adding a "\Music\*\*.mp3" to force the mp3 files onto a specific disk, while leaving other files to go anywhere... or add another rule for different extensions).  But it sounds like it's not quite what you're looking for, though.

Link to comment
Share on other sites

Good day.

 

> And as I said, if you disable realtime duplication, that should "fix" the issue,

> although with the mentioned caveats. Not ideal, but it works.

 

I noted it, thanks! I will probably not duplicate the entire pool anyway, only some folders, so it might be good enough for me to use a single SSD and allow real time duplication, especially if I add a parity disk with SnapRAID or something like that. How often does the SSD Optimizer empty the "Landing Zone" SSDs? Only when the "Fill Threshold" is reached?

 

> Alex (the developer) has a good post that goes over this feature, that

> may be worth taking a read: http://blog.covecube...vepool-beta-m4/

 

Good read! I learned that unlike DB, DP doesn't duplicate the entire folder structure on all disks :-) And that it will hash files when there is a mismatched time. And that it doesn't consider 1 part the "original" and 1 part the "copy" but that all parts are equal. And that +++ it handles other programs modifyng files that it is currently duplicating with no problems. This is all very good.

 

> I believe there is a feature request pending for this already (a balancer that will specifically

> keep the contents of folders together), and once StableBit CloudDrive is released, we (Alex)

> should at least get to it, whether or not it's implemented. 

 

I went to the DEV Wiki and started clicking on tickets, but they are all "private" or something, there is no description on most...

 

> Though, as I mentioned, the "File Placement Rules" will allow you to do this,

> albeit completely manually. (though, you can also do neat things like adding

> a "\Music\*\*.mp3" to force the mp3 files onto a specific disk, while leaving other

> files to go anywhere... or add another rule for different extensions).  But it

> sounds like it's not quite what you're looking for, though.

 

Not quite. The file placement rules are too much for me for now (but the fact that they exist places this product way up in the "Wow so cool" category). I'm not really interested in placing some files on a specific disk (but I might be one day who knows), just in simplifying management when the pool is down. There is no rush; placing files all over the place has some performance advantages anyway.

 

Thank you for the excellent feedback you provide.

Best Regards,

Link to comment
Share on other sites

I noted it, thanks! I will probably not duplicate the entire pool anyway, only some folders, so it might be good enough for me to use a single SSD and allow real time duplication, especially if I add a parity disk with SnapRAID or something like that. How often does the SSD Optimizer empty the "Landing Zone" SSDs? Only when the "Fill Threshold" is reached?

 

As for when balancing happens, that depends entirely on the balancing settings. 

 

Specifically, there are three options for frequency: 

  • Do not balance automatically
    • (which requires manual balancing, in most cases)
  • Balance Immediately
    • As soon as the pool condition ratio gets below the specified thresholds, it will start balancing.
    • This can be limited to a specific frequency, in hours, so it's not always moving data around. 
  • Balance every day at ....
    • Will only run once a day, at 1AM by default. 

 

As for the automatic balancing, there is a slider for the ratio. If you want to aggressively move data out of the drive, move this to "100%", otherwise, it won't aggressively move data out, until the drive gets fuller. 

There is also an option to specify "or if at least this much data needs to be moved" option, which may be preferable for the SSD Optimizer.

 

 

This and more is covered in the manual:

http://stablebit.com/Support/DrivePool/2.X/Manual?Section=Balancing%20Settings

With images, so it may be worth taking a look. 

 

 

 

 

Good read! I learned that unlike DB, DP doesn't duplicate the entire folder structure on all disks :-) And that it will hash files when there is a mismatched time. And that it doesn't consider 1 part the "original" and 1 part the "copy" but that all parts are equal. And that +++ it handles other programs modifyng files that it is currently duplicating with no problems. This is all very good.

 

No, it doesn't. But it may end up with a large majority of it on the drives, over time. But this really depends on how things get balanced. 

 

And yes, specifically, when a file is accessed or a duplication pass occurs (can be triggered by remeasuring the pool, taking a disk offline and reconnecting, or changing the duplication settings), it checks to see if the timestamp matches. If it doesn't, it checks the hash of both files and compares them. If these do match, it updates the timestamp, otherwise it flags the file in the UI as a "mismatched" file and prompts you for resolution. (this is a section that we definitely need to improve, as it's pretty basic right now). 

 

And yeah, there is no distinction between original and copy. Both files are "equal", as there really isn't a reason to differentiate here, and it helps with the Read Striping feature. 

 

 

 

I went to the DEV Wiki and started clicking on tickets, but they are all "private" or something, there is no description on most...

 

This depends on the specific issues. Some don't have public posts on them. We do try to include public information on most of them, even if it's brief.  And some of the issues have a lot of internal info that we don't wish public at this moment (meaning they may be related to potential features and/or products that are only in the planning stage).

 

However, if there are a lot with no description, I'll see about going through and updating some of these. 

 

 

 

Not quite. The file placement rules are too much for me for now (but the fact that they exist places this product way up in the "Wow so cool" category). I'm not really interested in placing some files on a specific disk (but I might be one day who knows), just in simplifying management when the pool is down. There is no rush; placing files all over the place has some performance advantages anyway.

 

Well, we hope that the pool is never down!

 

And as I said, the Ordered File Placement balancer does attempt to do this, but that's not it's main goal.

I've mentioned this to Alex (as I don't remember and can't find a specific request for this), as specifically a balancer that keeps a specific folder on the same disk. 

 

If you have programming experience, we do support 3rd party balancing plugins, and source  to create your own:

http://wiki.covecube.com/StableBit_DrivePool_-_Develop_Balancing_Plugins

I post this mostly for reference, and if you wanted to take a look at. There is no obligation or anything at all here. 

Link to comment
Share on other sites

Good day.

 

> Well, we hope that the pool is never down!

 

Hahaha, well I multi-boot and also boot in WinPE regularly, the pool *will* be down sometimes ;-)

 

> And as I said, the Ordered File Placement balancer does attempt to do this,

> but that's not it's main goal. I've mentioned this to Alex (as I don't remember

> and can't find a specific request for this), as specifically a balancer that keeps

> a specific folder on the same disk.

 

Careful, not "a specific folder" but *any* folder, e.g. all files (but not sub-folders) inside any folder are kept on the same disk. The best example of the usefulness of this came from that other guy who talked about this in another thread: 1 Folder per Music Album, perhaps with a sub-folder for the lyrics, and all the tracks inside the folder on the same disk, don't care which, and all the lyrics on the same disk, don't care which.

 

Thank you.

Best Regards,

Link to comment
Share on other sites

  • 4 months later...

To build upon @B00ze's question, how would one use 2 SSD's and 1 HDD to maintain the best possible write speeds with 3x file duplication enabled?

 

Specifically, I have dual 500GB Crucial MX200 SSD's and a single 500GB Hitachi Deskstar P7K500 HDD in my system, configured as a single pool.

I've added the 500GB HDD (hooked up over eSata) to keep a 3rd external copy of my data that I can grab and go in case of fire or other emergency.

 

Before adding the HDD to the pool, I consistently got around 500MB/s write and 1100MB/s read speeds (with read striping enabled) on the pool.

Even with configuring the SSD optimizer, I cannot obtain fast write speeds without disabling 3x duplication on the pool, or disabling real-time duplication.

I understand this is by design, or I've possibly configured the pool settings incorrectly, however...

 

Is there a way to tell DrivePool to write all the initial data ONLY to the SSD's, and then write the 3rd copy of the data to the slower HDD later?

Or should I just remove the HDD from the pool, and use some other backup software to create weekly backups from the pool to the HDD separately?

Link to comment
Share on other sites

×
×
  • Create New...