Jump to content
  • 1

Plugin Source


methejuggler

Question

14 answers to this question

Recommended Posts

  • 0
On 11/17/2020 at 6:29 PM, methejuggler said:

I'm interested in extending the behavior of the current balancing plugins, but don't want to re-write them from scratch. Is there any chance the current balancing plugins could be made open source to allow the community to contribute and extend them?

I have not had much luck customizing the balancing plugins. I thought I understood using Rules and such, but things never worked the way I expected. I think improving the balancing plugins would be great. If you do get source code and are able to extend the balancing plugins, please write instructions, examples, etc... so us normal people can get expected output. I don't mean to denigrate the current balancers, which work fine on my DrivePool, but if I want to customize the balancing process I am completely lost. 

Just curious, what features would you want to add/improve to the existing balancers?

Link to comment
Share on other sites

  • 0

I actually wrote a balancing plugin yesterday which is working pretty well now. It took a bit to figure out how to make it do what I want. There's almost no documentation for it, and it doesn't seem very intuitive in many places.

So far, I've been "combining" several of the official plugins together to make them actually work together properly. I found the official plugins like to fight each other sometimes. This means I can have SSD drop drives working with equalization and disk usage limitations with no thrashing. Currently this is working, although I ended up re-writing most of the original plugins from scratch anyway simply because they wouldn't combine very well as originally coded. Plus, the disk space equalizer plugin had some bugs in a way which made it easier to rewrite than fix.

I wasn't able to combine the scanner plugin - it seems to be obfuscated and baked into the main source, which made it difficult to see what it was doing.

Unfortunately, the main thing I wanted to do doesn't seem possible as far as I can tell. I had wanted to have it able to move files based on their creation/modified dates, so that I could keep new/frequently edited files on faster drives and move files that aren't edited often to slower drives. I'm hoping maybe they can make this possible in the future.

Another idea I had hoped to do was to create drive "groups" and have it avoid putting duplicate content on the same group. The idea behind that was that drives purchased at the same time are more likely to fail around the same time, so if I avoid putting both duplicated files on those drives, there's less likelihood of losing files in the case of multiple drive failure from the same group of drives. This also doesn't seem possible right now.

Link to comment
Share on other sites

  • 0
On 11/26/2020 at 10:55 AM, methejuggler said:

I actually wrote a balancing plugin yesterday which is working pretty well now. It took a bit to figure out how to make it do what I want. There's almost no documentation for it, and it doesn't seem very intuitive in many places.

Years ago, I used to code. It seems to me that writing a good program is one thing, and writing good documentation that is helpful and understandable is another skill altogether. If you do write a new balancer plug-in, I hope your documentation is written for us "normal" people who may want to use the program but may not have the insight into how it all works.

On 11/26/2020 at 10:55 AM, methejuggler said:

I found the official plugins like to fight each other sometimes.

Yeah, I think that's what I referred to as unexpected results. I thought I understood how to use the "rules" of the balancer, but my results came out different.

On 11/26/2020 at 10:55 AM, methejuggler said:

Unfortunately, the main thing I wanted to do doesn't seem possible as far as I can tell. I had wanted to have it able to move files based on their creation/modified dates, so that I could keep new/frequently edited files on faster drives and move files that aren't edited often to slower drives. I'm hoping maybe they can make this possible in the future.

That is an interesting idea. I am thinking of SSD cache programs that are able to track which files are accessed most often and it moves them to the SSD cache over time. In you case, you would want them moved to your fastest drives.

I know in DrivePool you can manually move files to the hidden PoolPart directory of any drive, and then DrivePool will automatically update the system. I can see that in some scenarios that moving certain files to faster drives would be a great benefit. In my case, I am using DrivePool as my media center storage and even my slowest drives are fast enough for streaming my files.

For years I was using Windows Storage Spaces and found the packet writing software can be very fast, depending on how you set up the Storage Space (single, parity, duplicated, etc...). However, after my 3rd catastrophic failure of Storage Spaces and loss of TB's of data, I decided to try DrivePool. I found DrivePool was not as fast as Storage Spaces, but it was much more reliable and if a problem did occur with a pool drive in DrivePool, data recovery is possible. I was fortunate enough to have a pool drive fail while I was still in trial mode of DrivePool, and I was still able to salvage almost all the data off that drive and, of course, the other drives were all still just fine. In Windows Storage Spaces, I had one older small HDD fail in a pool of 26 HDDs, and it took down the entire Storage Spaces despite being setup for 1 disk failure. Storage Spaces failed to rebuild itself and with packets written all over the drives, there was no way I could recover my data.

Getting more to your point, I added a SSD to my DrivePool and now my DrivePool is even faster (and still more reliable) than my Windows Storage Spaces ever was. There may be some way to manually designate your frequently accessed files to a SSD, or other faster drive, using the rules in the existing balancer(s). That would be great to use for programs that require TEMP directories for editing. In my case, I told the SSD Optimizer not to re-balance any data on my 248 GB SSD until there was at least 100 GB to move. Essentially, I made a 100 GB cache drive on the SSD for myself. All my new files go directly to my SSD and stay there until the 100 GB threshold is reached.  

On 11/26/2020 at 10:55 AM, methejuggler said:

Another idea I had hoped to do was to create drive "groups" and have it avoid putting duplicate content on the same group. The idea behind that was that drives purchased at the same time are more likely to fail around the same time, so if I avoid putting both duplicated files on those drives, there's less likelihood of losing files in the case of multiple drive failure from the same group of drives. This also doesn't seem possible right now.

I understand your thinking behind that, but I personally don't know if that is needed in DrivePool. DrivePool allows you to designate duplication as 2X, 3X, 4X, etc... for important files so you would have as many backups on your pool on as many drives you want. The chances of 1 drive failure is always there, but having 2 or more drives failing at the same time must be really low. In my limited experience with DrivePool, it handled duplication just fine after a HDD failure. So, I was very happy. I really don't think that drives purchased at the same time, from the same lot, would be likely to all fail within hours of each other. I expect DrivePool would be able to rebuild itself with duplication before the next drive would possibly fail.

Having said that, duplication on any pool of drives is not a backup program. You could always have a catastrophic fire, lightning strike, etc... and lose your entire pool. So, you still need to have a proper backup program with important files stored elsewhere. In my case, my original media files are stored on HDDs in the closet. If I have a catastrophic fire in the house, then I have more things to worry about then lost movie files. My financial files, which are very small compared to the movies files, are backed up on a 5GB cloud service.

Good luck on your project(s) and maybe you will be writing the next best balancer for DrivePool.

 

Link to comment
Share on other sites

  • 0
On 11/27/2020 at 12:13 PM, gtaus said:

There may be some way to manually designate your frequently accessed files to a SSD, or other faster drive, using the rules in the existing balancer(s). That would be great to use for programs that require TEMP directories for editing. In my case, I told the SSD Optimizer not to re-balance any data on my 248 GB SSD until there was at least 100 GB to move. Essentially, I made a 100 GB cache drive on the SSD for myself. All my new files go directly to my SSD and stay there until the 100 GB threshold is reached.  

To my knowledge, there isn't a way to automatically designate frequently accessed files to SSD. You can manually set individual folders to specific drives if you know which folders will be frequently accessed, but this isn't always the case depending on file structure, and it's quite possible that you'll have some large files in those folders which aren't ever edited which would end up taking up space on the SSD then.

I could never get the SSD optimizer plugin to work for some reason, but the "Archive plugin" worked for me, and appears to be essentially the same thing, just without ordered placement options, so I used that instead. This works great for initially copying files to the pool, but once those have been moved off the SSDs onto the archive disks, if you go back and edit them they edit in place on the archive drives.

On 11/27/2020 at 12:13 PM, gtaus said:

I understand your thinking behind that, but I personally don't know if that is needed in DrivePool. DrivePool allows you to designate duplication as 2X, 3X, 4X, etc... for important files so you would have as many backups on your pool on as many drives you want. The chances of 1 drive failure is always there, but having 2 or more drives failing at the same time must be really low. In my limited experience with DrivePool, it handled duplication just fine after a HDD failure. So, I was very happy. I really don't think that drives purchased at the same time, from the same lot, would be likely to all fail within hours of each other. I expect DrivePool would be able to rebuild itself with duplication before the next drive would possibly fail.

I don't think you're quite getting what I was saying about that section, so here's an example.

I have 3 new 14tb drives all purchased at the same time, and several other drives purchased at different times (some in sets, some not, it's irrelevant to the example).

If I set 2x or 3x duplication on a directory and copy a bunch of new files onto the pool with those new hard drives, there's a VERY high chance that the original PLUS all of the duplicates will be placed on those 3 new 14 tb drives (since they're empty, and unless you're using ordered placement all other balancers will end up filling up the least filled drives first).

Since all 3 were purchased at the same time, the chances of more than one failing at the same time is higher than the chances of a different set of drives failing at the same time as these. This means that there's a *much* greater chance of permanently losing some of that data that was copied to the pool in the case of multiple drive failure.

My proposal was to set those 3 drives as a "group" and tell it to try not to place multiple duplicates on the other disks in the group (spread the duplicates between the different drive groups) to reduce the chance of this.

Link to comment
Share on other sites

  • 0
On 11/27/2020 at 12:13 PM, gtaus said:

Good luck on your project(s) and maybe you will be writing the next best balancer for DrivePool.

Currently I think my balancer works better than the official plugins. It offers all the features of all of them, but without any fighting. I also solved all the bugs I'm aware of in the disk space equalizer plugin (there were some corner cases where it balanced incorrectly).

I'm working on some issues with the ordered placement portion and then I'll release it.

Link to comment
Share on other sites

  • 0

Settings from my plugin. Notice the 1.5 tb is set to not contain duplicated or unduplicated:

image.thumb.png.8066c72c502142584226dde1053290be.png

 

Balance with those settings. Again note the 1.5tb is scheduled to move everything off the disk due to the settings above.

image.thumb.png.6ca8ebff4c367d60a171cd349b2a8b0a.png

 

It properly handles mixed content (duplicated, unduplicated and "unpooled other"), and equalizes accounting for all 3.

Link to comment
Share on other sites

  • 0
20 hours ago, methejuggler said:

I don't think you're quite getting what I was saying about that section, so here's an example.

I have 3 new 14tb drives all purchased at the same time, and several other drives purchased at different times (some in sets, some not, it's irrelevant to the example).

If I set 2x or 3x duplication on a directory and copy a bunch of new files onto the pool with those new hard drives, there's a VERY high chance that the original PLUS all of the duplicates will be placed on those 3 new 14 tb drives (since they're empty, and unless you're using ordered placement all other balancers will end up filling up the least filled drives first).

Since all 3 were purchased at the same time, the chances of more than one failing at the same time is higher than the chances of a different set of drives failing at the same time as these. This means that there's a *much* greater chance of permanently losing some of that data that was copied to the pool in the case of multiple drive failure.

My proposal was to set those 3 drives as a "group" and tell it to try not to place multiple duplicates on the other disks in the group (spread the duplicates between the different drive groups) to reduce the chance of this.

No, I understood what you wanted to do. If you can figure out how to group drives so duplicates are written to other groups, that's great. The point I was trying to make is that I would expect the chances of more than 1 drive (in a group) to fail within hours of each other is very small. If you had a drive fail in a group, DrivePool would duplicate designated files to other drives. I usually replace dead drives as soon as possible, so old duplicates would essentially be sent to the newest drive when available. But even if duplicates were written to the drives in the same group, I still can't see much probability of multiple drives all dying in that group before the duplicate files were safely written elsewhere.

At one time I had about 30 drives in my pool. I would expect to lose one drive due to end of life failure maybe every six months. I cannot remember losing two or more drives in the same week, let alone within hours of each other. But, I was buying different brand drives on sale and adding them to the pool as needed. I currently have 16 drives in my DrivePool, some as old as 8 years old and some purchased within the last 4 months. I have never purchased the same brand drives from a group, so I really cannot say that they might not all die within hours of each other, but I still have never seen that happen even when I was running RAID systems with matched drives.

Again, I just want to say that duplication on a pool of drives does not replace a legitimate backup program where important files are safely stored offline, and preferably at another location in case of fire.

11 hours ago, methejuggler said:

Currently I think my balancer works better than the official plugins. It offers all the features of all of them, but without any fighting. I also solved all the bugs I'm aware of in the disk space equalizer plugin (there were some corner cases where it balanced incorrectly).

I'm working on some issues with the ordered placement portion and then I'll release it.

That sounds great. I have already become a big fan of DrivePool and improvements to the system will always be welcomed.

Link to comment
Share on other sites

  • 0
20 hours ago, methejuggler said:

To my knowledge, there isn't a way to automatically designate frequently accessed files to SSD.

I was thinking of those hybrid drives that have a small SSD on top of a conventional platter drive. Some of them have software that monitors frequently accessed files and learns how you use your system. Over time, the software moves those files to the SSD for faster access. I have seen this on YouTube videos where a gamer will load a game, once, twice and then a third time. You can see the load times decrease with each load until the program is essentially all on the SSD and reaches its maximum speed potential. Likewise, if a file is no longer accessed, the software will place it back in the queue to move it back to the platter when more space is need for new files on the SSD portion. That's what I have read, anyway, but don't have any further insight.

Link to comment
Share on other sites

  • 0
1 hour ago, gtaus said:

Again, I just want to say that duplication on a pool of drives does not replace a legitimate backup program where important files are safely stored offline, and preferably at another location in case of fire.

Of course, I have Backblaze for cloud backup too, but re-downloading 10+ TB of data that could have been protected better locally isn't ideal.

I'm glad to hear you've had good luck so far, but don't fool yourself - multiple drive failure happens. Keep in mind that drives fail more when they're used more. The most common situations of multiple drive failure is that one drive fails, and you need to restore those files from your redundancies. During the restore process, another drive fails due to the increased use.

The most simultaneous failures I've heard of is 4 (not to me)... but that was in a larger raid. There's a reason for the parity drive count increasing every ~5 drives in parity based raids.

So far, I've been quite lucky. I've never permanently lost any files due to a drive failure - but I don't want that to start due to lack of diligence on my part either, so if I can find ways to make my storage solution more reliable I will.

In fact - one of the main reasons I went with DrivePool is that it seems more fault tolerant. Duplicates are spread between multiple drives (rather than mirroring, which relies entirely on one single other drive), so if you do lose two drives, you may lose some files, but not a full drive's worth. (Plus the lack of striping similarly makes sure that you don't lose the whole array if you can't restore.) I realize I don't need to explain any of this to someone who uses it, but just highlighting the reasons I found DP attractive in the first place - separating the duplicates amount multiple drives to reduce the chance of losses on failures. If that can be improved to further reduce those chances...

Link to comment
Share on other sites

  • 0
1 hour ago, gtaus said:

I was thinking of those hybrid drives that have a small SSD on top of a conventional platter drive. Some of them have software that monitors frequently accessed files and learns how you use your system. Over time, the software moves those files to the SSD for faster access. I have seen this on YouTube videos where a gamer will load a game, once, twice and then a third time. You can see the load times decrease with each load until the program is essentially all on the SSD and reaches its maximum speed potential. Likewise, if a file is no longer accessed, the software will place it back in the queue to move it back to the platter when more space is need for new files on the SSD portion. That's what I have read, anyway, but don't have any further insight.

Hybrid SSDs are nice for normal use, but in mixed-mode operating environments they get overwhelmed pretty quickly and start thrashing (ie. NAS with several users). There's also the problem of things like DrivePool mixing all your content up across the different drives, so the drive replaces your cached documents with a movie you decide to watch, and then the documents are slow again, even though you weren't going to watch the movie more than once. If there was a way to specify to only cache often written/edited files for increased speed, then maybe? But I think that would still run into issues with the balancer moving files between drives. The Hybrid drive wouldn't know the difference between that and legitimately newly written files.

Link to comment
Share on other sites

  • 0
3 hours ago, gtaus said:

If you can figure out how to group drives so duplicates are written to other groups, that's great.

I actually just thought of a solution for this which doesn't require a plugin! I could make separate pools for each of the drives I bought at the same time and NOT set duplication on these, and then make one big pool that only consists of those smaller pools and only set duplication on the one big pool. Then it would duplicate between pools, and ensure that the duplicates are on different groups of drives.

I'll probably do the same with any disks nearing their end of life. Place all near EoL disks in one pool to make sure it doesn't duplicate files on multiple near EoL disks.

Link to comment
Share on other sites

  • 0
On 11/29/2020 at 2:31 PM, methejuggler said:

I actually just thought of a solution for this which doesn't require a plugin! I could make separate pools for each of the drives I bought at the same time and NOT set duplication on these, and then make one big pool that only consists of those smaller pools and only set duplication on the one big pool. Then it would duplicate between pools, and ensure that the duplicates are on different groups of drives.

I'll probably do the same with any disks nearing their end of life. Place all near EoL disks in one pool to make sure it doesn't duplicate files on multiple near EoL disks.

Yes, I think that would work. I am also trying to think of how to best use my near end of life HDDs. I have a number of older <500GB USB HDDs that are sitting in the closet. Wondering if I should put them in a DrivePool, use them as archive backups, or just junk them. Given that you can get a 3TB HDD from Goharddrive.com for <$40.00, I am having doubts if those old USB HDDs are even worth keeping. I think I might pose that question as my own thread to see what other people think.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Answer this question...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...