General questions about handling large files...

mr_yellow · April 30, 2014

Hi Guys,

I'm a soon to be user trying to migrate from my aged WHS v1 box and was thinking of using drivepool. The simplicity appeals very much to me but I have a few questions I was hoping someone could shed light on...

1) My NAS is 95% filled with large video files ranging from 1 GB up to 15-20 GB. I have LOTS of these. how does drivepool handle duplication and balancing of many large files? especially when many of them are continually opened and read (ie. torrent seeds)?

2) If I store a VM disk image on a duplicated drive pool , I have the option of splitting up the file into 2GB chunks. Does drivepool handle a single large 50GB file better or will it handle 25 2GB files easier? Will the entire file have to get re-duplicated if a small part of it changes or is drivepool smart enough to only write the part of the files that change?

3) What's your opinion on requiring ECC memory in a home NAS solution? Does drivepool use a lot of memory and benefit from ECC memory? What is the expected memory load on a 7TB drive pool with duplication?

4) How does drivepool handle corrupted files? If a drive goes south and files become unreadable or corrupted, how does drivepool prevent that corrupted file from propogating and overwriting the good version?

Thanks everyone!

Alex · April 30, 2014

The first time that you turn on duplication on a folder StableBit DrivePool will go through each file in that folder and make a duplicated copy, in the background. It cannot make background duplicated copies of files that are in use, so I suggest that you don't run your torrent client until this is complete.

Once the initial background duplication pass is complete, any further changes to those files and any new files created in that folder will be duplicated in real-time. From this point on you don't have to worry about in-use files, as that has no effect on real-time duplication.
No, to StableBit DrivePool the size of the file doesn't matter. It duplicates exactly what is changed, byte by byte. In this case, StableBit DrivePool would take on the characteristics of NTFS, so you might ask, does NTFS handle small files better? And my answer would be no, at least I can't think of any reason why smaller files would be better.
Typically the pooling kernel driver uses just kilobytes of memory, so it really doesn't require lots of RAM. The service will use, maybe up to 50 MB. As far as getting ECC, I personally don't for my home server, but I probably would if I was setting up a business server with mission critical data.
Once a file becomes corrupt on a disk, a portion of that file will become unreadable. I.e. you will get an error reading that file. StableBit DrivePool doesn't do any kind of propagation, so in the worst case, where you don't do anything about that error, the file will simply remain unreadable.

When used together with the StableBit Scanner, you will not only get the benefit of the StableBit Scanner "refreshing" the drive, it will also detect unreadable sectors and notify you about them. At the same time it will also notify StableBit DrivePool and it will begin an immediate file evacuation process. Any pooled files still readable on that disk will be evacuated to a known good drive.

Once you remove a damaged drive from the system, a background duplication pass is run on the pool and any files that should be duplicated but are not will be reduplicated. This process is run over the live pool, so there is no downtime due to "rebuilding" necessary (like you see with some RAIDs).

In addition StableBit DrivePool stores all of your files as plain NTFS files, so if anything at all goes wrong with the process above you can simply plug in your pooled disks into any Windows machine, whether it has StableBit DrivePool installed or not, and gain access to your files.

mr_yellow · May 1, 2014

Thanks for the reply. One more additional question...

Are there any issues running drivepool on an Atom Processor? I know Windows 2012 R2's is suppose to run on an atom processor but would drivepool cause a lot of additional load? I know DEmigrator and searchindexer absolutely thrash the atom processor on my Acer H340 running WHS v1....

Christopher (Drashna) · May 1, 2014

No, there shouldn't be any issues.

The only feature that may cause issues on an Atom CPU is the "Network IO Boost", but that feature is disabled by default for just that reason.

(this is because the feature uses added overhead to differentiate network traffic from local disk traffic)

Randomified · June 19, 2015

Curiosity question, continuing the topic of large files:

Say I have a pool of six 1GB drives (not at all cost-effective, I realize, but theoretically...)

It's my understanding as stated above that files are stored completely normally on the underlying NTFS, and can be recovered from individual drives if the pool fails, etc.

This pool presents to the system as one <6 GB virtual drive. I try to move a 4 GB home video from a camcorder onto the virtual drive. That should work, right? After all, the OS thinks it's all one storage space.

What happens behind the scenes? I mean, I guess the file has to be split up somehow between the real drives, right? Would manual reconstruction somehow be possible, in an emergency? etc.

If drive-leveling balancing is in effect, the drives are almost full, and a file is copied to the virtual drive for which there is supposedly still plenty of room, but no single drive in the pool has enough space for this file, what happens?

Thanks!

Christopher (Drashna) · June 19, 2015

No, that wouldn't work. Since we store the actual files on each drive, there isn't a drive large enough in this situation to store the file.

The OS may see all the space available, but the driver for the pool is intelligent enough to know that no disk in that pool as enough space to store the file, and gives you an out of space error.

Specifically, when you go to create/copy the file, Windows normally queries for free space (to see if it will fit). The DrivePool driver gets that query, and then queries each individual disk (based on the "real time placement limits", if any), and looks at those values. If no disk has enough space to put the 4GB file (and in this example, no disk has the space), the driver returns the "not enough space" error. This is passed to the program querying, and causes the operation to error out with the appropriate error.

And if the entire pool was full? Then it would cause problems.

However, one of the balancers, "Prevent Drive Overfill" tries to keep all of the disks in the pool 10% free, or 100GBs free.

This can also cause issues when using the SSD Optimizer balancer, if you have very small cache/SSD drives.

Also, you could use the Ordered File Placement Balancer to minimize this possibility, as it fills one (or two, with duplication) drives at a time.

If we used a block based solution (such as how StableBit CloudDrive, RAID or Storage Spaces works), then yes, that would work fine. But you're storing the blocks of RAW data, so it's not easily accessible without ALL of the blocks, and those would be divvied up between the different disks.

Randomified · June 4, 2018

Sorry to revisit such an old topic.

Our actual scenario is: using DrivePool on a backup-receiving server (essentially a relatively 'dumb,' single-purpose Windows file server) to combine four "3TB" drives to 11TB. Over the past few years, the data that is being backed up has grown faster (in terms of individual file sizes) than we'd estimated.

I am now having to 'empty' one of the physical drives as much as possible by transferring its files to fill up / fit into the free space on the other 3 drives, so that one drive has enough space to receive the incoming files. I don't want auto-balancing to undo my work in that, but I do want to make sure that any incoming backup files still go to the drive with the most free space. How do I best configure or ensure this in the Balancing settings?

Thank you.

Christopher (Drashna) · June 5, 2018

You could use the SSD Optimizer to do this. You don't need an SSD, but ... you could use a larger HDD as the landing zone and then move off to the rest of the drives.

Sign In

General questions about handling large files...

Question

mr_yellow

7 answers to this question

Recommended Posts

Alex

mr_yellow

Christopher (Drashna)

Randomified

Christopher (Drashna)

Randomified

Christopher (Drashna)

Join the conversation

Browse

Activity