we're testing drivepool to determine if it might be an suitable solution to keep a second online copy and backup.
We're dealing with huge numbers of small files (500kb - 25mb) in several "main" folders each with 4096 subfolders. Now we got quite a few of those main folders with between 10 and 50 million files each and adding 25-100k files per day. Additionally we have to import other systems which may add 10-30 million files at once. We're planning for 24 or 36 drive systems with drivepool and can see the 1 billion file barrier break on single systems.
I've set up a test machine (i7, 32gig and 6x4tb 7.2k drives exclusively for storage) and gave it a taste of around 50-60 million files with duplication (2x). It did not to bad until I restarted the system. It is working but the dashboard is calculating for a couple of hours now and claiming xxx GB are not duplicated (while they are, number is increasing, I assume it has to check _every_ file on disk?). Will give it a couple of hours more to get organized.
Is if drivepool is suitable for those large number of files? I assume there is some kind of database keeping records where which files are placed. Is there a maximum database size or overall file limit per pool? Do those numbers have a known impact on performance? Does this recalculation occur after every restart?
Anyone having experience with similar file numbers?
Question
kts
Hi,
we're testing drivepool to determine if it might be an suitable solution to keep a second online copy and backup.
We're dealing with huge numbers of small files (500kb - 25mb) in several "main" folders each with 4096 subfolders. Now we got quite a few of those main folders with between 10 and 50 million files each and adding 25-100k files per day. Additionally we have to import other systems which may add 10-30 million files at once. We're planning for 24 or 36 drive systems with drivepool and can see the 1 billion file barrier break on single systems.
I've set up a test machine (i7, 32gig and 6x4tb 7.2k drives exclusively for storage) and gave it a taste of around 50-60 million files with duplication (2x). It did not to bad until I restarted the system. It is working but the dashboard is calculating for a couple of hours now and claiming xxx GB are not duplicated (while they are, number is increasing, I assume it has to check _every_ file on disk?). Will give it a couple of hours more to get organized.
Is if drivepool is suitable for those large number of files? I assume there is some kind of database keeping records where which files are placed. Is there a maximum database size or overall file limit per pool? Do those numbers have a known impact on performance? Does this recalculation occur after every restart?
Anyone having experience with similar file numbers?
Thanks!
Link to comment
Share on other sites
9 answers to this question
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.