Jump to content
  • 0

A few Qs regarding SSDs, caching, read speeds and duplication...


bd2003

Question

So I'm finally about to move up to Windows 8, and I was initially considering storage spaces....but it seems to have far too many caveats to trust my data to it.  So after a little googling, I stumbled upon drivepool. I used to use WHS back in the day, loved drive extender...but that's no longer a thing, and my needs have changed considerably since then. My current setup:

 

Main PC

120GB SSD - Main OS/Apps drive

240GB SSD - Gaming performance drive

2TB WD Green - Gaming/misc archive drive

64GB SSD - Caches the 2TB drive using intel smart response, so reads and writes are super quick despite the mechanical HDD backing it.

 

Synology NAS

2TB WD Green - Primary storage drive

1TB WD Green - Backup drive (manual sync from NAS 2TB drive and the main PC over network)

 

 

Ideally, what I'd like to do, is move all the drives into the PC and drop the NAS.  The 120GB drive would remain the OS drive, all the HDDs would merge into a single storage pool, and the 240GB SSD would serve as a cache. Maybe there's still a place for the 64GB, I dunno?Ideally I'd be able to pick and choose which folders would be mirrored onto the SSD for full performance but backed by a HDD (a few select games), which folders would be mirrored between HDDs for data protection, and which folders wouldn't be duplicated at all to save space.

 

So how realistic is this?  I see that drivepool has an option to use a fast drive as a "landing pad", to increase write speeds...but I'm more concerned with having a "launch pad" for specific games to run at full SSD speed.  I know you can choose on a folder by folder basis which drives are mirrored, but can you choose amongst which drives they'll be mirrored TO?  And will the read speeds off the SSD be hindered at all by drivepool and the backing HDD? Furthermore, any chance a SSD can be made to function as a cache, similar to fusion drive?  Where only the most frequently or recently read files are automatically mirrored or moved onto the SSD?

 

Beyond that, if I give the drivepool trial a try, how difficult will it be to revert to regular ol single drives? 

Link to comment
Share on other sites

2 answers to this question

Recommended Posts

  • 0
Dled the trial and ran a couple of benchmarks while I was at it. First, I tested a game load. The game is called path of exile, and the load takes an epic amount of time, seemingly due to a ton of random reads during startup. The system was rebooted between each run to clear the windows memory cache. The OS SSD is a 120GB Samsung 840 EVO, the SSD is a 240GB Samsung 840, and the HDD is a 2TB WD Green. Just so it's crystal clear what I'm doing here, "unpooled" is using the same physical drives that are in the pool, but using the non-pooled space. Pooled is reading from the pooled volume. The OS SSD is completely outside of the pool. 

 

OS SSD - 20s

unpooled HDD - 68s

unpooled SSD - 20s

pooled HDD only - 81s (slowdown could be due to fragmentation or a different location on disk)

pooled SSD only - 20s

pooled HDD/SSD duplicated - 23s

 

Not bad...but not great either. I could see the game simultaneously loading from both the SSD and HDD in performance monitor, but the HDD held the SSD back by a few seconds.

 

 

Now for some few raw copy speed tests. The total folder size is almost exactly 5GB in 310 files, the vast majority of which is in a single 4.8GB file. 

 

Pooled HDD only to OS SSD - 67s, 74mb/s

Pooled SSD Only to OS SSD - 18s, 277mb/s

Pooled Duplicated SSD/HDD to OS SSD - 16s, 312mb/s

 

When we're dealing with sequential reads, even though the SSD is vastly faster than the HDD, the HDD doesn't hold the SSD back, but instead provides a boost.

 

So in practice, the benefits of drivepool's read striping for sequential reads are turned on their head into a performance drag when the I/O load is more random. While the game load performance degradation vs pure SSD isn't severe enough to completely dissuade me from using it, there's some other issues. The major one is that once I start adding more HDDs to the pool, there's no guarantee that duplicates will land on the SSD, they can just as well land on another HDD. Even if I used the "file placement limiter" or "file placement priority" balancers to only put duplicated files on the SSD, then I couldnt use duplication across only HDDs for the more mundane stuff like MP3s...they'd just clog up and waste the valuable SSD space. Using the SSD as a "feeder" helps write speeds considerably, but it's almost immediately going to be dumped out to a HDD archive. And on top of that, I can no longer use junctions to shift specific games from the HDD to the SSD...something about the pooling breaks that functionality.

 

I understand the primary market for diskpool is mass storage of files, where data integrity is a much larger concern than performance, but a lot of the potential of moderns SSD are going to waste. This is a huge opportunity for diskpool to set itself apart, and I can see a few ways to fix that. Diskpool would need to recognize SSDs as a special part of the pool, not just dumb storage equal in stature to HDDs. 

 

1) The first part is basically already covered - write caching. This is basically what the archive optimizer plugin already does when you designate an SSD as a "feeder". Writes to the pool are super quick, since they all go straight to the SSD before being filtered down to the HDDs.

 

2) The second part is a little trickier. Essentially what I'm proposing is a balancer that will allow you to designate certain folders as "accelerated" - any of these folders will store a duplicate on a SSD. Ideally, you'd be able to selectively disable read striping from HDD backed accelerated folders, to maintain the SSD's random read advantage. But in a pool with multiple SSDs, duplicates could be spread across them, and read striping from 2-3 SSDs would probably give sequential and random read speeds through the roof. 

 

3) Finally, the remainder of the SSD space in the pool should be used for read caching non-accelerated folders, based on access recency and frequency. A few considerations here: Enabling this caching should be left up to the user, as the additional writes will wear out the SSD much quicker. Likewise, users should be able to designate which folders should never be accelerated, because it's pointless burn out your SSD to cache media files. 

 

So in the perfect scenario, my pool with a 64gb SSD, 240GB SSD, 1TB HDD and 2x 2TB HDDs, I'd be able to cordone off 8GB for write caching and 32GB for read caching. I'd be able to mark my media folders as "do not accelerate", and mark specific games/apps as "accelerate". Accelerated folders would be balanced to spread across SSDs only, with spare space utilized to cache non-accelerated folders. Duplicates would start falling back to a single SSD as space begins to constrict. Once SSD space dries up to the point where the full set of accelerated duplicates can't be maintained in the available SSD space, the UI should notify the user so they can decide whether to de-accelerate some folders or add more SSD storage.

 

It might sound like a radical proposal, but this kind of SSD caching/acceleration is fairly commonplace for non-pooled scenarios - bringing it to pooled disks would be a major step forward!

Link to comment
Share on other sites

  • 0

Very interesting read, with the costs of SSD dropping I wonder if this is the best way to get what you want... ( I too believe that storage spaces needs another generation before I will trust it, and everything I hear from peers is the same)

 

SSD caching is becoming a building block in the typical SAN pool. But in those cases we use banks of drives, not a single drive to be at the head of it. I would be really concerned that if it is reading and writing everything going into a large pool and it was a sole source of any data. If the SSD was literally treated as a cache and was written out to disk than I think it would be fine.. that way the worst that would happen is you burn the drive, SSD drives are a still a bit on the new side to me and the industry, next year when SATA express comes out I think there will be a resurgence in interest as the pipe will double.

 

I do agree that having a way to mix SSD drives into the Stablebit platform would be good. It would be golden if you could tag things programmatically... So movies which get watched regularly (or a game you want to play) is moved to the cache and lives there with a real copy on the HD's... then the HDs can spin down. So you get the best of both worlds in that way.

 

Just as an aside, perhaps another thing to consider for your would be to optimize the space you are using on you SSD via MS Dedup, I am really wishing Stablebit supported this.(but I am not 100% sure I would trust it to use it against my 12TB Pool, yet)

 

You could see your potiential savings if you have your gaming stuff on a separate drive using (DDPEval.exe)

http://blogs.technet.com/b/filecab/archive/2012/05/21/introduction-to-data-deduplication-in-windows-server-2012.aspx

 

I don't game as much as I do virtual work, so on my win8.1 box I have 10 virtual machines running in 87gigs of real space (~200 gigs of duped space), there are alot of same files, in things like virtual machines. But I am not sure how much savings you would get with games.. I put my 30G steam directory in it, but I need to find a way to calculate it (the tools only do whole drives)

 

r

 

j

 

 

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Answer this question...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...