Correct way to apply PrimoCache write cache

zeroibis · October 10, 2018

I am intending to use PrimoCache L2 write cache in combination with DrivePool. So far in testing it appears that for it to work I need to apply the cache to the underline drives that make up a given DrivePool. I wanted to verify that this is the correct implementation.

My tests so far:

Apply cache to the DrivePool volume = cache does not work
Apply cache to each of the underline drives/volumes that make up the drive pool = cache works

Christopher (Drashna) · October 10, 2018

Yeah, that's the correct way.

Specifically, it's caching the blocks of data, IIRC. But since there is no actual data for the DrivePool... you'd need to use the underlying disks.

Jaga · October 10, 2018

While Christopher answered for you already, I'll add this: You can specify in Primocache a different "block size" for it to use than what the format's cluster size is. The trade-offs in that regard are faster random access speed (smaller block size - all the way down to the cluster size of the volume) using more memory, vs lower memory overhead with larger block sizes (and potentially better throughput for large files).

When setting up a L2 against a set of Pool drives, which normally tend to be larger than a boot drive, it's beneficial to keep Primocache's block size larger to reduce overhead. And if you find you aren't getting a high enough hitrate because your L2 SSD isn't quite large enough (10%+ data coverage from a read/write cache is ideal), you can actually stripe multiple SSDs and use the resulting stripe as the L2 target, to save costs.

zeroibis · October 10, 2018

My only use of the SSD cache is to improve write performance. I am going to be comparing the performance of PrimoCache to that of using the SSD cache from DrivePool as well.

My main concern is that using the drivepool ssd write cache I can not enable real time file duplication unless I have two SSDs.

Regardless, what I really want is basically a write buffer that fills up when needed and constantly empties itself. Hard part is figuring out the best way to achieve this. (using a 250GB 970 EVO)

zeroibis · October 10, 2018

Tested with the SSD Cache plugin and yea it is a lot faster. Problem is that once the cache fills up your file transfer literally hits a brick wall and everything stops. What should happen is that once the cache fills up my transfer should bypass it and go to the underline dives directly.

Hopefully when that issue is addressed in the future I can use the cache option but until then it is not worth it.

After looking at primocache it is also not worth it as there is little to no speed benefit in the transfers. Now if the SSD was 2-4x faster it would be but we are not there yet.

Jaga · October 11, 2018

3 hours ago, zeroibis said:

Tested with the SSD Cache plugin and yea it is a lot faster. Problem is that once the cache fills up your file transfer literally hits a brick wall and everything stops. What should happen is that once the cache fills up my transfer should bypass it and go to the underline dives directly.

Hopefully when that issue is addressed in the future I can use the cache option but until then it is not worth it.

After looking at primocache it is also not worth it as there is little to no speed benefit in the transfers. Now if the SSD was 2-4x faster it would be but we are not there yet.

Yeah - Drivepool's SSD cache plugin isn't a true block-level drive cache like Primocache. Instead it's a temporary front-end fast mini-pool. The two are apples and oranges trying to achieve similar results (at least from a write cache perspective).

You can emulate two SSDs for the cache plugin by partitioning your single SSD into two equal volumes, adding each to separate child pools, then assigning both child pools to the SSD Cache plugin as targets. You won't have any redundancy (which goes against the whole duplication to protect files strategy on the SSD cache pool), but it will work.

Primocache *should* be fast enough, certainly faster than your pool drives if it's a 970 EVO, and at least as fast as DP's SSD Cache plugin. If you want to post here (or PM me) your config for it, we can see if there are changes that would help it's performance for you. Or even post over on the Primocache support forum. I've worked with it for around 5-6 years now, and am very familiar with it. I have it running on all my machines using L1's, L2's and combinations of both.

zeroibis · October 11, 2018

The actual transfer speeds for file transfers when using the cache was rarely over 200MB/s. I was getting around the same speeds with the cache off as with on. Obviously it was a bit faster for the first few seconds. (This is for primocache). Now when benchmarking it would look like there was big differences but I virtually never saw those in real life.

Here is an example config that I used:

Block size: 16-512MB
L1 Cache: 0-2GB
Latency: 10-300
Write Mode: Native/Average
(Moved the bar all the way to write and also tested with it off)
Verified strategy was write only.

I had an extremely high number of transfers going to multiple different volumes and background crc checks going on so it could be that I just had the system way overloaded as well. The CPU was thrashing into the 90%+ utilization. Once I finish moving everything over and get all my data settled I will do a much more in depth test of moving a sample directory around and time exactly how long it takes to transfer with different settings.

Jaga · October 11, 2018

7 hours ago, zeroibis said:

I had an extremely high number of transfers going to multiple different volumes and background crc checks going on so it could be that I just had the system way overloaded as well. The CPU was thrashing into the 90%+ utilization. Once I finish moving everything over and get all my data settled I will do a much more in depth test of moving a sample directory around and time exactly how long it takes to transfer with different settings.

Yep, sounds like you might have exceeded the system's abilities at that time. I've never had a performance issue using a L2 with a decent SSD, but then I've never benchmarked during heavy activity either.

Usually with a SSD and write cache, setting the Primocache block size equal to the formatted cluster size is best for performance. It comes with some overhead, but for high IOPS and small file operations it's helpful - I'd recommend matching the two for your use.

Since you're just using the cache as a L2 write cache on data volumes only (no boot drive, right?), then I'd recommend leaving the L1 cache off.

If you're using the entire SSD as a write cache and may be filling it up quickly... do you over-provision your SSD? Samsung recommends 10% over-provision, but I find even 5% can be enough. In scenarios where the SSD gets full quickly, that over-provision can really help.

It looks like you tried Native and Average defer-write modes - normally I stick with Intelligent or Buffer (Romex claims buffer is best for high activity/throughput scenarios). Those are really only good with longer timeouts however, like 30s+. I like allowing my drive time to build up writable content using a high defer-write time (600s), and it even helps with trimming unnecessary writes (i.e. a file gets copied to the volume, hits the L2 cache, is used for whatever reason, and moments later is deleted - the write to the destination drive never happens and the blocks get trimmed while still in the cache).

When using a write-only L2 cache, just keep the slider all the way to the right to tell it 100% goes to the write cache. You might also want to check "Volatile cache contents" to ensure Primocache completely flushes the L2 between reboots/shutdowns. You aren't using it as a read cache, so there's no reason for it to hold any content at shut down.

zeroibis · October 12, 2018

Yea thanks for the info most files on the system are going to be 512MB-2GB. I will test with various block sizes. I am limited in how large the cache be with small blocks as I only have 8GB of total system ram to work with.

Jaga · October 12, 2018

Great, then you know your target file sizes, and can adjust the block size higher. Probably no less than 16k then, or 32k. Whicher you pick, it can't be smaller than the cluster of the target drives. Will be interested to hear how it works out for you.

zeroibis · October 12, 2018

I think I found where my issue was occurring, I am being bottle necked by the windows OS cache because I am running the OS off a SATA SSD. I need to move that over to part of the 970 EVO. I am going to attempt that OS reinstall move later and test again.

Now the problem makes a lot more sense and is why the speeds looked great in benchmarks but did not manifest in real world file transfers.

zeroibis · October 13, 2018

Actually it looks like that may not be the case; however, I have started testing and using it on a single raid 0 it is great it takes a 17min transfer down to 9min.

I also found that normal was the fastest and that the size had no real impact on transfer times.

Unfortunately, the double W/R penalty of using a single cache on a mirrored raid 0 was a disaster. You had everything working against it. First doubling the W/R rate killed most of the performance benefit of the SSD and then the double space requirement finished it off in the end. I will be posting images later of the benchmarks. Using the write cache allowed for a speed improvement of 1min for a transfer time of 16min vs 17min without the cache.

I do believe there is a workaround although the best case workaround can not be achieved on Ryzen Your going to need threadripper. However, I will still attempt this solution in the future as I am sure it is better than nothing. What you need is a minimum number of SSDs equal to the number of duplication sets in drivepool. So if your using 2x real time duplication you need a minimum of 2 SSDs. Then you set each one to independently handle a different set of underline drives so they never get hit with the double penalty.

I am still running some other tests and will post up all the results when they are finished.

If the SSD cache for drive pool could resolve 1 single issue then it would be the best method. That issue is the size limit that you can not transfer more data than the size of the ssd cache. If that could be fixed then it would blow primocache out of the water. Also you might wonder about the non real time duplication issue when using the drivepool duplication cache but I realized that you can get around that by adding an existing duplication pool with real time duplication instead of the underline drives and do it that way. Basically: Pool with SSD cach -> Pool with Duplication -> underline volumes.

zeroibis · October 13, 2018

I think it could be that my real limiting factor is that I could be using a faster SSD, from what I can see the larger sizes do have faster to 2x the read/write speeds compared to the 250gb 970 evo. I have a 960 EVO 1TB that I could toss in there and run all the benchmarks again. I will try that later and see how it goes.

In the mean time I have gotten it down to 14min transfer times (3min improvement over no cache) using 512k blocks.

zeroibis · October 13, 2018

Here are the test results.

ssd to raid 0:

ssd to raid 0 cache 8kb buffer 60:
83729700_ssdtoraid0cache8kbbuffer60.PNG.d76b7d0763e6d62d4cf7437bd471d35e.PNG

ssd to raid 0 cache 8kb normal 60:
641634392_ssdtoraid0cache8kbnormal60.PNG.e95869e4404610c49607a09ad83b6b5f.PNG

ssd to raid 0 cache 32kb normal 60:
281618911_ssdtoraid0cache32kbnormal60.PNG.ca179b2e8ed76db70bd8c1815a53bd75.PNG

ssd to raid 0 cache 128kb normal 60:
1562377279_ssdtoraid0cache128kbnormal60.PNG.779284a4466a8bc06db5cb23ef8a5c20.PNG

ssd to raid 0 cache 512kb normal 60:
691108137_ssdtoraid0cache512kbnormal60.PNG.8a70706caea41ebafbd79e40de6bfba8.PNG

ssd to raid 0 mirror cache 8kb normal 60:
500927699_ssdtoraid0mirrorcache8kbnormal60.PNG.c798d2161cc73c8b1e2cdda429ac5c3c.PNG

ssd to raid 0 mirror cache 512kb normal 60:
1524288201_ssdtoraid0mirrorcache512kbnormal60.PNG.2320621e08c7a6888583af5e0161f8f5.PNG

ssd to raid 0 mirror cache 512kb normal 300:
1082026220_ssdtoraid0mirrorcache512kbnormal300.PNG.01bce96c380a0e2ddc5b73949cae974c.PNG

Jaga · October 13, 2018

3 hours ago, zeroibis said:

Here are the test results.

-snip-

Just some feedback on the tests: I don't think you ever mentioned if you had over-provisioned your drives with Samsung Magician. If not, the large number of writes were probably negatively affecting the results with v-NAND rewrites, seeing as they had 94GB copied to them over, and over, and over again. There was probably very little time for any garbage collection to occur on the drives to keep performance high.

Looks like larger block sizes are definitely an improvement for you though.

6 hours ago, zeroibis said:

However, I will still attempt this solution in the future as I am sure it is better than nothing. What you need is a minimum number of SSDs equal to the number of duplication sets in drivepool. So if your using 2x real time duplication you need a minimum of 2 SSDs. Then you set each one to independently handle a different set of underline drives so they never get hit with the double penalty.

I am still running some other tests and will post up all the results when they are finished.

If the SSD cache for drive pool could resolve 1 single issue then it would be the best method. That issue is the size limit that you can not transfer more data than the size of the ssd cache. If that could be fixed then it would blow primocache out of the water. Also you might wonder about the non real time duplication issue when using the drivepool duplication cache but I realized that you can get around that by adding an existing duplication pool with real time duplication instead of the underline drives and do it that way. Basically: Pool with SSD cach -> Pool with Duplication -> underline volumes.

That's actually the recommended way to use the SSD Cache plugin - a physical number of drives equal to the level of pool duplication.

And while you can use pools, within pools, within pools... I'm dubious about any impacts to overall speed in getting the writes to where they finally need to go. As in the case with a single SSD caching the top level pool, which holds the pool volume itself, which consists of duplicated pools, each which can consist of multiple member volumes. I guess it depends on how efficient the DP balancing algorithms are where child pools are concerned, which I've never benchmarked.

In a single SSD Cache plugin volume scenario, you lose the safety of duplication (obviously) and are limited by the max space restraint of the plugin (as you found), and can only try to address it by enabling real-time balancing on that top-level cached pool. Real time balancing has historically had performance considerations to weigh carefully - if the cache drive is moving items off at the same time items are being added - performance gains are shot to hell. Or it stops moving items off when new ones are added and you run into the space restraints again.

For additional tests I'd highly recommend a smaller data set to use (perhaps 20GB) and ample drive over-provisioning, so that you don't run into GC issues on the caching drive(s).

I know that Primocache pre-fetches (read caching) volumes in the same cache task sequentially (not simultaneously), so it may force reads to multiple target disks the same way. It would be worth benchmarking to see - setup a RAID 0 volume with multiple SSDs, make that the L2 and cache all pool drives with it, then send some data at it and see how it performs. Do the writes flush at the speed of a single target disk's write speed, or jump up to multiples? Then recreate that L2 with just one SSD and compare with new testing. I don't have spare SSDs to test with here, but you could try it and see. If it does flush to multiples at the same time, you'd probably be able to saturate many HDDs in the pool with full write performance.

zeroibis · October 13, 2018

I have ordered a second 970 EVO and a x1 gpu so I will have those in later this week. I will re-test with that installed and I will split up the drives between the two caches as so to avoid the double read/write penalty.

I am hoping that by using the ssd that will be using the PCH PCI 2.0 x4 connection as the cache for the HDDs that are also on the same PCH it will avoid any bandwidth issues. My understanding is that the PCH to CPU connection is at 32Gb/s. The x4 slot is at 20Gb/s and I also assume my integrated 10Gb/s nick is on there as well. So even with them all saturated I should not run into a PCH bottleneck on that side if I am lucky.

Not running over-provisioned but I will also look into the benefits of that as well. Note though that I was using only a 111GB partition for the tests.

zeroibis · October 17, 2018

I realized that I never posted the SSD to SSD result here it is:

Jaga · October 17, 2018

What was your architecture for that last test zeroibis? You've changed hardware and testing schemes a bunch. I'm assuming that was with 2 SSDs working independently as DP SSD Cache drives for your 2x duplication pool.

zeroibis · October 19, 2018

On 10/17/2018 at 3:47 PM, Jaga said:

What was your architecture for that last test zeroibis? You've changed hardware and testing schemes a bunch. I'm assuming that was with 2 SSDs working independently as DP SSD Cache drives for your 2x duplication pool.

No that was just the transfer time from the 960 EVO (1tb) to a 970 evo (250gb). The transfer directly to an nvme SSD represents the fastest possible transfer time.

Actually recreating that test on my current system it takes over 10min now and I am not sure why that is. Only setting changed since then was enabling sata hot plug in bios and swapping a PCIe x16 ATI card (running pcie2.0 x4) for a x1 (3.0) nvidia card.

Oh it could be that the NVME has now settled to its true performance because I wrote over 11TB to it earlier this week. (that shit is warn in now)

zeroibis · October 19, 2018

Oh BAM I found what was off. I knew based on the CPU usage and SSD usage that they were not being loaded heavy and figured it was something with the network. Changed the nic to enable jumbo frames and bam dropped the transfer down to ~9min.

I do not remember turning them on before but maybe they were enabled when the ati card was connected. I do know that for some reason the ati gpu drives would screw around with the network connections in general (for example with the ati gpu it would take an extra 45 seconds to establish a network connection and this issue never occurs when an nvidia card is connected instead).

Very strange behavior but at least now I am back to 9min to transfer 94GB which is what I am looking for. Especially given previous best case test SSD to SSD was getting 8.5min.

Sign In

Correct way to apply PrimoCache write cache

Question

zeroibis

19 answers to this question

Recommended Posts

Christopher (Drashna)

Jaga

zeroibis

zeroibis

Jaga

zeroibis

Jaga

zeroibis

Jaga

zeroibis

zeroibis

zeroibis

zeroibis

Jaga

zeroibis

zeroibis

Jaga

zeroibis

zeroibis

Join the conversation

Browse

Activity