Jump to content
  • 0

REFS in pool


thepregnantgod

Question

Recommended Posts

  • 0

A lot.  I keep on pushing ReFS support for DrivePool and Scanner (full/proper support).

 

The biggest thing is that ReFS is a "copy on write" (or "allocate on write") file system. This means that when you modify data, that it copies it and writes to that new set.  This means that data corruption is significantly less likely to occur.  And if it does, rolling back is easier (internally, at least). 

 

 

 

 

Secondly, is that ReFS defaults to 64KB cluster (allocation unit size).  

Cluster size matters.  First, it determines the smallest unit used on the disk.  Too high, and it will lead to wasted space (slack space). Too low and it can cause issues with fragmentation and performance problems. 

Specifically, when you right a file, it gets allocated a number of clusters.  The last cluster may or may not be fully used (normal). But if you have a lot of small files (lets say 2kb files), then larger cluster size means that you're going to have a lot more waste.  

 

However, since you can "fit" more data into a larger sized cluster, it means that you mave more sequential data. This helps not only to prevent fragmentation.... but it helps reduce read/write head movement, as more data will be contiguous.   In fact, when messing with cluster sizes a while ago, I saw a 10-20MB difference in read and rite performance for larger files (sequential data). 

 

This is why NTFS defaults to 4kB. it's a good balance, as "most" drives are going to use both small and large files.

However, ReFS is designed for large scale storage. Meaning that it is much more likely to see large files stored on the drive.  So it has the large allocation unit size. 

 

 

The biggest part is that ReFS supports "integrity streams", which is basically checksum bits for each blog of data.  This means that the data stored on the drive is 'safe guarded" against modification (random bit flips).  

Furthermore, with Storage Spaces, if any data comes back as "damaged", Storage Spaces will automatically repair the damage (if possible). This is the "self healing" feature of ReFS that you may have seen. 

 

The caveat here is that ReFS only enables the integrity checking for metadata by default. You must manually enable it for everything else. To enable it for the entire drive, you want to run the following from an elevated power shell prompt: 

Set-FileIntegrity H:\ -Enable $True

This will enable Integrity streams for the entire disk (in theory). 

 

 

 

 

 

 

 

 

There are some downsides here though. 

 

First, is that the Windows 10 version of ReFS is not backwards compatible. Meaning that it cannot be read on previous versions of Windows. Period.   Nor can it be read on any Windows Server version (currently).   So if you go ReFS, you're going to be stuck with Windows 10. 

 

Second, is that there is a performance hit for using ReFS. The integrity streams and copy on write feature do add a penalty here, and may slow down the writes to the file system.  This isn't massive, but can be noticeable.  

It is much worse when using ReFS on a parity Storage Spaces pool, though.  

 

Third, is that there is no way to force a repair of the file system. no built in tools to do so (there are commands to force a check ,but if something goes wrong .... you're basically SOL. 

However, the chances are significantly reduced over NTFS. 

 

Fourth, data recovery. There are some solutions that do support ReFS, but only a couple right now.  With Windows 10's push for ReFS support, this will likely change.   But for now, if you need to run data recovery, you're going to be in a world of hurt. 

 

 

 

 

 

 

As for StableBit DrivePool on top of ReFS? There isn't anything extra that we do. The drives are treated normally, and will work normally.  We don't do any integrity checking, though. Nor any special recovery based on the integrity streams.

 

Additionally, the "file recovery" in StableBit Scanner doesn't work, nor does the "file system scan".  At least currently. 

Link to comment
Share on other sites

  • 0

Drashna, thanks for the quick response.

 

1. How do I access the configuration settings to enable mixed REFS with NTFS as I slowly remove one drive, format it, then add it?

2. You mention adding the command line file integrity blah, blah, blah - but in your example that was for just one drive.  I have 32 drives without a drive letter.  The only drive letter is the one that Drivepool creates.  Do I need to just take that drivepool letter and use the command line or do I have to assign a letter to each drive and then enable it that way?

Link to comment
Share on other sites

  • 0

  1. http://wiki.covecube.com/StableBit_DrivePool_2.x_Advanced_Settings

    Set "CoveFs_AllowMixedFilesystems" to "True" and restart the service or reboot the system.  This will allow you to create "merged" pools. 

     

  2. You'd need do this for each underlying disk.  And you'd need to mount the drives to a folder path or drive letter temporarily. 

    It may be possible to script this, but ... that won't be simple, at the very least. 

Link to comment
Share on other sites

  • 0

Hi have 2 pools one drivepool and one storage spaces both identical and I have to say at the moment storage spaces seems to be doing better read and write is faster and it has the ability to self heal if it works don't know how to prove it drivepool really needs some updates other than that I don't really see any difference

Link to comment
Share on other sites

  • 0

Thanks, Drashna.

 

I'll do that as I remove them from the pool, reformat, then add them back.

 

Can you remind me the downside to Storage Spaces again?  I looked into before I switched to your product a long time ago - but am just curious since I'm seeing more articles on it.

 

Well, it depends on the OS you're using, and the configuration. 

 

If you're using Windows 10 or server 2016, either the mirrored or parity array for storage spaces gets you the self healing functionality (otherwise, it MUST be mirrored, IIRC). 

 

(see below for more)

 

Hi have 2 pools one drivepool and one storage spaces both identical and I have to say at the moment storage spaces seems to be doing better read and write is faster and it has the ability to self heal if it works don't know how to prove it drivepool really needs some updates other than that I don't really see any difference

 

 

Yes, the performance may be better in Storage Spaces, for the same reason that a RAID array is going to be better: 

In both cases, the IO is spread out between disks.  This means that instead of pulling the full file from a single disk, it's pulling the contents from multiple disks.   This is (almost) always going to net you better results.   It's an architectural advantage. 

 

However, parity arrays in Storage Spaces are known to have some pretty nasty performance problems, ESPECiALLY when coupled with ReFS.  I mean, abysmal write speeds.   I'm sure there has been improvements, but it's still going to be a significant hit, because Parity is CPU intensive, regardless (most good RAID cards have a beefy processing unit onboard to specifically deal with this .... that's why LSI cards have a big heatsink, and some HighPoint cards do). 

 

 

The other difference here is that Storage Spaces is still a block based solution.  That means if too many drives fail, you can lose the ENTIRE array.  Basically, the same thing that gets better performance is what can cause catastrophic data lose (that your data is spread out on multiple drives).  

 

StableBit DrivePool stores the actual files on normal NTFS/ReFS volumes.... so each individual disk is recoverable. 

 

 

The other issue is recovery tools.  There are a couple for storage spaces, but they're expensive.  Generally, if a storage spaces array fails catastrophically, your best option is to wipe it, recreate it and restore from backup.  Or spending $500+ on software or recovery services to get your data back. 

 

 

 

 

As for the self healing, this is something that I personally want in DrivePool. Eg, where it checks the integrity streams of the data, and then will replace "bad" copies of files.  And ignoring bad copies, when duplication is enabled, so we only hand "good" copies to the pool.  Eg, creating our own "self healing" functionality.

This isn't a simple thing, so we can't promise anything. But I am actively pushing for it. 

Link to comment
Share on other sites

  • 0

Thanks, Drashna.

 

I'll do that as I remove them from the pool, reformat, then add them back.

 

Also, because it was bugging me .... I created a script to automate this: 

#PowerShell
$ReFSVolumes = Get-Volume | Where-Object { $_.FileSystem -match "ReFS" }

ForEach ( $Volume in $ReFSVolumes ) 
{
	$Partition = Get-Partition -Volume $Volume
	Set-Partition -InputObject $Partition -NewDriveLetter "X"
	Set-FileIntegrity "X:\" -Enable $True
	Remove-PartitionAccessPath -InputObject $Partition -AccessPath "X:\"
	Set-Partition -InputObject $Partition -NoDefaultDriveLetter $True
}

This will find all the ReFS volumes in the system, assign each the "X" drive letter, enable the file integrity, and then remove the drive letter (and make sure it is never auto-assigned one). 

Link to comment
Share on other sites

  • 0

Drashna, thanks again.

 

Using Raid 5 in the past and having a 2 disk failure and a rebuild that was estimated to be over 3 weeks, I moved to Drivepool - must be nearly 3 years ago.

 

I'm a fan but always look for the newest, bestest tech and self-healing is neat sounding.  

 

But, I'm not going to risk my media collection and the years of work it took to build on a system that could have a catastrophic failure.

 

Thanks for the script.  I plan on converting my remaining 30 drives to REFS over the next few weeks.

Link to comment
Share on other sites

  • 0

yeah, the rebuild times on RAID 5 is pretty horrible, especially with 8-10 TB drives. :(

 

And totally understand the desire to look for the best/greatest tech.  And yeah, self healing is fantastic.  ReFS + DrivePool doesn't do so (yet), but hopefully in the near future.  

 

And totally agree.  And totally understand.  I have ~60TBs of media, duplicated.  I would hate to lose all of that myself.... and I have before (ST3000DM001's).  it's .... well, it really sucks, and I'm still finding stuff that I know I used to have. :(

 

 

And you're very welcome for the script.  PowerShell is pretty fantastic (and I keep on bugging Alex to add PowerShell support for DrivePool and CloudDrive) :)

Link to comment
Share on other sites

  • 0

Any issues with mixing the ReFS versions (eg WS2012R2 vs Win10) in a pool on Win10?  The disks that I'm using on my Win10 pool were originally formatted on a WS2012R2 box or with the nasty Windows 10 Reg hack.  I know that any disk I format on Win10 is a "newer" version of ReFS and will not be recognised (for now) on WS2012R2 but apart from that any issues?

Thanks

Nathan

Link to comment
Share on other sites

  • 0

Heads up:

 

Windows 10 Creator Update uses a version of ReFS that is NOT backwards compatible.  It will only work on Windows 10.   Andy drive formatted as ReFS will not be usable on previous versions of Windows, including Windows Server 2012R2. 

 

 

Otherwise, there is currently no issue with mixing file systems, but it's not advisable.  In the future, we may implement stuff that makes it problematic for the long term though. 

Link to comment
Share on other sites

  • 0

OK - I've pushed ahead and formatted one ReFS drive formatted using Windows 10 Creator Update and it is working fine in the pool so far.  I know about the backwards compatibility but I'm not worried on that as I would reformat drives if moving between systems anyway (and I'm sure that at some point WS2016 will support the newer version).  I'm still unclear of what the different is between the ReFS versions however and if their is a way of updating the older ReFS file system to the newer one without a reformat (as this will take a long time of 8TB drives to move data on and back on if you have to reformat).


I'm also unclear on how to tell what version of ReFS the drive has been formatted with.

Link to comment
Share on other sites

  • 0

I have no idea how to tell, other than plugging it into a system and seeing if it works......  Which is pretty horrible.   

 

fsutil doesn't work with refs, so ... no meaningful info. And no powershell commands that reveal this, either (from what I could find). 

 

Which is a pretty bad thing to do. 

Link to comment
Share on other sites

  • 0

Found this article - https://blog.workinghardinit.work/tag/refs-versions/and I can confirm that fsutil does work!

 

I have drives that are ReFS V1.2 and now with Win10 ReFS V3.2.  Also tells you if you have Check Sum set :)


You can also run it on mount points without a Drive Letter as follows (the first drive listed is the new one I added to the pool - need to set CheckSum!)

 

C:\Users\natha>fsutil fsinfo refsinfo C:\Users\natha\Desktop\MountPoints\045f6
REFS Volume Serial Number :       0x52e2b127e2b10fe9
REFS Version   :                  3.2
Number Sectors :                  0x00000003a37c0000
Total Clusters :                  0x000000000746f800
Free Clusters  :                  0x0000000003b57bc0
Total Reserved :                  0x0000000000077c04
Bytes Per Sector  :               512
Bytes Per Physical Sector :       4096
Bytes Per Cluster :               65536
Checksum Type:                    CHECKSUM_TYPE_NONE




C:\Users\natha>fsutil fsinfo refsinfo C:\Users\natha\Desktop\MountPoints\416c0
REFS Volume Serial Number :       0x28d095dfd095b392
REFS Version   :                  1.2
Number Sectors :                  0x00000001d1c00000
Total Clusters :                  0x0000000003a38000
Free Clusters  :                  0x0000000001495578
Total Reserved :                  0x0000000000000000
Bytes Per Sector  :               512
Bytes Per Physical Sector :       4096
Bytes Per Cluster :               65536
Checksum Type:                    CHECKSUM_TYPE_CRC64
Link to comment
Share on other sites

  • 0

Ah, this is a Windows 10 only feature..... fsutil doesn't have the "refsinfo" option on older versions of Windows.  

 

 

 

As for setting the checksum,  see my post above for enabling it on all drives:

http://community.covecube.com/index.php?/topic/2944-refs-in-pool/&do=findComment&comment=20197

 

Otherwise, it's a simple powershell command: 

 

Set-FileIntegrity "X:\" -Enable $True
Link to comment
Share on other sites

  • 0

As someone who has just put together a drivepool from scratch, I tried ReFS and found the performance hit was too much for my liking, which is unfortunate but on a drive that can hit 150mb/sec writes with ease I was getting around 40mb/sec... that to me isn't acceptable.

Link to comment
Share on other sites

  • 0

I have no issues getting full speed on my ReFS formatted drives.

 

do you have integrity turned on? i believe that is why I noticed such a drastic performance drop... and not having it enabled kinda defeats the purpose from what I've read

Link to comment
Share on other sites

  • 0

From what I could find on ReFS features there seems to be a new Fast/Block cloning added post V1 that may be on interest for DrivePool:

https://blog.workinghardinit.work/2016/08/25/veeam-leads-the-way-by-leveraging-refs-v3-capabilities/

 

I've got a couple of 8TB Hitachi coming so I'll probably do a rolling ReFS upgrade of the Seagate 8TBs

I'll have to look into this more, then. 

 

 

As someone who has just put together a drivepool from scratch, I tried ReFS and found the performance hit was too much for my liking, which is unfortunate but on a drive that can hit 150mb/sec writes with ease I was getting around 40mb/sec... that to me isn't acceptable.

 

 

That may actually be one of the differences between the versions, speed optimizations. 

 

That said... the performance hit should be minimal here, as the checksumming is only enabled for metadata by default, unless you EXPLICITLY run a command to enable it on other stuff (powershell, "set-fileintegrity"). 

 

 

That said, this sounds rather abnormal, as I can attest to "good speeds" for both reads and writes on my system.  As in 150+MB/s speeds on NAS drives. 

 

But there are a lot of things that can interfere/cause this issue.  

 

 

And worse case, just use NTFS. It will be the most compatible. 

 

 

 

 

 

However, that said, the possible performance issues are why I'm using the SSD Optimizer balancer (that and I have a bunch of Seagate Archive (SMR) drives in my pool). 

Link to comment
Share on other sites

  • 0

Wow - I've really hit the write speed limit on these SMR drives as I empty them out --> Reformat as REFS V3 --> add back to the pool --> copy next drive to it --> repeat for the next drive etc.  Once I've burnt through their 20GB cache I'm seeing maybe 30MB average for writes, all up this is going to take a couple of weeks to do the 6 x 8TB drives.  Don't get me wrong, I do like the Seagate Archive drives and they are fast enough to eat multiple BD rips and plenty fast to read off, but if you are rebuilding a pool or doing mass file moves then they are sloooooow.  

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Answer this question...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...