Jump to content
  • 0

Dedupe and folders in drivepool or underlying drives


asdaHP

Question

Hi,

I was planning on trying dedupe (server 16 standard essentials experience). ddpeval showed 27% saving as of now. Should i run dedup on the underlying drives or the drivepool virtual drive? Any recommendations?

 

Also for that matter, is it recommended to point the server shared folders (setup via the server essentials dashboard ) to the drivepool drive or again should I instead point it to one of the underlying physical drives. I currently have it pointing to the drivepool pool drive and its working fine expect for the one time when I got a message that the user defined folders were missing, that resolved itself (may be coincidental).

 

thank you

Link to comment
Share on other sites

6 answers to this question

Recommended Posts

  • 0

Thank you. So if i run dedup on one of the disks, drivepool should duplicate those reduced parts into the mirrored disk? Since currently i only have two drives (duplicated by drivepool) i guess i just need to  run dedup on one of them, not both correct?

 

No.  It will copy the entire file to the other disk.  However, that file may get dedup-ed in the same way.

 

The issue here, is how deduplication and our software works.  And this is part of why the beta vesion is "required", or you need to enable an advanced option.

 

Normally, our software bypasses all file system filters when accessing the underlying data.  This is boost performance (filters can cause serious slow down), and for compatibility (some filters freak out when the same file is being accessed repeatedly, and when one request isn't finished yet.... I'm looking at you, Avast). 

 

This is fine, usually.  However, the deduplication feature splits the contents. It creates a special reparse point out of the original file. It leaves non-redundant data attached to this file/reparse point hybrid object, and it puts all of the duplicate data into the "System Volume Information" folder (that same one used by VSS).   

 

Then when accessing files, it uses a file system filter to splice this data back together, in the right order. 

 

 

Now, as to why this is an issue:

  1. Deduplication can't access the blocks of data on the Pool drive. So it just doesn't work. 
  2. StableBit DrivePool bypasses the file system filters on pooled disks, so you would only get partial (or no) data when accessing deduplicated data.   This is why you MUST disable the "bypass" option.  This way, the dedup filter can splice the data back together, properly. 

 

The beta version looks for the "dedup" filter, and automatically disables this "bypass file system filter" option on the pool, to prevent this from being an issue.

Additionally, the latest internal betas include some special handling when measuring the drive. 

 

 

Also, when balancing or duplicating the data, it will grab the spliced together data, as well.

Link to comment
Share on other sites

  • 0

Deduplication is a block based technology.  You CANNOT run it on the pool itself.   You must run it on the underlying disks.

 

The downside is that you won't save as much space this way, but you should still see a good chunk of savings.

 

 

That said, we don't officially support deduplication here, but we do accomidate for it.  The software should automaticlly disable the "bypass file system filters" option on systems with the Deduplication feature installed.  But it's worth checking to make sure it's disabled (Pool options -> performance).

 

 

 

As for Shared Folders themselves, you'd want to set them up on the pool itself.

 

As for the user defined folders being missing, I'm not sure about that.  If I had to guess, it was a timing issue.

Link to comment
Share on other sites

  • 0

Deduplication is a block based technology.  You CANNOT run it on the pool itself.   You must run it on the underlying disks.

 

The downside is that you won't save as much space this way, but you should still see a good chunk of savings.

Thank you. So if i run dedup on one of the disks, drivepool should duplicate those reduced parts into the mirrored disk? Since currently i only have two drives (duplicated by drivepool) i guess i just need to  run dedup on one of them, not both correct?

 

 

That said, we don't officially support deduplication here, but we do accomidate for it.  The software should automaticlly disable the "bypass file system filters" option on systems with the Deduplication feature installed.  But it's worth checking to make sure it's disabled (Pool options -> performance).

Will do. Should i use a beta version to get best results?

 

 

 

As for Shared Folders themselves, you'd want to set them up on the pool itself.

 

As for the user defined folders being missing, I'm not sure about that.  If I had to guess, it was a timing issue.

Thank you.

Link to comment
Share on other sites

  • 0

Thank you!! I was away and for some reason did not realize you had responded. Thank you for explaining the basis for the issue.

 

So to summarize, I can run dedupe on the disks managed by drivepool provided i use the beta version of the software. Interestingly I am on version 2.2.0.651 beta as you suggested and the 'bypass file system filters' option is already checked. Should I uncheck it before i try running dedupe even though this is the beta version?

 

Lastly, just to be sure i understood you correctly, since i currently only have two drives in the pool (which are copies of each other), I should run dedupe on one of these two drives and drivepool will take care of the other drive?

 

Can an option at some point be added in the software to only allow this dedupe filter but still bypass other filters so as not to slow down the processing (as you mentioned that typically bypassing filters speeds up processing). EDIT: Rereading your response, it sounds like that is exactly what the beta version already does, sorry.

 

thank you again.

 

No.  It will copy the entire file to the other disk.  However, that file may get dedup-ed in the same way.

 

The issue here, is how deduplication and our software works.  And this is part of why the beta vesion is "required", or you need to enable an advanced option.

 

Normally, our software bypasses all file system filters when accessing the underlying data.  This is boost performance (filters can cause serious slow down), and for compatibility (some filters freak out when the same file is being accessed repeatedly, and when one request isn't finished yet.... I'm looking at you, Avast). 

 

This is fine, usually.  However, the deduplication feature splits the contents. It creates a special reparse point out of the original file. It leaves non-redundant data attached to this file/reparse point hybrid object, and it puts all of the duplicate data into the "System Volume Information" folder (that same one used by VSS).   

 

Then when accessing files, it uses a file system filter to splice this data back together, in the right order. 

 

 

Now, as to why this is an issue:

  1. Deduplication can't access the blocks of data on the Pool drive. So it just doesn't work. 
  2. StableBit DrivePool bypasses the file system filters on pooled disks, so you would only get partial (or no) data when accessing deduplicated data.   This is why you MUST disable the "bypass" option.  This way, the dedup filter can splice the data back together, properly. 

 

The beta version looks for the "dedup" filter, and automatically disables this "bypass file system filter" option on the pool, to prevent this from being an issue.

Additionally, the latest internal betas include some special handling when measuring the drive. 

 

 

Also, when balancing or duplicating the data, it will grab the spliced together data, as well.

Link to comment
Share on other sites

  • 0

You're very welcome.  And we do try to explain stuff as best we can.  You never know when it will be helpful. ;)

 

 

That said, the beta should detect when the Data Deduplication feature is enabled, and should disable itself automatically. 

 

But that said, just in case, I would recommend disabling manually, anyways.  This is to ensure that it is picked up properly. 

 

 

As for the dedup itself, you'd want to run it on both drives.  When StableBit DrivePool duplicates the files, it will grab the WHOLE file, not just part.   This means that if you enable deduplication on only one disk, then only one disk will have the optimized space. The other will use the full files. 

 

 

As for the bypassing?  No, we can't selectively disable the filters.

 

These filters are actually drives, and sit ontop of the file system.  We have no control.... Well almost no control. The options are "bypass all filters" or "normal access".  And this is handled by a kernel level IOCTRL command.  So that's why we don't have much control. 

 

And this is essentially "by design" in windows.  Because antivirus uses file system filters for real time production.  

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Answer this question...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...