Kayos's Content - Covecube Inc.

Windows Server 2012r2 Deduplication

Kayos posted a question in General

I was reading the changlog and I read; * [issue #13517] When the "dedup" file system filter is installed, "Bypass file system filters" is overridden and disabled. This allows pooled drives that are utilizing data deduplication to work correctly with StableBit DrivePool. Is this for the Windows Server 2012r2 Data Deduplication Service? It's been a while and I've been using mhddfs and I was looking into swapping my server back to Windows for ADDC, and some other stuff that is easier to do in windows. Right now I do a hardlink of my Finished folder to another folder called Sort. I then move all the Finished data to Deletable. I can then sort the files in Sort into other directories without issues caused by renaming. This lets me seed to 2.0 in peace without space waist. I was wondering how DP would handle this. So how does DP handle hardlinks right now? Before it gets asked, I'm using Snapraid instead of DP's duplication system. A 2-disk parity saves a lot of space. If I can hardlink and/or use Deduplication that would further save space.

October 13, 2015
3 replies
- deduplication
- windows server 2012r2
- (and 2 more)
  Tagged with:

Server 2012 R2 Deduplication and DrivePool

Kayos replied to Kayos's question in General

Regarding Linux arguments, I couldn't agree more. To me an OS is a means to an end. Then there's Abercombi... err Apple. The thing about Deduplication is that; unless DP became very intelligent about it's file handling, it would be very unlikely that DP could handle proper file placement for DP to work. Dedup is a per volume attribute. DP is an overlay very akin to UnionFS or AuFS in concept utilizing the .NET libraries. It would be like using AuFS with ZFS (Dedup enabled). The difference is that DP isn't linking directly to the files and is instead working with the file allocation table. If what gregcaulder said is true, then that means that the files are linking outside of the DP on the same volume into a folder that isn't available to the user. The best solution would be a plug-in that keeps a per cluster hash of all the files (On my system it's about 620MB in size) that can match cluster SHA-256 and when matched compare the two chunks byte for byte. If matches are found then the files are relocated to the same drive so that Deduplication can catch the matching clusters during it's next optimization cycle. This is assuming that the service is fixed so that it can understand Deduped volumes. I see the biggest problem being the way the service is written. UnionFS got around the issue by hijacking the File system so that the kernel handled all the data in it's respective formats. It was possible to mix and max drive formats because all the software did is push the requests forward to the volumes. The reason DP can't do this is because of it's duplication methods. Such a plug-in would by it's very nature, spit in the face of the duplication data preservation techniques. I use DP only for it's pooling capabilities so such a plug-in works fine for me. I've never used a windows backup utility. The reason I like SnapRAID so much is that it works on the files, not the drives. That can be a bit of a snag as well, since I could end up needing a 3TB drive for parity for a series of 2TB drives as one or more drives contain 2.8 TB of data due to really good dedupping. Once a month (I do it on the 1st) after a sync of the files I run 'snapraid check' and if anything comes back in the log, I fix it. A sync uses last modify dates, so bit flipping and such gets flagged during the check as the hash won't match the damaged file. Barring some crazy astronomical phenomenon in which the same sector is bad on three disks, it can be easily fixed with a simple command. I can also undelete files with a blanket command that touches all the disks in the same location for the file (due to DP I have no idea which disk it's on) and restore it back to the last sync. I've been writing a front-end for SnapRAID for a while since the one they had was ultra user friendly and I needed something that could work with DP. I've only lost one drive and SnapRAID brought it back onto a new volume. I was impressed. Granted it took eight hours, but it restored 1.7TB of data that I would have lost otherwise. I added a q-parity to make sure that I won't loose anything. I really need to keep my posts shorter.

Server 2012 R2 Deduplication and DrivePool

Kayos replied to Kayos's question in General

I won't use Dynamic Disks because I can't save them in Linux. Same goes for Windows fakeRAID and StorageSpaces. I do most of my partition work in Mint as Windows seems to have a conniption fit with even the most basic partition work of non-dynamic drives. Their excuse? Oh, there is third party software for that. Yea, it's call gparted... d*cks. I had a disk fail and windows refused to mount it in any way, shape, or form. Linux Mint mounted it just fine, though it did suggest that I reformat it, then later that I should toss it. I got everything off of it and it all sha-512'd fine. If I was stuck in windows only, I would have lost it all. When someone on a Forum mentions Dynamic Disks and Windows fakeRAID I cringe. On linux forums it's LVM (Logical Volume Management) and mdadm which Microsoft stole and now calls StorageSpaces. RAID is NOT a backup people! SnapRAID isn't a backup either, but it's much closer when up to date. RAID was designed to keep run time up while the drives repopulate. SnapRAID restores missing drives, during which the missing data is unaccessible meaning that SnapRAID isn't RAID at all. SnapRAID is for Archives, RAID is for production data such as databases and massive web servers which need uptime. Downtime is money. That said, DrivePool with SnapRAID (and once a month parity check) is a win. Deduplication is too new for my tastes. I've been playing with it for the last two weeks. I can't get a file size query in properties and the only way to know what I've saved in space is to use PowerShell. It's a huge space saver (even compared to disk compression as it's best) and definitely worth most of the trouble, but I think I'll wait till Microsoft figures it out. Besides, I can run Deduplication in Windows 8.1 without a problem. May as well use a desktop OS on this machine.

Server 2012 R2 Deduplication and DrivePool

Kayos replied to Kayos's question in General

I couldn't get spell check to give me the right "niche" spelling. I decided a long time ago that if I ever lost all my data, I'd never make another Archive again. Last time when using a Windows raid, I thought I had lost it all. I didn't even feel too bad. I've been learning recently that age old expression, "The things you posses soon come to posses you." So I don't see it as being the worst possible thing. Though I'd totally ebay all the drives and hardware. Cash is always useful. Ebay does take 10% of all sales along with PayPals 3% +$.30 and ebay horribly under writes shipping fee's which means that I take a hit in shipping costs, but dodge part of that 10%. I've considered Craigslist, though I've heard bad things regarding that. I was reading that people honestly believe that Drive Extender was replaced with Storage Spaces. That's complete bunk. I loose a drive with Drive Extender I still have the other data from the remaining drives. Storage Spaces nukes it all. How is that a replacement? It's just another name for fakeRAID. It's also proprietary to Windows 8+. Missing a drive? It's now in cycle hell. DrivePool is the successor to Drive Extender. Anyway, this topic is now derailed so I guess that's enough. I hope deduplication gets added in the next two years, but if not, oh well. It was nice chatting with you.

Server 2012 R2 Deduplication and DrivePool

Kayos replied to Kayos's question in General

I use snapraid with q parity so no need for duplication. The chances of loosing two drives at once is astronomical. I'm mainly trying to get my current storage to stretch farther. I'll defiantly replace drives and what not as time goes by. As for those helium drives. The price tag is a real put off. Lets face it, 350$ for 4TB is way too much. I know it'll come down. I have a feeling that the next gen storage wars will be between solid state and this helium tech. For now they're the playthings of the wealthy and hobbyists. The fact that this discussion is on a forum for drive pooling a bunch of small disks into one big one speaks volumes to our current approach. It's cheaper to use a port duplicator and some dup boards and just add a bunch of drives then to simply buy a bigger disk. What I want to know is why software like DrivePool is so nitch? Why do so many people risk all their data in software raid 5's and 6's when loosing two or three (or having windows/linux think one failed, this is endless hell) drives kills all the data. Without an XOR chip, those solutions aren't giving anything that this solution won't. It boggles the mind.

Server 2012 R2 Deduplication and DrivePool

Kayos replied to Kayos's question in General

Sorry for the delay. I actually made a piece of software that does the above. The problem was that Windows shares had a rough time with Hardlinked files. Also, renaming files was impossible in the case of files linked from torrent downloads. I just went ahead and removed my duplicates using snapraids duplicate check and a program that I made to help make the process really fast. It's unpolished, but works. The beautiful thing about deduplication is that two files that aren't duplicates but have duplicate parts get a file size reduction. This is apparent in mp3's were a tag is changed but the music data is the same. A file based dup checker see's two different files but dedupliction sees maybe one 4096 byte chunk of different data and the rest is the same. I hope this gets added to DrivePool. My laptop is running a StorageSpace with Deduplication since I can loose that data due to having a backup. It's saving about 30% in space which is amazing considering that it's mostly cartoons (Anime) and pictures (Manga). One thing studying snapraid has done for me is better understand the underlying architecture of the file system. I now know that dedup is done by hashing each chunk of a file by inode. Meaning that a files location and name can change without renewing the hash or resetting the dedup. It's genius and has opened my mind to a host of possibilities regarding software I can write.

Server 2012 R2 Deduplication and DrivePool

Kayos replied to Kayos's question in General

I won't touch Storage Spaces or any Windows Raid like solution after almost loosing a 10TB archive when Windows "forgot" that my drives were in the array and threw up an error stating that my raid was unrecoverable and there for "fixing" the main record to report such. I had to bring it back from a backup that I no longer have. That wasn't an isolated case either. I like DrivePool and SnapRAID because if the unthinkable happens and I loose a drive or two I can recover it/them with SnapRAID (Happened once with a single drive, but that drive was on it's way out anyway) and DrivePool puts them into a single drive that I can easily organize and share over a network. I hope DrivePool figures this one out. Deduplication has the potential to save a ton of space, though if it works too well it may pose a problem with SnapRAID. Deduplication and a once full drive could spell disaster for restoration since I could potentially end up with 3TB of data on a 2TB drive. For now I'll simply compress my drives. I'm no where near running out of space but it's nice to maintain as much free space as possible. Plus, the compressed data with a fast processor actually speeds the read rate up. This I have confirmed as well.

Server 2012 R2 Deduplication and DrivePool

Kayos replied to Kayos's question in General

I went ahead and did a test. It's still a problem. I can't access deduped files from the pool but I can access them from their respective drives. Also, they report some odd file sizes in File properties, but that seems universal. I'm guessing that's a side effect of the deduping process. It still seems untidy from my perspective of MS.

Server 2012 R2 Deduplication and DrivePool

Kayos replied to Kayos's question in General

Deduplication works on a block level. Basically if block 55 in file x is the same as block 22 in file y, the system will set block 22 in file y to block 55 in file x and sparce it in file y. This is of course a gross simplification as the block in question no longer belongs to either file and both files get a relink and a sparce out. The TOC or FAT of MFT or what ever you want to call it lists each block location per file. It only works on the same partiton. That's why Deduplication dosne't work with ReFS as ReFS is incapable of per block listing due to it making sure a file isn't fragmented and listing a file as a block location and its extent. Only NTFS can support Deduplication in Windows 2012/R2. As for using DrivePool; it's not pointless as duplicaton in DrivePool is performed by making sure a file is on multiple partitions. Deduplication can only work per drive. Targeting the drive pool will definatly throw up an error as Deduplication requires direct partition access. What I want to know is, will it kill file access to Deduplicated files, such as file x and y from above. Has any one tested it? If not, I'll give it a try on a sub directory filled with oblious dupes and see what happens.

Server 2012 R2 Deduplication and DrivePool

Kayos posted a question in General

This is Kayot, due to some weirdness I can't seem to log into my real account. The recovery email won't show up either (I checked spam) That aside; Does Server 2012 R2 Deduplication break DrivePool? I know it has to be per drive, which is fine since I use a SnapRaid. I was reading from posts in May on the old forum that DrivePool couldn't access deduped files. Since I'm about to Dedup my Archive I wanted to make sure I wasn't about to nuke DrivePool.

Sign In

Kayos

Posts

Joined

Last visited

Content Type

Profiles

Forums

Everything posted by Kayos

Windows Server 2012r2 Deduplication

Server 2012 R2 Deduplication and DrivePool

Server 2012 R2 Deduplication and DrivePool

Server 2012 R2 Deduplication and DrivePool

Server 2012 R2 Deduplication and DrivePool

Server 2012 R2 Deduplication and DrivePool

Server 2012 R2 Deduplication and DrivePool

Server 2012 R2 Deduplication and DrivePool

Server 2012 R2 Deduplication and DrivePool

Server 2012 R2 Deduplication and DrivePool

Browse

Activity