Jump to content
  • 0

Seeding files that are already duplicated across two drives such that no additional file movement or copying takes place


Question

Posted

I have files that are already manually duplicated across two external drives that are separate from DrivePool; that is, one copy of each file on each drive, having the exact folder structure (exact folder dates, even verified using checksums/WinMerge that the copies are bit identical). I'm wanting to seed these files into a pool that uses these two drives and that uses duplication. In other words, I want no additional file movement or copying to take place after manually seeding (moving) these files into the hidden pool folders. Is this possible? I'm getting the impression from another thread that it's not possible:

https://community.covecube.com/index.php?/topic/12535-new-to-stablebit-drivepool-quesiton-regarding-copy-or-move/#comment-44820

Quote

It's fine to move pre-existing data on a drive directly into that drive's hidden PoolPart folder to get it quickly into the pool; you just have to be careful to avoid accidentally overlapping anything already in the pool (e.g. on other drives) that has the same folder/file naming. To avoid that accidentally happening, I suggest creating a unique folder (e.g. maybe use that drive's serial number as the folder name) inside each PoolPart and moving your data into that if you're worried; once you've done that for all the pre-used drives you're adding to the pool you can then move them where you want in the pool normally.

5 answers to this question

Recommended Posts

  • 0
Posted

The quoted part is only relevant if you're wanting to seed - for example - a folder on E: named "\Bob" into a pool P: that already has a "\Bob". And nothing stops you from creating a new folder in the pool (e.g. "\fromAlice") and then seeding your folder under that (e.g. resulting in "P:\fromAlice\Bob" which would be separate from the existing "P:\Bob"). 

If you're wanting to prevent your files from getting spread across the rest of the drives in the pool, you will first need to ensure that balancing is turned off and that priority is given to the File Placement rules above all else (unless you want an exception for something). Then after seeding set the File Placement rules to the effect that those files/folders must be kept only on those two drives (and if desired, that no other files/folders may be kept on those two drives) and ensure those folders are set to 2x duplication. Then you can turn balancing back on (if you're using it).

  • 0
Posted
13 hours ago, Shane said:

The quoted part is only relevant if you're wanting to seed - for example - a folder on E: named "\Bob" into a pool P: that already has a "\Bob".

That's what I'm asking. I'm wanting to use the DrivePool pool as, essentially, a software RAID for any file in the pool. I want duplication across the two drives in the pool for all files, and, going in, I happen to already have these files duplicated *outside* the pool with the exact same folder structure. Is there a way for me to turn on file duplication on the pool, disable the DrivePool service, and then seed *each* drive's identical set of files, one set to each drive's hidden DrivePool folder, with the same exact folder structure? I don't see a way to "trick" DrivePool into doing this.

Conceptually, if you let DrivePool duplicate the files itself, it essentially ends up with the same result, but it somehow knows that the files are duplicates, so it must be keeping a record somewhere.

My experience thus far is that if I try to manually seed the duplicates, DrivePool doesn't understand that these are duplicates, and ends up deleting them one at a time and recopying them from the other drive. To me, this is dangerous behavior. Conceptually, DrivePool could quickly programmatically check any identically named file it hadn't seen before for date and size, and if these are identical, calculate a binary checksum to verify it. If it's identical, leave the file in place, rather than delete and re-copy.

  • 0
Posted

Going with a bit of an infodump here; there's a TLDR at the end. This should all be accurate as of v2.3.8.1600.

DrivePool uses ADS to tag folders with their duplication requirements (untagged folders inherit their level from the first tagged folder found heading rootwards towards and including the poolpart folder) where their level or a subfolder's level differs from their pool's level (and the poolpart folders themselves are tagged if the pool's base level is x2 or higher). This is, as far as I know, the only permanent record of duplication kept by DrivePool (there may be a backup in the hidden metadata or covefs folders in the pool); everything that follows on from that is done in RAM where possible for speed.

Duplication consistency checking involves looking at each drive's NTFS records (which Windows tries to cache in RAM) to see if each folder and file has the appropriate number of instances across the poolparts (e.g. if alicefolder declares or inherits a level of x2, then alicefolder should be on at least 2 drives and alicefolder\bobfile should be on only 2 drives) and is in the correct poolparts (per any file placement rules that apply) and has matching metadata (i.e. at least size and last-modified timestamp).

If everything lines up, it leaves the files alone. If it does not, it will either ensure the correct number of instances (if that's the problem) are on the correct poolparts (if that's the problem) or warn the user (if there is a metadata mismatch).

(It doesn't, as far as I'm aware, do any content comparison - I wish it had that as an option - leaving content integrity handling up to the user e.g. via SnapRAID, RAID1+, Multipar, etc).

Duplication consistency checking can be manually initiated or performed automatically on a daily schedule.

This means that DrivePool should not be deleting either of your two sets of seeded files, unless you don't have duplication turned on for the pool [1] or for the folder [2] into which you're seeding your content, because your content will inherit the duplication level of whatever folder it is being moved into.

[1] e.g. if you are moving content directly into poolpart.string\ rather than into poolpart.string\somefolder\ then your content will inherit the pool's duplication level.
[2] e.g. if you are moving content into poolpart.string\somefolder\ rather than directly into poolpart.string\ then your content will inherit somefolder's duplication level.

Note: if you move a folder within a pool, it will keep its custom duplication level only if it has such - folders with inherited duplication will inherit their new parent's duplication. If instead you copy a folder within a pool, the copy will always inherit its new parent's duplication level.

Testing #0: created new pool across two drives. created identical external content on both drives, in a folder named calico.

Testing #1: pool duplication x1. opened both poolpart folders directly, seeded both with calico, started duplication consistency check. drivepool deleted one instance of calico's files, leaving the other instance untouched (as expected).

Testing #2: pool duplication x2. opened both poolpart folders directly, seeded both with calico, started duplication consistency check. drivepool left both sets of calico's files untouched (as expected).

Testing #3: created folder alice at x1 and bob at x2. opened all poolpart folders, manually created second alice folder, seeded both alice and bob on both drives with calico, started duplication consistency check. drivepool deleted one instance of calico's files in alice (as expected), leaving the other instance untouched (as expected) and did not touch calico's files in bob (as expected).

It might be possible to confuse DrivePool by manually creating ex nihilo (rather than copying) additional instances of a folder that is tagged with a non-default duplication count and seeding into those? Would have to test further. But you can (and should) avoid that by simply manually copying that folder (from the poolpart in which it exists to any poolparts in which it doesn't that you plan to seed into).

TLDR: for your scenario create a unique folder in the pool. ensure its duplication level is showing x2. open the poolparts you plan to seed with your content. if the folder isn't there, copy it so it is (i.e. don't create a "New Folder" and rename it to match, make a direct copy instead). set a file placement rule to keep that folder's content only on those two drives and tell drivepool to respect file placement (if you want that). seed that folder's instances in your poolparts with your content. remeasure. it should leave them untouched.

  • 0
Posted
15 hours ago, Shane said:

TLDR: for your scenario create a unique folder in the pool. ensure its duplication level is showing x2. open the poolparts you plan to seed with your content. if the folder isn't there, copy it so it is (i.e. don't create a "New Folder" and rename it to match, make a direct copy instead). set a file placement rule to keep that folder's content only on those two drives and tell drivepool to respect file placement (if you want that). seed that folder's instances in your poolparts with your content. remeasure. it should leave them untouched.

Thanks for the detailed information above. Regarding your TLDR, and also your testing examples, you're talking about using a unique folder with folder duplication, but I've been trying all of this with file duplication: I want duplication across the entire pool. Hopefully there wouldn't be any difference here, but maybe that's where the issue lies? 

Also, you were saying to copy, but to get the speed improvement (and to maintain the best data integrity by not re-copying anything) this would involve *moving* all the files to seed the poolparts folders.

While I can definitely do some testing of it with copies of smaller files, the ultimate goal is to move the files directly into the poolparts folders, not to copy them. Hopefully this is just semantics -- unless you believe there could be a difference involved with moving versus copying to the poolparts folders when trying to seed duplicates?

Btw, during testing, I came across a tangential situation that also seems to be pretty dangerous: If you have a fresh install of DrivePool on a different computer that hasn't been configured yet (all defaults), and, without thinking to shut down the DrivePool service, you happen to plug into this computer some external drives that are part of a pool from a different computer, then DrivePool will detect these drives as a pool but start applying the *default settings* to these drives! Meaning, if you had file duplication set up on the other computer, and, obviously, that's not the default, then DrivePool will start destroying the duplication by removing the duplicates. 

To me, if you plug in drives from a pool from a different computer into another computer with DrivePool installed, DrivePool shouldn't be automatically treating these as a pool -- it should detect that this pool has never existed on the computer before, and just leave all the files alone, and maybe give you the option to import this pool's exact settings into that computer and then giving you the option to bring in the pool. Are DrivePool's settings not stored within a pool's hidden folders? This would make so much more sense. So, say your computer dies, then you'd be able to move the pool's drives over to a new computer and DrivePool would let you import the exact settings so that nothing would get screwed up.

  • 0
Posted
7 hours ago, mmortal03 said:

Thanks for the detailed information above. Regarding your TLDR, and also your testing examples, you're talking about using a unique folder with folder duplication, but I've been trying all of this with file duplication: I want duplication across the entire pool. Hopefully there wouldn't be any difference here, but maybe that's where the issue lies?

Also, you were saying to copy, but to get the speed improvement (and to maintain the best data integrity by not re-copying anything) this would involve *moving* all the files to seed the poolparts folders.

DrivePool's duplication can be set at the pool level (all content placed in the set pool inherits this) and the folder level (all content placed in the set folder inherits this). Some folks use different levels of duplication for different content (e.g. they might have their pool set to x2 but particularly important folders set to x3); if your whole pool is at x2 and that's the way you want it then you don't have to worry about that.

The copying was referring to the suggestion of using a unique folder inside the poolparts IF you've already got other stuff in the pool that you want to avoid bumping into; either way your external content would still be moved - seeded - into the pool (whether under that folder or directly under the poolpart folders).

7 hours ago, mmortal03 said:

Btw, during testing, I came across a tangential situation that also seems to be pretty dangerous: If you have a fresh install of DrivePool on a different computer that hasn't been configured yet (all defaults), and, without thinking to shut down the DrivePool service, you happen to plug into this computer some external drives that are part of a pool from a different computer, then DrivePool will detect these drives as a pool but start applying the *default settings* to these drives! Meaning, if you had file duplication set up on the other computer, and, obviously, that's not the default, then DrivePool will start destroying the duplication by removing the duplicates. 

Thanks. That shouldn't happen, yes. Can you give an exact step-by-step of how to reproduce this?

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Answer this question...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...