Jump to content
  • 0

Problem on files duplication.


Newbie

Question

Hello,

Sorry if this case has already been treated (I didn't see exactly the same problem in the forum).

I bought DrivePool which fits my needs.

I created a pool with several disks with the following options :

-       no balancing

-       no file duplication

which screenshots are joined (cf. 1.jpg, 2.jpg).

 

When I add a new disk to the pool according to the suitable procedure (https://wiki.covecube.com/StableBit_DrivePool_Q4142489), DrivePool starts to remeasure the pool, then checks it, which seems understandable.

But during the check I get the message of unconsistent file duplication (cf. 3.jpg), despite the fact that I did not allow file duplication.

Then once everything finished, I still have a pie chart showing 7.07GB of duplicated files (cf. 4.jpg) while I don't see anything duplicated in the per folder duplication screen (except of course the metadata). I also see 30.4GB of "other" files while every file of each disk is supposed to have been transfered into the pool.

Is this normal ?

Thank you for your help.

1.jpg

2.jpg

3.jpg

4.jpg

Link to comment
Share on other sites

5 answers to this question

Recommended Posts

  • 1

If you're manually moving new files into the pool via the hidden poolpart folders as per Q4142489, it is up to you to ensure they do not overlap existing folders/files in the pool.

This is because DrivePool's duplication works via having the same file exist in the same path on multiple drives in the pool.

For example, say you have a pool P consisting of drives D and E, whose contents are as follows:

d:\poolpart.1\folder1\file1 --> p:\folder1\file1 <-- this is a duplicated file
d:\poolpart.1\folder1\file2 --> p:\folder1\file2
d:\poolpart.1\folder1\file3 --> p:\folder1\file3
e:\poolpart.2\folder1\file1 --> p:\folder1\file1 <-- this is a duplicated file
e:\poolpart.2\folder1\file4 --> p:\folder1\file4
e:\poolpart.2\folder2\file1 --> p:\folder2\file1
e:\poolpart.2\folder2\file2 --> p:\folder2\file2

If you then had a new drive F you wanted to manually seed into the pool as per Q4142489, with new (i.e. different to the above) content as follows:

f:\folder1\file2 - - -> f:\poolpart.3\folder1\file2
f:\folder2\file3 - - -> f:\poolpart.3\folder2\file3

You would have to first change the name of F's folder1, folder2, file2 and/or file3 before moving \folder1\file2 into any hidden poolpart as otherwise it would overlap with the existing \folder1\ and \folder2\ as follows:

d:\poolpart.1\folder1\file1 --> p:\folder1\file1 <-- this is a duplicated file
d:\poolpart.1\folder1\file2 --> p:\folder1\file2 <-- this existing file is in conflict with a new file
d:\poolpart.1\folder1\file3 --> p:\folder1\file3
e:\poolpart.2\folder1\file1 --> p:\folder1\file1 <-- this is a duplicated file
e:\poolpart.2\folder1\file4 --> p:\folder1\file4
e:\poolpart.2\folder2\file1 --> p:\folder2\file1
e:\poolpart.2\folder2\file2 --> p:\folder2\file2
f:\poolpart.3\folder1\file2 --> p:\folder1\file2 <-- this new file is now in conflict with an existing file
f:\poolpart.3\folder2\file3 --> p:\folder2\file3 <-- this new file is now in the same folder as two existing files

@Christopher (Drashna) I recommend that the Q4142489 wiki entry should mention this explicitly; e.g. by instructing the user in step 4 to "First, check that the folder structure you intend to move into the pool does not already exist in the pool, unless your goal is to merge the content of those folder structures together."

Link to comment
Share on other sites

  • 1

Q: I understand now that these apparently surprising duplicated files in my pie chart were in fact mine from the beginning. Is it then a problem to leave them there ?

If they're actually duplicates, i.e. the exact same file with the same path in different poolparts, then no problem.

Q: I then don't quite understand the duplication warning that I get during the check : what can be the "duplicated files mismatching parts" ? I also noticed that when theses duplicates files just have the same name but are not really the same binary file (for example 2 different videos with the same name), then DrivePool just shows one of the two files in the pool. Which one does DrivePool choose ? Is this the case seen by DrivePool as a "duplicated files mismatching parts" case during the check ?

Yes, this indeed occurs when different files with the same path and name have been moved into different poolparts.

For example, let's say you have a photo of a cat saved as d:\photos\cute24.jpg and a photo of a dog saved as e:\photos\cute24.jpg and you manually move them into the hidden poolpart folder on the respective drives like so:

d:\photos\cute24.jpg <- cat -> d:\poolpart1\photos\cute24.jpg --> shows up in the pool as p:\photos\cute24.jpg
e:\photos\cute24.jpg <- dog -> e:\poolpart2\photos\cute24.jpg --> also shows up in the pool as p:\photos\cute24.jpg

If you had moved the cat photo into p:\photos the normal way (d -> p) and then moved the dog photo into p:\photos the normal way (e -> p), Windows would pop up a warning that there's already a file there with that name and ask if you wanted to replace it). But by accessing the hidden poolparts directly (d -> d:\poolpart and e -> e:\poolpart) you bypass the normal safety procedures.

As to which one DrivePool chooses to show, I believe it would be whichever drive that DrivePool accesses first (which would depend on various factors).

Q: And finally one last question : how to know the physical path of a file seen in the pool ? (i.e. when browsing the pool, how to know on which physical disk is a file located ?)

There are various ways, for example:

  • Manually check the equivalent path in each hidden poolpart folder.
  • Open a command prompt run as an administrator and enter dpcmd get-duplication filepath where filepath is the fully pathed name of the file ( e.g. dpcmd get-duplication "p:\photos\cute pets\oscar the turtle.jpg" ) <-- note this shows the volume numbers, not the drive letters, so you'd have to look it up in Windows Disk Management or similar to find the corresponding drive letters (dpcmd does this because DrivePool can pool volumes without them requiring a drive letter).
  • Use a tool which can quickly scan lettered NTFS volumes and show all files on all drives that match the search string, e.g. Everything by Voidtools can do this.
Link to comment
Share on other sites

  • 1

It would complain every time it does a health check of the pool, and offer to delete all the older file(s) under the assumption that there was an error in duplicating the newer instance of the file across the pool, but it won't delete any automatically unless you tick the box that tells it to do so for future checks. However note too that if you yourself update the "single" file that shows up in the pool (or move it out of the pool) then only one of the actual files in the poolparts will actually be moved/updated and the rest will be lost.

E.g. if you manually put test.txt (contains the word "apple") into d:\poolpart1 and manually put test.txt (contains the word "orange") into e:\poolpart2 and then you open p:\test.txt you might get the one that contains "apple" or you might get the one that contains "orange", and if you moved p:/test.txt to c:/test.txt or edited it to say "banana" then only one would be moved/edited and the other one would be lost or overwritten.

(at least, when read-striping is off; I'm not sure whether something more odd might happen if read-striping is on and the files involved are large).

And thankyou!

Link to comment
Share on other sites

  • 0

Thank you very much for this detailed answer.

I knew that DrivePool tolerates having 2 duplicates with the same path on different drives in the pool.

When adding a disk to the pool, I of course first eliminated the duplicates files with a duplicate finder software, but it was sometimes tedious so I then left the duplicates where they were.

I understand now that these apparently surprising duplicated files in my pie chart were in fact mine from the beginning. Is it then a problem to leave them there ?

I then don't quite understand the duplication warning that I get during the check : what can be the "duplicated files mismatching parts" ?

I also noticed that when theses duplicates files just have the same name but are not really the same binary file (for example 2 different videos with the same name), then DrivePool just shows one of the two files in the pool. Which one does DrivePool choose ? Is this the case seen by DrivePool as a "duplicated files mismatching parts" case during the check ?

And finally one last question : how to know the physical path of a file seen in the pool ? (i.e. when browsing the pool, how to know on which physical disk is a file located ?)

Thank you a lot for the time you spend to answer to me.

Link to comment
Share on other sites

  • 0

So I understand that DrivePool will tolerate 2 different files with the same name & path on different disks at first, but will eventually delete one of them during the next check.
As an ordinary user, and as you mentioned it to "Christopher (Drashna)", please allow me to also recommend that this DrivePool behavior be explained in the Q4142489 wiki entry, which would be helpful for the overall understanding of DrivePool.

Once again, thank you very, very much for your efforts and all these accurate and complete answers. 
By the way, maybe I didn't quite understand your status, but I'm impressed by your dedication to answering people in a forum without getting paid for it.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Answer this question...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...