Jump to content
  • 0

Drivepool keeps checking and duplicating


btb66

Question

I swapped out a drive for one with a higher capacity, I used the 'remove drive' option, when complete I then added the larger capacity drive to the pool. Initially it took time to repopulate the new drive and now the pool doesn't seem to have settled and it keeps checking, then duplicating, after duplication has finished it shows the pool is good only for a short time, about ten minutes I would say, then the cycle repeats, checking again and more duplicating, over and over.

 

I've attached a screenshot of some notifications, (only these few because I cleared them after reading earlier).

 

The pool seems to fine except for this issue, I'm fairly sure before swapping the drive this didn't happen, is there anything I can do?

 

I'm using beta v. 2.2.0.765

 

Edited to add another screenshot, cycle continues.

post-570-0-79246700-1499694047_thumb.png

post-570-0-49252100-1499708490_thumb.jpg

Link to comment
Share on other sites

  • Answers 60
  • Created
  • Last Reply

Top Posters For This Question

Recommended Posts

  • 0

Alex got to it.  Bunch of files getting "access denied".  So either an ACL issue, or there is a file system issue.... 

 

So modifying stuff may have helped, at least temporarily.

 

Running a disk check on all of the disks would be a good idea. As would resetting the permissions on the pool.

Link to comment
Share on other sites

  • 0

this is all on v769

 

doubt very much its a permission issue as if that were the case it would not clear the issue - more likely dp getting itself confused.

 

i will run the disk check - but it will take a while for 18 disks

 

can alex point to particular disk(s)

Link to comment
Share on other sites

  • 0

Well, I'm passing on what Alex has mentioned. 

 

Both the duplcation and balancing is failing, and for the same reasons,  "Access denied" errors. 

 

"It just seems like a random collection of files / folders. Something is probably wrong with either one of the pool parts, or the overall pool (e.g. ACLs broken)."

 

As for pointing out specific disks, I have a feeling that it's "in general"... so hitting all of the disks, or a bunch of them. 

 

If needed, this is how to reset the permissions/ACL on the entire pool (or any disk, really):

http://wiki.covecube.com/StableBit_DrivePool_Q5510455

 

It may be worth doing this, and then reconfiguring them, to make sure that it's not a weird issue, or some holdover from old file system damage. 

Link to comment
Share on other sites

  • 0

I can almost create this at will now

 

I ran a simple script to update the "title" field in a few thousand MKV files

 

now i have over 400GB of "unduplicated files" - checking the logs - no issues for these files - they are not listed (unlike some others that i spotted earlier - which i have cleaned up)

 

Doing a remeasure at the moment to confirm (looking like it) the unduplicated files

 

Question - why are the error logs encrypted/compressed for DP? other logs are not.

Link to comment
Share on other sites

  • 0

Ah - i just did a test copy of a large file to the pool

 

and the file was "saved" to a HDD not the ssd's - hence why i am not getting duplication - real time duplication is enabled in the gui

 

so checked config page of ssd optimiser and the two ssd's are defined as ssd's - hmm

 

 a bit more testing is needed - will report back

Edited by Spider99
Link to comment
Share on other sites

  • 0

Ok - i have tried the following

 

uninstalled the ssd plugin - upgraded to 773 - reinstalled ssd plugin - with associated reboots

 

This resulted in no change - i.e. the ssd cache does not work and files are saved to the HDD directly

 

I appear to have fixed it by removing the ssd's from the pool, i reformatted them and added them back - this appears to have made the ssd plugin work again/the pool to use the ssd's

 

Observations

1) DP needs some better and more visible error reporting to the user - as a lot of this time and effort could have been saved by some informative error reports

2) Resetting to defaults - does not reset everything

3) DP Error reports should not be compressed/encrypted so the user can read them

Link to comment
Share on other sites

  • 0

  1. Yup, we do plan on improving that.  The beta builds *are* better about it, but not perfect.

     

  2. If you use the troubleshooting menu, that *will* reset everything (except duplication settings, as these are stored on the pool itself, not in the setting store).

     

  3. No.   Not going to happen.  

    But before you get upset, the ErrorReports are handled crashes.  These point to specific lines in the code, so that we can easily troubleshoot issues.  So, even if they weren't encrypted, if you don't have access to the source code ... which since we don't publish it, you shouldn't..... these will still mean absolutely nothing to you. 

    that said, the event viewer will provide *some* information about what is going on when these are created. 

     

    Otherwise, the service logs should include a lot of information about what is going on. Many warnings (such as duplication issues, balancing issues, etc) will be logged here.  I don't say "errors", because errors are "hard stops" that will prevent the service from running.  These may be logged to... but .... may be a lot more obvious when there is an issue. 

 

 

 

 

That said, if the SS Optimizer balancer isn't working properly, then it may be a configuration issue.  

In this case, disable ALL other balancers, and remove any file placement rules.   Also, make sure there are 2 drives marked as SSD if you're using duplication at all.   And then test things. 

 

If things work properly there, then try re-enabling things.  For balancers.... generally, you should only need the SSD Optimizer and StableBit Scanner balancers.  

 

And if you're using file placement rules, make sure that both the "File placement rules respect real-time file placement limits set by the balancing plug-ins" option and the "Unless the drive is being emptied" options are DISABLED (unchecked) in the main balancing tab. 

 

The first option is so that file placement rules will be placed properly on drives outside of "SSD" drives... Otherwise, this could/would cause files to not follow the rules properly. 

 

The second is that if you have rules locking files to the SSD drives, unchecking the setting will make sure that they remain on the SSD. 

Link to comment
Share on other sites

  • 0

Hi Chris

 

1. Good - needs a lot of work - as nothing in the logs re access denied - did not even flag it had an issue - Alex found it in the troubleshoot data it uploaded - needs to be visible to the user

2. I used the Option at the bottom of the Balancing page - if that does not reset everything - then its a chocolate tea pot  :P

3. Fine if thats what they contain - just odd to encrypt them

 

It is the SSD Optimiser that is/was the issue - as a reset (had to do this a few times over the last few months) clears the lockup where it will stop working and a re balance will leave data on the ssd's no matter how many times you run it

I only have the scanner plugin enabled - all others are disabled and have no file placement rules on the pool.

 

I have a suspicion that when i run a powershell script across multiple directories that delete/copy and rename files/directories - DP cant keep up - as the "edits" are across multiple physical disks

Also the script runs multiple processes in parallel - so lots of amendments going on at the "same" time.

I dont do this all the time but when i get chance i will test it to see if i can get it to be reproducible

Link to comment
Share on other sites

  • 0

  1. Well, we're adding much more verbose logging for duplication and balancing (in the latest betas, build 777 and up).  That should help find issues and see what's going on.  

     

  2. That should reset the balancing settings.  But only those.  

    If that's not restoring them to default, then please do let me know.

     

  3. IIRC, it's the error reporting module that we use.  And yeah, I agree. But for the most part, it doesn't contain info that is helpful ... unless you're a developer with the code. :)

 

As for the SSD optimizer, that is possible.  It may be worth setting the balancing to be limited in how often it balances.  either set to once per day, or "not more than every X hours". 

Link to comment
Share on other sites

  • 0

I have a suspicion that when i run a powershell script across multiple directories that delete/copy and rename files/directories - DP cant keep up - as the "edits" are across multiple physical disks

Also the script runs multiple processes in parallel - so lots of amendments going on at the "same" time.

I dont do this all the time but when i get chance i will test it to see if i can get it to be reproducible

Confirmed - running a script across the pool will create Un-duplicated files - i think this is because the real-time duplication cannot keep up and/or the real time duplication does not happen on the pool drives but only on the ssd's

 

I have just run a script that creates N sub directories, moves matching files to the sub directory - creates a sub sub directory and creates (via ffmpeg) screen shots (jpg) of the movie at 1 minute intervals - i am reorganising my movie libraries for Emby so each movie is now in its own subdirectory and has a subdirectory (extrafanart) for screen shots etc

 

Nothing complicated - now have 5.79GB of un-duplicated files

 

If you want a copy of the script for testing let me know

Link to comment
Share on other sites

  • 0

Real time duplication happens in the kernel, and writes the files to both copies in parallel.  Period.

 

However, my guess is that the script is triggering a "smart move", where the data is not actually touched, but the file location is updated.  This ... would definitely cause the data to not be duplicated properly. 

 

That said, if you are scripting this, run a read pass of the data afterwards. Accessing it on the pool *should* trigger a duplication pass, IIRC.

 

 

 

 

But if you could, grab the logs of this happen, "in action"

http://wiki.covecube.com/StableBit_DrivePool_2.x_Log_Collection

 

Also, if you could, send me a copy of the scripts (or throw it in the log folder when you upload it).

 

This way, we can see about reproducing the issue.

Link to comment
Share on other sites

  • 0

However, my guess is that the script is triggering a "smart move", where the data is not actually touched, but the file location is updated.  This ... would definitely cause the data to not be duplicated properly.  - Yes this is very likely to be what is happening - but this is no different to a normal move of data?

 

That said, if you are scripting this, run a read pass of the data afterwards. Accessing it on the pool *should* trigger a duplication pass, IIRC. - Nope that does not happen

 

Real time duplication happens in the kernel, and writes the files to both copies in parallel.  - hmm only for a new file/copy but not a move it would seen - but its catching some but not all

 

The other thing i have noticed is that the files it reports as not duplicated are nothing to do with the files being moved or copied etc - they are completely unrelated - in most cases on another physical disk - i wonder if the index is getting out of sync so it knows somethings wrong but reports the wrong files???

Link to comment
Share on other sites

  • 0

Chris

 

I have duplicated the problem on a separate machine (win10) rather than my 2012r2 which we have been talking about until now in this thread

 

I tried to duplicate the problem with 742 beta on win10 and it did not appear to be affected. So i upgraded DP to 773 (same as 2012r2 machine)

 

To see the problem quickly - i ran Script 2 (PS script) multiple times in the same directory - so it deletes the extrafanart directory and the files within each time - if you do this three or four times in a row - the pool will show unduplicated files.

 

I thought initially that the win10 pool being only 2 sata ssd's was going to be too quick to see the problem but that did not turn out to be the case.

 

Checking the files that were unduplicated - as i noticed before - the files that are listed are nothing to do with the files modified by the script. Completely different directory tree off a different root folder on the pool??

 

So i reset the duplication cache via dpcmd - but it still lists the same "unrelated" files plus a handful extras which are related to work done by script 1 - 4 "new" unduplicated files in this case which was running at the time.

 

I then used dpcmd to remeasure the pool and then re ran the check-pool-fileparts again - still listing the same files - not sure if this a concern in that its loosing files/over writing them/miss reading them????

 

This is what the GUI looks like at the moment

 

post-2627-0-47144900-1502261630_thumb.jpg

 

I will get the pool balanced again and upgrade to the latest beta and test again

 

Interesting times :)

 

 

Link to comment
Share on other sites

  • 0

Chris

 

upgraded to 798 beta - win10 machine

 

ran dpcmd check-pool-fileparts - no errors

 

ran script 2 - twice and got unduplicated files

 

ran dpcmd again - 10 files listed as unduplicated - this time though they are the files that script 2 amended - not sure if thats luck or 798 reports better? :)

 

Anyway over to you and Alex to test

 

Tim

Link to comment
Share on other sites

  • 0

Hi Chris

 

Well there are three things it does

 

1. remove the extrafanart folder - if it exists - first time it may not

2. Copy over 10 random images into the extrafanart folder from Photos - create extrafanart directory if it does not exist

3. Create the .ignore file in the photos directory - i doubt its this as its a simple file copy and rename

 

My guess as to what might be causing it are :-

1. It might be the folder deletion and creation happening i quick succession that is "too" fast for DP/system to pick up.

2. However the last time i did it (post #41) the undup files listed were the 10 random files copied - so it could be that as well

3. Also it could be Powershell doing something different to what is expected - MS being MS and all :)

 

I would setup a folder on a pool with just a photos directory with some random photos ( mine vary in size but say approx 1meg each or bigger) then run the PS script

wait a second after it finishes then run it again wait a second and run it again etc

 

If you see what i see on two different machines - DP will "complain" after a couple of runs - i had the GUI open on screen to see the change.

 

T

Link to comment
Share on other sites

  • 0

As for what the scripts do, yeah, I picked up on that. :)

 

And it may be how PowerShell is handling the file operations.  

 

As for checking into this, Alex does mention that it looks like there are some access issues on the pool.  ACL (permissions) may be broken.

 

If you haven't, try resetting the permissions on the pool

http://wiki.covecube.com/StableBit_DrivePool_Q5510455

 

Also running a "chkdsk /r" pass on all of the disks (with or without the "/scan" option)  may be a good idea.

 

 

if the issue persists after that, let me know and I'll see about setting up a test machine to reproduce this

Link to comment
Share on other sites

  • 0

No there are no ACL problems on the pool and all disks have been checked by scanner in the last two weeks as it takes that long with 20 disks to check. - i did that on my 2012r2 machine when i submitted the trouble shooter logs and you mentioned it before.

 

If there were ACL problems that i would see those errors accessing the pool - and there is nothing in the event log either

 

Remember this happens on two different machines so ACL problems very unlikely.

 

I have randomly checked whatever files get listed as unduplicated and can read the files fine - so again not an ACL problem.

Link to comment
Share on other sites

  • 0

Sorry, no, it doesn't.

 

However, I've tried reproducing the issue, with the scripts you've created, and have not been able to.

 

And I'm pretty sure that it's not the number of calls at once .... (eg, it's not overwhelming the driver).... 

 

 

That said, it may be worth creating a "foreach" loop here, rather than calling all of the scripts at once.

$Directory = get-childitem -Directory -Path X:\Path\Of\Videos

foreach ($folder in $Directory)
{
    relative code goes here
}

This would prevent it from hammering the pool, and would require only ONE script to be ran.

 

The result of this change would be very telling, though.

 

 

As for the ticket,  i've updated it and bumped it:

https://stablebit.com/Admin/IssueAnalysis/27560

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Answer this question...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...

Announcements


×
×
  • Create New...