Jump to content
Covecube Inc.
  • Announcements

    • Christopher (Drashna)

      Login issues   11/07/17

      If you have issues with logging in, make sure you use your display name and not the "username" or email.  Or head here for more info.   http://community.covecube.com/index.php?/topic/3252-login-issues/  
    • Christopher (Drashna)

      Getting Help   11/07/17

      If you're experiencing problems with the software, the best way to get ahold of us is to head to https://stablebit.com/Contact, especially if this is a licensing issue.    Issues submitted there are checked first, and handled more aggressively. So, especially if the problem is urgent, please head over there first. 

Question

I've been seeing quite a few requests about knowing which files are on which drives in case of needing a recovery for unduplicated files.  I know the dpcmd.exe has some functionality for listing all files and their locations, but I wanted something that I could "tweak" a little better to my needs, so I created a PowerShell script to get me exactly what I need.  I decided on PowerShell, as it allows me to do just about ANYTHING I can imagine, given enough logic.  Feel free to use this, or let me know if it would be more helpful "tweaked" a different way...

 

Prerequisites:

 

  1. You gotta know PowerShell (or be interested in learning a little bit of it, anyway) :)
  2. All of your DrivePool drives need to be mounted as a path (I chose to mount all drives as C:\DrivePool\{disk name})
  3. Your computer must be able to run PowerShell scripts (I set my execution policy to 'RemoteSigned')

I have this PowerShell script set to run each day at 3am, and it generates a .csv file that I can use to sort/filter all of the results.  Need to know what files were on drive A? Done.  Need to know which drives are holding all of the files in your Movies folder? Done.  Your imagination is the limit.

 

Here is a screenshot of the .CSV file it generates, showing the location of all of the files in a particular directory (as an example):

 

post-1373-0-10480200-1458673106_thumb.png

 

Here is the code I used (it's also attached in the .zip file):

# This saves the full listing of files in DrivePool
$files = Get-ChildItem -Path C:\DrivePool -Recurse -Force | where {!$_.PsIsContainer}

# This creates an empty table to store details of the files
$filelist = @()

# This goes through each file, and populates the table with the drive name, file name and directory name
foreach ($file in $files)
    {
    $filelist += New-Object psobject -Property @{Drive=$(($file.DirectoryName).Substring(13,5));FileName=$($file.Name);DirectoryName=$(($file.DirectoryName).Substring(64))}
    }

# This saves the table to a .csv file so it can be opened later on, sorted, filtered, etc.
$filelist | Export-CSV F:\DPFileList.csv -NoTypeInformation

Let me know if there is interest in this, if you have any questions on how to get this going on your system, or if you'd like any clarification of the above.

 

Hope it helps!

 

-Quinn

 

 

gj80 has written a further improvement to this script:

http://community.covecube.com/index.php?/topic/1865-howto-file-location-catalog/&do=findComment&comment=16553

DPFileList.zip

Edited by Christopher (Drashna)
Formatting, clarification

Share this post


Link to post
Share on other sites

41 answers to this question

Recommended Posts

  • 0

Hi

 

One thing i spotted today is the log file is Unicode which means its twice the size it needs to be - if you convert to ANSI (single byte per character) then the files are about half the size - saving 350MB per file is no small saving and will save on the zip files as well :)

 

could you add this to the script? you can use the "type" command in windows but i guess there would be something in PS to do this as well - not looked

 

Running 1,4 now

 

have a basic gui working but needs more work before its fully useable

Share this post


Link to post
Share on other sites
  • 0

The log file is produced by the "dpcmd" utility, which is part of DrivePool itself, and not something I wrote - so I can't control whether it writes out as unicode or not. It makes sense for it to be in unicode, though, since otherwise some filenames with non-ascii text wouldn't be logged properly. The .log isn't zipped, though - only the CSV. So, there will only ever be the "current" .log file after each run, and none in the zip files.

 

As for the CSV which the script is producing from the log file, I'm opening the file for writing with ASCII text encoding. Actually, I guess I probably should have gone with unicode for that as well... hmmm.

Share this post


Link to post
Share on other sites
  • 0

Hi

 

Yes i know you did not write the dpcmd utility - what i meant was could you add a line in the PS script to convert it to single byte - maybe an optional switch??

 

I ran 1.4 last night

 

CSV looks fine now appears to have all files - ~1.4m lines :)

 

but it failed to zip again - have the zip - size 1k

 

is there anything to check my end on the zip front - i posted my .net and PS version a couple of days ago?

Share this post


Link to post
Share on other sites
  • 0

 

 

what i meant was could you add a line in the PS script to convert it to single byte - maybe an optional switch

 

The dpcmd log file is only used to produce the csv. I could just set the script to auto-delete the log file, I guess. I just figured that it wouldn't use any significant amount of space as a single file being left sitting around (since, again, there's only ever the one log file - those aren't retained in the zips). Converting to ascii would cause filenames/paths to be incorrect if any extended characters were used (kanji, etc presumably).

 

I'll set the script up to auto-delete the .log and .csv after the zip process in a later revision. For now, we're still testing (and the auto-zip isn't working for you), so doing it now would make it harder to test.

 

 

 

CSV looks fine now appears to have all files - ~1.4m lines 

 

Wow! Can you open it and filter by the filepath column in excel? How much memory does excel use when doing so? At 1.4m files, that should be as good a pretty good test of whether a spreadsheet will work in all cases.

 

 

 

is there anything to check my end on the zip front - i posted my .net and PS version a couple of days ago?

 

Your .NET is newer than mine, and we're on the same OS and PS version...it's odd that it's not working for you. Hmm. It should work regardless, but can you maybe try making a root-level folder on your C: drive, without spaces, like c:\dplogtest or something, setting the $auditLocation to that, and then granting "Everyone" full access to that folder in the NTFS security settings?

 

Also, is the language locale in Windows on that computer English, or perhaps something else?

 

Oh, and one other thing to check... the zip files it's producing that are 1kb... can you manually double click in Explorer to enter those archives as a folder (and paste a file into them), or does it give you a message about the .zip file being damaged/invalid?

Share this post


Link to post
Share on other sites
  • 0

I uploaded a new "V1.51"

 

In 1.5 I switched the CSV to be unicode (so that it matches drivepool's unicode format in the log format and doesn't fail to work for everyone with non-english language files). Incidentally, the (zipped) size of the CSV only increased ~35% from being unicode.

 

Also, in testing, I noticed that I ended up with one of the 1kb zip files myself. Running it a second time, it worked...so I think there's some timing issue going on. I added a bunch of long sleep timers in the zip function to rule that out, and uploaded a new "V1.51". See if that works for you?

Share this post


Link to post
Share on other sites
  • 0

hi

 

zip - works if i drag and drop a file on it - the machine has winRar on it if that might make a difference

 

CSV is not going to work with more than 500k of files as Excel 2007 - drops all records above 1048576

 

Csv has 1438463 lines

 

log has 2250006 lines

 

Can the unicode stuff have a switch in the script as i dont need it and would like to keep the files smaller they are big enough already

 

the directory i use is in the root of d: and the user is my netadmin account which also created the scheduled task so permissions should not be an issue - permissions look ok

 

will give 1.5 a bash

Share this post


Link to post
Share on other sites
  • 0

@gj80, if you're familiar with other programming languages, it may be possible to bypass the dpcmd utility completely. 

 

Specifically, the dpcmd utility is querying the DrivePool driver directly for this information.  You could do the same, and get the information in whatever format you wanted, for a more limited subset of info, if needed. 

 

We'd just need to document *how* to do this. 

Share this post


Link to post
Share on other sites
  • 0

 

 

Spider99: CSV is not going to work with more than 500k of files as Excel 2007 - drops all records above 1048576

 

Just a million? That's surprising. And unfortunate... Hrmmm.

 

Maybe it would be better if a list of files was just dumped out per disk. I guess I could have it make a folder structure corresponding to all the drives, and have a text file inside with the disk information and a csv with the full list of files. I could split out to a second csv if it exceeds 1 million files on a single disk (not likely, but with an 8TB drive and some file types, maybe....). Having the separate text file with the disk info would reduce the size of the CSV data, as a nice side benefit.

 

...this is starting to reduce the appeal, though, since it's starting to get more fiddly, and less easy to play "what if" with a master list of all your files. I guess it would still accomplish the overall goal of "a simple list of all the files that were on a particular drive" though.

 

Writing it out to an SQLite database would make the most sense. Powershell can export to SQLite.. the only issue would be writing a front-end. A front-end isn't something I want to tackle for this though.

 

I guess the per-disk thing would probably make the most sense. Realistically, we're not going to be using this for anything other than figuring out what was on a drive in an emergency... thoughts?

 

 

Spider99: 

Can the unicode stuff have a switch in the script as i dont need it and would like to keep the files smaller they are big enough already

 

How big is the uncompressed .csv and how big is the .log?

 

When you zip the csv (try doing it manually if 1.51 doesn't work still), what is the file size of the compressed csv?

 

 

Christopher: @gj80, if you're familiar with other programming languages, it may be possible to bypass the dpcmd utility completely

 

Thanks Christopher. The dpcmd's output works fine, though, and since I already wrote the regex stuff to parse it, and I'm lazy, I'm fine just sticking with that :)

Share this post


Link to post
Share on other sites
  • 0

aaaand it works :)

 

 

 

CSV is approx twice the size it was pre unicode c250mb - took almost three hours to produce though??

 

500k of files per disk is quite likely - i have over 500k of photos and they take up 350GB or so....

 

Multiple drive csv's would work for most situations

 

@Christopher

 

this option to programmatically talk to drive pool would be a good option to explore - i am working on a gui to take the log file/csv and put it in a database if i could do it "directly" without the dpcmd log file or create a slightly easier file format to read in that would be good as its a tad fiddly at the moment and that slows things down when you have 2.5m lines of data to read. I bet i do not have the biggest file collection out of dp users.

 

What i have in mind/working on is a database thats not dependant on anything else to run - i.e. no drivers or software install - just one exe file and maybe an empty database file - so its simple for people to use - adding export functions to say csv/excel would be no problem and as its sql compliant adding query functionality is simple as well.

 

So how long would it take to get some documentation to talk directly to DP to pull the data directly???

 


@gj80, if you're familiar with other programming languages, it may be possible to bypass the dpcmd utility completely. 

 

Specifically, the dpcmd utility is querying the DrivePool driver directly for this information.  You could do the same, and get the information in whatever format you wanted, for a more limited subset of info, if needed. 

 

We'd just need to document *how* to do this. 

Is this something i should wait for or would it be a long time coming :)?

Share this post


Link to post
Share on other sites
  • 0

@Spider -

If you're interested in writing a DB frontend, and if drivepool already has a database to keep track of the files, then the frontend could just access that database with all the metadata info and then there would be no need to do an "import" at all. If there isn't a database-like functionality to access, however, then direct access to the API probably wouldn't provide much benefit. The two 500mb files the script generates don't really matter, since the only thing that's retained from run to run are the 12mb zip files, which isn't anything of note in terms of space requirements. Once the script is out of testing, I can always just uncomment the lines I've got at the bottom to delete the csv + log after the zip happens, so 1GB is only needed while it's running temporarily, if it matters.

 

If you get a frontend set up, I can always change the PS script to write out to an sqlite file.

 

@Christopher - How does drivepool store the metadata so it knows what file is on what disk, etc? Is it a sqlite file somewhere? I haven't poked around too much into the way drivepool actually operates. 

Share this post


Link to post
Share on other sites
  • 0

@gj80

 

if there is a db (i dont think there is from a bit of poking around - master file record table perhaps???) then yes that could be used as the back end - assuming its quick and responsive and support queries etc

 

I'm currently not using SQLite but Absolute Db from Componentace - it compiles into the exe file and has all the features you would expect of a sql db. Exe is only 1.5meg at the moment so minimal foot print.

 

Just refining the import routines and refining the data structure - got the import down to less than a minute for 500k of log file records - bit more work to do to speed that up a bit.

Share this post


Link to post
Share on other sites
  • 0

DrivePool doesn't have a DB for this. I mean, aside from NTFS which is *technically* a database. :) 

 

The dpcmd utility queries the DrivePool driver (coveFS.sys) directly, and generates the info dynamically. 

 

Right now, you guys are parsing and processing the output from dpcmd...  but if you're able to cut out the middle-man, it would probably help. 

 

 

As for a timeline for the info/API,  I have no idea. Alex has discussed documenting it in the past, but it's a matter of free time. 

 

If you guys are very serious about this, I'll see about pushing Alex for the info/documentation on how to do this. 

Share this post


Link to post
Share on other sites
  • 0

Ha thought so - no point having two db of the same thing

 

Yes i am interested - as i can see a few useful utilities/routines that would be of use to all - i hope :)

 

1. Which drive has x file - also what drives has it been on might be useful history in some circumstances - cough cough - like a pool that does not balance properly :)

2. Where is directory x and which drives are its files and sub directories on

3. What errors do i have - i was surprised i had one - as dp had not notified me of it - unless i missed it

4. what files are inconsistent and where are they etc

 

just a few of the top of my head have a few others in mind as well

 

if Alex can give me a few pointers/guidance on what parameters the sys file will take and what it will respond with it would be good to get a basic trial done so i know where to invest time. The parsing of the file is fiddly but trivial and fill have it fully refined in a day or so - just sorting a memory leak in the db code (not mine!!) awaiting call back from dev's think i know what it is but just been diverted to sort this.

 

As for the script its chugging through the nightly workload fine - have two zips now and its working on the third log at the moment - combined takes about 4 hours on Boris with his I5

Share this post


Link to post
Share on other sites
  • 0

  1. Well, that's easy, as the DPCMD utility is already doing that. 

    As for historical data.... the DPCMD utility is pulling this from the driver, which pulls it from the system.  So the information is dynamic, and changes. And no, we don't keep a record. 

  2. yeah, that info can be pulled to, and the DPCMD utlity is doing so (but I believe it's a simple matter of being 'recursive', and enumerating everything. 
  3. That depends on what you mean.  However, if you mean during access, then yeah, that could be handled.  Otherwise, again, no historical data. 
  4. See above? :)

I'll bug alex about this. And likely? He'll post code samples, I think.  So it should be fairly simple, relatively speaking. 

Share this post


Link to post
Share on other sites
  • 0

ok

 

will be interesting to see what he posts

 

 its only at the moment giving live info for all - more interesting is the history for all

 

error is that pesky sys vol info folder!!!!! which also happens to be where most of the "other" data lives for me anyway 

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now


×