[HOWTO] File Location Catalog

Quinn · March 22, 2016

I've been seeing quite a few requests about knowing which files are on which drives in case of needing a recovery for unduplicated files. I know the dpcmd.exe has some functionality for listing all files and their locations, but I wanted something that I could "tweak" a little better to my needs, so I created a PowerShell script to get me exactly what I need. I decided on PowerShell, as it allows me to do just about ANYTHING I can imagine, given enough logic. Feel free to use this, or let me know if it would be more helpful "tweaked" a different way...

Prerequisites:

You gotta know PowerShell (or be interested in learning a little bit of it, anyway)
All of your DrivePool drives need to be mounted as a path (I chose to mount all drives as C:\DrivePool\{disk name})
- Details on how to mount your drives to folders can be found here:
  http://wiki.covecube.com/StableBit_DrivePool_Q4822624
Your computer must be able to run PowerShell scripts (I set my execution policy to 'RemoteSigned')

I have this PowerShell script set to run each day at 3am, and it generates a .csv file that I can use to sort/filter all of the results. Need to know what files were on drive A? Done. Need to know which drives are holding all of the files in your Movies folder? Done. Your imagination is the limit.

Here is a screenshot of the .CSV file it generates, showing the location of all of the files in a particular directory (as an example):

Here is the code I used (it's also attached in the .zip file):

# This saves the full listing of files in DrivePool
$files = Get-ChildItem -Path C:\DrivePool -Recurse -Force | where {!$_.PsIsContainer}

# This creates an empty table to store details of the files
$filelist = @()

# This goes through each file, and populates the table with the drive name, file name and directory name
foreach ($file in $files)
    {
    $filelist += New-Object psobject -Property @{Drive=$(($file.DirectoryName).Substring(13,5));FileName=$($file.Name);DirectoryName=$(($file.DirectoryName).Substring(64))}
    }

# This saves the table to a .csv file so it can be opened later on, sorted, filtered, etc.
$filelist | Export-CSV F:\DPFileList.csv -NoTypeInformation

Let me know if there is interest in this, if you have any questions on how to get this going on your system, or if you'd like any clarification of the above.

Hope it helps!

-Quinn

gj80 has written a further improvement to this script:

DPFileList.zip

And B00ze has further improved the script (Win7 fixes):

DrivePool-Generate-CSV-Log-V1.60.zip

Spider99 · December 3, 2016

Hi

v1.2 was run on 2012 r2 essentials

i believe its upto date

PS version 4 0 -1 -1

net 4.6.1

Just working on a gui for log file etc

will run 1.4 later - need fooooooood

Spider99 · December 4, 2016

not had a chance to run 1.4 yet

probabaly tomorrow

Spider99 · December 5, 2016

Hi

One thing i spotted today is the log file is Unicode which means its twice the size it needs to be - if you convert to ANSI (single byte per character) then the files are about half the size - saving 350MB per file is no small saving and will save on the zip files as well

could you add this to the script? you can use the "type" command in windows but i guess there would be something in PS to do this as well - not looked

Running 1,4 now

have a basic gui working but needs more work before its fully useable

gj80 · December 5, 2016

The log file is produced by the "dpcmd" utility, which is part of DrivePool itself, and not something I wrote - so I can't control whether it writes out as unicode or not. It makes sense for it to be in unicode, though, since otherwise some filenames with non-ascii text wouldn't be logged properly. The .log isn't zipped, though - only the CSV. So, there will only ever be the "current" .log file after each run, and none in the zip files.

As for the CSV which the script is producing from the log file, I'm opening the file for writing with ASCII text encoding. Actually, I guess I probably should have gone with unicode for that as well... hmmm.

Spider99 · December 5, 2016

Hi

Yes i know you did not write the dpcmd utility - what i meant was could you add a line in the PS script to convert it to single byte - maybe an optional switch??

I ran 1.4 last night

CSV looks fine now appears to have all files - ~1.4m lines

but it failed to zip again - have the zip - size 1k

is there anything to check my end on the zip front - i posted my .net and PS version a couple of days ago?

gj80 · December 5, 2016

what i meant was could you add a line in the PS script to convert it to single byte - maybe an optional switch

The dpcmd log file is only used to produce the csv. I could just set the script to auto-delete the log file, I guess. I just figured that it wouldn't use any significant amount of space as a single file being left sitting around (since, again, there's only ever the one log file - those aren't retained in the zips). Converting to ascii would cause filenames/paths to be incorrect if any extended characters were used (kanji, etc presumably).

I'll set the script up to auto-delete the .log and .csv after the zip process in a later revision. For now, we're still testing (and the auto-zip isn't working for you), so doing it now would make it harder to test.

CSV looks fine now appears to have all files - ~1.4m lines

Wow! Can you open it and filter by the filepath column in excel? How much memory does excel use when doing so? At 1.4m files, that should be as good a pretty good test of whether a spreadsheet will work in all cases.

is there anything to check my end on the zip front - i posted my .net and PS version a couple of days ago?

Your .NET is newer than mine, and we're on the same OS and PS version...it's odd that it's not working for you. Hmm. It should work regardless, but can you maybe try making a root-level folder on your C: drive, without spaces, like c:\dplogtest or something, setting the $auditLocation to that, and then granting "Everyone" full access to that folder in the NTFS security settings?

Also, is the language locale in Windows on that computer English, or perhaps something else?

Oh, and one other thing to check... the zip files it's producing that are 1kb... can you manually double click in Explorer to enter those archives as a folder (and paste a file into them), or does it give you a message about the .zip file being damaged/invalid?

gj80 · December 5, 2016

I uploaded a new "V1.51"

In 1.5 I switched the CSV to be unicode (so that it matches drivepool's unicode format in the log format and doesn't fail to work for everyone with non-english language files). Incidentally, the (zipped) size of the CSV only increased ~35% from being unicode.

Also, in testing, I noticed that I ended up with one of the 1kb zip files myself. Running it a second time, it worked...so I think there's some timing issue going on. I added a bunch of long sleep timers in the zip function to rule that out, and uploaded a new "V1.51". See if that works for you?

Spider99 · December 6, 2016

hi

zip - works if i drag and drop a file on it - the machine has winRar on it if that might make a difference

CSV is not going to work with more than 500k of files as Excel 2007 - drops all records above 1048576

Csv has 1438463 lines

log has 2250006 lines

Can the unicode stuff have a switch in the script as i dont need it and would like to keep the files smaller they are big enough already

the directory i use is in the root of d: and the user is my netadmin account which also created the scheduled task so permissions should not be an issue - permissions look ok

will give 1.5 a bash

Christopher (Drashna) · December 6, 2016

@gj80, if you're familiar with other programming languages, it may be possible to bypass the dpcmd utility completely.

Specifically, the dpcmd utility is querying the DrivePool driver directly for this information. You could do the same, and get the information in whatever format you wanted, for a more limited subset of info, if needed.

We'd just need to document *how* to do this.

gj80 · December 6, 2016

Spider99: CSV is not going to work with more than 500k of files as Excel 2007 - drops all records above 1048576

Just a million? That's surprising. And unfortunate... Hrmmm.

Maybe it would be better if a list of files was just dumped out per disk. I guess I could have it make a folder structure corresponding to all the drives, and have a text file inside with the disk information and a csv with the full list of files. I could split out to a second csv if it exceeds 1 million files on a single disk (not likely, but with an 8TB drive and some file types, maybe....). Having the separate text file with the disk info would reduce the size of the CSV data, as a nice side benefit.

...this is starting to reduce the appeal, though, since it's starting to get more fiddly, and less easy to play "what if" with a master list of all your files. I guess it would still accomplish the overall goal of "a simple list of all the files that were on a particular drive" though.

Writing it out to an SQLite database would make the most sense. Powershell can export to SQLite.. the only issue would be writing a front-end. A front-end isn't something I want to tackle for this though.

I guess the per-disk thing would probably make the most sense. Realistically, we're not going to be using this for anything other than figuring out what was on a drive in an emergency... thoughts?

Spider99:
Can the unicode stuff have a switch in the script as i dont need it and would like to keep the files smaller they are big enough already

How big is the uncompressed .csv and how big is the .log?

When you zip the csv (try doing it manually if 1.51 doesn't work still), what is the file size of the compressed csv?

Christopher: @gj80, if you're familiar with other programming languages, it may be possible to bypass the dpcmd utility completely

Thanks Christopher. The dpcmd's output works fine, though, and since I already wrote the regex stuff to parse it, and I'm lazy, I'm fine just sticking with that

Spider99 · December 6, 2016

aaaand it works

CSV is approx twice the size it was pre unicode c250mb - took almost three hours to produce though??

500k of files per disk is quite likely - i have over 500k of photos and they take up 350GB or so....

Multiple drive csv's would work for most situations

@Christopher

this option to programmatically talk to drive pool would be a good option to explore - i am working on a gui to take the log file/csv and put it in a database if i could do it "directly" without the dpcmd log file or create a slightly easier file format to read in that would be good as its a tad fiddly at the moment and that slows things down when you have 2.5m lines of data to read. I bet i do not have the biggest file collection out of dp users.

What i have in mind/working on is a database thats not dependant on anything else to run - i.e. no drivers or software install - just one exe file and maybe an empty database file - so its simple for people to use - adding export functions to say csv/excel would be no problem and as its sql compliant adding query functionality is simple as well.

So how long would it take to get some documentation to talk directly to DP to pull the data directly???

@gj80, if you're familiar with other programming languages, it may be possible to bypass the dpcmd utility completely.

Specifically, the dpcmd utility is querying the DrivePool driver directly for this information. You could do the same, and get the information in whatever format you wanted, for a more limited subset of info, if needed.

We'd just need to document *how* to do this.

Is this something i should wait for or would it be a long time coming ?

gj80 · December 6, 2016

@Spider -

If you're interested in writing a DB frontend, and if drivepool already has a database to keep track of the files, then the frontend could just access that database with all the metadata info and then there would be no need to do an "import" at all. If there isn't a database-like functionality to access, however, then direct access to the API probably wouldn't provide much benefit. The two 500mb files the script generates don't really matter, since the only thing that's retained from run to run are the 12mb zip files, which isn't anything of note in terms of space requirements. Once the script is out of testing, I can always just uncomment the lines I've got at the bottom to delete the csv + log after the zip happens, so 1GB is only needed while it's running temporarily, if it matters.

If you get a frontend set up, I can always change the PS script to write out to an sqlite file.

@Christopher - How does drivepool store the metadata so it knows what file is on what disk, etc? Is it a sqlite file somewhere? I haven't poked around too much into the way drivepool actually operates.

Spider99 · December 7, 2016

@gj80

if there is a db (i dont think there is from a bit of poking around - master file record table perhaps???) then yes that could be used as the back end - assuming its quick and responsive and support queries etc

I'm currently not using SQLite but Absolute Db from Componentace - it compiles into the exe file and has all the features you would expect of a sql db. Exe is only 1.5meg at the moment so minimal foot print.

Just refining the import routines and refining the data structure - got the import down to less than a minute for 500k of log file records - bit more work to do to speed that up a bit.

Christopher (Drashna) · December 7, 2016

DrivePool doesn't have a DB for this. I mean, aside from NTFS which is *technically* a database.

The dpcmd utility queries the DrivePool driver (coveFS.sys) directly, and generates the info dynamically.

Right now, you guys are parsing and processing the output from dpcmd... but if you're able to cut out the middle-man, it would probably help.

As for a timeline for the info/API, I have no idea. Alex has discussed documenting it in the past, but it's a matter of free time.

If you guys are very serious about this, I'll see about pushing Alex for the info/documentation on how to do this.

Spider99 · December 8, 2016

Ha thought so - no point having two db of the same thing

Yes i am interested - as i can see a few useful utilities/routines that would be of use to all - i hope

1. Which drive has x file - also what drives has it been on might be useful history in some circumstances - cough cough - like a pool that does not balance properly

2. Where is directory x and which drives are its files and sub directories on

3. What errors do i have - i was surprised i had one - as dp had not notified me of it - unless i missed it

4. what files are inconsistent and where are they etc

just a few of the top of my head have a few others in mind as well

if Alex can give me a few pointers/guidance on what parameters the sys file will take and what it will respond with it would be good to get a basic trial done so i know where to invest time. The parsing of the file is fiddly but trivial and fill have it fully refined in a day or so - just sorting a memory leak in the db code (not mine!!) awaiting call back from dev's think i know what it is but just been diverted to sort this.

As for the script its chugging through the nightly workload fine - have two zips now and its working on the third log at the moment - combined takes about 4 hours on Boris with his I5

Christopher (Drashna) · December 11, 2016

Well, that's easy, as the DPCMD utility is already doing that.
As for historical data.... the DPCMD utility is pulling this from the driver, which pulls it from the system. So the information is dynamic, and changes. And no, we don't keep a record.
yeah, that info can be pulled to, and the DPCMD utlity is doing so (but I believe it's a simple matter of being 'recursive', and enumerating everything.
That depends on what you mean. However, if you mean during access, then yeah, that could be handled. Otherwise, again, no historical data.
See above?

I'll bug alex about this. And likely? He'll post code samples, I think. So it should be fairly simple, relatively speaking.

Spider99 · December 12, 2016

ok

will be interesting to see what he posts

its only at the moment giving live info for all - more interesting is the history for all

error is that pesky sys vol info folder!!!!! which also happens to be where most of the "other" data lives for me anyway

B00ze · February 28, 2018

Good day everyone.

I had two issues with DrivePool-Generate-CSV-Log-V1.51 which I have corrected (I will try to attach to this reply):

Get-PhysicalDisk is not supported on Windows 7 so I changed it to Get-CimInstance -ClassName CIM_DiskDrive. Under Win7 the script wouldn't fail but the disk model and serial number columns were just blank.
Unicode CSV files do not load correctly when double-clicked, at least not with Excel 2016, they load as a single column. Turns out however that Excel 2016 supports TAB delimited Unicode CSV just fine, so I changed the format from comma-delimited to tab-delimited. Works fine.

The updated script should be attached...

DrivePool-Generate-CSV-Log-V1.60.zip

APoolTodayKeepsTheNasAway · April 20, 2018

Is there anyway this could be used to restore only the missing files in the event of a drive failure?

APoolTodayKeepsTheNasAway · April 21, 2018

17 hours ago, B00ze said:

Good day.

Of course, this is kinda the whole point. Do you have Excel? You can load a CSV from before the loss of a drive, and a CSV from after the loss, and compare them. There is this function in Excel called VLOOKUP. You load both files into the same workbook as sheets, add a column to one of them and VLOOKUP the file paths in this one to the paths in the other; whatever's missing is what you've lost. You could setup conditional highlighting to do the same thing (I think.) Once you got a list in Excel, you can sort then copy/paste in a text file. You can then automate the process of recovery by writing a small batch script that reads the text file and copies the missing files from backup back onto the pool.

If you do not use duplication at all, then it's even easier, just sort by disk and whatever's on the lost drive is what you need to recover.

Regards,

That sounds mostly like what Im looking for actually, well sort of but also totally even though I dont have Excel. I assume other spreadsheet programs such as Google sheets or Libre Office would work.

I say sort of because I've realized I asked the wrong question as sync/backup software would cover just missing files, and hopefully be smart enough for name changes, but instead I was looking for how to figure out what files weren't backed up. Luckily I think your question answers that too as I imagine I could just compare the files after recovering from the backup and find out which ones still arent there.

Thanks

Christopher (Drashna) · April 24, 2018

On 4/21/2018 at 2:50 PM, APoolTodayKeepsTheNasAway said:

That sounds mostly like what Im looking for actually, well sort of but also totally even though I dont have Excel. I assume other spreadsheet programs such as Google sheets or Libre Office would work.

Libre Office would work. As for excel, if you upload to Google Drive, or OneDrive, then you have something that can read it. So you should be fine, regardless of what you have, or don't have.

Dizzy · December 26, 2018

Why am I just finding this now? Would have been great a week ago BEFORE my drive died. I have no idea what I lost

eujanro · January 17, 2019

Hi everyone,

First, I would like to share that I am very satisfied with DP&Scanner. This IS a "State of the art" software.

Second, I have personally experienced 4 HDD drives fail, burned by the PSU,(99% data was professionally $$$$ recovered) and a content information, would have been comfortable, just to rapid compare and have a status overview.

I also asked myself, how to catalog the pooled drives content, logging/versioning, just to know, if a pooled drive will die, if professional recovery make sense (again), but also, to check the duplication algorithm is working as advertised.

Being a fan of "as simple as it get's", I have found a simple free File lister, command line capable.

https://www.jam-software.com/filelist/

I have build up a .cmd file to export Drive letter (eg: %Drive_letter_%Label%_YYYYMMDDSS.txt), for each pooled drives. Then I scheduled a job to run every 3hours, and before running, just pack all previous .txt's into an archive, for versioning purposes.

I get for each 10*2TB, 60% filled pooled HDD's, around 15-20MB .txt file (with excluding content filter option) in ~20minute time. An zipped archive, with all files inside, is getting 20MB per archive. For checking, I just use Notepad++ "Find in Files" function, point down to the desired .txt's folder path, and I get what I'm looking for, on each file per drive.

I would love to see such options for finding the file on each drive, built up in DP interface.

Hopefully good info, and not a long post.

Good luck!

Dizzy · May 14, 2019

Would the StableBit FileVault mentioned in the Dev Status page take care of this?

If so, we need that FileVault ASAP!

Christopher (Drashna) · May 17, 2019

On 5/13/2019 at 11:07 PM, Dizzy said:

Would the StableBit FileVault mentioned in the Dev Status page take care of this?

If so, we need that FileVault ASAP!

It's possible. It's not feature set, at this point. So ...

However, we're currently working on StableBit Cloud. Once that's "shipped", then we'll see. (just FYI, I am fully behind FileVault and want it as much, or more than you, so "eventually")

Sign In

[HOWTO] File Location Catalog

Question

Link to comment

Share on other sites

50 answers to this question

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation