Jump to content
  • 1

[HOWTO] File Location Catalog


Quinn

Question

I've been seeing quite a few requests about knowing which files are on which drives in case of needing a recovery for unduplicated files.  I know the dpcmd.exe has some functionality for listing all files and their locations, but I wanted something that I could "tweak" a little better to my needs, so I created a PowerShell script to get me exactly what I need.  I decided on PowerShell, as it allows me to do just about ANYTHING I can imagine, given enough logic.  Feel free to use this, or let me know if it would be more helpful "tweaked" a different way...

 

Prerequisites:

 

  1. You gotta know PowerShell (or be interested in learning a little bit of it, anyway) :)
  2. All of your DrivePool drives need to be mounted as a path (I chose to mount all drives as C:\DrivePool\{disk name})
  3. Your computer must be able to run PowerShell scripts (I set my execution policy to 'RemoteSigned')

I have this PowerShell script set to run each day at 3am, and it generates a .csv file that I can use to sort/filter all of the results.  Need to know what files were on drive A? Done.  Need to know which drives are holding all of the files in your Movies folder? Done.  Your imagination is the limit.

 

Here is a screenshot of the .CSV file it generates, showing the location of all of the files in a particular directory (as an example):

 

post-1373-0-10480200-1458673106_thumb.png

 

Here is the code I used (it's also attached in the .zip file):

# This saves the full listing of files in DrivePool
$files = Get-ChildItem -Path C:\DrivePool -Recurse -Force | where {!$_.PsIsContainer}

# This creates an empty table to store details of the files
$filelist = @()

# This goes through each file, and populates the table with the drive name, file name and directory name
foreach ($file in $files)
    {
    $filelist += New-Object psobject -Property @{Drive=$(($file.DirectoryName).Substring(13,5));FileName=$($file.Name);DirectoryName=$(($file.DirectoryName).Substring(64))}
    }

# This saves the table to a .csv file so it can be opened later on, sorted, filtered, etc.
$filelist | Export-CSV F:\DPFileList.csv -NoTypeInformation

Let me know if there is interest in this, if you have any questions on how to get this going on your system, or if you'd like any clarification of the above.

 

Hope it helps!

 

-Quinn

 

 

gj80 has written a further improvement to this script:

 

DPFileList.zip

And B00ze has further improved the script (Win7 fixes):

DrivePool-Generate-CSV-Log-V1.60.zip

 

Link to comment
Share on other sites

Recommended Posts

  • 0

Hi

 

One thing i spotted today is the log file is Unicode which means its twice the size it needs to be - if you convert to ANSI (single byte per character) then the files are about half the size - saving 350MB per file is no small saving and will save on the zip files as well :)

 

could you add this to the script? you can use the "type" command in windows but i guess there would be something in PS to do this as well - not looked

 

Running 1,4 now

 

have a basic gui working but needs more work before its fully useable

Link to comment
Share on other sites

  • 0

The log file is produced by the "dpcmd" utility, which is part of DrivePool itself, and not something I wrote - so I can't control whether it writes out as unicode or not. It makes sense for it to be in unicode, though, since otherwise some filenames with non-ascii text wouldn't be logged properly. The .log isn't zipped, though - only the CSV. So, there will only ever be the "current" .log file after each run, and none in the zip files.

 

As for the CSV which the script is producing from the log file, I'm opening the file for writing with ASCII text encoding. Actually, I guess I probably should have gone with unicode for that as well... hmmm.

Link to comment
Share on other sites

  • 0

Hi

 

Yes i know you did not write the dpcmd utility - what i meant was could you add a line in the PS script to convert it to single byte - maybe an optional switch??

 

I ran 1.4 last night

 

CSV looks fine now appears to have all files - ~1.4m lines :)

 

but it failed to zip again - have the zip - size 1k

 

is there anything to check my end on the zip front - i posted my .net and PS version a couple of days ago?

Link to comment
Share on other sites

  • 0

 

 

what i meant was could you add a line in the PS script to convert it to single byte - maybe an optional switch

 

The dpcmd log file is only used to produce the csv. I could just set the script to auto-delete the log file, I guess. I just figured that it wouldn't use any significant amount of space as a single file being left sitting around (since, again, there's only ever the one log file - those aren't retained in the zips). Converting to ascii would cause filenames/paths to be incorrect if any extended characters were used (kanji, etc presumably).

 

I'll set the script up to auto-delete the .log and .csv after the zip process in a later revision. For now, we're still testing (and the auto-zip isn't working for you), so doing it now would make it harder to test.

 

 

 

CSV looks fine now appears to have all files - ~1.4m lines 

 

Wow! Can you open it and filter by the filepath column in excel? How much memory does excel use when doing so? At 1.4m files, that should be as good a pretty good test of whether a spreadsheet will work in all cases.

 

 

 

is there anything to check my end on the zip front - i posted my .net and PS version a couple of days ago?

 

Your .NET is newer than mine, and we're on the same OS and PS version...it's odd that it's not working for you. Hmm. It should work regardless, but can you maybe try making a root-level folder on your C: drive, without spaces, like c:\dplogtest or something, setting the $auditLocation to that, and then granting "Everyone" full access to that folder in the NTFS security settings?

 

Also, is the language locale in Windows on that computer English, or perhaps something else?

 

Oh, and one other thing to check... the zip files it's producing that are 1kb... can you manually double click in Explorer to enter those archives as a folder (and paste a file into them), or does it give you a message about the .zip file being damaged/invalid?

Link to comment
Share on other sites

  • 0

I uploaded a new "V1.51"

 

In 1.5 I switched the CSV to be unicode (so that it matches drivepool's unicode format in the log format and doesn't fail to work for everyone with non-english language files). Incidentally, the (zipped) size of the CSV only increased ~35% from being unicode.

 

Also, in testing, I noticed that I ended up with one of the 1kb zip files myself. Running it a second time, it worked...so I think there's some timing issue going on. I added a bunch of long sleep timers in the zip function to rule that out, and uploaded a new "V1.51". See if that works for you?

Link to comment
Share on other sites

  • 0

hi

 

zip - works if i drag and drop a file on it - the machine has winRar on it if that might make a difference

 

CSV is not going to work with more than 500k of files as Excel 2007 - drops all records above 1048576

 

Csv has 1438463 lines

 

log has 2250006 lines

 

Can the unicode stuff have a switch in the script as i dont need it and would like to keep the files smaller they are big enough already

 

the directory i use is in the root of d: and the user is my netadmin account which also created the scheduled task so permissions should not be an issue - permissions look ok

 

will give 1.5 a bash

Link to comment
Share on other sites

  • 0

@gj80, if you're familiar with other programming languages, it may be possible to bypass the dpcmd utility completely. 

 

Specifically, the dpcmd utility is querying the DrivePool driver directly for this information.  You could do the same, and get the information in whatever format you wanted, for a more limited subset of info, if needed. 

 

We'd just need to document *how* to do this. 

Link to comment
Share on other sites

  • 0

 

 

Spider99: CSV is not going to work with more than 500k of files as Excel 2007 - drops all records above 1048576

 

Just a million? That's surprising. And unfortunate... Hrmmm.

 

Maybe it would be better if a list of files was just dumped out per disk. I guess I could have it make a folder structure corresponding to all the drives, and have a text file inside with the disk information and a csv with the full list of files. I could split out to a second csv if it exceeds 1 million files on a single disk (not likely, but with an 8TB drive and some file types, maybe....). Having the separate text file with the disk info would reduce the size of the CSV data, as a nice side benefit.

 

...this is starting to reduce the appeal, though, since it's starting to get more fiddly, and less easy to play "what if" with a master list of all your files. I guess it would still accomplish the overall goal of "a simple list of all the files that were on a particular drive" though.

 

Writing it out to an SQLite database would make the most sense. Powershell can export to SQLite.. the only issue would be writing a front-end. A front-end isn't something I want to tackle for this though.

 

I guess the per-disk thing would probably make the most sense. Realistically, we're not going to be using this for anything other than figuring out what was on a drive in an emergency... thoughts?

 

 

Spider99: 

Can the unicode stuff have a switch in the script as i dont need it and would like to keep the files smaller they are big enough already

 

How big is the uncompressed .csv and how big is the .log?

 

When you zip the csv (try doing it manually if 1.51 doesn't work still), what is the file size of the compressed csv?

 

 

Christopher: @gj80, if you're familiar with other programming languages, it may be possible to bypass the dpcmd utility completely

 

Thanks Christopher. The dpcmd's output works fine, though, and since I already wrote the regex stuff to parse it, and I'm lazy, I'm fine just sticking with that :)

Link to comment
Share on other sites

  • 0

aaaand it works :)

 

 

 

CSV is approx twice the size it was pre unicode c250mb - took almost three hours to produce though??

 

500k of files per disk is quite likely - i have over 500k of photos and they take up 350GB or so....

 

Multiple drive csv's would work for most situations

 

@Christopher

 

this option to programmatically talk to drive pool would be a good option to explore - i am working on a gui to take the log file/csv and put it in a database if i could do it "directly" without the dpcmd log file or create a slightly easier file format to read in that would be good as its a tad fiddly at the moment and that slows things down when you have 2.5m lines of data to read. I bet i do not have the biggest file collection out of dp users.

 

What i have in mind/working on is a database thats not dependant on anything else to run - i.e. no drivers or software install - just one exe file and maybe an empty database file - so its simple for people to use - adding export functions to say csv/excel would be no problem and as its sql compliant adding query functionality is simple as well.

 

So how long would it take to get some documentation to talk directly to DP to pull the data directly???

 


@gj80, if you're familiar with other programming languages, it may be possible to bypass the dpcmd utility completely. 

 

Specifically, the dpcmd utility is querying the DrivePool driver directly for this information.  You could do the same, and get the information in whatever format you wanted, for a more limited subset of info, if needed. 

 

We'd just need to document *how* to do this. 

Is this something i should wait for or would it be a long time coming :)?

Link to comment
Share on other sites

  • 0

@Spider -

If you're interested in writing a DB frontend, and if drivepool already has a database to keep track of the files, then the frontend could just access that database with all the metadata info and then there would be no need to do an "import" at all. If there isn't a database-like functionality to access, however, then direct access to the API probably wouldn't provide much benefit. The two 500mb files the script generates don't really matter, since the only thing that's retained from run to run are the 12mb zip files, which isn't anything of note in terms of space requirements. Once the script is out of testing, I can always just uncomment the lines I've got at the bottom to delete the csv + log after the zip happens, so 1GB is only needed while it's running temporarily, if it matters.

 

If you get a frontend set up, I can always change the PS script to write out to an sqlite file.

 

@Christopher - How does drivepool store the metadata so it knows what file is on what disk, etc? Is it a sqlite file somewhere? I haven't poked around too much into the way drivepool actually operates. 

Link to comment
Share on other sites

  • 0

@gj80

 

if there is a db (i dont think there is from a bit of poking around - master file record table perhaps???) then yes that could be used as the back end - assuming its quick and responsive and support queries etc

 

I'm currently not using SQLite but Absolute Db from Componentace - it compiles into the exe file and has all the features you would expect of a sql db. Exe is only 1.5meg at the moment so minimal foot print.

 

Just refining the import routines and refining the data structure - got the import down to less than a minute for 500k of log file records - bit more work to do to speed that up a bit.

Link to comment
Share on other sites

  • 0

DrivePool doesn't have a DB for this. I mean, aside from NTFS which is *technically* a database. :) 

 

The dpcmd utility queries the DrivePool driver (coveFS.sys) directly, and generates the info dynamically. 

 

Right now, you guys are parsing and processing the output from dpcmd...  but if you're able to cut out the middle-man, it would probably help. 

 

 

As for a timeline for the info/API,  I have no idea. Alex has discussed documenting it in the past, but it's a matter of free time. 

 

If you guys are very serious about this, I'll see about pushing Alex for the info/documentation on how to do this. 

Link to comment
Share on other sites

  • 0

Ha thought so - no point having two db of the same thing

 

Yes i am interested - as i can see a few useful utilities/routines that would be of use to all - i hope :)

 

1. Which drive has x file - also what drives has it been on might be useful history in some circumstances - cough cough - like a pool that does not balance properly :)

2. Where is directory x and which drives are its files and sub directories on

3. What errors do i have - i was surprised i had one - as dp had not notified me of it - unless i missed it

4. what files are inconsistent and where are they etc

 

just a few of the top of my head have a few others in mind as well

 

if Alex can give me a few pointers/guidance on what parameters the sys file will take and what it will respond with it would be good to get a basic trial done so i know where to invest time. The parsing of the file is fiddly but trivial and fill have it fully refined in a day or so - just sorting a memory leak in the db code (not mine!!) awaiting call back from dev's think i know what it is but just been diverted to sort this.

 

As for the script its chugging through the nightly workload fine - have two zips now and its working on the third log at the moment - combined takes about 4 hours on Boris with his I5

Link to comment
Share on other sites

  • 0

  1. Well, that's easy, as the DPCMD utility is already doing that. 

    As for historical data.... the DPCMD utility is pulling this from the driver, which pulls it from the system.  So the information is dynamic, and changes. And no, we don't keep a record. 

  2. yeah, that info can be pulled to, and the DPCMD utlity is doing so (but I believe it's a simple matter of being 'recursive', and enumerating everything. 
  3. That depends on what you mean.  However, if you mean during access, then yeah, that could be handled.  Otherwise, again, no historical data. 
  4. See above? :)

I'll bug alex about this. And likely? He'll post code samples, I think.  So it should be fairly simple, relatively speaking. 

Link to comment
Share on other sites

  • 0

Good day everyone.

I had two issues with DrivePool-Generate-CSV-Log-V1.51 which I have corrected (I will try to attach to this reply):

  • Get-PhysicalDisk is not supported on Windows 7 so I changed it to  Get-CimInstance -ClassName CIM_DiskDrive. Under Win7 the script wouldn't fail but the disk model and serial number columns were just blank.
  • Unicode CSV files do not load correctly when double-clicked, at least not with Excel 2016, they load as a single column. Turns out however that Excel 2016 supports TAB delimited Unicode CSV just fine, so I changed the format from comma-delimited to tab-delimited. Works fine.

The updated script should be attached...

DrivePool-Generate-CSV-Log-V1.60.zip

Link to comment
Share on other sites

  • 0
17 hours ago, B00ze said:

Good day.

Of course, this is kinda the whole point. Do you have Excel? You can load a CSV from before the loss of a drive, and a CSV from after the loss, and compare them. There is this function in Excel called VLOOKUP. You load both files into the same workbook as sheets, add a column to one of them and VLOOKUP the file paths in this one to the paths in the other; whatever's missing is what you've lost. You could setup conditional highlighting to do the same thing (I think.) Once you got a list in Excel, you can sort then copy/paste in a text file. You can then automate the process of recovery by writing a small batch script that reads the text file and copies the missing files from backup back onto the pool.

If you do not use duplication at all, then it's even easier, just sort by disk and whatever's on the lost drive is what you need to recover.

Regards,

That sounds mostly like what Im looking for actually, well sort of but also totally even though I dont have Excel. I assume other spreadsheet programs such as Google sheets or Libre Office would work.

I say sort of because I've realized I asked the wrong question as sync/backup software would cover just missing files, and hopefully be smart enough for name changes, but instead I was looking for how to figure out what files weren't backed up. Luckily I think your question answers that too as I imagine I could just compare the files after recovering from the backup and find out which ones still arent there.

Thanks

Link to comment
Share on other sites

  • 0
On 4/21/2018 at 2:50 PM, APoolTodayKeepsTheNasAway said:

That sounds mostly like what Im looking for actually, well sort of but also totally even though I dont have Excel. I assume other spreadsheet programs such as Google sheets or Libre Office would work.

Libre Office would work.  As for excel, if you upload to Google Drive, or OneDrive, then you have something that can read it.  So you should be fine, regardless of what you have, or don't have. 

Link to comment
Share on other sites

  • 0

Hi everyone,

First, I would like to share that I am very satisfied with DP&Scanner. This IS a "State of the art" software.

Second, I have personally experienced 4 HDD drives fail, burned by the PSU,(99% data was professionally $$$$ recovered) and a content information, would have been comfortable, just to rapid compare and have a status overview.

I also asked myself, how to catalog the pooled drives content, logging/versioning, just to know, if a pooled drive will die, if professional recovery make sense (again), but also, to check the duplication algorithm is working as advertised.

Being a fan of "as simple as it get's", I have found a simple free File lister, command line capable.

https://www.jam-software.com/filelist/

I have build up a .cmd file to export Drive letter (eg: %Drive_letter_%Label%_YYYYMMDDSS.txt), for each pooled drives. Then I scheduled a job to run every 3hours, and before running, just pack all previous .txt's into an archive, for versioning purposes. 

I get for each 10*2TB, 60% filled pooled HDD's, around 15-20MB .txt file (with excluding content filter option) in ~20minute time. An zipped archive, with all files inside, is getting 20MB per archive. For checking, I just use Notepad++ "Find in Files" function, point down to the desired .txt's folder path, and I get what I'm looking for, on each file per drive.

I would love to see such options for finding the file on each drive, built up in DP interface.

Hopefully good info, and not a long post.

Good luck!

 

Link to comment
Share on other sites

  • 0
On 5/13/2019 at 11:07 PM, Dizzy said:

Would the StableBit FileVault mentioned in the Dev Status page take care of this?

If so, we need that FileVault ASAP! :D

It's possible.  It's not feature set, at this point. So ... 

However, we're currently working on StableBit Cloud.  Once that's "shipped", then we'll see. (just FYI, I am fully behind FileVault and want it as much, or more than you, so "eventually")

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Answer this question...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...