Restoring Images and More from "Broken" Storage Media ====================================================== What do I mean by "broken" ? ---------------------------- - still "raw readable" in particular: not gone through a shredder, nor explicitly 0-ed out. - "broken" in the sense of the filesystem: + flash-media suddenly turned "read-only", cached directory entries not written back. + Files removed (even bypassing Trashcan). + Card has undergone a "quick" format. Requisites: ----------- - A computer/OS that can read out storage media in "raw" mode. - "hd" (hexdump) and "grep" - Tcl, of course! :-) - "seqf" - helper tool (another tcl-script) - "scanforALL" - the main tcl-script of this presentation - "params.tcl" - A file with settings for a recovery project... What's on a disk? ----------------- - partition table, boot-sector, "FAT" -- boring (and not reliable) - main directory (on older media, no longer on exFAT) - all the clusters + Chunks of constant size of all contents (files and subdirs) What does "scanforALL" do? -------------------------- - scanforALL grep [outFile] -- scan the disk image for certain "known" file-starts: subdirectories, JPEGs, some video formats. This gives me a headstart to finding the "parameters" needed for the rest - still some "math" needed. - scanforALL analyze [outfile] -- based on params, try to classify each cluster. (e.g.: looks like JPEG, - scanforALL known [infile] [outfile] -- extract the simple cases, and store information about not yet classified clusters that remain outside of identified files. - scanforALL collect [.ext] ranges... -- allow me to extract a file based on ranges of clusters. - scanforALL link [dir-entries] -- connect subdirectory entries to extracted files. Hands-on test: -------------- - given "disk.img" a raw-dump from a USB-stick. - creating "params.tcl" set env(IMG) disk.img ;# name of raw dump set env(OFS) 0x0 ;# offset to first cluster, 0 for start. set env(BS) 0x1 ;# block/cluster size, 1 for start. set env(FMT) %06d ;# format for cluster numbers for first step, only IMG is relevant (but all need to exist) - scanforALL grep creates a file "all-starts" to be opened in a viewer: 00bed000 ff d8 ff e1 47 84 45 78 69 66 00 00 49 49 2a 00 |....G.Exif..II*.| 0114b000 ff d8 ff e1 42 68 45 78 69 66 00 00 49 49 2a 00 |....BhExif..II*.| 0148e000 2e 20 20 20 20 20 20 20 20 20 20 10 00 8a 63 66 |. ...cf| 02055000 ff d8 ff e1 46 dc 45 78 69 66 00 00 49 49 2a 00 |....F.Exif..II*.| 023dc000 ff d8 ff e1 47 44 45 78 69 66 00 00 49 49 2a 00 |....GDExif..II*.| 03414000 ff d8 ff e1 56 50 45 78 69 66 00 00 49 49 2a 00 |....VPExif..II*.| 0469e000 ff d8 ff e1 52 64 45 78 69 66 00 00 49 49 2a 00 |....RdExif..II*.| We see: all starts are multiples of 0x1000 => we can "set env(BS) 0x1000" in params.tcl - open the disk image file in a viewer and "guess" the offset of first cluster. If we mis-guess, we won't be able to match directory entries to extracted files. Here, we believe to find the start: 0001b0a0: 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 | | * -> 00021000: 54 45 53 54 44 49 53 4B - 20 20 20 08 00 00 46 76 |TESTDISK Fv| 00021010: E9 58 E9 58 00 00 46 76 - E9 58 00 00 00 00 00 00 | X X Fv X | 00021020: 41 4B 00 65 00 79 00 6E - 00 6F 00 0F 00 CE 74 00 |AK e y n o t | 00021030: 65 00 2E 00 6D 00 70 00 - 34 00 00 00 00 00 FF FF |e . m p 4 | so, we "set env(OFS) 0x00021000" in params.tcl and are ready to go. - scanforALL analyze list-1 ;# knowing cluster boundaries, it now makes a better go at identifying files. 000001 NULL 000001-000004 (4) 000005 MP4 003020 JPEG EXIF 004394 JPEG EXIF 005229 SUBDIR 008244 JPEG EXIF 009147 JPEG EXIF 013299 JPEG EXIF 018045 JPEG EXIF 032735 ENDE It didn't recognize the main dir, we can add it manually: We change the "000001 NULL ..." to start with 000000 and give it label "SUBDIR": 000000 SUBDIR 000005 MP4 ... ... - scanforALL known list-1 list-2 ;# this will already extract the JPEGs to hardcoded "out/j" directory and write out a more detailed list-2: 000000 SUBDIR Ok 000000-000001 Warning: unexpected "c1" Warning: unexpected "c1" *000002 EXTRA 000002-000004 (3) 000005 MP4 Ok 000005-000324 size 1309576 ftyp[(28)] moov[(1309540)] \ free[(8)] truncated after moov at "mdat[(54614054)]". 003020 JPEG EXIF Ok 003020-004393 size 5627904 1st APP1:['Exif':boII:42:o8] Len=5626791 004394 JPEG EXIF Ok 004394-005228 size 3420160 1st APP1:['Exif':boII:42:o8] Len=3416868 005229 SUBDIR Ok 005229-005230 *005231 EXTRA 005231-008243 (3013) 008244 JPEG EXIF Ok 008244-009146 size 3698688 1st APP1:['Exif':boII:42:o8] Len=3697419 009147 JPEG EXIF Ok 009147-010284 size 4661248 1st APP1:['Exif':boII:42:o8] Len=4657423 *010285 EXTRA 010285-013298 (3014) 013299 JPEG EXIF Ok 013299-015030 size 7094272 1st APP1:['Exif':boII:42:o8] Len=7092089 *015031 EXTRA 015031-018044 (3014) 018045 JPEG EXIF Ok 018045-018963 size 3764224 1st APP1:['Exif':boII:42:o8] Len=3761186 *018964 EXTRA 018964-032734 (13771) 032735 ENDE There is some "chance", that the MP4 may extend from 000005 to 003019 and maybe continue in the further "EXTRA" clusters... - scanforALL collect .mp4 000005-003019 005231-008243 010285-013298 015031-018044 018964-032734 and try playing out/file-000005.mp4 ... nope, that's not yet it... let's check the clusters after the SUBDIR... - scanforALL collect 005230-005234 It turns out, that SUBDIR was "too greedy", cluster 005230 didn't look like part of a directory. - scanforALL collect .mp4 000005-003019 005230-008243 010285-013298 015031-018044 018964-032734 looks good. The Video actually contains lots of garbage at the end. For some video files, the script can determine the correct file-size, though not yet when it is "interrupted". What about directories? ----------------------- - the "scanforALL known ..." invocation also dumped lines to stdout. Let's now pretend we captured them with a redirection to "list-dir", which contains: CLINFO 000000 FILE {2024.07.09 14:50:12.00} TESTDISK.___ CLINFO 000003 FILE {2024.07.09 12:52:12.00} KEYNOTE_.MP4 CLINFO 003018 FILE {2024.07.09 12:50:46.00} IMG_3186.JPG CLINFO 004392 FILE {2024.07.09 12:50:46.00} IMG_3194.JPG CLINFO 005227 DIR/ {2024.07.09 12:52:08.00} MORE____.___ CLINFO 005227 DIR/ {2024.07.09 12:51:06.00} ._______.___ CLINFO 000000 DIR/ {2024.07.09 12:51:06.00} ..______.___ CLINFO 008242 FILE {2024.07.09 12:51:36.00} IMG_3206.JPG CLINFO 009145 FILE {2024.07.09 12:51:36.00} IMG_3222.JPG CLINFO 013297 FILE {2024.07.09 12:51:56.00} IMG_3183.JPG CLINFO 018043 FILE {2024.07.09 12:52:08.00} IMG_3184.JPG - ./scanforALL link list-dir 000003 missing - should be "KEYNOTE_.MP4" 003018 missing - should be "IMG_3186.JPG" 004392 missing - should be "IMG_3194.JPG" 008242 missing - should be "IMG_3206.JPG" 009145 missing - should be "IMG_3222.JPG" 013297 missing - should be "IMG_3183.JPG" 018043 missing - should be "IMG_3184.JPG" Oh, no, we picked the wrong "OFS", the cluster numbers are "off-by-2". Either we redo the steps above with correct OFS, or I just edit the file and incr each by two, so it fits :-) - then we move the video into the subdir of movie-clips: mv out/file-000005.mp4 out/m/ and run "./scanforALL link list-dir" again: 000002 missing - should be "TESTDISK.___" (volume label; ignore) and look in out/lnk: lrwx [...] file-000005-KEYNOTE_.MP4 -> ../m/file-000005.mp4 lrwx [...] file-003020-IMG_3186.JPG -> ../j/file-003020.jpg lrwx [...] file-004394-IMG_3194.JPG -> ../j/file-004394.jpg lrwx [...] file-008244-IMG_3206.JPG -> ../j/file-008244.jpg lrwx [...] file-009147-IMG_3222.JPG -> ../j/file-009147.jpg lrwx [...] file-013299-IMG_3183.JPG -> ../j/file-013299.jpg lrwx [...] file-018045-IMG_3184.JPG -> ../j/file-018045.jpg End ---