Yesterday, while I was having fun, debugging ad_procs I found cr_check_mime_type, and some interesting comments within it.
# TODO: we use only the extension to get the mimetype. Something
# better should be done, like inspecting the actual content of the
# file and never trust the user on this regard, but as this
# involves changes also in the data model, we leave this for the
# future. Usages of this proc in the systems are already set to
# give us the path to the file here.
api-doc/proc-view?proc=cr_check_mime_type&source_p=1
I always had the same feeling, but I'd also always postponed further analysis on this subject, until now. I tried to write a TCL chunk to inspect what's within the file.
But it needed too much customization, and sometimes even files, with the same extensions (let's say PDF files) had a different schema, if they were from different sources. To each of them, I need to customize something.
For example, .docx files exported to PDF, using a PDFCreator and other file exported by MS Office own's converter.
In another case, the TCL chunk scanned PNG images, one exported from Adobe Photoshop, and the other created by screenshot feature on MAC. They had different schema too. Only one had PNG within its first line.
set file_type [file type ${file.tmpfile}]
ns_log Notice "TYPE $file_type"
set file_extension [file extension ${file.tmpfile}]
ns_log Notice "FILE EXT $file_extension"
set fl [open ${file.tmpfile}]
set f_line [gets $fl line]
ns_log Notice "LINE $f_line"
set data [read $fl]
ns_log Notice "FILE \n $data"
There's some interesting code, written to images. Thanks Dave!
Is there any to inspect/scan PDF files?
api-doc/proc-view?proc=image::identify_binary&source_p=1
and
api-doc/proc-view?proc=image::imagemagick_identify&source_p=1