Forum OpenACS Development: Re: Comments within cr_check_mime_type

Collapse
Posted by Antonio Pisano on
Hi Iuri,

some approaches available now which you might consider for filetype detection is:
- available tcl API (e.g. [1])
- wrapping the "file" command line utility (on unix-like systems).

Keep in mind that, although stricter, checking file type by its content is much more expensive than the lazy, extension-based approach (requires file IO, sometimes an exec...). Also, the more you want recognition to be "type specific" (e.g. number of pages in a pdf, width of a png...) and the more is likely you will need a special tool/lib for this.

About pdfs, in [1] you can see they are also recognized (but you should check with different variants). If you need some further content inspection, the pdfinfo command from the poppler-utils works quite fine and we have also some wrapping for this in [2]

[1] - https://core.tcl.tk/tcllib/doc/tcllib-1-18/embedded/www/tcllib/files/modules/fileutil/fileutil.html#11
[2] - https://openacs.org/api-doc/proc-view?proc=util::pdfinfo&source_p=1

Collapse
Posted by Iuri Sampaio on
Thanks Antonio.
That's precisely the information I need. I'm aware that depending on the feature a third party app would be implied. Plus I/O would mean performance decrease, and so on.

A good example is based on what I've seen from Dave, in the scenarios ImageMagick has been applied, and etc.

The main idea, which derived these daydreams, was because sometimes we deal with files, uploaded by the users, and the system is providing them to other users. Meaning, my system could potentially harm/infect another computer if malicious or even unaware members upload their files within viruses, and/or malicious code, macros etc.

Currently, NGINX is blocking most of the dangerous ones, and the rest I have left to OACS god's hands!

I know there are tons to be implemented still. The post was a very good coincidence between hat I have experienced, by writing code, and the comments that I found within ad_proc cr_check_mime_type.

Once in a while, I get stuck on basic/fundamentals troubleshooting and I decide to recycle a bit, instead of rushing things up.