I ran into a problem indexing a PDF from search. It appears search believes the CR file is of storage type "text" instead of "file".
This leads to search::content_get passing the file data in instead of the filename to pdftotext, which fails. I assume the other converters would fail too.
This seems to be the problem:
http://fisheye.openacs.org/browse/OpenACS/openacs-4/packages/acs-content-repository/tcl/search-procs.tcl?r2=1.9&r1=1.8
storage_type for all CR files is being set to text, and then content is being set to the revision data.
What was the reason for this change? Is it safe to revert it back to respect the file storage_type with content as filename?