Forum OpenACS Development: Problem indexing PDF files with search package

I ran into a problem indexing a PDF from search. It appears search believes the CR file is of storage type "text" instead of "file".

This leads to search::content_get passing the file data in instead of the filename to pdftotext, which fails. I assume the other converters would fail too.

This seems to be the problem:

http://fisheye.openacs.org/browse/OpenACS/openacs-4/packages/acs-content-repository/tcl/search-procs.tcl?r2=1.9&r1=1.8

storage_type for all CR files is being set to text, and then content is being set to the revision data.

What was the reason for this change? Is it safe to revert it back to respect the file storage_type with content as filename?

Collapse
Posted by Emmanuelle Raffenne on
Hi Ryan,

It looks like there's a unintentional change in that commit (not all of it though). I've corrected it and committed to HEAD and oacs-5-7 branch. It should fix it.

Sorry for the inconvenience.

Collapse
Posted by Ryan Gallimore on
Thanks, Emma!