Forum OpenACS Development: Problem indexing PDF files with search package

1: Problem indexing PDF files with search package

Posted by Ryan Gallimore on 06/28/11 05:15 PM

I ran into a problem indexing a PDF from search. It appears search believes the CR file is of storage type "text" instead of "file".

This leads to search::content_get passing the file data in instead of the filename to pdftotext, which fails. I assume the other converters would fail too.

This seems to be the problem:

http://fisheye.openacs.org/browse/OpenACS/openacs-4/packages/acs-content-repository/tcl/search-procs.tcl?r2=1.9&r1=1.8

storage_type for all CR files is being set to text, and then content is being set to the revision data.

What was the reason for this change? Is it safe to revert it back to respect the file storage_type with content as filename?

2: Re: Problem indexing PDF files with search package (response to 1)

Posted by Emmanuelle Raffenne on 06/28/11 06:32 PM

Hi Ryan,

It looks like there's a unintentional change in that commit (not all of it though). I've corrected it and committed to HEAD and oacs-5-7 branch. It should fix it.

Sorry for the inconvenience.

3: Re: Problem indexing PDF files with search package (response to 2)

Posted by Ryan Gallimore on 06/28/11 08:03 PM

Thanks, Emma!