Forum OpenACS Development: Response to document to text conversion in search indexer

Posted by Tilmann Singer on
After adding document conversion for html, msword and pdf files to search_content_filter I realized that this proc is not only called from the indexer but also when displaying the results, to produce an excerpt of the matching document with the matches highlighted (which looks very good btw).

So this results in exec'ing the external conversion programs for every matching document everytime the results page is displayed, which is of course inacceptable.

Any suggestions on how to deal with this? Saving a text version for each document that has to be converted, in parallel to the actual content is unavoidable, at least if we don't want to loose the nice abstracts in the search results.

Where / how should that be done?