Forum OpenACS Development: OpenFTS indexing non-text formats revisited
Right now I am looking at a solution that will be keyed to a content item's mime type that will [exec] an external program to extract the text of an item and return that in the datasource service contract tcl procedure.
The result of this is that the text would be extracted on every call to that service contract.
The alternatives include: storing a text version parallel to the binary item in the filesystem.
Storing a text version as a related content item in the content repository.
Storing text of items to be indexed in a seperate table in the database.
Eventually I'd like to get whatever solution I find back into OpenACS.
As for the storage, have it in a seperate table, so we could have this seperate table on a different partition or even a different server if need be. I'm just thinking about the many many files e.g. AIESEC has in their file storage.
I am going to be working on (hopefully with some help) on integrating the latest tsearch2 for Postgresql into search. More details when some exist.