Forum OpenACS Development: Response to document to text conversion in search indexer

Posted by Tilmann Singer on
txt is postgresql specific, but this is something that would be needed for the oracle version as well, no? Assuming there will ever be an FtsContentProvider for oracle of course ...

Well, maybe FtsContentProvider for oracle could somehow make use of intermedia's own INSO filter stuff instead of replicating functionality that's already there, but that would still not solve the problem where to get the text for generating the abstract from. So we need a solution that works for both databases I think.

Also it should be considered that the text version might become huge - imagine someone uploading a book in pdf format to file storage. Some people might not want to store the text of this in an additional postgresql table. Ideally the text version should use the same storage method as the original content (don't know if that's possible, just loud thinking).

Another option might be to just not show that abstract for documents that require expensive transformation. Which would be a pity. And at least there should be an alternative provided in the form of a description, e.g. the first paragraph of the document or something like that. Which in turn would among other changes require an addition to the content provider service contract.

Or a parametrizable limit on the size of the text version?