Forum OpenACS Development: ile based search

Collapse
Posted by Christian Brechbuehler on
Oracle Text (AKA Intermedia) can index documents in tables, in the file system, and on the web. Our current package "search" only does tables. I want to use files directly, specifically PDF documents (Oracle understands over 100 formats).

The package static-pages goes part of the way. It makes static HTML pages searchable by duplicating the content in the database. It also allows adding gc comments -- that seems very specific to HTML. (I noticed search is taking the approach of conversion scripts, and there are some provided for the essential formats, including PDF.) It is not quite what I want because I'd like to make more use of Oracle's capabilities.

So I'm considering creating a new package, for use with intermedia-driver and search. Any ideas, caveats, suggestions?

Collapse
2: File Based Search (response to 1)
Posted by Christian Brechbuehler on
Can't seem to edit the typo in the subject.
Collapse
3: Re: ile based search (response to 1)
Posted by Dave Bauer on
What I did for the original intermedia search was to store a copy of the indexed content in the site_wide_index table. This makes it simple to do queries and pull out headlines for objects since everything is in one place. This should be working the code is in the intermedia-driver package.

You can see in the search package there is a search-convert-procs.tcl http://cvs.openacs.org/cvs/openacs-4/packages/search/tcl/search-convert-procs.tcl?rev=1.2&view=log which could be modified to allow diffrent services to extract the text. We based our work on Oracle 8i and using intermedia was not easy on that version.