Forum OpenACS Development: Response to OpenACS wish-list

Collapse
Posted by Kapil Thangavelu on
jerry,

when i was talking about direct access to the index, i was refering
to the 'openindex' command from the api sketch in your previous
message, from which i assumed you were trying to interface directly
to the index files (via some sort of wrapping of the various search
libraries). its clear from your response that your are more
interested in an external search daemon with rich exposed interfaces
(possibly another aolserver).

re incremental indexing - i ran into this same problem when i was
interfacing zope to swish++, the solution i came up with was to
store ids/urls of 'live' documents in a persistent btree and filter
search results based on that and have a cron job periodically
rebuild the index based on documents in the btree. i'm not sure how
applicable this solution would be to an acs/aolserver integration.

regarding choosing among the various search engines, i think there
should be some consideration of the various formats that the engines
can index. i think this becomes more relevant when cms integration
is considered and there exists possiblities of pdfs, ps, word docs,
etc. that should be intergrated with the search. From a casual look
at the other referenced engines only swish++ seems to support
indexing these types of docs without the aid of external (outside of
distribution) programs and hacking. i might well be wrong about
this, if anyone knows otherwise i'd like to know.

i'm curious about what kinda of integration you would envision for
an external search engine and the database? seems to me alot of the
proper indexing behavior is very application specific.

kryszstof,

your right about adding persistent sockets to the swish++ search
daemon, the overhead in the system is very low for a new connection.
esp. for a number of clients <= to the number of preallocated
threads. my informal stress testing indicated that the bottleneck
was not swish++ but the webserver.