Forum OpenACS Development: Response to OpenACS wish-list

Collapse
Posted by Kapil Thangavelu on
more thoughts on text indexing.

- its probably important to differentiate between aolserver and acs
solutions.

- an acs solution should probably have some integration with the cms
as this seems to be the common store for application content data,
and has a lot of 'free' info there (re mime/types), plus cms
integration might lower the application programmer burden for adding
search capabilities + maintainence.

- conversion of non-text to text, while there are alot of 3rd party
tools to do conversions from any particular document format, as has
already been mentioned in this thread swish++ comes with a tool to
extract text from binary data. i think this offers a great deal of
convience esp since 3rd party tools like wvware can be a pain to
compile on a server since they have lots of nested depends.

- aaran swartz suggested lucene.sourceforge.net as a possible
indexing mechanism, and i'm pretty impressed by its capabilites, 1mb
indexing heap, fast indexing, updates on indexes while being
searched, merged searches of multiple indexes, flexibility in
document definition, path limiting queries (in cvs). it would need a
socket server interface or perhaps an xml-rpc to be useful from
aolserver.

- also its probably worthwhile to check out the acs5 take on
searching and search metadata.
http://developer.arsdigita.com/acs-java/doc/services/search/doc/index.html