Forum OpenACS Q&A: Response to Full text search

Collapse
Posted by Don Baccus on
The full text search I implemented exactly mimics the original simple search implemented for photo.net before Context (now InterMedia) was available (and resurrected the first time or two Context was tried, since it had a nasty habit of looping infinitely and stuff like that).

It does a simple ranking based on a list of keywords - it's not phrased based.  The more keywords that are matched, the higher the score you get.  It doesn't weight for multiple occurances of keywords or anything like that.  It scales the return value so it lies between 0-100, 0 being "no keywords matched", 100 being "all keywords matched".

I suspect the simple Tcl ranking function could easily be twiddled to provide more finely-tuned search results - Tcl's a lot more fun for writing this kind of code than Oracle PL/SQL, that's for sure!  The current ranking function is about 10 lines of code...

But there's no way to avoid the basic problem that this hack requires a sequential scan of the bboard table (or any table you decide to search), so is inherently slow.  This is the major reason it is a
stopgap, as it won't scale.

But as Greg mentions, it indeed is better than nothing.  photo.net survived surprisingly well with this little hack for quite some time.

Right now Ben and I are leaning towards an out-of-database solution, since a good indexing solution in the database is likely to lead to slow inserts of posts, news items, and other searchable things.  Experience with InterMedia tends to back up that point of view (if not  outright poison our point of view!)

An out-of-database solution is fine, because you don't really need your search index to be ACID - if it hoses, you just rebuild it.

I've been playing with swish and swish++...