Forum OpenACS Q&A: Full text search

Collapse
Posted by mark rohn on
I see that full text searching is now working, is full text search
included in version 3.2.4 ?  Also how is full text searching done?

Thanks in advance

Collapse
Posted by Gregory McMullan on

See the thread of two days ago from Don Baccus entitled "Simple search tool available for bboard module" at http://new.openacs.org/bboard/q-and-a-fetch-msg.tcl?msg_id=0000Tz&topic_id=11&topic=OpenACS - Don says that this is only a stopgap measure, but it is clearly immensely better than nothing.

Collapse
Posted by Don Baccus on
The full text search I implemented exactly mimics the original simple search implemented for photo.net before Context (now InterMedia) was available (and resurrected the first time or two Context was tried, since it had a nasty habit of looping infinitely and stuff like that).

It does a simple ranking based on a list of keywords - it's not phrased based.  The more keywords that are matched, the higher the score you get.  It doesn't weight for multiple occurances of keywords or anything like that.  It scales the return value so it lies between 0-100, 0 being "no keywords matched", 100 being "all keywords matched".

I suspect the simple Tcl ranking function could easily be twiddled to provide more finely-tuned search results - Tcl's a lot more fun for writing this kind of code than Oracle PL/SQL, that's for sure!  The current ranking function is about 10 lines of code...

But there's no way to avoid the basic problem that this hack requires a sequential scan of the bboard table (or any table you decide to search), so is inherently slow.  This is the major reason it is a
stopgap, as it won't scale.

But as Greg mentions, it indeed is better than nothing.  photo.net survived surprisingly well with this little hack for quite some time.

Right now Ben and I are leaning towards an out-of-database solution, since a good indexing solution in the database is likely to lead to slow inserts of posts, news items, and other searchable things.  Experience with InterMedia tends to back up that point of view (if not  outright poison our point of view!)

An out-of-database solution is fine, because you don't really need your search index to be ACID - if it hoses, you just rebuild it.

I've been playing with swish and swish++...

Collapse
Posted by mark rohn on
What about PLS?  Now that it’s gone open source is it  not now a option?
Collapse
Posted by Krzysztof Kowalczyk on
I took a look at PLS web site and didn't get the impression that it's open-source. Quite the contrary. You can download binaries (after registration) but license states that "no dissasembly or reverse engineering allowed". The source code download states, that:

"The PLWeb Turbo 3.0 source code distribution allows you to rebuild the application-level binaries of PLWeb Turbo 3.0. The libraries that are part of the search engine (CPL) are provided as binaries in the CPL 6.3 binary distribution and cannot be rebuilt (i.e., no source code is provided)."

In summary: it's not an open-source software and nothing seems that it'll ever be.

Collapse
Posted by mark rohn on
Sorry, you are correct PLS is not open source, but it is free. The license does however read as follows.

2. License Grant. AOL hereby grants You a world-wide, royalty-free, non-exclusive license (a) to use, reproduce, sublicense and distribute the Licensed Software, including as part of one or more Integrated Works; (b) to provide support and maintenance to a third-party in the use of the Licensed Software and in the development and use of one or more Integrated Works; and (c) to use, reproduce, sublicense and distribute the Documentation in connection with all of the foregoing. Perl scripts incorporated with the Licensed Software may be used and modified to facilitate use of the Licensed Software or Integrated Works.

It may not be open source, but free is still free.

Thanks
Mark Rohn

Collapse
Posted by Krzysztof Kowalczyk on
Well, not really. The biggest problem is: if OpenACS' implementation of search is based on psl and pls gets ditched by AOL (and the fact that they're only providing RedHat 5.2 rpms, which, among others, leaves out Debian and probably others out of the equation, doesn't really give me a lot of confidence in their countinuing support) in a few months the whole work would get obsolete because old binaries would no longer work and you couldn't fix them yourself.

However, as the (real) Free Software philosophy goes: you've got the source (to OpenACS) and there is nothing that can stop you to provide OpenACS search interface for pls.