Forum OpenACS Q&A: Searching a PostgeSQL database

Hello: What are my options as of now for doing a text search of a
PostgreSQL database? I'm interested in free as well as commercial
software packages. Comments anyone? Thanks.
Collapse
Posted by Don Baccus on
Using "like" on your varchar (or lztext) contents is the only way, at the moment.  Be patient, solving this problem is near the top of the list for the OpenACS project.

The problem for web work is that solutions that depend entirely on the  database for building and containing an index for searching tend to be slow, complex, and troublesome.  Oracle's InterMedia, for instance.  Photo.net's not automatically indexed at this point, but rather is done by hand every couple of weeks, because of InterMedia problems.

There is something out there called PLS, which is also owned by AOL.  It builds indices on its own for searching.  Since these can easily be  reconstructed from the database, there's really no need for search indices to be maintained in the database.  PLS has the reputation of being very fast and very good.

It's not open source and not supported under Linux, at the moment.  There is some hope that AOL might open source it.  If not, we need to search for another solution.

Collapse
Posted by Albert Langer on
Not sure if this helps, but can't do much harm.

Postgresql has a contributed function for doing substrings searches more efficiently than LIKE. I know someone who wrote an alternative to it in PGPLSQL or whatever the procedural language for "triggers" is called, using a faster algorithm and accepting substring queries in the same format as LDAP. Works by indexing each of the 10 rotated versions of a 10 letter word separately so that a search for '*SUB' or '*SUB*' will be just as fast as for 'SUB*'. Doesn't need to search the whole table, just looks up the (much larger) index. So searches are dramatically faster (but inserts slower and much more disk space, though the latter is pretty irrelevant these days).

Last I saw it was working but hadn't been packaged up for release or integrated as an internal Postgresql function. I'm sure he'll be happy to pass it on "as is". If anyone is interested, let me know.

Doesn't deal directly with full text searching, only relatively small text fields within database. But could be relevant if people are looking at different ways to do search related things.

Collapse
Posted by Don Baccus on
That's a cute approach because it lets you use an index on the search rather than a sequential scan.  However, the size of the table will get huge for really busy sites that want to index full content, and you'll run into the 2 GB table size limit fairly quickly, I fear.

So the technique may have limited applicability for a general solution.  For the right scenario, though, it could work really well.