Forum OpenACS Q&A: Full Text Indexing - External prog a solution?

1: Full Text Indexing - External prog a solution?

Posted by Patrick Giagnocavo on 05/31/00 08:49 PM

Hello,

I am working at setting up an OpenACS site, and notice that full text
indexing is not yet available.

I am curious if the following might work:

1. Using wget or other page-grabbing program, pull down all dynamic
and static content into a directory and sub-directories (not
necessarily under pageroot). Could use cron to grab latest results
every few hours.

2. Run indexing program on those pages.

3. When indexing program is called via Web form, modify results
returned to point to the dynamic content, not the static content.

Has anyone done this already? It might be a cheap hack, but if it
works it will be fine for now.

I have been looking at the Isearch program
(http://www.cnidr.org/ir/isearch.html) but have not yet compiled it.
It appears to be completely free.

./p

2: Response to Full Text Indexing - External prog a solution? (response to 1)

Posted by Aaron Swartz on 05/31/00 10:33 PM

I'm using ht://dig to do something like this. (www.htdig.org) Basically, the ht://dig configuration file lets you specify a "start" page and then the depth of links to follow. So what I did was I created a Tcl page that links to all of the useful content on the site and set the link depth to one. So far it works pretty well.