Forum OpenACS Development: Response to OpenACS wish-list

Collapse
Posted by Jerry Asher on
I thought date ranges would help with the "incremental index rebuild" situation.

Most of these indexing programs don't let you delete content from the indexes, but do let you merge indexes together.  Also, most of the search programs can't work with the index while it is being rebuilt, but need to be restarted when the new index is ready.

The reason I suggested date ranges, is that I was thinking that one way to implement the "big reindex" of a site is by implementing lots of "little reindexes" of the site that then get merged into the current index.  I don't think that would cut necessarily the overall work down, but it would let web administrators schedule the work better.

For instance, with a site that had five years worth of content, where stuff that was added in the past might change or get deleted, you could implement several different strategies depending on your site's needs:

1.  Every night, reindex the whole site.

2.  Keep yearly indexes, this years index, and today's index.  Every hour, rebuild today's index the new stuff, and every night, index one year's worth of old stuff, creating a new index by merging the latest index with the other indices together (minus the index that just got rebuilt)

3.  Keep indexes on a monthly basis.  Every night, index the last month's worth stuff, and every Sunday night, reindex the whole site.

Anyway, I am think this could be done with a date range, but don't see how a more opaque token string would help the incremental merge situation.