Forum OpenACS Q&A: Search indexer slows down entire OACS?

On a vanilla OACS 4.5/ Postgres 7.2.1 install, I added about 2 gigs of static content (a mix of HTML and image files) in a subsite and let the "Static pages" module loose on that subsite. The initial indexing run already took a couple of days, I think (fortunately, I was able to escape on vacation). After starting this, the site was usable only when I re-niced the running nsd/ postmaster processes to a lower priority.

Now, after a database/ server restart, it looks like search_indexer is taking up all of the machine's time (and, over time, memory too). Even with re-nicing, many pages on the site (especially the Site Map) are unusable.

The log shows many instances of this...

NOTICE:  identifier "acs_object_util__get_object_type" will be truncated to "acs_object_util__get_object_typ"

... and, more seldom, this:

[11/Dec/2002:16:06:13][16005.2051][-sched-] Error: Ns_PgExec: result status: 7 message: ERROR:  Cannot insert a duplicate key into unique index index12_key
[11/Dec/2002:16:06:13][16005.2051][-sched-] Error: Aborting transaction due to error:
Database operation "dml" failed

I am only guessing here, but it seems as if these messages come out of the search_indexer procedure...

My guess at this time is that the initial indexer run did not finish at all, and search_indexer is trying to look at the whole 2 gigs of HTML and images again. Is this really supposed to be this slow, and is patience my only hope? Or can I do anything to make this go faster?

Alternatively, how can I pull the 2 gigs of stuff out of OACS without breaking data integrity or anything else? Is it enough to simply move the directory to a different name?

Thanks for any help!

Helge

Collapse
Posted by Jeff Davis on
The truncation message is just an informational message (of dubious value) from postgresql. We see the second one on openacs.org with some frequency but I have never tried to track down why it happens, indexing does seem to continue after that though. The indexer is not terribly fast however so what you might want to do is copy the search_observer_queue to another table, truncate it, and feed things back into it in off hours for indexing.

I think indexing 5000 bboard posts took about 3 hours on openacs.org (although I was not watching it that carefully). Not sure how that scales with size or with number of items though.

Collapse
Posted by Dave Bauer on
Helge,

You can stop the indexer by doing

<pre>
DELETE from search_observer_queue;
</pre>

which will remove all the items pending in the queue.

It also looks like you have a copy of OpenFTS 0.2 with a bug in it.

Check this thread: https://openacs.org/forums/message-view?message_id=52363

You best bet if you have that bug is to fix it, and restart the AOLserver which should allow the indexer to finish.

Collapse
Posted by Dave Bauer on
Jeffs idea is better, but check for the bug.
Collapse
Posted by Helge Wilker on
Thank you for your hints. My OpenFTS had that bug indeed, however, the indexing did not finish and the machine continued to be heavily loaded.

I tried moving the entries from search_observer_queue, but that did not help either. As a last resort, I truncated the static_pages table and disabled the "Static pages" package.

The machine load is back to zero now -- page loads, especially for admin pages, are still very slow, though. I guess I'll have to hunt for some tuning tricks now. And I do hope that having the database on a RAID-5 file system was not such a bad idea...

Cheers,
Helge

Collapse
Posted by Jeff Davis on
Did you do a vacuum after doing the big insert?
I guess it's too late now but that might have helped.
Collapse
Posted by Helge Wilker on
Yes, VACUUM... I must have been dreaming about "VACUUM not necessary anymore in PostgreSQL". Of course it was too late when I thought about vacuuming the database. Now I'm in the middle of uninstalling "Static pages" completely (the automatic uninstall broke).

My plan is to try all this (installing "Static pages", creating static content, indexing it, ...) again with a somewhat smaller set of data.

Again, thanks for your help!

Cheers,
Helge

Collapse
Posted by Bart Teeuwisse on

Hmmm. This might be an old bug resurfacing. I'm running OpenACS 4.6 with OpenFTS 0.3.2 in combination with PG 7.3. And I'm experiencing the exact same problem:

Cannot insert a duplicate key into unique index index10_key

I've verified that this not caused by the OpenFTS 0.2 bug. Moreover, I'm experiencing this problem with both OpenFTS 0.3.1 and 0.3.2.

It turns out that I'm only experiencing this problem in PG 7.3. Dan, have you tried OpenFTS with PG 7.3?

/Bart

Collapse
Posted by Dan Wickstrom on
Bart,

I haven't yet tried openfts with pg 7.3, so it's possible there are some problems there.  pg 7.3 also doesn't work with openacs, so I was going to wait until the pg 7.3 fixes were merged in with openacs before fixing openfts.