Forum OpenACS Q&A: postmaster eating 99.9% of CPU

Hello everybody,

Well, we have been developing a site using OpenACS and strangely, as it begins to be used for real in a production environment (heavy but not overwhelming resources usage) , annoying things started to happen. The postmaster suddenly eats 99.9% of CPU and the aolserver, while not being *logically* down, is unable to answer requests and is thus *actually* down... And this lasts until a killall nsd followed by a /etc/rc.d/init.d/postgres restart (we are using OACS 4.6.3 with Postgres 7.2.2 and RedHadt on a dual proc / 2 GB RAM server) be issued.

The PG log says :

2003-12-09 16:38:17 DEBUG: pq_recvbuf: unexpected EOF on client connection
2003-12-09 16:38:17 DEBUG: pq_recvbuf: unexpected EOF on client connection
2003-12-09 16:38:17 DEBUG: pq_recvbuf: unexpected EOF on client connection
2003-12-09 16:38:17 DEBUG: pq_recvbuf: unexpected EOF on client connection
2003-12-09 16:38:52 DEBUG: fast shutdown request
2003-12-09 16:38:52 DEBUG: aborting any active transactions
2003-12-09 16:38:52 FATAL 1: This connection has been terminated by the administrator.
2003-12-09 16:38:52 FATAL 1: This connection has been terminated by the administrator.
2003-12-09 16:38:52 DEBUG: pq_flush: send() failed: Broken pipe
2003-12-09 16:38:52 DEBUG: pq_flush: send() failed: Broken pipe
2003-12-09 16:38:52 FATAL 1: This connection has been terminated by the administrator.
2003-12-09 16:38:52 DEBUG: pq_flush: send() failed: Broken pipe
2003-12-09 16:38:52 FATAL 1: This connection has been terminated by the administrator.
2003-12-09 16:38:52 DEBUG: pq_flush: send() failed: Broken pipe
2003-12-09 16:38:52 FATAL 1: This connection has been terminated by the administrator.
2003-12-09 16:38:52 DEBUG: pq_flush: send() failed: Broken pipe
2003-12-09 16:38:53 DEBUG: shutting down
2003-12-09 16:38:55 DEBUG: database system is shut down

These odd things happen, say, every 3-4 hours and are apparently related to the server's activity.

Recently, we changed the vacuumdb to happpen every hour insted of once a day, as a little search with google indicated that the problem might be a resources availability problem. We increased the max_connection value from 32 to 256 as well. But these steps don't make any change, maybe other params should be changed, like stack size, vacuum mem, etc. I read a few posts here and there, including https://openacs.org/forums/message-view?message_id=27321 down here and it seems that the "unexpected EOF on client connection" is quite usual. I've also learned that this could be trigerred by a certain request. And I've heard of problems with the aolserver postresql driver (version) as well.

Knowing this issue may have been treated in a post/update/bugfix already, we would really appreciate if someone very good at pgsql/aolserver or who had this kind of problem already, could give a few indications to undertand and solve it.

Thanks in advance.
Best reagards,

Thomas Senn
Startforyou.com