Forum OpenACS Q&A: postmaster eating 99.9% of CPU

Collapse
Posted by Thomas Senn on
Hello everybody,

Well, we have been developing a site using OpenACS and strangely, as it begins to be used for real in a production environment (heavy but not overwhelming resources usage) , annoying things started to happen. The postmaster suddenly eats 99.9% of CPU and the aolserver, while not being *logically* down, is unable to answer requests and is thus *actually* down... And this lasts until a killall nsd followed by a /etc/rc.d/init.d/postgres restart (we are using OACS 4.6.3 with Postgres 7.2.2 and RedHadt on a dual proc / 2 GB RAM server) be issued.

The PG log says :

2003-12-09 16:38:17 DEBUG:  pq_recvbuf: unexpected EOF on client connection
2003-12-09 16:38:17 DEBUG:  pq_recvbuf: unexpected EOF on client connection
2003-12-09 16:38:17 DEBUG:  pq_recvbuf: unexpected EOF on client connection
2003-12-09 16:38:17 DEBUG:  pq_recvbuf: unexpected EOF on client connection
2003-12-09 16:38:52 DEBUG:  fast shutdown request
2003-12-09 16:38:52 DEBUG:  aborting any active transactions
2003-12-09 16:38:52 FATAL 1:  This connection has been terminated by the administrator.
2003-12-09 16:38:52 FATAL 1:  This connection has been terminated by the administrator.
2003-12-09 16:38:52 DEBUG:  pq_flush: send() failed: Broken pipe
2003-12-09 16:38:52 DEBUG:  pq_flush: send() failed: Broken pipe
2003-12-09 16:38:52 FATAL 1:  This connection has been terminated by the administrator.
2003-12-09 16:38:52 DEBUG:  pq_flush: send() failed: Broken pipe
2003-12-09 16:38:52 FATAL 1:  This connection has been terminated by the administrator.
2003-12-09 16:38:52 DEBUG:  pq_flush: send() failed: Broken pipe
2003-12-09 16:38:52 FATAL 1:  This connection has been terminated by the administrator.
2003-12-09 16:38:52 DEBUG:  pq_flush: send() failed: Broken pipe
2003-12-09 16:38:53 DEBUG:  shutting down
2003-12-09 16:38:55 DEBUG:  database system is shut down

These odd things happen, say, every 3-4 hours and are apparently related to the server's activity.

Recently, we changed the vacuumdb to happpen every hour insted of once a day, as a little search with google indicated that the problem might be a resources availability problem. We increased the max_connection value from 32 to 256 as well. But these steps don't make any change, maybe other params should be changed, like stack size, vacuum mem, etc. I read a few posts here and there, including https://openacs.org/forums/message-view?message_id=27321 down here and it seems that the "unexpected EOF on client connection" is quite usual. I've also learned that this could be trigerred by a certain request. And I've heard of problems with the aolserver postresql driver (version) as well.

Knowing this issue may have been treated in a post/update/bugfix already, we would really appreciate if someone very good at pgsql/aolserver or who had this kind of problem already, could give a few indications to undertand and solve it.

Thanks in advance.
Best reagards,

Thomas Senn
Startforyou.com

Collapse
Posted by Thomas Senn on
Just a precision : we are using version 7.2.4 of postgre actually.
Collapse
Posted by Jeff Davis on
I think you should check what requests are being served
on the server next time it is pegged (you can use
https://openacs.org/packages/t2.adp.txt to do that).

I fixed a couple of infinite loops in stored procedures
on the 5.0 branch but I doubt that is what is causing your
problem.