Forum OpenACS Q&A: Immune aolserver-error process

Collapse
Posted by Hans Gaasenbeek on
Hello!

I have 6 instances op OpenACS running (RH 7.1, the rpm version of
OpenACS) and after a few day's use, I can see a process called
"aolserver-error" in top, taking up all CPU resources when the system
is idle. It cannot be killed by #killall -9 aolserver-error (or pid),
nor by restarting aolserver or postgres.

Anybody have an idea as to
how to get rid of it - apart from rebooting the system - (and how to
prevent it from coming back)? I have a suspicion that it has to do
with PostgreSQL preventing Aolserver to connect when there are too
many clients already connected, so I should raise the maximum amount
of clients.

Collapse
2: (bump!) (response to 1)
Posted by Cathy Sarisky on
I discovered aolserver-error eating CPU resources today.  Fortunately it proved killable by kill pid (while root).  A restart did not kill it.

Hans, did you ever get this problem figured out?

I'm running oACS 3.2.5 (from RH rpms, on a mandrake 8.0 system).  I'm doing virtual hosting with Jerry's vhosting module.

Anyone have any ideas on this one?  I like the emailed errors, since I can't sit next to my server and watch it all day, but having aolserver-errors eat all my CPU is not good!

Collapse
Posted by Don Baccus on
The script aolserver-errors.pl in bin loops infinitely in some cases (on some log files).  We got bit by that on the openacs.org server not long ago. It's run by "watchdog".

I've not had time to look into this, it would sure be nice if someone who loves Perl would find the problem and fix it.

Collapse
Posted by Steve Woodcock on

Don, here's the patch we made to aolserver-errors.pl to fix the infinite loop problem. I'll post it elsewhere if this gets mangled...

Index: aolserver-errors.pl
===================================================================
RCS file: /usr/local/cvsroot/intranet/packages/scholastic-core/bin/aolserver-errors.pl,v
retrieving revision 1.1
retrieving revision 1.1.2.1
diff -u -r1.1 -r1.1.2.1
--- aolserver-errors.pl 2002/01/31 16:37:39     1.1
+++ aolserver-errors.pl 2002/02/01 22:50:49     1.1.2.1
@@ -90,9 +90,21 @@
     $start_time = sprintf "%02d%02d%02d%02d", (localtime(time - (60*$num_minutes)))[4,3,2,1];

     seek LOG, -$bite_size, 2;
+    $search_from = tell(LOG);

     while (1) {
         while () {
+
+           # Don't search past where we got to last time, rewind to
+           # where we started and leave (next iter will go back
+           # another chunk)
+           if (defined $search_to) {
+               if (tell(LOG) >= $search_to) {
+                   seek LOG, -$bite_size, 1;
+                   last;
+               }
+           }
+
             if (/^[([0-9]+)/([A-Za-z]+)/([0-9]+):([0-9]+):([0-9]+)/) {
                 my($day, $month_name, $year, $hour, $minute) = ($1, $2, $3, $4, $5);

@@ -122,7 +134,15 @@
                     # the end of the file. If it's the second case, we
                     # need to set the starting point to the end of the file.
                     $starting_point = $last_position unless $starting_point;
-                }
+                } else {
+                   # We found a dated entry but we need to go further
+                   # back. Go back from where we started from last
+                   # time, to avoid infinite loop where the whole
+                   # $bite_size doesn't have any dates
+
+                   $rewind = (tell(LOG) - $search_from); # how far to rewind to get back to search_from
+                   seek LOG, -$rewind, 1;
+               }
                 # We only need to get one time stamp
                 last;
             }
@@ -130,11 +150,11 @@

         last if defined $starting_point;

+       $search_to = tell(LOG); # don't bother searching after where we are now
         seek LOG, -$bite_size, 1;
-
-        $position = tell LOG;
+       $search_from = tell(LOG);

-        if ($position < $bite_size) {
+        if ($search_from < $bite_size) {
             # then we need to read the entire file
             $starting_point = 0;
             last;

Collapse
Posted by Steve Woodcock on

Oops, the backslashes didn't make it so I put a copy of the whole thing in file storage https://openacs.org/new-file-storage/one-file?file_id=306. Sorry for the noise.

Collapse
Posted by Cathy Sarisky on
Thank you Steve!  I'll give this a try.
Collapse
Posted by Don Baccus on
Hey, Steve, thanks!  I'll have to pick that up and apply it to CVS ...