Forum OpenACS Q&A: Problem with AOLServer respawning

My AOLServer process dies about once a week. It's not a big deal,
since I can use inittab to respawn the process.  However, the
respawning doesn't always work correctly.
<br><br>
Sometimes the new nsd process is spawned before the old one releases
the port, causing a "port already in use" error.  Since nsd doesn't
exit after reporting this error, inittab doesn't try to respawn, and
the site stops responding until I restart nsd manually.
<br><br>
If it helps, the line in my inittab is
<br>
<pre>
nss:2345:respawn:~nsadmin/bin/nsd-postgres -i -t ~nsadmin/rym.tcl -u
nsadmin</pre>

And the thread that's hanging on to the port isn't defunct;  I'm
assuming that it's just trying to finish serving a page before
exiting.
<br><br>
Has anyone else had this problem with AOLServer?  Any recommendations?

<br><br>Thanks,
<br><br>
Hossein
<br><br><br>

By the way, I apologize if this has been answered before; I searched
the archives with no success.

Collapse
Posted by defunct defunct on
Yes, we've had this problem quite a lot...
<p>
I don't know if your problem will be similar to ours, but essentially it appears to be related to detach threads.
<p>
We use the SMS broker code which means quite a few detached threads hanging around, and a number of scheduled procs waiting to go..
<p>
when AOLServer attempts to go down, for some reason it appears to wait for these detached threads/scheduled procs, even if they aren;t du to fire for a time yet....
<p>
My only (slightly naff) suggestion is to use a shell script ot kick off AOLServer (from inittab), place a reaonable pause in it to give the system chance to come down, then if process still hang around you could be bit drastic and kill -9 them..(yikes)
<p>
Not sure why this behaviour, although we don't seem to have seen it as much with newer linux kernels. perhaps there's been some stuff done to the linux threading model?
<p>
Anyway, now were on Mandrake 8.2, we've not seen this again.
Collapse
Posted by Andrew Piskorski on
FYI, while habitually kill -9'ing anything, AOLserver included,
usually doesn't sound like a very good idea, I've seen it done many,
many times, on Solaris, and I can't remember ever seeing it have any
bad effects.  (There was suspicion way back when - at least two years
ago now I think - that it might be related to leaving lots of orphaned
Oracle connection processes on the box, but that problem has long
since vanished.)

So I'd say go ahead and kill -9 AOLserver if you think you need to.  It
doesn't seem to hurt anything.  Perhaps ask on the AOLserver list if
you want an expert opinion.

Collapse
Posted by Ken Mayer on
I think I remember tweaking the closewait parameter in nssock (from the default 2 seconds to 0). That was long ago, and I may be remembering it from a different problem (it was related to nsvhr-ness).
Collapse
Posted by russ m on
The main reason I'm aware of for avoiding "kill -9" unless
absolutely nescessary is it doesn't let a process release kernel
resources like shared memory segments... Whether nsd actually
uses any kernel resources that aren't garbage collected at
process completion is another question though...

Another way to pause nsd startup to work around Hossein's
problem is to stick an "after 5000" in the site configuration .tcl file
to pause nsd for 5 seconds while starting (hopefully long
enough for any lingering connections to dissapear)... Deciding if
this is any less ugly than "kill -9"-ing nsd is I suppose a matter of
taste...

Collapse
Posted by Dan Wickstrom on
If you start aolserver using the -k option, it will terminate the running process before it restarts.  This should also work from inittab.