Forum OpenACS Q&A: Daemontools respawning AOLserver

Posted by Titi Ala'ilima on
I'm running nsd from daemontools, my /service/demo/run file contains:

exec /usr/local/aolserver33/bin/nsd-oracle -it /usr/local/aolserver33/demo.tcl -u nsadmin -g nsadmin
but the process keeps going up and coming back down immediately. Nothing shows up under readproctitle and all the error log tells me is:
[22/Jan/2003:18:16:50][23266.1024][-main-] Notice: serv: waiting for warmup
[22/Jan/2003:18:16:52][23266.1024][-main-] Notice: serv: warmed up
[22/Jan/2003:18:16:52][23266.1024][-main-] Notice: socks: idle
[22/Jan/2003:18:16:52][23266.1024][-main-] Notice: sched: idle
[22/Jan/2003:18:16:52][23266.1024][-main-] Notice: binder: listen(,80) = 12
[22/Jan/2003:18:16:52][23266.1024][-main-] Notice: nssock: listening on
[22/Jan/2003:18:16:52][23266.13326][-nssock-] Notice: nssock: starting
[22/Jan/2003:18:16:52][23266.13326][-nssock-] Notice: nssock: accepting connections
[22/Jan/2003:18:16:52][23266.1024][-main-] Notice: nsmain: AOLserver/3.3.1+ad13 stopping
[22/Jan/2003:18:16:52][23266.1024][-main-] Notice: nssock: triggering shutdown
[22/Jan/2003:18:16:52][23266.1024][-main-] Notice: serv: stopping connection threads
[22/Jan/2003:18:16:52][23266.13326][-nssock-] Notice: exiting
[22/Jan/2003:18:16:53][23266.1024][-main-] Notice: serv: connection threads stopped
[22/Jan/2003:18:16:53][23266.1024][-main-] Notice: sched: shutdown pending
[22/Jan/2003:18:16:53][23266.2051][-sched-] Notice: sched: shutdown started
[22/Jan/2003:18:16:53][23266.2051][-sched-] Notice: sched: shutdown complete
[22/Jan/2003:18:16:53][23266.14339][-shutdown-] Notice: nslog: closing '/usr/local/aolserver33/log/demo.log'
[22/Jan/2003:18:16:53][23266.14339][-shutdown-] Notice: nssock: shutdown complete
[22/Jan/2003:18:16:53][23266.1024][-main-] Notice: nsmain: AOLserver/3.3.1+ad13 exiting
before it starts the whole process all over again.

It runs fine manually in the foreground. I've tried it under svscanboot and Rob Mayoff's rc.svscan, and my qmail processes seem to behave fine both ways, but both ways this nsd won't stay up. I have a virtually identical setup on another server that doesn't seem to have any such troubles.

Any ideas?

Tangentially, anyone using rc.svscan, how are you running it? I'm running it from the inittab, but I'm not sure if it should be set to respawn or not. Right now I have it set to "once".

Posted by Rocael Hernández Rizzardini on
I had the same problem but my problem was at first that I didn't start the DB, and then the respective IP for the different services were not configured in the server. After this everything worked well.
Posted by Tom Jackson on

When you say it starts manually, what do you mean? Change the 'i' to an 'f', and do a ./run. If that works, try leaving the 'f' there and do a:

svc -o /service/yourserver (or whatever works)

If that stays up, kill it using svc -d, and then try the -o option again with the 'i' changed back to an 'f' in your run file. Tell us what happens.

Posted by Titi Ala'ilima on
Not totally sure what to make of it all, but I tried Tom's suggestion and it went up fine, but I had trouble bringing it down. I had to diddle with restart-aolserver to actually kill the nsd. Anyway, I took a peek inside the supervise directory, and thought maybe that got corrupted somehow. So I unlinked it from /service, killed all the nsds, and removed the supervise dir, re-linked it, and everything seems peachy. Here's the before and after of the supervise dir:
total 8
-rw-r-----    1 nsadmin  nsadmin         3 Jan 23 10:48 control
-rw-r-----    1 nsadmin  nsadmin         0 Jan 21 16:36 lock
prw-------    1 root     root            0 Jan 21 16:39 ok
-rw-r--r--    1 root     root           18 Jan 23 10:46 status
total 4
prw-------    1 root     root            0 Jan 23 10:50 control
-rw-------    1 root     root            0 Jan 23 10:49 lock
prw-------    1 root     root            0 Jan 23 10:49 ok
-rw-r--r--    1 root     root           18 Jan 23 10:50 status
Notice the missing p on the "before" control file, as well as the fact that it's owned by nsadmin rather than root. I don't think the chown I did should have affected anything, but somehow the named pipe got turned into a file. Anyone more familiar with the workings of daemontools able to offer any insight?