We have a dotLRN server running on Oracle on RedHat 7.2 that
daemontools is not able to start properly. When putting the server
under supervise the nsd processes are created but no logs are ever
created and the server is not responding. When executing the run
script from the command-line the server comes up nicely. However, we
noticed that when hitting C-c to kill the server one or two nsd
processes would be left around and refuse to die. Sometimes however,
hitting C-C would successfully kill all nsd processes of the server.
Our suspicion is that it is this problem to stop AOLServer that is
confusing Daemontools but there might be other problems involved as
well. To give you some more information, here are the commands I am
using and the output that I get:
command line:
/usr/local/aolserver/bin/nsd-oracle -fzt
/web/dotlrn-demo/dotlrn-demo.tcl -u nsadmin -g web
nsd-oracle script:
#!/bin/sh
#source /etc/shell-mods.sh
export ORACLE_BASE=/ora8/m01/app/oracle
export ORACLE_HOME=$ORACLE_BASE/product/8.1.7
export PATH=$PATH:$ORACLE_HOME/bin:$ORACLE_HOME/ctx/lib:.
export
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$ORACLE_HOME/lib:$ORACLE_HOME/jdbc/lib:$ORACLE_HOME/ctx/lib
export ORACLE_SID=ora8
export ORACLE_TERM=vt100
export NLS_DATE_FORMAT=YYYY-MM-DD
export ORA_NLS33=$ORACLE_HOME/ocommon/nls/admin/data
export NLS_LANG=.UTF8
exec /usr/local/aolserver/bin/nsd $*
After running the command line above and hitting C-c I see:
[22/Jul/2002:13:00:00][20483.3076][-thread3076-] Warning: nsunix:
accept(12) failed: Interrupted system call(4)
[22/Jul/2002:13:00:00][20483.3076][-thread3076-] Error: nsunix:
drvaccept: accept returned error: Interrupted system call(4)
[22/Jul/2002:13:00:00][20483.3076][-thread3076-] Error: nsunix:
DriverContext is:
location: http://www.collaboraid.biz
host: www.collaboraid.biz
port:80
udsFilename:modules/nsunix/dotlrn-demo.nsunix
listenSocket: 12
: lock:138326456
[22/Jul/2002:13:00:00][20483.3076][-thread3076-] Error: drv: driver
'nsunix' failed: error -1
[22/Jul/2002:13:00:00][20483.1024][-main-] Notice: nsmain:
AOLserver/3.3.1+ad13 stopping
[22/Jul/2002:13:00:00][20483.1024][-main-] Notice: nssock: triggering
shutdown
[22/Jul/2002:13:00:00][20483.1024][-main-] Notice: serv: stopping
connection threads
[22/Jul/2002:13:00:00][20483.4101][-nssock-] Notice: exiting
[22/Jul/2002:13:00:00][20483.1024][-main-] Notice: serv: connection
threads stopped
[22/Jul/2002:13:00:00][20483.1024][-main-] Notice: sched: shutdown pending
[22/Jul/2002:13:00:00][20483.2051][-sched-] Notice: Running scheduled
proc search_indexer...
[22/Jul/2002:13:00:00][20483.2051][-sched-] Notice: Done running
scheduled proc search_indexer.
[22/Jul/2002:13:00:00][20483.2051][-sched-] Notice: sched: shutdown
started
[22/Jul/2002:13:00:00][20483.2051][-sched-] Notice: sched: shutdown
complete
[22/Jul/2002:13:00:00][20483.8196][-shutdown-] Notice: nslog: closing
'/web/dotlrn-demo/log/access.log'
[22/Jul/2002:13:00:00][20483.8196][-shutdown-] Notice: nssock:
shutdown complete
[22/Jul/2002:13:00:00][20483.1024][-main-] Notice: nsmain:
AOLserver/3.3.1+ad13 exiting
and there are no processes left. However, sometimes when hitting C-c I
get only:
[22/Jul/2002:13:04:24][20625.2051][-sched-] Notice: Done running
scheduled proc search_indexer.
[22/Jul/2002:13:04:35][20625.1024][-main-] Notice: nsmain:
AOLserver/3.3.1+ad13 stopping
[22/Jul/2002:13:04:35][20625.1024][-main-] Notice: nssock: triggering
shutdown
and there will be two processes left:
nsadmin 20631 0.0 2.4 30488 25444 pts/5 S 13:03 0:00
/usr/local/aolserver/bin/nsd -fzt /web/dotlrn-demo/dotlrn-demo.tcl -u
nsadmi...
nsadmin 20632 0.4 2.4 30488 25444 pts/5 S 13:03 0:00
/usr/local/aolserver/bin/nsd -fzt /web/dotlrn-demo/dotlrn-demo.tcl -u
nsadmi...
Those processes need to be killed with -9. At yet other times I will get:
[22/Jul/2002:13:07:11][20654.2051][-sched-] Notice: Done running
scheduled proc search_indexer
[22/Jul/2002:13:07:21][20654.1024][-main-] Notice: nsmain:
AOLserver/3.3.1+ad13 stopping
[22/Jul/2002:13:07:21][20654.1024][-main-] Notice: nssock: triggering
shutdown
[22/Jul/2002:13:07:21][20654.1024][-main-] Notice: serv: stopping
connection threads
[22/Jul/2002:13:07:21][20654.1024][-main-] Notice: serv: connection
threads stopped
[22/Jul/2002:13:07:21][20654.1024][-main-] Notice: sched: shutdown pending
[22/Jul/2002:13:07:21][20654.2051][-sched-] Notice: sched: shutdown
started
[22/Jul/2002:13:07:21][20654.2051][-sched-] Notice: sched: shutdown
complete
and there is one process left.
Has anybody else had similar problems. I noticed Dave Bowers and Simon
Millwards posts ( see
https://openacs.org/bboard/q-and-a-fetch-msg.tcl?msg_id=0003RT&topic_id=11&topic=OpenACS
) mentioning that they sometimes had one nsd process left that refused
to die.