Forum OpenACS Q&A: Rebooting a system using daemontools and Oracle

There has always been a problem with rebooting a system with a running Oracle-based ACS site;  the database doesn't shut down because the web server user is still logged in.  I have traditionally solved this problem by starting nsd from inittab and having it active only at run levels 3 and 4;  Oracle shuts down at level 2, so the web server user is already gone at that point and things can proceed smoothly.

This doesn't seem to be working properly on a system that uses daemontools intead of inittab.  I don't know exactly what the problem is, as I haven't seen it for myself yet, but I'm guessing that nsd is still running when Oracle tries to shut down.

Has anyone else had this problem, and if so, did you figure out a solution?  I'm assuming I can't just mess with the levels that svscanboot runs at, since DJB's software is not usually intended to be user-tweaked. :)  I've looked at the daemontools docs but didn't see anything about specifying run levels there.

Collapse
Posted by Guan Yang on
How about a script that runs at level 3 kill, loops through /service/* and runs
for I in /service/*
do
  svc -d $I
  svc -k $I
  svc -x $I
done
Collapse
Posted by Tom Jackson on

Janine, you need to kill stuff running in /service. One failing of daemontools is shutting down stuff in order, and starting up in order. I have an init.d script for this situation:


#! /bin/sh

# chkconfig: 345 99 01
# description: svc start and kill all

# This is an example of a start/stop script for SysV-style init, such
# as is used on Linux systems.  You should edit some of the variables
# and maybe the 'echo' commands.
#
# add to starup via:
# chkconfig --add svc
#


touch /tmp/running_svc

# The path that is to be used for the script
PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin

# What to use to start up the postmaster
DAEMON="/usr/local/bin/svc"


# Only start if we can find svc.
test -f $DAEMON || exit 0

# Parse command line parameters.
case $1 in
   start)
         echo "Starting All SVC Daemons:"
         $DAEMON -u /service/*
         touch /var/lock/subsys/svc
         echo "ok"
         ;;
   stop)
         echo -n "Stopping All SVC Daemons: "
         svc -d /service/*
         rm -f /var/lock/subsys/svc
         echo "ok"
         ;;
   restart)
         echo -n "Restarting All SVC Daemons: "
         $0 stop
         $0 start
         echo "ok"
         ;;
   status)
         svstat /service/*
         ;;
   *)
         # Print help
         echo "Usage: $0 {start|stop|restart|status}" 1>&2
         exit 1
         ;;
esac

exit 0

It's heavy handed I know, you could adjust as needed, and maybe move the touched lock file to a better location.

Collapse
Posted by Tom Jackson on

Oops, also note that for stop you might want to do a -k and then a -d, AOLserver sometimes doesn't stop on -d alone. (or better do -d and then -k).

Collapse
Posted by Tom Jackson on

Shoot, I should read my own script. Just remove the line:

touch /tmp/running_svc

That is a debug command I stuck in there and forgot to remove. The lock file is where it is supposed to be.

Collapse
Posted by Dirk Gomez on
What is the problem if you "crash" Oracle? It'll do instance recovery upon startup and granted you setup your system properly that should work smoothly.

You could also put shutdown immediate or shutdown abort into your dbstart script. However the init process will probably not wait until a shutdown immediate goes thru on a busy database, you may give that a shot tho.

There are a bunch of discussions about the pros and cons of shutdown abort/immediate in the Oracle newsgroup.

Collapse
Posted by Janine Ohmer on
Thanks, Tom and Guan.  I guess I figured things would be more elegant under daemontools, but wth.  This looks like a good solution.

Dirk, if I had to I would go ahead and crash Oracle, but since I don't have to, I'd rather not take the risk.  Or, to be precise, the Oracle DBA I'm working with on this would not like to take the risk;  he's fairly conservative, as such folks tend to be.  It's his job. :)

Collapse
Posted by Dirk Gomez on
Well, databases take a lot of overhead to secure data against crashes, so what I said is much less outrageous than it may have seemed.

Anyway - run this command: less `which dbshut`

And then append immediate to every shutdown command and test the script by shutting down Oracle with a running AOLServer/SQLPlus session. If it works, everything will be fine and dandy, this is the script called by the Oracle rc script.

(You need to fiddle a bit with the spfiles when using Oracle 9)

Collapse
Posted by Dirk Gomez on
As to what shutdown immediate|abort|transactional do, look here: http://download-west.oracle.com/docs/cd/A87860_01/doc/server.817/a76956/start.htm#6370

And immediate may take a lot of time - depending on how long the longest rollback will take. Someone on the Oracle newsgroup posted that he once waited 6 hours.

Collapse
Posted by Andrew Piskorski on
Making this stuff work correctly is not that difficult. See the April 2003 "Oracle /etc/init.d scripts should use shutdown immediate" thread for more info, bugfixes to the Oracle 8.1.7 dbshut script, etc.

Dirk has a good point about the 6 hour rollback, but if you should probably never have huge massive update transactions like that going during normal operations for any OLTP-style application like OpenACS anyway. If you do have huge data warehouse style updates going on somewhere, you'll probably want that stuff in a different Oracle instance anyway. AKA, the possibility of a 6 hour Oracle rollback is indeed very real and worth thinking about, but I don't think it would ever happen during normal operations of any stock OpenACS system.

Collapse
Posted by Janine Ohmer on
FYI for anyone else working on this:

Tom's script didn't stop qmail.  I don't understand quite why, it could be a problem with our installation, but I'm not going to sweat it.  Adding /var/qmail/bin/qmailctl stop|start to the appropriate sections solved the problem.

Collapse
Posted by Tom Jackson on

Janine, that is interesting, don't know why 'svc -d /service/*' wouldn't stop every service, but qmail, should be stopped by using its own init.d script, since they might have their own order of startup/shutdown. My script is very crude, and will startup even services which have a 'down' file in their run directory, meaning they shouldn't be started on boot!