Forum OpenACS Q&A: Re: organize openacs.org!
Andrew: yes, I've had the opportunity to work on some of the old Arsdigita machines and know what you mean. And I know all too well the type of sysadmin you're talking about--we call that sort of sysadmin the BOFH (do a google search on BOFH to read all about it).
For my own sysadmin style I like to follow the Principle of Least Astonishment. That is I like to put things and run them in a manner which is least likely to confuse someone else that may need to sysadmin the box. This means that qmail and daemontools are installed in the default locations, even though I find DJB's directory structure to be highly bizarre. Normally I install as much as I can with rpm and leave things in their default installation directories. Services should all be setup so they're in chkconfig and controlled by the service scripts.
Generally I don't document much because you can query rpm to find out what was done, when, and where it is. And with the Red Hat Network and up2date you get Debian apt-get like functionality (i.e. "up2date postgres" will fetch the latest postgres and install it and any dependencies it needs).
Once the machine is setup and stable I don't expect to need much in the way of general sysadmining beyond occasional updates to various packages. I get daily reports on bandwidth, uptime, disk usage, and log file activity so I keep up on any problems that might develop.
Whatever Red Hat version you folks want is fine with me. Most of our servers are on either 7.2 or 7.3. But I have several on 8.0 and one machine on 9. It's probably best to stick with 7.3 or 8.0 for the OpenACS box to minimize risk.
I normally give warnings if extended downtime is expected, but I had hoped the upgrade for openacs.org would be a quick reboot and that'd be it. Besides, it's either Saturday night or Sunday depending on your location and you folks are all suppose to have lives and be away from the computer. ;) It's us sysadmin folks that get stuck working weekends.
I normally keep one spare server-class machine that can assume the identity of any of our production servers. My spare server at the moment is a Dell 2450 with a 733-MHz CPU, 1-GB RAM, and 50-GB of RAID 5 disk and Red Hat 8.
We could move openacs.org over to this box. However, Ben donated the current server to the community and I'd like to keep openacs.org on it if for no other reason than the box isn't furfly property and could be moved elsewhere if the community wanted to. We could move openacs.org to the spare server and move it back to the upgraded box but that doubles the work.
Backup recovery for the machine depends on what happens. Full filesystem backups are done nightly. If a machine has a massive hardware failure I just move the site to the spare server using the latest backups (from an online 1-TB disk array -- tapes suck). If a disk fails I have spares in the cage at Exodus and they can be quickly swapped in and the RAID array rebuilt on the fly. The Sun world spends a lot of time talking about bare-metal recovery--for Linux I find it quicker to just reload the whole OS from CD and look at it as an opportunity for an OS upgrade.