Joel, you're talking about making sure AOLserver access and error log
names don't collide (which is quite trivial to do), but normally any
real "hot spare" is going to be an entirely separate Linux box anyway,
where the possibility of access log collision normally doesn't even
arise at all. So that seems kind of confusing.
Load balancing (high performance) and HA (high availability) interact
but really aren't the same thing at all. By "load-balancing" people
generally mean using multiple front-end web server boxes all talking
to one big honking RDBMS box. If you're concerned about HA,
then your number one concern is, "How do I make sure that the one
RDBMS doesn't go down, and what happens when it does??". Which means
you've got to decide whether you're going to lose all data back to
your last nightly backup or dump, or if you're going to sign up for
making sure you never lose a committed transaction, and how
certain you need to be that you really really never lose a
committed transaction.
Depending on your uptime requirements, never losing a committed
transactons means looking closely into things like where (and in how
many redundant places) to put Oracle's archived transaction logs,
storage area networks or other ways for multiple Oracle instances to
read the same physical database files, Master-Slave databases with
failover, stuff like that. And remember that if you plan to restore
from backup, how long that restore takes could be a real
problem too.
In all cases, whether you're concerned with high availability (HA),
high performance (HP), or both, the RDBMS is typically the most
complicated and thus most difficult part. AFAIK there aren't any
out-of-the-box solutions to any of that.
Note that PostgreSQL currently has fewer features than Oracle for this
sort of stuff (e.g., no archived transaction logs and thus no "point
in time recovery"), but (unsurprisingly, being open source), has more
flexibility and variety of possible tools and solutions that might be
useful in the future, more opportunity to roll your own. The Oracle
stuff isn't necessarily too friendly even if it does work though
(e.g., archive log mode is instance wide, no way to turn it on/off
with any finer granularity than that).
Regarding scheduled maintenance, any real site should have a simple
"down for maintenace, come back at time XYZ" tool no matter what.
There will always be some upgrade that needs it, no matter what other
fancy uptime features you have.
Making the site work properly in a read-only limited functionality
mode during upgrades or whatever is a nice feature, but that's real
development work and is probably quite site-specific in many cases.
Probably nobody's going to do that unless it's a real business
requirement for their site, not just a "Oh, that would be nice to
have" feature. I'd be curious to know if anyone's done it in
practice. The business case for some sites always the luxury of
scheduled downtime during certain non-business hours - if you can get
that, grab it!
On front-end load balancers, something functionally like the Big IP
router (as opposed to round-robin DNS or whatever) is the way to go,
but I've been told that underneath, the Big IP is basically just
standard PC hardware plus proprietary custom software. A Linux box
with the right software should be able to do the same thing, and
generally would be better. (E.g., back at aD, I remember
people complaining that the stupid ad-hoc configuration language to
tell the Big IP what requests to forward where didn't let them do what
they wanted. An open source solution wouldn't have that problem.)
I'm not familiar with software to turn a Linux box into a big-IP-like
front-end load balancing router, though. Presumably it is out there
in some fashion. I too would like to hear what others have done
there.