Forum OpenACS Development: Re: High Availability Configurations

Collapse
Posted by Andrew Piskorski on
Joel, you're talking about making sure AOLserver access and error log names don't collide (which is quite trivial to do), but normally any real "hot spare" is going to be an entirely separate Linux box anyway, where the possibility of access log collision normally doesn't even arise at all. So that seems kind of confusing.

Load balancing (high performance) and HA (high availability) interact but really aren't the same thing at all. By "load-balancing" people generally mean using multiple front-end web server boxes all talking to one big honking RDBMS box. If you're concerned about HA, then your number one concern is, "How do I make sure that the one RDBMS doesn't go down, and what happens when it does??". Which means you've got to decide whether you're going to lose all data back to your last nightly backup or dump, or if you're going to sign up for making sure you never lose a committed transaction, and how certain you need to be that you really really never lose a committed transaction.

Depending on your uptime requirements, never losing a committed transactons means looking closely into things like where (and in how many redundant places) to put Oracle's archived transaction logs, storage area networks or other ways for multiple Oracle instances to read the same physical database files, Master-Slave databases with failover, stuff like that. And remember that if you plan to restore from backup, how long that restore takes could be a real problem too.

In all cases, whether you're concerned with high availability (HA), high performance (HP), or both, the RDBMS is typically the most complicated and thus most difficult part. AFAIK there aren't any out-of-the-box solutions to any of that.

Note that PostgreSQL currently has fewer features than Oracle for this sort of stuff (e.g., no archived transaction logs and thus no "point in time recovery"), but (unsurprisingly, being open source), has more flexibility and variety of possible tools and solutions that might be useful in the future, more opportunity to roll your own. The Oracle stuff isn't necessarily too friendly even if it does work though (e.g., archive log mode is instance wide, no way to turn it on/off with any finer granularity than that).

Regarding scheduled maintenance, any real site should have a simple "down for maintenace, come back at time XYZ" tool no matter what. There will always be some upgrade that needs it, no matter what other fancy uptime features you have.

Making the site work properly in a read-only limited functionality mode during upgrades or whatever is a nice feature, but that's real development work and is probably quite site-specific in many cases. Probably nobody's going to do that unless it's a real business requirement for their site, not just a "Oh, that would be nice to have" feature. I'd be curious to know if anyone's done it in practice. The business case for some sites always the luxury of scheduled downtime during certain non-business hours - if you can get that, grab it!

On front-end load balancers, something functionally like the Big IP router (as opposed to round-robin DNS or whatever) is the way to go, but I've been told that underneath, the Big IP is basically just standard PC hardware plus proprietary custom software. A Linux box with the right software should be able to do the same thing, and generally would be better. (E.g., back at aD, I remember people complaining that the stupid ad-hoc configuration language to tell the Big IP what requests to forward where didn't let them do what they wanted. An open source solution wouldn't have that problem.)

I'm not familiar with software to turn a Linux box into a big-IP-like front-end load balancing router, though. Presumably it is out there in some fashion. I too would like to hear what others have done there.