Forum OpenACS Development: Re: High Availability Configurations

Collapse
Posted by Barry Books on
Almost forgot the most important thing. Monitoring Software.
Collapse
Posted by Joel Aufrecht on
I see several overlapping issues.
  • There's no reason that smaller sites, without $20,000 hardware farms (which are of course themselves only medium, not large), shouldn't have nearly the same uptime. The biggest uptime killer for small sites I've worked on has been software error - specifically, openacs.org upgrade issues and platform upgrade issues. I think that going to an A/B rollover approach for most routine upgrades (A is live; copy A's database; upgrade B; test B; replace A with B) would help a lot, and I'd like to make it part of the standard, documented setup for production sites.

    At this level, the problems I outlined above apply - non-conflicting logs, smooth load transferring across several servers, etc. It's almost all trivial, but if it isn't documented as a standard, then each new admin has to figure it out for themself and each OpenACS install is different. Barry, how did you designate a single machine to run scheduled tasks? Through the kernel parameters via the admin UI?

    Another benefit of using basic HA tools even on a single box is that it make it very easy to start scaling up.

  • The first step up from everything on one box is to get a web server on one box and database on another. This is already documented for PostGreSQL - anybody have similar docs for Oracle?
  • The next step up is multiple web servers, as has been described above. Arrowpoint and BigIP have both been used to load balance OpenACS and I'm experimenting with balance, which appears to offer at least 50% of what BigIP does (no heartbeat or smart load balancing, no SSL-specific stuff, but has failover and session-maintaining load-balancing). (BigIP is now mostly on custom, proprietary hardware, I understand. One reason is to get good performance for encrypted connections.) Is this something Apache, pound, or even AOLserver 4 could also do?
  • The fourth level would be multiple databases; is anybody running OpenACS with multiple database servers? What's the largest PostGreSQL database with acceptable performance?
Monitoring applies across all levels; we have uptime as one tool. Of course inittab/daemontools is a given. Individual server keepalive tests each server for actual responsiveness and kills the OpenACS process if it becomes too unresponsive - crude but effective. Other monitoring tools?