Forum OpenACS Q&A: Re: High availability of .LRN under Linux

Collapse
Posted by Andrew Piskorski on
There is no such thing as a 100% available system, and any goal of trying to achieve that is not a real goal. Serious High Availabiltiy people talk about 99.9999% uptime, etc. But without knowing more my best guess is that you don't need anything like that at all, because very few people do.

Just how many users are you planning to serve with that dotLRN instance? How many of them using the site each day? How much does downtime hurt your users? And how important, critical, or irreplacable is the content each user contributes each day? You have to start with those questions first.

Last I heard, PostgreSQL has several asynchronous replication tools that meet some people's needs, and a better asynchronous replication tool (Slony) on the way. It currently does not have anything equivalent to Oracle's Archivelog Mode at all, but the Point in Time Recovery work currently underway should allow building something like it, and there seems to be interest in doing so.

Note that with Oracle properly set up in Archivelog mode, you can, at least in theory, guarantee that you will never lose a committed transaction. With PostgreSQL that is currently not possible. But in reality, how many PostgreSQL or Oracle installations does that really matter to, in practice? Probably not very many. (But the Archivelog feature certainly is nice to have.)

Ask yourself how much data you can afford to lose, and how much time you can afford to lose.

On the data, ask yourself: "Worst case, can I afford to lose all data in my database since my last PostgreSQL backup (typically 1 day's worth)?" If yes, great, just backup nightly like most all OpenACS users do.

On the other hand, if your answer is no, but your answer to the question, "Worst case, can I affored to lose some transactions, maybe a few minutes or a few hours worth?" is yes, then investigate the various current PostgreSQL asynchronous replication tool.

But in the unlikely event that your answer really is, "No, I can't afford to ever lose a transaction, no matter what.", then you should not be using PostgreSQL. You need to use Oracle (or something equivalent), and you need to put a whole lot more time, money, and thought into the problem.

In all cases, make sure you have good backups, and a good procedure for bringing up a new website using them, preferably on a spare machine. If you can't afford to be down for very long, there are various Linux High Availability tools that can help you fail over to a backup server, etc. Googling should find them. So far I havn't heard of anyone at all doing that sort of stuff with OpenACS, but it should be feasible.