Forum OpenACS Q&A: Postgres, distributed databases, two phase commits, or other replication strategies....

I'm still tinkering with this five nines project and boy oh boy is
Oracle expensive.  It turns out that ignoring the five nines issues,
our database needs are relatively modest and could almost certainly
be handled by Postgres.  I do have a requirement to create two (or
more) database farms, separated by an ocean, in which the
databases remain synchronized.

There appear to be two strategies that projects take to keep two
remote databases synchronized: a) master/slave and disk replication,
b) active/active architectures using two-phase commits.  Am I right
about this?

In general it seems that in the master/slave relationship, one
database is live, and the other is in a frozen state, where it is just
about to pull the db in off of disk.  It's ready to go at a moment's
notice, but the failover process may be many minutes (10-30) long
depending upon the size of the database.

The active/active architecture is wonderful: you get to utilize both
databases and both sets of hardware, but it puts an enormous
requirement that your network be rock solid or else the active/active
will turn into deadlock/deadlock.  But the active/active requires that
the database support two phase commits.  (Or something else?  If so,
what?)

So looking at google and searching the docs at postgresql.org, I find
just about nothing to indicate that postgres supports two phase
commits.  If it doesn't it cannot be used as a participant in a
distributed database.  Am I wrong about that?

I would like to be wrong about that, boy oh boy is Oracle expensive.

As an alternative some engineers at penguincomputing were
brainstorming about using Linux's network block device as a way to
stitch together a master/slave disk replication system.  I have no
experience with the network block device, but again, googling left me
feeling that it is still not ready for prime time, or rather, it does
not as yet have any robust strategy to deal with network problems.  Is
that true?  Would you bet your data to the network block device?

I've been doing some work using DRBD

http://www.complang.tuwien.ac.at/reisner/drbd/

Which basically gives you a Network Raid 1 (one server primmary r/w and one secondary r/o).  Using heartbeat you should be able to create a h/a type setup on the database.

I currently using on a NFS server pair for my mail systems so and it handles its daily beatings fairly well.

Of course, since you need to fail over, it can take about 30-60 for the fail over to work so you could be done during those times.

Also, these servers are on the same network.

Jerry,

I think you're requirement is typically met by the use of an App Server.  two-phase commit is one of the ways they justify charging $10-40K per CPU.

Have you looked at clustering or synchronizing a distributed caching tier instead of the database?  I'd think that's  much simpler over a WAN.

Also,  you might have more luck poking around theserverside.com.  Again,  the app server folks seem to be solving these problems all day long.

Jerry,

I don't know much about Postgres, but we are using Oracle effectively with EMC disk array to create what they call BCV (Business Control Volume) of PROD i.e. multiple copies of PROD for different purposes like reporting, backup etc on different locations. This is what you call disk replication but how this is affecting the licensing?
We are using a combination of BCV and materialized views to have the updated PROD available almost on real time basis.. but I think you may also want to seriously look at OPS  if you are beyond 8.1.6..

PG doesn't implement two-phased commits, though there's been talk about doing this in order to support true replication throught the write-ahead log.

In the interim, there's a set of master-slave replication scripts that supposedly work fairly well.  They were written by the guy who added write-ahead logging and row-level locking to PG, in other words by someone who knows what they're doing.  However I have no idea if this approach is robust in the face of the slave going down (it may be, I just haven't looked at it).  The basic synchronizing work is open sourced, but the admin tools which lay atop it are sold as a product by PostgreSQL, Inc.  They've promised to open source these as well in a year or so.

It's worth investigating, at least...

This is good information, so thanks for helping out.

drdb may suit our needs, but I wasn't clear before, we believe the nature of the app is such that we need both databases up and running and always synced.  drdb appears to implement the master/standby slave architecture.  We can be down no more than five minutes per year, so yeah, being down for any significant chunk of time is best avoided.

I haven't thought of using a distributed caching tier across a large network.  I wasn't aware that app servers could run across such large networks either, I had the impression they worked within local machines.  Can an app server guarantee ACID semantics if the underlying databases don't support two phase commits?  Also, I hate to say it, but Java provokes a strong allergic reaction from the client paying our bills.  Can you suggest a C/C++ based app server that might be applicable to our needs?

Thanks again,

Several app servers work with C++ in addition to Java.  Persistence is one of them.

I'm not sure about whether they implement two-phase commit ontop of databases which don't support it.  I'd have to defer to an app-server wonk.

There are several companies that are rolling out akamai-like distributed caching specifically for dynamic content.  I thought persistence's Dynamai was one of them...  These seemed like a much cheaper way to maintain uptime than managing a global db cluster (yuck),  but it's possible it's not the fault tolerance you're looking for.

It makes a huge difference in cost if some of the data needs to be continuously available, and it's mostly read-only.  What I'm suggesting wouldn't address your read-write needs.