Forum OpenACS Q&A: Response to What makes a network efficient AND robust?

Posted by David Walker on
Basically the customer wants a network layer in which he could press the "enter" key on the client, and moments later have the client undergo complete destruction, or have any intermediaries undergo power failure or software crash. Nevertheless, the customer wants the system as a whole to not lose any data.

There's always going to be some type of time element, whether it is microseconds or minutes from the time the enter key is pressed until the transaction is in a secure state. All you can do is make sure you don't get any half-transactions in there and that the time period is as small as possible.

How would a 2 phase commit system handle a failure in the network layer?

Say you're commiting to database servers db1 and db2 from client1 and client2 (it doesn't have to be a database server but that's the easiest way for me to tell it). A failure in the network layer causes a split between area1 and area2 so that client1 can only see db1 and client2 can only see db2. Then the split heals but db1 and db2 are no longer synced and each has commits that the other doesn't.
(Syncing file systems could have similar problems. Say 2 people edit the same file and save it during the split. Which one wins?)

Our approach has just been a single powerful machine with raid and that kind of redundancies. I can't tell you if it's a good or bad solution as that machine and that network haven't had much stress.