Forum OpenACS Q&A: What makes a network efficient AND robust?

I have been asked to help build a high performance communications
layer. High performance apparently implies lots of bandwidth used
amongst numerous network transactions, AND little latency from end to
end. I want to build this of course with my favorite swiss army
knife, AOLserver.

There's another requirement, but I don't have a name for it, and I'm
hoping one of you folks can educate me....

Basically the customer wants a network layer in which he could press
the "enter" key on the client, and moments later have the client
undergo complete destruction, or have any intermediaries undergo
power failure or software crash. Nevertheless, the customer wants the
system as a whole to not lose any data.

That makes me think of databases and ACID, especially the Atomic and
Durable attributes of ACIDity. It makes me think of file systems and
journaling. And it makes me think of two phase commits, in which
remote databases can coordinate so as to ensure that each can be
updated simultaneously to insure database integrity and integrity of
the greater application.

What sort of capabilities should the network have?

I see how two phase commits could be done over a naked TCP/IP
network, where the upper layers detect network failures and handle
them appropriately. But I wonder if a faster, lower latency, more
efficient approach wouldn't relay on a special abilities of a layer
on top of TCP/IP.

What am I looking for? Can you point to a good book discussing the
building of high performance networks?

Thank you,

Jerry, first of all, I don't know. But I do have a few thoughts.

First of all, why do you consider this thing they want you to help build to be a "network layer"?

It sounds like what you need is some kind of ACID-compliant RPC interface. Maybe you can call that a "network layer", but it sounds more like an application protocol to me.

Gray and Reuter's book Transaction Processing (TP) talks a fair amount about distributed transactions and transaction monitors.

Unfortunatley, TP can be kind of hard to understand, because all its examples discuss really old mainframe software like IMS (a hierarchal database from IBM), CICS (a transaction processing monitor from IBM), and LU6.2 (an IBM de facto standard which "defines a protocol to invoke remote transactional servers"). I'd never even heard of most of those before reading the book. We are talking some truly ancient software here. For example:

IMS is the dominant database and data communications system in use today [1992] - it has most of the data. It began in the late 1960s as an inventory tracking system for the U.S. moon landing effort.   [...]   Today, the typical IMS system has thousands of terminals distributed around the world, driving a large multiprocessor system with a disk-based database in the 100 GB range. There are only about 10,000 such systems, but they form the core transactions processsing systems of virtually all large corporations.

Links in TP back to the familiar RDBMS-oriented world of Oracle and PostgreSQL are slim to nonexistant. For example, DB2 is mentioned only in the context of IMS ("DB2 is IBM's implementation of SQL on its mainframes. It acts as a resource manager for either IMS, CICS, or porgrams running without any transaction monitor.") And Oracle is mentioned only in a single sentence in, in the discussion of sequence numbers.

Interestingly, although this has nothing to do with the topic at hand, TP also notes that Optimistic Locking via Timestamps - which I think is what Oracle and PostgreSQL do - is merely a degenerate and inferior version of "Field Calls":

One of the intersting things about locking exotics is the vast literature. It is possible for experts to be completely ignorant of some major branch of the field. Optimistic schemes are an example. They were invented five years after field calls were being sold on the street and, in fact, are less general than field calls. Optimistic methods differ from field calls in only the following:   [...]   These are rather degenerate predicates and transforms.

Anyway, there must be other ways to fulfill your requirement, but I think what you probably want is a Transactional (ACID compliant) Remote Procedure Call interface. What all the (potentially scary) design implications of that are, or what the proper way to go about building such a thing is, I have no idea...

Collapse
Posted by Jerry Asher on
Thanks for the discussion Andrew.

I call it a network layer in part because that was how it was described to me, and in part because I know the client wants to make this part of their platform, and enable others to build their applications on top of it.

Right now I am in the stage where I ask around and see what others think: is there really an issue here?  What is the generally known name for that problem?  What have others done in this area?  That's my attempt to eliminate the ignorant-experts-reinvent-the-wheel and related ignorant-experts-fight-and-churn problem.

Transactional RPC (or google tells me, TxRPC!) that's cool, I'll take a look at that and see what that brings me to.  (Do you know of any open source high performance 5 nine layers?)

Collapse
Posted by David Walker on
Basically the customer wants a network layer in which he could press the "enter" key on the client, and moments later have the client undergo complete destruction, or have any intermediaries undergo power failure or software crash. Nevertheless, the customer wants the system as a whole to not lose any data.

There's always going to be some type of time element, whether it is microseconds or minutes from the time the enter key is pressed until the transaction is in a secure state. All you can do is make sure you don't get any half-transactions in there and that the time period is as small as possible.

How would a 2 phase commit system handle a failure in the network layer?

Say you're commiting to database servers db1 and db2 from client1 and client2 (it doesn't have to be a database server but that's the easiest way for me to tell it). A failure in the network layer causes a split between area1 and area2 so that client1 can only see db1 and client2 can only see db2. Then the split heals but db1 and db2 are no longer synced and each has commits that the other doesn't.
(Syncing file systems could have similar problems. Say 2 people edit the same file and save it during the split. Which one wins?)

Our approach has just been a single powerful machine with raid and that kind of redundancies. I can't tell you if it's a good or bad solution as that machine and that network haven't had much stress.
Collapse
Posted by Jun Yamog on
Hi Jerry,

Not sure I understand your question correctly.  Anyway there are several layers in the network.  I think what you are dicussing here is more on the top level around the application level.

I think if you can get the most out of your network by starting on the lower levels.  Physical that it is.

Fiber Optics now carries the fastest network transmission rates.  There are also switches that offer redundant links and the like.  Your choice of equipment should be a good factor which why Juniper is up there rather than Cisco on the biggest routers in the net.

Basically I think each layer has to be optimized and performance is done more on the lower layers of network and fault tolerance on the higher levels.  Just like TCP/IP.  IP is think on layer 3 and TCP is on layer 4.  IP does not have any fault tolenrance (ACK thingies).  TCP layer does the retransmission of packets in case of failure. Of course there is some fault tolerance already in Ethernet which is down on layer 2.

Sorry if this does not answer your question.  Or maybe my view is a little different since I come from a Network Admin background.

Collapse
Posted by Tom Jackson on

http://www.cs.cornell.edu/Info/Projects/Ensemble/faq.html talks about Ensemble: a replacement for most of the TCP stack with another reliable protocol. It uses UDP/IP for transmission. This probably isn't what you are looking for, but just popped into mind.

Collapse
Posted by Stefan Larsson on

I read this question, plus the "Five 9s reliability, how would you do it?" thread, and just before that looked at JavaGroups - A Reliable Multicast Communication Toolkit for Java available at http://javagroups.sourceforge.net/

To me JavaGroups sounds pretty much like what you're looking for, it may not be AOLServer but it claims to give you features like atomicity when sending messages to multiple recipients. If you want an application server built on top of it, check out JBoss 3.0 (currently in alpha though, but they use JavaGroups for their clustering effort...)

Collapse
Posted by Jerry Asher on
Thanks for the pointer, I will take a look at that.