Forum OpenACS Development: Re: Not handled by server cluster procs?
That is interesting, I thought arsDigita had a way of syncing nodes. Some time back Jim Davidson mentioned that AOL used something called an NV: network variable. I've asked many times how this was implimented, but never got a response. I think I can take a guess at how it would be done, athought I don't know how efficient it would be.
First the NV would be pre-declaired as such. After that its value would be maintained on a separate 'parent' node. On the parent node, this would just be an nsv array so the threads of the parent would have serialized access. Each child node (your frontend cluster servers) would maintain a pool of tcp connections to the parent, the pool access controlled by a mutex/condition variable. The interface would be the same as to nsv arrays.
I just finished writing code for a generalized pooled query API, which would work for the client side, leaving the communication protocol, on the server, plus the client code for the query.
It should be interesting to see what the performance looks like. (Obviously it's going to be much worse than nsv using process-local memory, but probably plenty good enough for many or most applications.) I wonder if UDP might work better than TCP; but then there's no UDP access from Tcl, so certainly ignore that idea for now. You probably need to worry about handling broken tcp connections in the pool - Cleverly's Burrow could be helpful for that.
The enhanced nsv/tsv provided by the Tcl Threads Extension might be of particular value, as in some cases it's more powerful API let's you do more work in a single atomic call, which means only one round trip to the server rather than several. Also, in order for NetV to work the same as NSV, you need NetMutex too, otherwise it's impossible for the client to properly serialize access when doing complex operations on a Nsv. That's also a good reason to use the Tcl Threads Extension tsv/nsv rather than the legacy AOLserver nsv implementation, as it's enhanced API should let you avoid explicit use of mutexes much more often.
You may also want some sort of notification from the server to the client, saying, "Oh shit, I just crashed and restarted!". Edge cases like that are never going to work quite the same way as local Nsvs, but that should be ok.
This NetV stuff is all basically a re-invention of the Linda/Tuplespace stuff of course. That Wiki page also mentions that Ruby ships with a standard tuplespace implementation, so it might be worth looking there briefly for any lessons learned, performance hacks, or that sort of thing.
If you ever had a site making really big use of NetV (like AOL Digital City, say), I wonder whether it would pay to have NetV do its message passing over the network via MPI rather than straight TCP sockets. As long as the MPI is still talking over TCP/IP underneath, my guess is no, as the NetV communication patterns are very simple. But if you had an alternate low-latency ethernet driver (no TCP/IP, OS bypass, etc.), MPI might be the easiest way to interface to it (because someone else will have already done the work). And of course, if for some bizarre reason you actually had an SCI interconnect between your boxes, like you'd need for Clusgres, you could use the real SCI shared memory for NetV too. (All very very blue sky, this whole last paragraph.)
Actually, Mnesia probably does support the the necessary concurrency, as well as other neat featuers. But I haven't really looked into it, it's all in Erlang, and AOLserver would need to talk to it via either SNMP or CORBA. Seems strange, but it looks like Erlang support client-side but not server-side ODBC. Erlang can use ODBC to talk to some external SQL database (Oracle or whatever), but a Tcl or C client cannot use ODBC to talk to Erang's Mnesia RDBMS. Ick.
Andrew, I have a module already working for this type of application. Essentially is is an abstraction layer which looks much like the ns_db sp_* api, and has many of the same goals of the OpenACS db_* api, although more general, and much simplified. Right now I have it working with odbc and postgresql drivers, these are called providers. I just finished removing all the ns_db calls into callbacks, so you can easily create a new provider with your own callbacks in this case.
The mutex/condition variables are used to protect a pool of connections/handles so you don't need to reconnect for each request. Also, you can reuse a connection for the life of your ns_conn without re-using the mutex/condition. It has a built in explicit transaction start/commit/rollback set of callbacks as well which could be used to issue special commands, such as locking a mutex on the remote server. That might be a little tricky in the case of a misbehaving or crashed client. Also transactions are handled without the need for catch/rollback. You could use these if you want, but when you start a transaction, the initializing routine registers a callback (once per ns_conn) to rollback any outstanding and uncommitted transactions upon connection/scheduled proc close, using ns_atclose. Ns_atclose also returns the token to the pool and signals any waiting threads that a token is now available. All this is created automatically when you create a new datasource and add a list of connection handles. Your callback code would need to translate this handle into an actual connection to use, for instance the callback code for odbc/postgresql is simply: 'return [ns_db gethandle $pool]'.
As far as maintaining the connections, I haven't yet looked at burrow. But I remember setting up something like this when I wrote a jabber module. It would vigorously maintain a connection with the jabber server.
Too bad we don't yet have udp in AOLserver. The changes to ns_sock are almost zero, but the problem is you need configuration changes so that a particular sock knows if it is udp or tcp. But the application doesn't need to know anything. With udp you could efficiently support server push. It might be worth writing a simple udp echo server. Whenever a change is made to an NV, the change is echoed back (to all connected nodes), which record the change. Using udp would probably complicate some things on both ends. How do you know when the server is down? Also you would probably need a timeout on every NV, where it would become stale and need to be rechecked. But this should be perfect for session management, with the session manager sitting on your database server and providing simple answers that might initially be stored in a database, but live in memory so multiple servers can track a single user session.
I'm curious whether AOL has ever had their NV (NetV) became a performance bottleneck. Even using TCP/IP, it is probably faster than querying an RDBMS, although lots slower than using a local NSV in shared memory. Since they are probably using it in a query-like rather than nsv-in-an-inner-loop like fashion, my guess is it's probably fast enough.
On the other hand, if you were really hitting NetV with lots of small requests, the TCP/IP latency might bite you. In that case, GAMMA might be the answer. It supports very low latency reliable MPI over cheap Intel PRO/1000 Gigabit Ethernet cards.
That seems like one of those atypical edge cases where "enterprise" style server computing and high performance cluster computing overlap. However, offhand I can't think of any web-oriented application at all that is likely to be that latency sensitive. I'm curious whether anyone has seen one.