Forum OpenACS Development: Unable to synchronize site nodes on clusters

I am currently trying to resolve an issue where the nsv sets are not synchronized  across the different servers in a cluster. I found util_memoize doing synchronizing across the cluster. I have adopted a similar technique similar to util_memoize for refreshing the nsv_set information of site_nodes across a cluster. That is :
update_cache_local for local updation
and update_cache for cluster-wide updation.

I tried mounting a single acs-subsite package and this works across the cluster. But when I try creating a community in dotlrn this process fails and I am unable to view the newly created community on the peer servers. Most probably the issue that we have is the number of nodes that need to update are more than one. Has anybody encountered this situation before. Or in general is there a better way of handling nsv sets across clusters. I also encountered issues with the ns_httpget timing out if any of the servers was under heavy load and doesnot take requests.

regards
Harish

Collapse
Posted by Barry Books on
I worked on this a long time ago and gave up. I now just go to one server and config everything then just restart the other servers. You can also build a page that will reload the cache on each server.

I think part of the problem is if you have very many nodes rebuilding the cache takes a very long time because site_node.get_url is very slow. This can also cause long startup times. I ended up making a site_node_url table and added a tigger to populate it.

Collapse
Posted by Harish Krishnan on
Configuring on one server and restarting the other servers is not an option for me since my users are allowed to create their own communities and they would want it available immediately.

Currently I am doing it by having a table that keeps track of node_id/peer_ip and a scheduled proc that runs on each server and updates its site node nsv set by looking at the  entries belonging that ip . The entries for each ip gets cleared when the server restarts. But then this is not exactly the best solution because of the latency involved.

I have similar issues with nsv sets being used at many places in the code which donnot get synchronized.

Thanks for the reply.

Collapse
Posted by Andrew Piskorski on
Do the OpenACS server cluster procs not do what you need? If not, can they be extended to do what you need?
Collapse
Posted by Harish Krishnan on
Currently they don't. I have added another proc to the set "synchronize_cluster_nodes" which tries to synchronize the nsv set information of the site_nodes across the cluster. But the way I have currently implemented it is not efficient enough as well as not "the" solution but a compromise for time. I am currently exploring possiblities to achieve synchronization via request-processor .. i.e when a user requests for a page(which is available on a new node) , no matter which server picks up the request it will check with the db in the case of failure in finding the node and only then give the user the "no-page-found" error. But this is turning out to be tricky.
Collapse
Posted by Tom Jackson on

That is interesting, I thought arsDigita had a way of syncing nodes. Some time back Jim Davidson mentioned that AOL used something called an NV: network variable. I've asked many times how this was implimented, but never got a response. I think I can take a guess at how it would be done, athought I don't know how efficient it would be.

First the NV would be pre-declaired as such. After that its value would be maintained on a separate 'parent' node. On the parent node, this would just be an nsv array so the threads of the parent would have serialized access. Each child node (your frontend cluster servers) would maintain a pool of tcp connections to the parent, the pool access controlled by a mutex/condition variable. The interface would be the same as to nsv arrays.

I just finished writing code for a generalized pooled query API, which would work for the client side, leaving the communication protocol, on the server, plus the client code for the query.

Collapse
Posted by Andrew Piskorski on
Tom, your NV stuff is interesting. (I think it should be called NetV though, to eliminate any confusion with NSV.) I don't understand the bit about the tcp pool on the client side having a mutex around it though, what for?

It should be interesting to see what the performance looks like. (Obviously it's going to be much worse than nsv using process-local memory, but probably plenty good enough for many or most applications.) I wonder if UDP might work better than TCP; but then there's no UDP access from Tcl, so certainly ignore that idea for now. You probably need to worry about handling broken tcp connections in the pool - Cleverly's Burrow could be helpful for that.

The enhanced nsv/tsv provided by the Tcl Threads Extension might be of particular value, as in some cases it's more powerful API let's you do more work in a single atomic call, which means only one round trip to the server rather than several. Also, in order for NetV to work the same as NSV, you need NetMutex too, otherwise it's impossible for the client to properly serialize access when doing complex operations on a Nsv. That's also a good reason to use the Tcl Threads Extension tsv/nsv rather than the legacy AOLserver nsv implementation, as it's enhanced API should let you avoid explicit use of mutexes much more often.

You may also want some sort of notification from the server to the client, saying, "Oh shit, I just crashed and restarted!". Edge cases like that are never going to work quite the same way as local Nsvs, but that should be ok.

This NetV stuff is all basically a re-invention of the Linda/Tuplespace stuff of course. That Wiki page also mentions that Ruby ships with a standard tuplespace implementation, so it might be worth looking there briefly for any lessons learned, performance hacks, or that sort of thing.

If you ever had a site making really big use of NetV (like AOL Digital City, say), I wonder whether it would pay to have NetV do its message passing over the network via MPI rather than straight TCP sockets. As long as the MPI is still talking over TCP/IP underneath, my guess is no, as the NetV communication patterns are very simple. But if you had an alternate low-latency ethernet driver (no TCP/IP, OS bypass, etc.), MPI might be the easiest way to interface to it (because someone else will have already done the work). And of course, if for some bizarre reason you actually had an SCI interconnect between your boxes, like you'd need for Clusgres, you could use the real SCI shared memory for NetV too. (All very very blue sky, this whole last paragraph.)

Collapse
Posted by Andrew Piskorski on
A possible alternate design to NetV would be to simply use AOLserver's database APIs to talk to a single remote in-memory RDBMS. Unfortunately, SQLite doesn't (yet?) support the necessary concurrency, nor does Metakit. And the NetV approach is a lot simpler, and would probably be noticeably faster for the simpler cases it supports.

Actually, Mnesia probably does support the the necessary concurrency, as well as other neat featuers. But I haven't really looked into it, it's all in Erlang, and AOLserver would need to talk to it via either SNMP or CORBA. Seems strange, but it looks like Erlang support client-side but not server-side ODBC. Erlang can use ODBC to talk to some external SQL database (Oracle or whatever), but a Tcl or C client cannot use ODBC to talk to Erang's Mnesia RDBMS. Ick.

Collapse
Posted by Tom Jackson on

Andrew, I have a module already working for this type of application. Essentially is is an abstraction layer which looks much like the ns_db sp_* api, and has many of the same goals of the OpenACS db_* api, although more general, and much simplified. Right now I have it working with odbc and postgresql drivers, these are called providers. I just finished removing all the ns_db calls into callbacks, so you can easily create a new provider with your own callbacks in this case.

The mutex/condition variables are used to protect a pool of connections/handles so you don't need to reconnect for each request. Also, you can reuse a connection for the life of your ns_conn without re-using the mutex/condition. It has a built in explicit transaction start/commit/rollback set of callbacks as well which could be used to issue special commands, such as locking a mutex on the remote server. That might be a little tricky in the case of a misbehaving or crashed client. Also transactions are handled without the need for catch/rollback. You could use these if you want, but when you start a transaction, the initializing routine registers a callback (once per ns_conn) to rollback any outstanding and uncommitted transactions upon connection/scheduled proc close, using ns_atclose. Ns_atclose also returns the token to the pool and signals any waiting threads that a token is now available. All this is created automatically when you create a new datasource and add a list of connection handles. Your callback code would need to translate this handle into an actual connection to use, for instance the callback code for odbc/postgresql is simply: 'return [ns_db gethandle $pool]'.

As far as maintaining the connections, I haven't yet looked at burrow. But I remember setting up something like this when I wrote a jabber module. It would vigorously maintain a connection with the jabber server.

Too bad we don't yet have udp in AOLserver. The changes to ns_sock are almost zero, but the problem is you need configuration changes so that a particular sock knows if it is udp or tcp. But the application doesn't need to know anything. With udp you could efficiently support server push. It might be worth writing a simple udp echo server. Whenever a change is made to an NV, the change is echoed back (to all connected nodes), which record the change. Using udp would probably complicate some things on both ends. How do you know when the server is down? Also you would probably need a timeout on every NV, where it would become stale and need to be rechecked. But this should be perfect for session management, with the session manager sitting on your database server and providing simple answers that might initially be stored in a database, but live in memory so multiple servers can track a single user session.

Collapse
12: NetV, GAMMA, MPI, etc. (response to 7)
Posted by Andrew Piskorski on
Tom, how's your NetV stuff going? Used it for anything interesting yet?

I'm curious whether AOL has ever had their NV (NetV) became a performance bottleneck. Even using TCP/IP, it is probably faster than querying an RDBMS, although lots slower than using a local NSV in shared memory. Since they are probably using it in a query-like rather than nsv-in-an-inner-loop like fashion, my guess is it's probably fast enough.

On the other hand, if you were really hitting NetV with lots of small requests, the TCP/IP latency might bite you. In that case, GAMMA might be the answer. It supports very low latency reliable MPI over cheap Intel PRO/1000 Gigabit Ethernet cards.

That seems like one of those atypical edge cases where "enterprise" style server computing and high performance cluster computing overlap. However, offhand I can't think of any web-oriented application at all that is likely to be that latency sensitive. I'm curious whether anyone has seen one.

Collapse
Posted by Andrew Piskorski on
On the AOLserver list, in the "Scaling at the high end" thread, Nathan Folkman just announced that AOL is preparing its nsdci "Digital City Module" for Open Source release. Among other things, it includes their network "NV" code.
Collapse
Posted by Andrew Piskorski on
I'd forgotten all about this, but apparently AOL really did open source their nsdci Digital City Infrastructure codebase some time ago, on Google Code.
Collapse
10: NetV, memcached (response to 1)
Posted by Andrew Piskorski on
Tom, something that sounds very much like your NetV idea (but as a stand-alone application, nothing to do with AOLserver) is already done and in wide production use: memcached
Collapse
Posted by Andrew Piskorski on
Naturally, all cacheing schemes that keep the cache outside the RDBMS - NSV, memcached, whatever - are inherently non-transactional. This is fine in many cases, and is keeps the NSV/cache implementation (but not necessarily the application that uses it!) simple.

What if, instead of the memory cache layer existing as a completely independent application, you turned it into part of the RDBMS? In essence, let the RDBMS keep its own in-memory cache of query results, but let it keep it on many other machines across the network, not just in it's own RAM. Obviously that means the cache servers and the master RDBMS have to exchane lots of short messages frequently in order to keep the network memory cache coherent and transactional, and getting that right would be critical for performance and scalability.

At least without thinking about it in much more detail, it sounds feasible to me. There are also obvious potential synergies with the stuff that the HPC cluster/Beowulf work on, optimized MPI libraries, low-latency high-bandwith networks and network drivers, etc. (Thus see also the somewhat related thread discussing Clusgres, Postgres-R, and the like.)

But, this seems like such an obvious R&D/thesis opportunity that I wonder whether it's already been done. Anyone know of any such projects?

Collapse
15: pgmemcache (response to 10)
Posted by Andrew Piskorski on
I just noticed that PostgreSQL has a memcached interface, pgmemcache. Also, Harish Krishnan provided a very small Tcl API to memcached back in Aug. 2005.
Collapse
16: SQLCacheD (response to 10)
Posted by Andrew Piskorski on
Hm, and an Ivan Voras is working on SQLCacheD, which is like memcached, but uses in-memory SQLite for greater flexibility (using SQL) in how one can manipulate (expire, update, etc.) cached data.
Collapse
13: InterWeave (response to 1)
Posted by Andrew Piskorski on
In many ways, Michael L. Scott's InterWeave project is like a really fancy version of the NetV idea, complete with optional Two Phase Commit, direct integration into lots of languages, etc. etc. They say it all works and give some examples and performance numbers, but unfortunately I don't see any code.
Collapse
14: Re: InterWeave (response to 13)
Posted by Andrew Piskorski on
Incidentally, I asked Michael L. Scott about InterWeave. The code has not been published, but it sounds like he's not necessarily adverse to publically releasing it in some fashion, it's just that no one has worked on it lately and he doesn't have anyone available to package it up into releasable shape.