Forum OpenACS Development: Site nodes scaling problem
Each call to site_node::update_cache takes about 10 seconds. When creating a new .LRN class, this proc is called 22 times. Creating a new class takes about 6 minutes during which the nsd process maxes out the CPU.
Most of the time in site_node::update_cache is spent in the four calls to "array set" at the top and in the four calls to "nsv_array reset" at the bottom. This is not surprising given the size of the data structures involved.
The question that occurs to me is, do we have to deal with the entire site map atomically, or can we cache at the individual node/url level?
Looking back through old code, we actually did it this way up until late November 2003. Only with r1.48 of site-nodes-procs.tcl did we begin to deal with the site map as a whole. Timo's commit message says "populate site-nodes-cache by resolving urls in tcl instead of using the slow plsql". I'm not clear yet on whether this change strictly requires whole-map caching instead of individual-url caching. But from a scaling pov I think we want to go back to individual caching if possible.
We know that AOLserver 4 will help speed the Tcl side, but 6 minutes divded by (say) a double performance boost is still 3 minutes.
As I commented in e-mail ... if we used regular nsvs with the URL as the key, rather than an array, we could modify the nsv for an individual URL automically and also add new URLs to the map atomically without going through all this update_cache crap...
So, what are the comparative costs of a nsv vs. nsv_array?
Maybe dossy's in IRC...
Other than that, Don sounds exactly right. :)
we have about 3.8k class instances and about 40K site-nodes ..., its quite slow to create a new class instance ...
What about populating the caches asynchronously e.g. in another thread? Or cache lazily i.e. after a site-node has indeed been hit. Or just add the new site-nodes to the cache?
As for updating the cache(s) it ought to be possible to do that inline as well, in a surgical fashion, as opposed to bulk over-writing the old nsv_arrays with the updated Tcl arrays. (I think!)
Another thing which struck me: Assuming I'm right in that the data structures (nsv_array and Tcl array) are copied "by value", is it then possible to do that "by reference" instead?
I don't think we've seen the end of site node related scaling fixes but this solves some important ones.
true, site nodes needs more work ... curriculum bar should be removed from dotlrn-master.tcl, since makes real bad performance, even if you don't have installed the pkg, for any site with more than 20k site nodes...