Forum OpenACS Development: Tcl Profiling
I've recently been doing some load testing of Surrey's Olympics website which is a subsite on one of our OpenACS installations - www.gosurrey.info.
They want to be able to handle a *big* increase in traffic around Olympics time, so I've been doing some load testing using Tsung (which is excellent, by the way!).
By cranking up the CPU power on the virtual server that runs it I can get it to handle about 300 visitors per minute (where each visitor loads a few pages). That's still some way off the ideal target.
It's entirely CPU bound - the db and files are all in RAM. What I found reasonably surprising is that most of the CPU time is nsd, not in postgresql.
So, I was wondering if anyone had any success in profiling the Tcl side of things recently? I have found various old discussions on using tclprof and nsprofile but nobody seemed to have much success.
I imagine it'd be possible to hook directly into ad_proc, but the overhead of adding timing to *every* procedure call might make everything unbearably slow!
I promise to post results here if anyone can give me any clues
Thanks, and happy new year!
It might make sense to put the VM onto dedicated hardware for the duration of the olympics with multiple cores/processors dedicated to the site.
How much time does a request take in NSD at 300/minute?
You could also try putting it "in the clouds" ie going with a VPS hosting service that can adjust the level of hardware service dedicated to the unit. You only pay for what you use, so you would likely pay out a fraction of the cost of building a dedicated hardware configuration.
Handling the load you are describing (300 * 5 -> 1,500 views per minute) is not too much (we are experiencing on our production site sometimes more than 10,000 views per minute, practically all of these are dynamic), but hits/views are very hard to compare, since the computational effort per request can be dramatically different.
First of all, CPU-bounded-ness is typically no big problem for aolserver/naviserver, when you have multiple cores. I would recommend to configure 2-4 connection threads per core for good scaling. If you cannot use multiple cores due to your virtual server, consider moving to the real machines - quad core (or even 8 core) machines are a commodity today.
Additionally, using a reverse proxy (such as nginx) for static content and baground delivery  will help you additionally to scale up.
Going towards high performance without serious monitoring is hardly possible, since every server is a fast as is weakest component. I am sure, you are aware of the options and tools, such as
- aolserver/naviserver specific tools xotcl-request monitor (statistics from actual to daily values) and nsstats.tcl (cache and lock statistics).
- db-tools monitoring (e.g. postgres statistics, both in-server statistics and log analyzing tools) and
- system monitoring tools (like e.g. hotsanic, symon, ...)
Coming now to your question - Tcl profiling:
There are at least 2 approaches:
- dtrace: newer Tcl versions have support for dtrace
provider. This allows you to turn on/off various
"probes" from an external program .... provided you
have a platform supporting dtrace (or access to such
a platform to run your tests)
- the next scripting framework (child of xotcl) has
among many features support for running tcl-procs
with non-positional argument support
(like ad-procs). We have a small extension to openacs
that allows to map all ad-procs with non-pos args
to that nsf-procs. We use since in production since
may. By this move,
- (1) the performance overhead per ad-proc invocation is greatly reduced (argument processing is done in C),
- (2) the memory footprint shrinks (non of the automatically generated argument parsers are needed)
- (3) one can use the built-in profiling support (when using xowiki, one can obtain even class diagrams with profiling information)
I'm using ElasticHosts to provide the virtual server. I can specify up to 20GHz of CPU split over 8 cores. (these are modern Opteron processors).
I believe by doing so I am effectively getting a whole physical server to myself - though I will check this. There'll obviously be some overhead from the hypervisor, though they use KVM so it should be pretty small.
I managed 300 visitors/minute with the full 20GHz/8 cores with 16 connection threads. So even with max power, I still need more performance from somewhere! (In this test I had no slow clients, so gained no performance from more threads. The server was fully utilised with negligible idle/waiting time).
Based on the load I was generating (modelled on traffic around the time of the Olympic road race test event), there was around 25 page fetches per visit (some pages have a lot of media files!) - so it will have peaked at about 7500 requests/minute (125/sec). The vast majority coming from the content repository rather than resources.
At full load the nsd threads were probably using about 80% of the availble CPU, and postgres 20%. A lot of the content is memoize cached, but quite a few queries are needed per request for permission checking and so on.
One quick win would be to move the most commonly accessed files (theme CSS etc) into a resources folder rather than serving from the content repository. Though of course I'd rather get the content repository up to near-resource speed to reduce the need to do this!
Thanks for the info on dtrace and NX. At the moment I'm using mostly out-of-the-box Ubuntu 10.04 (LTS), which means Tcl 8.4, AOLServer 4.5.1, postgresql 8.4. I'm happy to move to something newer if known to be faster (and not *too* much hassle!)
Our OpenACS is a bit ancient - we branched 5.2 and have sort of done our own thing since then. I know this is pretty crap of us, but we've made quite a lot of significant changes to the way some bits of OpenACS work and merging with the latest version might actually be impossible! (I'm sure there will have been performance improvements since our branch that we're missing out on! Perhaps job 1 would be to scan the changelogs and see if there are any patches I should bring across)
At the moment I don't have a clear picture of where all the Tcl overhead is (because I haven't looked rather than because it's impossible to find out!). Hence some profiling might be really useful. (I'll also use the RP Debug output to see if that gives some pointers.)
Apologies for rambling on! I'll look into the various suggestions and see where I get...
OR If the URLs match a consistent pattern, or you can write a URL handler for those type of content repository resources, you could proxy them with something like NGINX.
This might be the simplest solution. NOTE that the content repository does include a feature to publish content repository items to the filesystem specifically to remove the datbase overhead. If you published them to a /resources directory you'd get the additional benefit of no permission checks.
On top of the filesystem publish, proxying to remove all Tcl overhead, all /resources URL should give the maximum benefit.
If I put in a request handler for those I'd bypass a fair bit of code!
(at the moment we have a gcms-mount package mounted at "assets" in every subsite, with an index.vuh which does the serving from the content repository)
(or, as you suggest, it might be possible to set up nginx with a rule to cache */assets/* ?)