Forum OpenACS Development: Tcl Profiling

1: Tcl Profiling

Posted by Joe Oldak on 01/20/12 04:38 PM

Though it may come as a surprise, I'm still actively doing lots of things on OpenACS (or dotCommunity as we like to call our branch of it!).

I've recently been doing some load testing of Surrey's Olympics website which is a subsite on one of our OpenACS installations - www.gosurrey.info.

They want to be able to handle a *big* increase in traffic around Olympics time, so I've been doing some load testing using Tsung (which is excellent, by the way!).

By cranking up the CPU power on the virtual server that runs it I can get it to handle about 300 visitors per minute (where each visitor loads a few pages). That's still some way off the ideal target.

It's entirely CPU bound - the db and files are all in RAM. What I found reasonably surprising is that most of the CPU time is nsd, not in postgresql.

So, I was wondering if anyone had any success in profiling the Tcl side of things recently? I have found various old discussions on using tclprof and nsprofile but nobody seemed to have much success.

I imagine it'd be possible to hook directly into ad_proc, but the overhead of adding timing to *every* procedure call might make everything unbearably slow!

I promise to post results here if anyone can give me any clues 😊

Thanks, and happy new year!

Joe

2: Re: Tcl Profiling (response to 1)

Posted by Dave Bauer on 01/20/12 06:05 PM

I believe NSD and Postgresql benefit from multiple CPUs. How many CPUs can you assign for the site?

It might make sense to put the VM onto dedicated hardware for the duration of the olympics with multiple cores/processors dedicated to the site.

How much time does a request take in NSD at 300/minute?

3: Re: Tcl Profiling (response to 1)

Posted by Patrick Giagnocavo on 01/20/12 10:33 PM

You may find the cheapest / simplest way to improve performance is to put images and .js files on either a CDN or a separate machine with e.g. nginx ; and, put nginx in front of your AOLserver to keep the nsd from tying up threads serving slower clients.

4: Re: Tcl Profiling (response to 1)

Posted by Torben Brosten on 01/20/12 11:02 PM

Joe,

You could also try putting it "in the clouds" ie going with a VPS hosting service that can adjust the level of hardware service dedicated to the unit. You only pay for what you use, so you would likely pay out a fraction of the cost of building a dedicated hardware configuration.

cheers,

Torben

5: Re: Tcl Profiling (response to 1)

Posted by Gustaf Neumann on 01/21/12 07:25 AM

Dear Joe,

Handling the load you are describing (300 * 5 -> 1,500 views per minute) is not too much (we are experiencing on our production site sometimes more than 10,000 views per minute, practically all of these are dynamic), but hits/views are very hard to compare, since the computational effort per request can be dramatically different.

First of all, CPU-bounded-ness is typically no big problem for aolserver/naviserver, when you have multiple cores. I would recommend to configure 2-4 connection threads per core for good scaling. If you cannot use multiple cores due to your virtual server, consider moving to the real machines - quad core (or even 8 core) machines are a commodity today.

Additionally, using a reverse proxy (such as nginx) for static content and baground delivery [1] will help you additionally to scale up.

Going towards high performance without serious monitoring is hardly possible, since every server is a fast as is weakest component. I am sure, you are aware of the options and tools, such as

aolserver/naviserver specific tools xotcl-request monitor (statistics from actual to daily values) and nsstats.tcl (cache and lock statistics).
db-tools monitoring (e.g. postgres statistics, both in-server statistics and log analyzing tools) and
system monitoring tools (like e.g. hotsanic, symon, ...)

From that monitoring one might find out that e.g. you are just able to use 2 of the 8 cores, etc. This information is important for configuring aolserver (e.g. connection threads, db-handles, ns-cache sizes, ...), that a few SQL queries slow everything down, etc. Tuning the aolserver configuration and the server can improve scaling significantly.

Coming now to your question - Tcl profiling:

There are at least 2 approaches:

dtrace: newer Tcl versions have support for dtrace provider. This allows you to turn on/off various "probes" from an external program .... provided you have a platform supporting dtrace (or access to such a platform to run your tests)
See [2]
the next scripting framework (child of xotcl) has among many features support for running tcl-procs with non-positional argument support (like ad-procs). We have a small extension to openacs that allows to map all ad-procs with non-pos args to that nsf-procs. We use since in production since may. By this move,
- (1) the performance overhead per ad-proc invocation is greatly reduced (argument processing is done in C),
- (2) the memory footprint shrinks (non of the automatically generated argument parsers are needed)
- (3) one can use the built-in profiling support (when using xowiki, one can obtain even class diagrams with profiling information)
For this, you need an fairly new OpenACS, Tcl 8.5, the next scripting framework installed, an a small patch to lat ad-procs use nsf-procs.
See [3]

For giving more specific hints, one would need more information about you application, the used components, platform, etc. Anyway, hope this is still helpful
-gustaf neumann

[1] http://www.openacs.org/xowiki/openacs-performance-tuning
[2] http://wiki.tcl.tk/19923
[3] http://www.next-scripting.org/

6: Re: Tcl Profiling (response to 1)

Posted by Joe Oldak on 01/26/12 06:08 PM

Thanks for your suggestions! To address some of the questions:

I'm using ElasticHosts to provide the virtual server. I can specify up to 20GHz of CPU split over 8 cores. (these are modern Opteron processors).

I believe by doing so I am effectively getting a whole physical server to myself - though I will check this. There'll obviously be some overhead from the hypervisor, though they use KVM so it should be pretty small.

I managed 300 visitors/minute with the full 20GHz/8 cores with 16 connection threads. So even with max power, I still need more performance from somewhere! (In this test I had no slow clients, so gained no performance from more threads. The server was fully utilised with negligible idle/waiting time).

Based on the load I was generating (modelled on traffic around the time of the Olympic road race test event), there was around 25 page fetches per visit (some pages have a lot of media files!) - so it will have peaked at about 7500 requests/minute (125/sec). The vast majority coming from the content repository rather than resources.

At full load the nsd threads were probably using about 80% of the availble CPU, and postgres 20%. A lot of the content is memoize cached, but quite a few queries are needed per request for permission checking and so on.

One quick win would be to move the most commonly accessed files (theme CSS etc) into a resources folder rather than serving from the content repository. Though of course I'd rather get the content repository up to near-resource speed to reduce the need to do this!

Thanks for the info on dtrace and NX. At the moment I'm using mostly out-of-the-box Ubuntu 10.04 (LTS), which means Tcl 8.4, AOLServer 4.5.1, postgresql 8.4. I'm happy to move to something newer if known to be faster (and not *too* much hassle!)

Our OpenACS is a bit ancient - we branched 5.2 and have sort of done our own thing since then. I know this is pretty crap of us, but we've made quite a lot of significant changes to the way some bits of OpenACS work and merging with the latest version might actually be impossible! (I'm sure there will have been performance improvements since our branch that we're missing out on! Perhaps job 1 would be to scan the changelogs and see if there are any patches I should bring across)

At the moment I don't have a clear picture of where all the Tcl overhead is (because I haven't looked rather than because it's impossible to find out!). Hence some profiling might be really useful. (I'll also use the RP Debug output to see if that gives some pointers.)

Apologies for rambling on! I'll look into the various suggestions and see where I get...

Thanks again

Joe

7: Re: Tcl Profiling (response to 6)

Posted by Dave Bauer on 01/26/12 06:15 PM

Moving static files into the filesystem without permission checks is the best bet.

OR If the URLs match a consistent pattern, or you can write a URL handler for those type of content repository resources, you could proxy them with something like NGINX.

This might be the simplest solution. NOTE that the content repository does include a feature to publish content repository items to the filesystem specifically to remove the datbase overhead. If you published them to a /resources directory you'd get the additional benefit of no permission checks.

On top of the filesystem publish, proxying to remove all Tcl overhead, all /resources URL should give the maximum benefit.

8: Re: Tcl Profiling (response to 7)

Posted by Joe Oldak on 01/26/12 07:08 PM

Ooh, that's an interesting idea!

All image/css assets come from an assets folder in the subsite (so either http://main.domain/sitename/assets/... or http://vhost.domain/assets/...)

If I put in a request handler for those I'd bypass a fair bit of code!

(at the moment we have a gcms-mount package mounted at "assets" in every subsite, with an index.vuh which does the serving from the content repository)

(or, as you suggest, it might be possible to set up nginx with a rule to cache */assets/* ?)

Joe

9: Re: Tcl Profiling (response to 8)

Posted by Jeff Rogers on 01/26/12 09:39 PM

If you put in a dedicated handler for those, you'd also do well to set expiry/cache-control headers on the response; how aggressive depends on your needs of course but most recommendations are pretty high (months or more).