Thanks for your suggestions! To address some of the questions:
I'm using ElasticHosts to provide the virtual server. I can specify up to 20GHz of CPU split over 8 cores. (these are modern Opteron processors).
I believe by doing so I am effectively getting a whole physical server to myself - though I will check this. There'll obviously be some overhead from the hypervisor, though they use KVM so it should be pretty small.
I managed 300 visitors/minute with the full 20GHz/8 cores with 16 connection threads. So even with max power, I still need more performance from somewhere! (In this test I had no slow clients, so gained no performance from more threads. The server was fully utilised with negligible idle/waiting time).
Based on the load I was generating (modelled on traffic around the time of the Olympic road race test event), there was around 25 page fetches per visit (some pages have a lot of media files!) - so it will have peaked at about 7500 requests/minute (125/sec). The vast majority coming from the content repository rather than resources.
At full load the nsd threads were probably using about 80% of the availble CPU, and postgres 20%. A lot of the content is memoize cached, but quite a few queries are needed per request for permission checking and so on.
One quick win would be to move the most commonly accessed files (theme CSS etc) into a resources folder rather than serving from the content repository. Though of course I'd rather get the content repository up to near-resource speed to reduce the need to do this!
Thanks for the info on dtrace and NX. At the moment I'm using mostly out-of-the-box Ubuntu 10.04 (LTS), which means Tcl 8.4, AOLServer 4.5.1, postgresql 8.4. I'm happy to move to something newer if known to be faster (and not *too* much hassle!)
Our OpenACS is a bit ancient - we branched 5.2 and have sort of done our own thing since then. I know this is pretty crap of us, but we've made quite a lot of significant changes to the way some bits of OpenACS work and merging with the latest version might actually be impossible! (I'm sure there will have been performance improvements since our branch that we're missing out on! Perhaps job 1 would be to scan the changelogs and see if there are any patches I should bring across)
At the moment I don't have a clear picture of where all the Tcl overhead is (because I haven't looked rather than because it's impossible to find out!). Hence some profiling might be really useful. (I'll also use the RP Debug output to see if that gives some pointers.)
Apologies for rambling on! I'll look into the various suggestions and see where I get...
Thanks again
Joe