Eduaro, short comments to some of your statements:
- The last change in /acs-tcl/tcl/pools-init.tcl was 15 months ago. You might have been affected by change in the default pool order, http://fisheye.openacs.org/browse/OpenACS/openacs-4/etc/config.tcl?r1=1.47&r2=1.48
- i don't see an kind of "XOTcl problem" in this discussion (maybe i am overly sensible in this question). The only "problem" is see is that the xotcl-core writes a message into the error log, when aolserver threads exit. They would exit as well without XOTcl being involved.
- setting maxthreads to 150 is very high. We use on our production site (more than 2200 concurrent users, up to 120 views per second) maxthreads of 100. by using background delivery, maxthreads can be normally reduced. keep in mind that ever connection thread (or thread for scheduled procs) contains a full blueprint of all used packages, these are 5.000-10.000 procs). In other words, for every thread there is a separate copy of all procs in memory. We are talking about 500.000 to maybe more than one million (!) tcl procs, when 100 threads are configured. This is not lightweight. When all threads go down (as sketched in my earlier posting), all these procs are deleted and maybe recreated immediately in new threads.
- "difficult problem ... estimate the number of requests per user": The request monitor gives you on the start page the actual number of users and the actual number of requests per minute (in the graph), and the actual views/sec and the avg. view time per user. If you have an average view time (AVT) per user of say 62 seconds, then the views per user (VPU) per minute is VPU = 60/AVT = 0.96.
Therefore, 100 users generate U*VPU requests per minute (with the sample values: 96). Getting estimates on Requests per user is just the simple part. More complex is to figure out, how many threads one will need for that. One needs the average processing time, which is unfortunately depending in reality, how many other requests are currently running (all requests share common resources and are therefore not independent).
- Anyhow, the request monitor shows you as well the number of currently running requests, and by observing this value one can estimate the number of required threads. If you have eg. 10 connection threads configured, and you observe that often 8 ore more requests are running, you are already running into limits. Once all connection threads are busy, the newly incoming requests are queued, causing further delays. Even simple requests (like e.g. a logo) can take significant time, since the response time to the user is queuing time + processing time. If the QT is 10 seconds, every request will take at least 10 seconds. If the machine was not able to process the incoming requests quickly enough with the configured threads in the past than it is questionable, when it will be able to catch up. So, queuing requests should be avoided. This is also one place, where the background delivery helps since it frees connection threads much earlier to do other work. On the other hand, if you have e.g. 30 connection threads configured, and you see normally only 2 or three concurrent requests, you are most probably wasting resources.
- i do not agree with the conclusion "for best performance, you really have to split AOLServer and PostgreSQL" to different servers. It depends on the machines. Maybe you are correct with dual or quad-core Intel processors. If the database and aolserver are on the same machine, and you have enough memory and your have enough CPU powers (e.g. 8 cores or more) and your machine has a decent memory-throughput/latency, then local communication between database and aolserver is significantly faster. There are situations, where performance is better, when database and aolserver are on the same machine, even if the server is very busy (as on our learn@wu system). Certainly, it depends on the hardware and the usage patterns.
- concerning running in VMs: i completely agree, for a busy production site, using VMs is not good idea - at least not for beginners (tuning VMs and the hosting environment can help a lot, but there is yet another layer to fiddle around) Usually you have only one cpu visible from inside the VM, so using multiple CPUs (one strength of aolserver + postgres) won't help, so scalability is limited. This is one experience we are going currently through with openacs.org.
have to rush
-gustaf