Forum OpenACS Q&A: Re: Performance Problems - what is the bottleneck?

Collapse
Posted by Don Baccus on
Comments regarding database configuration (such as Steve Manning's) and the cost of permissioning and the portal system regarding the database (Denis Roy's) seem to miss the crucial point:

The database server's sitting mostly idle. It's the dynamic AOLserver box that's hosed with a load going over 10.

I have no solution to offer but Nima, it seems obvious you should be concentrating on the box that's the busiest, your Tcl box, not the database server or static server.

Profiling with developer support may help you gather the necessary information, along with AOLserver stats gathering. Without more information I can not offer any advice.

Collapse
Posted by Denis Roy on
My main point was that Nima has to find out what the primary reasons for his extremely slow page response times are, possibly using developer support (and AOLserver stats as Don mentioned).

The comments about the database were mainly to illustrate that application tuning was the most crucial thing for us. Even though we have a strong quad-Opteron database server which is usually about 50-70% idle and the load rarely goes over 1.5 or 2, we do have some pages that take a very long time to execute because there are e.g. many huge tables being joined and/or very many permission checks on those pages. Btw, when I mentioned the portal system, I didn't mean the portal system as such but rather some portlets which run very expensive queries.

So based on our experience, I would support what Dave has suspected earlier: Long-running queries can result in slow response times even though the load on the db server is low. And this in turn can block available AOLserver threads which will mainly become a problem during peak times with many concurrent users.

I didn't mean to say, though, that this or any other reason is causing all the trouble. That, as mentioned, needs more careful analysis of the entire system.

The hypothesis isn't supported by the fact that the dynamic server's busting its balls. If long-running queries were causing threads to stall leading to too few threads being available to service connection requests, one would expect the dynamic server to be waiting for the busy database server to complete those long-running queries. But there is no busy database server ...

If it were me, I'd concentrate on that Tcl server and not worry about anything else until I had a reasonable understanding as to why it is so heavily loaded.