Forum OpenACS CMS: Re: Configuring OpenACS for optimum performance

Collapse
Posted by Gustaf Neumann on
Hi Tony,

I am trying to find out how to best configure NaviServer and OpenACS for a system that hosts home grown business applications for 1000 concurrent users.

Running a system with this configuration and 1K+ users should be OK, although 8 cores are not much these days. The strong multi-threading of NaviServer can make use of all cores you provide. The largest installation of NaviServer + OpenACS i know run with ~9k concurrent users on a VM with 50 cores, where the same VM hosts nsd and the PostgreSQL server. On our university live system, we see up to about 3k concurrent users with intense usage such as serving one live videostream to 1500k users, etc. with average latencies < 60ms.

The first question coming up is the DB server: Is the PostgreSQL server running on the same machine? Probably yes, but the numbers are not great (we see e.g. avgwaittime 3.3µs avgsqltime 983.3µs). So it is probably a good time to look at cache statistics (in nsstats) and SQL statistics (as provided by pg).

The provided statistics show high numbers of queued requests (we see e.g.: 0.12% and less), which indicates also a probably queue overruns (watch out for "All available connections are used"). The queue length of the pool is determined by "maxconnections". You can identify these queue overruns by "All available connections are used" entries in the system log (named by default error.log). From what you describe, i am pretty sure, you will find such entries in the system log.

The goal should be (a) to reduce the queue overruns and (b) to improve the server behavior in these cases. Concerning (a) on has to figure out from which connection pool this is coming. You have also quite high values for queued requests on the "image" pool. I would recommend to define minthreads=maxthreads=4 (or maybe 6) for this pool, since image requests are often bulky. If one has a large blueprint (which is the case for most OpenACS installations) the cost of starting a thread can take 1s, so frequent starting and stopping of threads can be costly. In case you are using the xotcl-request monitor, i would recommend to create a "slow" pool, since the request monitor can dynamically assign requests from the default pool to the "slow" pool when these take a long time (longer than 3secs). This is part of the current xotcl-request monitor of oacs-5-10.

There is an improvement for handling of queue overruns in the recent versions of naviserver on bitbucket [1], implemented ~6 weeks ago, addressing the symptoms you describe above. NaviServer (and AOLServer) assumed that when a queue overruns, the server should stop to accept more incoming requests until there is space in the queue. This assumption was probably ok as long there is only one queue (= connection pool), but not useful in multi-pool setups. It took me a while to figure out the details, why on some slow sites suddenly nsstats was not responding, although there were separate monitor threads configured.

Is there a built-in way to turn on a more consistent / persistent logging of the values provided in nsstats
Look at [2]. The munin modules for NaviServer are constantly improved and provide an overview of what's normal, how the load patterns changes over time, etc.

Just recently we had a major slow down on our server where we tried to serve many video files and had some major slowdown.

How do you serve these video files? do you server these from the file-system, or via OpenACS file storage? Are these single chunks (e.g. mp4 files) or segmented video streams (e.g. HLS)?

Hope, this helps as a starter....
-g

[1] https://bitbucket.org/naviserver/naviserver/commits/1a387ddff2108a2c12c4b48421c52a96edd3d843
[2] https://github.com/gustafn/munin-plugins-ns