Forum OpenACS Q&A: OpenACS clustering setup and how it relates to xotcl-core.

We are looking to get some feedback and direction on the implementation of naviserver clustering.

Our growing company is very interested in implement naviserver clustering behind an nginx load balancing server. We have been testing this setup for a little over a week now. The nginx load balancing between the naviservers appears to work great. The main reason for looking into clustering is to scale up our capacity as it relates to naviserver connections, because we have during high loads, run out of available connections at times.

Not surprisingly, In our initial testing we have found that by having multiple naviservers behind nginx (serving round robin) we have much more capacity for connections during our load tests. And we have also found that having the database on a separate VM than the naviserver instances, provides a great speed-up during extreme loads.

In our testing we also use Docker to manage the nginx and backend naviservers containers. We are using an up to date oacs-5.10 development version and are using the latest xotcl-core.

In our testing we have set up the naviserver parameters to allow naviserver clustering under the kernel package (CanonicalServer, ClusterAuthorizedIP, ClusterEnabledP, ClusterPeerIP, EnableLoggingP) in order to properly implement cache flushing across the naviserver cluster.

In testing this configuration we have found that the code does call the cluster servers for anything that is util_memoize cached by calling the pages/SYSTEM/util_memoized_statement.tcl script for each server. However, for things like cached parameters it does not call the cluster servers. See Example below:

Broader overall Question #1:

How much of the cache flushing over the cluster has actually been implemented? We are asking this question along with the following 2 questions which are more specific.

Question #2:

In your opinion what would be the pluses and minus of us making parameters, users info, etc, flushable across all servers in the cluster?

Example: under acs-subsite/www/shared/parameters.tcl

                              - calls parameter-procs.tcl -> parameter::set_value package_id $package_id -parameter $name 
                                             - acs-content-repository-callback-procs.tcl -> callback subsite::parameter_changed
                                                            - if {$package_key eq "acs-content-repository" && $parameter eq "CRFileLocationRoot" && $value ne ""} Then
                                                                           - a) nsv_unset CR_LOCATIONS CR_FILES
                                                                           - b) nsv_set CR_LOCATIONS CR_FILES "$file_location"
                                             - acs.tcl - ad_parameter_cache -delete -set $value $package_id $parameter
                                                            - nsv_unset ad_param_$key $parameter_name
                                                            - NOTE:  <b>We could call code to call each server in the cluster to do the same nsv_unset here...</b>

We also noticed that xotcl-core looks like it has some caching setup for sitenodes when UserXotclSiteNodes is set to true. There also looks like there is a method for handling acs::clusterwide flushing of the generic cache in the base generic caching class (::acs::Cache) although there is a comment saying that it may not be all the way implemented yet.

Question #3:

In your opinion, if we wanted to implement server clustering in the next 2 months, Would it be easier and faster to implement cluster caching across most caches using the util_memoize_flush way of doing things

Or

Would we be better off implementing the xotcl caching for parameters, util_memoized functions etc?

If the later is true do you have any tips as to how to start caching/flushing across servers in the cluster this way for most of our caches.

Thanks for your assistance and direction, Marty

Dear Marty,

you are raising many interesting questions! The short answer is that the cluster support is just partly implemented - simply because with high numbers of cores and a suitable naviserver configuration one one gets a very high scalability (up 1000 permission checked OpenACS requests per second) - and getting all of OpenACS with all its packages fit for clusters is some work, and using cluster make debugging more complex.

If one has many long running connection requests, one should use multiple connection thread pools (have you set this up), and one should consider reducing the blocking time of the requests (what times are you experiencing). Are you using the xotcl request monitor? e.g. the long-calls page gives you a good overview.

found that having the database on a separate VM than the naviserver instances

i do not think this is in general the case, but depends on the memory of the VM, on the number of cores given to the VM, etc. The local communication performance within a machine is always faster in terms of latency and throughput than via TCP/IP. When the a VM is under-configured for the required load (has only e.g. 2 cores, or is running out of memory, not enough shared buffers, ...) then maybe.

how much of the cache flushing over the cluster has actually been implemented

Caching happens on many levels: per-request caching, per-thread caching, caching via ns_cache and caching via nsv. The first one is not relevant for cluster considerations, per-thread caching is intentionally lock-free, but requires measures like a restart (used for things that "never change"). For the other ones, there is the special handling as provided by util_memoize and the custerwide operations. Using util_memoize for everything is not good, since it does not scale, since there are many long locking operations that limit concurrency (check on nsstats the lock times and sort by "total wait"; it is not unlikely that 'util_memoize' cache is top), so the caching work of the last years was devoted in defining multiple caches and cache partitioning.

The clusterwide support should work for you out of the box. Historically, it was developed in xotcl-core, but has been moved partly into acs-tcl since this is useful everywhere (some more generally useful stuff will be moved in future versions from xo:: to the core locations). Slightly simplified it works like the following: instead of writing

::acs::clusterwide nsv_unset $array $key

one can use

::acs::clusterwide nsv_unset $array $key

to execute a command on all machines of a cluster. So prefixing the cache clearing operations with ::acs::clusterwide should do the job.

Be aware that intra-server talk for cache clearing operations can become quite heavy, and that the communication as done right now is not how it should be done. ... many things can be done there, but first it has to be clear that it is worth the effort.

The first check is usually to figure out the local bottlenecks and the configuration, this is usually much less work than going towards the clustered solution.

Hope this answers some of your questions
all the best
-g

Thanks Gustaf,

Sorry for the late reply. I was out on vacation.

We appreciate your help and expertise. Your point is well taken that we should try to fine tune our configuration as much as we can. I will try to describe our setup and testing on our end so we can try to make sense of what we see. I work on the same naviserver instance as Tony Kirkham. I know you have worked with him on some of our setup so that may help you, to some degree, know what type of setup we have.

We did, at one time, have an extra connection thread pool set up for slow runners. However, after recently updating to the newest naviserver with xotcl we do not have one currently. It was our understanding that the new xotcl may take care of some of the long runners automatically without us defining a specific thread pool and telling xotcl about it. Is this not a good assumption? How would we go about telling the new core where our new long running pool(s) resides? Our current pool config is below.

ns_section ns/server/${server}/pools
ns_param        monitor         "Monitoring actions to check the health of the system"
ns_param        image           "Image file requests"

ns_section ns/server/${server}/pool/monitor
ns_param        minthreads 3
ns_param        maxthreads 5
ns_param        map "GET /admin/nsstats"
ns_param        map "GET /SYSTEM"
ns_param        map "GET /ds"
ns_param        map "POST /ds"

ns_section ns/server/${server}/pool/image
ns_param        minthreads 3
ns_param        maxthreads 5
ns_param        map "GET /*.png"
ns_param        map "GET /*.jpg"
ns_param        map "GET /*.PNG"
ns_param        map "GET /*.JPG"
ns_param        map "GET /*.JPEG"
ns_param        map "GET /*.jpeg"
ns_param        map "GET /*.mp4"
ns_param        map "GET /*.MP4"
ns_param        map "GET /*.pdf"
ns_param        map "GET /*.PDF"
ns_param        map "GET /*.gif"
ns_param        map "GET /*.GIF"
Our Server setup:

We have one VM that has 24 cores and 80 Gig of RAM which our live system resides. We also have a staging VM that has 16 cores and 200 Gig of RAM.

The tests below will be run against our staging server.

I have been using a load testing tool called k6 (See: https://k6.io/) to set a base line test for our server and to do a quick 4 minute test.

The base test had the following ramp up options:
export const options = {
    stages: [
      { target: 25, duration: '30s' },			// Ramp up to 25 concurrent users over the 30 seconds 
      { target: 100, duration: '30s' },			// Then ramp up to 100 concurrent users for the next 30 seconds
      { target: 300, duration: '1m' },			// Then ramp up to 300 concurrent users for the next minute
      { target: 100, duration: '1m' },			// Then ramp down to 100 concurrent users for the next minute
      { target: 0, duration: '1m' },				// Finally ramp down to 0 concurrent users for the next minute
    ]
  };
Each k6 session does the following things:
1.	Go to login page
2.	Verify that it got a status 200 from login page
3.	Sleep 1 second
4.	Submit the login form
5.	Verify that it got a status 200
6.	Sleep 3 seconds
7.	Go to a quick loading page (~ under 1 second)
8.	Verify that it got status 200
9.	Sleep 3 seconds
10.	Go to our highly dynamic server side rendered commonly used effort reporting page (normally takes a full 2 seconds to load)
11.	Verify that it got status 200
12.	Sleep 3 seconds

Test #1 on our Staging server

Nsstats over time
------------------------
No Load on system:  3:56pm
Connection Threads:	min 2 max 10 current 2 idle 1 stopping 0 waiting 0
Request Handling:	requests 7, queued 0 (0.00%), spooled 6 (85.71%)
Request Timing:	avg queue time 98.57µs, avg filter time 183.63ms, avg run time 12.91ms avg trace time 3.92ms
Active Requests:	cns6 172.31.6.115 running GET /z1 0.014718 0 .

Starting the test.

~3:58pm
140 VUs (Virtual Users)
Connection Threads:	min 2 max 10 current 10 idle 0 stopping 0 waiting 73
Request Handling:	requests 1.7K, queued 1K (61.48%), spooled 1K (62.24%)
Request Timing:	avg queue time 660.56ms, avg filter time 5.44ms, avg run time 218.25ms avg trace time 1.31ms
Active Requests:	cns1639 172.31.207.40 running GET /z2 5.408894 0 .

~3:59pm
300 VUs
Connection Threads:	min 2 max 10 current 10 idle 0 stopping 0 waiting 90
Request Handling:	requests 3.6K, queued 3K (77.81%), spooled 2K (64.06%)
Request Timing:	avg queue time 1.39s, avg filter time 5.51ms, avg run time 224.59ms avg trace time 1.6ms
Active Requests:	cns3397 172.31.207.40 running GET /z3 0.815387 0 .

~4:00pm
200 VUs
Connection Threads:	min 2 max 10 current 10 idle 0 stopping 0 waiting 75
Request Handling:	requests 4.9K, queued 4K (82.60%), spooled 3K (66.17%)
Request Timing:	avg queue time 1.95s, avg filter time 5.77ms, avg run time 268.55ms avg trace time 1.71ms
Active Requests:	cns4809 172.31.207.40 running GET /z4 0.932554 0 .

4:01pm
49 VUs
Connection Threads:	min 2 max 10 current 10 idle 3 stopping 0 waiting 0
Request Handling:	requests 6.1K, queued 5K (81.76%), spooled 4K (66.21%)
Request Timing:	avg queue time 2.03s, avg filter time 5.84ms, avg run time 289.01ms avg trace time 1.74ms
Active Requests:	cns6038 172.31.207.40 running GET /z5 7.99834 0 .

K6 summary 
------------------
running (4m09.0s), 000/300 VUs, 964 complete and 86 interrupted iterations
default ✓ [======================================] 000/300 VUs  4m0s

     ✗ status is 200
      ↳  99% — ✓ 4051 / ✗ 7
     ✗ page succeeded
      ↳  99% — ✓ 4041 / ✗ 17

     █ setup

     █ teardown

     checks.........................: 99.70% ✓ 8092      ✗ 24         < Note:  We would like be 100% on our page check for status 200 and page succeeded.
     data_received..................: 922 MB 3.7 MB/s
     data_sent......................: 5.2 MB 21 kB/s
     http_req_blocked...............: avg=2.6ms    min=1.48µs   med=7.01µs   max=41.1ms   p(90)=8.31ms   p(95)=12.44ms
     http_req_connecting............: avg=180.19µs min=0s       med=0s       max=34.22ms  p(90)=419.21µs p(95)=465.91µs
   ✗ http_req_duration..............: avg=3.89s    min=441.05µs med=1.15s    max=35.08s   p(90)=11.28s   p(95)=17.46s    < Note:  It would be nice to be under 8 seconds in the 95th percentile.
       { expected_response:true }...: avg=3.89s    min=4.6ms    med=1.15s    max=35.08s   p(90)=11.28s   p(95)=17.4s
     http_req_failed................: 0.11%  ✓ 7         ✗ 6143     < Note:  We would like to have 0% failed http requests.
     http_req_receiving.............: avg=1.16ms   min=51.84µs  med=178.08µs max=41.42ms  p(90)=2.66ms   p(95)=4.52ms
     http_req_sending...............: avg=42.78µs  min=8.58µs   med=40.67µs  max=513.41µs p(90)=76.4µs   p(95)=88.37µs
     http_req_tls_handshaking.......: avg=2.38ms   min=0s       med=0s       max=39.85ms  p(90)=7.75ms   p(95)=11.64ms
     http_req_waiting...............: avg=3.89s    min=294.73µs med=1.15s    max=35.08s   p(90)=11.28s   p(95)=17.46s
     http_reqs......................: 6150   24.695368/s
     iteration_duration.............: avg=32.34s   min=2.31µs   med=31.28s   max=1m6s     p(90)=54.78s   p(95)=56.5s
     iterations.....................: 964    3.870949/s
     vus............................: 1      min=1       max=300
     vus_max........................: 300    min=300     max=300

Error.log file shows we are out of available connections during this test.
--------------------
...
[13/Sep/2021:16:00:16][1.7f1a15c5a700][-driver:nsssl_v4:0-] Notice: [east_intrastage pool ] All available connections are used, waiting 90 idle 0 current 10
[13/Sep/2021:16:00:17][1.7f1a15c5a700][-driver:nsssl_v4:0-] Notice: [east_intrastage pool ] All available connections are used, waiting 90 idle 0 current 10
[13/Sep/2021:16:00:18][1.7f1a15c5a700][-driver:nsssl_v4:0-] Notice: [east_intrastage pool ] All available connections are used, waiting 90 idle 0 current 10
[13/Sep/2021:16:00:18][1.7f1a15c5a700][-driver:nsssl_v4:0-] Notice: [east_intrastage pool ] All available connections are used, waiting 90 idle 0 current 10
[13/Sep/2021:16:00:19][1.7f1a15c5a700][-driver:nsssl_v4:0-] Notice: [east_intrastage pool ] All available connections are used, waiting 90 idle 0 current 10
[13/Sep/2021:16:00:20][1.7f1a15c5a700][-driver:nsssl_v4:0-] Notice: [east_intrastage pool ] All available connections are used, waiting 90 idle 0 current 10
[13/Sep/2021:16:00:21][1.7f1a15c5a700][-driver:nsssl_v4:0-] Notice: [east_intrastage pool ] All available connections are used, waiting 90 idle 0 current 10
...
Some of the longer calls from request monitor during this time. request-monitor/long-calls?lines=1000
...
14.371059	0.003934	1.550067	15.925060	2021 Sep 13 16:00:30
14.290415	0.006291	1.668398	15.965104	2021 Sep 13 16:00:30
14.062884	0.003386	0.007183	14.073453	2021 Sep 13 16:00:29
14.189928	0.003912	1.581149	15.774989	2021 Sep 13 16:00:29
14.176448	0.004008	0.018152	14.198608	2021 Sep 13 16:00:29
14.314284	0.003587	0.010584	14.328455	2021 Sep 13 16:00:29
14.265956	0.004304	1.653035	15.923295	2021 Sep 13 16:00:29
14.447920	0.005010	1.477352	15.930282	2021 Sep 13 16:00:29
14.803571	0.004733	1.380080	16.188384	2021 Sep 13 16:00:29
14.458517	0.004058	1.835248	16.297823	2021 Sep 13 16:00:29
14.572207	0.006749	1.628151	16.207107	2021 Sep 13 16:00:29
14.449219	0.003139	0.006589	14.458947	2021 Sep 13 16:00:28
14.402088	0.007025	1.514557	15.923670	2021 Sep 13 16:00:28
14.547796	0.005566	1.703462	16.256824	2021 Sep 13 16:00:28
14.278776	0.012663	1.880569	16.172008	2021 Sep 13 16:00:28
14.470381	0.019008	1.793124	16.282513	2021 Sep 13 16:00:28
14.950281	0.003875	1.816067	16.770223	2021 Sep 13 16:00:28
14.547664	0.004792	1.457710	16.010166	2021 Sep 13 16:00:28
14.418724	0.019254	0.008703	14.446681	2021 Sep 13 16:00:28
...

From the data above it looks like we are running out of connections in our connection pool which I believe would cause requests to be put onto the queue to wait until another connection is available.

Note: When we put 5 servers behind a nginx server and the DB on a separate VM our status checks for page success was 100% and our wait times in the 95th percentile were around 7 seconds.

Any suggestions you have that would help us fine tune our configuration for one server would be greatly appreciated.

Thanks for your help, Marty
It was our understanding that the new xotcl may take care of some of the long runners automatically without us defining a specific thread pool and telling xotcl about it. Is this not a good assumption

Well, newer version of the xotcl-request monitor can automatize some steps, but it needs configuration. The xotcl-request monitor can map requests to a "slow" pool when it is defined by the site-admin. The site-admin has also to configure (e.g. min/max threads, rejectoverruns, queue size, ...) this connection pool. From the snippet of the configuration files, it seem, that you have no "slow" pool defined, so no automatic mapping will happen.

From the output i see that the default pool has minthreads 2 and maxthreads 10. I would recommend to define 25/25 on your testing instance based on:

live 24 cores and 80 GB of RAM, staging 16 cores and 200 GB
For a rough estimate, one can run with 24 cores 96 connection threads (assuming typical OpenACS loads) and with 16 about 64.

When looking at the 140 VUs output i see

Request Handling:	requests 1.7K, queued 1K (61.48%), spooled 1K (62.24%)

The number of queued requests is way to high. On our production system (typically between 600 and 3000 concurrent users) we have typically less the 0.01% queued requests in the default pool.

Request Timing:	avg queue time 660.56ms, avg filter time 5.44ms, avg run time 218.25ms avg trace time 1.31ms

You see here that the avg queue time (660ms, time waiting to be served) is much longer the than the processing time (avg run time). On our production system (currently 650 users) we have now

avg queue time 465.04µs, avg filter time 4.18ms, avg run time 104.08ms avg trace time 4.27ms

the output

[13/Sep/2021:16:00:16][1.7f1a15c5a700][-driver:nsssl_v4:0-] Notice: [east_intrastage pool ] All available connections are used, waiting 90 idle 0 current 10
tells that the queue is filled up (100 entries). The queue size is defined per pool via "maxconnections", we have on our live system this value set to 600. One can fine-tune the behavior of queue overruns via "rejectoverrun". A larger queue makes the system more friendly when there are sometime large request bursts.

While the values above are average values, the values below (long-calls) are from individual requests:

14.371059	0.003934	1.550067	15.925060	2021 Sep 13 16:00:30

The columns are: Queuetime Filtertime Runtime Totaltime.
The requests above were waiting 14secs in the queue before being served, whereas the actual run-time was about 1.5 secs. When you get rid of the queuing time, the performance will improve by a factor of 10. Increasing the threads as indicated above should help significantly.

For this test, setting up the "slow" pool will not improve the benchmark times, since the pools make sure that "slow" requests do not kill the "fast" requests on your server. This is important to ensure liveliness of the server, when the majority of requests is "fast", and only a few are "slow", and you do not want that someone firing e.g. 100 "slow" requests brings you server to its knees.

By increasing the number of maxthreads form 10 (default) to e.g. 50 should increase the throughput by a factor of 5.

Hope, this helps
-gn

Thanks Gustaf,

We appreciate your expertise in helping us tune our system. It has helped a lot!

I was successful in setting up the "slow" pool. I was wondering if there is a setting to only map a request to the slow pool if it is only over 30 or 40 seconds. I found that it started to put our effort reporting page on the slow pool when the system was a little stressed. So I removed the slow pool for the following tests for now.

I have been playing around with the configuration and we have decided to go up to 48 cores on our staging system with the 200 Gig of RAM.

Here is what I found to be somewhat optimal for the 48 cores.

We are not running out of Available connections anymore and the queued request are a lot lower as well as the avg queue times. The K6 tests show that all http request are status 200 as well.

I was wondering if you see any problems with the configuration and if you had suggestions for changes.

Again, thanks for your help

Marty
ns_section ns/server/${server}
ns_param        maxconnections  2000    ;# 100; number of allocated connection structures
ns_param        maxthreads      200     ;# 10; maximal number of connection threads
ns_param        minthreads      200     ;# 1; minimal number of connection threads

ns_param        connsperthread  10000   ;# 10000; number of connections (requests) handled per thread
                                        ;# Setting connsperthread to > 0 will cause the thread to
                                        ;# graciously exit, after processing that many
                                        ;# requests, thus initiating kind-of Tcl-level
                                        ;# garbage collection.

# ns_param      threadtimeout   120     ;# 120; timeout for idle threads.
                                        ;# In case, minthreads < maxthreads, threads
                                        ;# are shutdown after this idle time until
                                        ;# minthreads are reached.

# ns_param      lowwatermark    10      ;# 10; create additional threads above this queue-full percentage
ns_param        highwatermark   100     ;# 80; allow concurrent creates above this queue-is percentage
                                        ;# 100 means to disable concurrent creates
ns_param        rejectoverrun   true   ;# If set to true; NaviServer sends on queue overruns an error "503 service unavailable".
                                        ;#  The request is not added to the waiting list.

# Compress response character data: ns_return, ADP etc.
#
ns_param        compressenable  off     ;# false, use "ns_conn compress" to override

ns_section ns/server/${server}/pools
ns_param        monitor         "Monitoring actions to check the health of the system"
ns_param        image           "Image file requests"
#ns_param        slow            "Slow requests"
ns_param        fast            "Fast requests"
ns_section ns/server/${server}/pool/monitor
ns_param        minthreads 3
ns_param        maxthreads 5

ns_param        map "GET /admin/nsstats"
ns_param        map "GET /SYSTEM"
ns_param        map "GET /ds"
ns_param        map "POST /ds"
ns_section ns/server/${server}/pool/image
ns_param        minthreads 3
ns_param        maxthreads 5

ns_param        map "GET /*.png"
ns_param        map "GET /*.jpg"
ns_param        map "GET /*.PNG"
ns_param        map "GET /*.JPG"
ns_param        map "GET /*.JPEG"
ns_param        map "GET /*.jpeg"
ns_param        map "GET /*.mp4"
ns_param        map "GET /*.MP4"
ns_param        map "GET /*.pdf"
ns_param        map "GET /*.PDF"
ns_param        map "GET /*.gif"
ns_param        map "GET /*.GIF"

#ns_section ns/server/${server}/pool/slow
#ns_param        minthreads 15
#ns_param        maxthreads 15

ns_section ns/server/${server}/pool/fast
ns_param        minthreads 6
ns_param        maxthreads 6

# Test  w 48 CPU's   200 Gig RAM
#   maxconnections 2000
#   maxthreads 200
#   minthreads 200 
#   connsperthread 10000
#   highwatermark 100
#   compressenable  off
#   rejectoverrun true



150 VUs
Connection Threads:	min 200 max 200 current 200 idle 151 stopping 0 waiting 0
Request Handling:	requests 1.8K, queued 0 (0.00%), spooled 1K (63.55%)
Request Timing:	avg queue time 161.33µs, avg filter time 78.53ms, avg run time 212.77ms avg trace time 972.63µs

300 VUs
Connection Threads:	min 200 max 200 current 200 idle 157 stopping 0 waiting 0
Request Handling:	requests 5.4K, queued 508 (9.46%), spooled 4K (66.81%)
Request Timing:	avg queue time 295.87ms, avg filter time 1.74s, avg run time 251.96ms avg trace time 1.1ms

150 VUs
Connection Threads:	min 200 max 200 current 200 idle 189 stopping 0 waiting 0
Request Handling:	requests 7.1K, queued 627 (8.80%), spooled 5K (65.20%)
Request Timing:	avg queue time 260.7ms, avg filter time 2s, avg run time 255.49ms avg trace time 1.13ms

70 VUs
Connection Threads:	min 200 max 200 current 200 idle 141 stopping 0 waiting 0
Request Handling:	requests 7.9K, queued 627 (7.98%), spooled 5K (65.70%)
Request Timing:	avg queue time 236.39ms, avg filter time 1.94s, avg run time 264.67ms avg trace time 1.12ms

10 VUs
Connection Threads:	min 200 max 200 current 200 idle 199 stopping 0 waiting 0
Request Handling:	requests 8.4K, queued 627 (7.48%), spooled 6K (66.52%)
Request Timing:	avg queue time 221.7ms, avg filter time 1.85s, avg run time 275.66ms avg trace time 1.13ms


K6 Summary
running (4m09.3s), 000/300 VUs, 1359 complete and 38 interrupted iterations
default ✓ [======================================] 000/300 VUs  4m0s

     ✓ status is 200
     ✗ page succeeded
      ↳  99% — ✓ 5542 / ✗ 8

     █ setup

     █ teardown

     checks.........................: 99.92% ✓ 11092     ✗ 8
     data_received..................: 1.3 GB 5.2 MB/s
     data_sent......................: 7.0 MB 28 kB/s
     http_req_blocked...............: avg=1.46ms   min=1.27µs  med=6.61µs   max=51.42ms p(90)=4.1ms    p(95)=4.87ms
     http_req_connecting............: avg=147.33µs min=0s      med=0s       max=48.09ms p(90)=376.22µs p(95)=442.16µs
   ✗ http_req_duration..............: avg=2.32s    min=4.86ms  med=224.41ms max=24.15s  p(90)=8.5s     p(95)=13.05s
       { expected_response:true }...: avg=2.32s    min=4.86ms  med=224.41ms max=24.15s  p(90)=8.5s     p(95)=13.05s
     http_req_failed................: 0.00%  ✓ 0         ✗ 8355
     http_req_receiving.............: avg=689.98µs min=39.82µs med=179.82µs max=70.57ms p(90)=1.71ms   p(95)=1.89ms
     http_req_sending...............: avg=39.42µs  min=8.56µs  med=35.11µs  max=15.19ms p(90)=66.52µs  p(95)=76.12µs
     http_req_tls_handshaking.......: avg=1.28ms   min=0s      med=0s       max=42.24ms p(90)=3.62ms   p(95)=4.51ms
     http_req_waiting...............: avg=2.32s    min=4.66ms  med=223.99ms max=24.15s  p(90)=8.5s     p(95)=13.05s
     http_reqs......................: 8355   33.512465/s
     iteration_duration.............: avg=23.56s   min=1.64µs  med=22.73s   max=51.14s  p(90)=36.62s   p(95)=38.12s
     iterations.....................: 1359   5.45104/s
     vus............................: 1      min=1       max=300
     vus_max........................: 300    min=300     max=300

Hi Marty,

wondering ... there is a setting to only map a request to the slow pool if it is only over 30 or 40 seconds

Well, there should be absolutely NO request processed in the foreground taking longer than say 10 secs. Currently, the various thresholds are hard-coded in the request monitor (mapping to slow pool, warnings for queue-times and filter times, ...., mapping to color-schemes) based on best practice of several large systems, but it is certainly possible to make these configurable in the future. The threshold for slow-pool mapping is 3 seconds. Note, that in case, your server has many such requests (which can't be made faster), one should define a larger "slow" pool. Requests in the "slow" pool are processed with the same efficiency as requests in a "fast" pool ("slow" and "fast" are just names), one can give the slow pool as many threads as you like.

I was wondering if you see any problems with the configuration and if you had suggestions for changes.

You have currently:

            min max
   default  200 200
   monitor    3   5
   image      3   5
   slow      15  15
   fast       6   6

200 min threads and 200 max threads for the default pool is a lot, but certainly, the right values depend on the application and on the machine you are using. When your jobs are CPU-intense, then it can happen that concurrent jobs slow each other more down than it helps. Many processors have a degressive clock rate (when many CPUs are busy, the temperature in the CPU goes up and the clock rate is reduced). Choosing the right CPU-type, is not so easy, often deactivating hyper-threading can speed up things (don't believe your vendor), etc. .... but this is a long story and brings us away from your question.

When there is the DB + nsd on the same machine, don't forget that the DB needs as well a lot of resources (CPU and memory). Every (active) thread will require a DB-connection, and the resources available per SQL query (postgres process) will be less, the SQL queries become slower, etc.

My recommendation is: if you have multiple connection pools, and the default pool is responsible for most of the hard work, don't make this much larger than the number of cores you have. When you have long running queries, make sure, you can't handle these via spooling or background processing.

From your configuration, i see a pool "fast", which is probably unused (and can be removed), since there is no mapping defined. I would also recommend to use the "image" pool minthreads=maxthreads. You should see from the per-pool statistics how many threads you need. Here are is the output from our "fast" pool used for images (with the given load, 6 threads can server 210 requests in this pool per second).

Connection Threads: min 6 max 6 current 6 idle 6 stopping 0 waiting 0 started 23 maxconnections 600
Request Handling:   requests 194.5K, queued 149 (0.08%), spooled 124K (63.55%), dropped 0 possible-max-reqs 210.5rps
Request Timing: avg queue time 239.75µs, avg filter time 4.69ms, avg run time 22.1ms avg trace time 1.72ms

We have on our production environment currently:

            min max
   default   60  60
   bots       3   3
   monitor    2   4
   fast       6   6
   slow      15  15

What i can see from your figures is that the filter time is very high, already for 150 VUs:

Request Timing: avg queue time 161.33µs, avg filter time 78.53ms, avg run time 212.77ms avg trace time 972.63µs

Typically, the filters are responsible for authorization and setup.... but seeing there 78ms, and for 300 VUs 1.7s makes me wonder, what's going on. Maybe you are running out of DB connections (as configured in the NaviServer configuration file). Or you are running out of SQL connection from you DB configuration.

Check out the section DB-Pools in nsstats. It should look like the snipped below.

pool1:  statements 99M gethandles 2.8M handles 170 connected 64 used 135 waittime 3.872729 sqltime 82712.668212 avgwaittime 1.4µs avgsqltime 835.8µs
pool2:  statements 0 gethandles 2 handles 2 connected 0 used 1 waittime 0.002325 sqltime 0.000000 avgwaittime 1.2ms
pool3:  statements 0 gethandles 2 handles 2 connected 0 used 1 waittime 0.002555 sqltime 0.000000 avgwaittime 1.3ms

If avgwaittime is high, you have the problem. Newer OpenACS versions don't use pool2 and pool3 (for nested SQL calls, which were troublesome for rollbacks).

Hope this helps!

-g

Thanks Gustaf,

You were right the avgwaittime on pool1 was high

pool1:	statements 1.1M gethandles 8.6K handles 15 connected 0 used 15 waittime 14693.460994 sqltime 1724.604281 avgwaittime 1.7s avgsqltime 1.5ms

The pool1 connections were still at 15. I increased them to 70 and brought our maxthreads down to 60.

Here is my test with this configuration

# 48 cores   200 Gig RAM
# maxconnections 2000
# maxthreads 60
# minthreads 60
# connsperthread 10000
# highwatermark 100
# compressenable off
# rejectoverrun true
# image pool 6/6
# ns_section ns/db/pool/pool1
ns_param connections 70

150 VUs
Connection Threads: min 60 max 60 current 60 idle 33 stopping 0 waiting 0
Request Handling: requests 2.4K, queued 0 (0.00%), spooled 2K (64.39%)
Request Timing: avg queue time 167.94µs, avg filter time 4.87ms, avg run time 231.84ms avg trace time 996.02µs

pool1: statements 288.5K gethandles 2.6K handles 70 connected 5 used 34 waittime 0.477450 sqltime 447.052551 avgwaittime 186.3µs avgsqltime 1.5ms

300 VUs
Connection Threads: min 60 max 60 current 60 idle 18 stopping 0 waiting 0
Request Handling: requests 7.4K, queued 2K (21.37%), spooled 5K (64.86%)
Request Timing: avg queue time 87.8ms, avg filter time 8.85ms, avg run time 333.63ms avg trace time 3.23ms

pool1: statements 923.4K gethandles 7.6K handles 70 connected 17 used 61 waittime 0.998778 sqltime 1997.236181 avgwaittime 132.1µs avgsqltime 2.2ms

150 VUs
Connection Threads: min 60 max 60 current 60 idle 7 stopping 0 waiting 0
Request Handling: requests 11.3K, queued 3K (29.84%), spooled 7K (66.01%)
Request Timing: avg queue time 324.46ms, avg filter time 9.84ms, avg run time 416.1ms avg trace time 3.87ms

pool1: statements 1.5M gethandles 11.5K handles 70 connected 7 used 62 waittime 1.052769 sqltime 3914.273814 avgwaittime 91.8µs avgsqltime 2.5ms

70 VUs
Connection Threads: min 60 max 60 current 60 idle 57 stopping 0 waiting 0
Request Handling: requests 13K, queued 3K (25.79%), spooled 9K (66.52%)
Request Timing: avg queue time 280.5ms, avg filter time 9.11ms, avg run time 414.92ms avg trace time 3.51ms

pool1: statements 1.8M gethandles 13.2K handles 70 connected 57 used 62 waittime 1.054629 sqltime 4429.716496 avgwaittime 79.6µs avgsqltime 2.4ms

K6 Summary
running (4m09.2s), 000/300 VUs, 2225 complete and 0 interrupted iterations
default ✓ [======================================] 000/300 VUs 4m0s

✗ status is 200
↳ 99% — ✓ 8889 / ✗ 1
✗ page succeeded
↳ 99% — ✓ 8886 / ✗ 4

checks.........................: 99.97% ✓ 17775 ✗ 5
data_received..................: 2.1 GB 8.3 MB/s
data_sent......................: 11 MB 44 kB/s
http_req_blocked...............: avg=2.35ms min=1.27µs med=4.83µs max=1.85s p(90)=4.9ms p(95)=7.77ms
http_req_connecting............: avg=762.21µs min=0s med=0s max=1.84s p(90)=340.97µs p(95)=392.74µs
✓ http_req_duration..............: avg=699.68ms min=4.73ms med=43.58ms max=5.8s p(90)=2.63s p(95)=3.23s
{ expected_response:true }...: avg=699.74ms min=4.73ms med=43.59ms max=5.8s p(90)=2.63s p(95)=3.23s
http_req_failed................: 0.00% ✓ 1 ✗ 13342
http_req_receiving.............: avg=927.2µs min=42.66µs med=167.63µs max=1.63s p(90)=1.68ms p(95)=2.29ms
http_req_sending...............: avg=32.23µs min=7.95µs med=28.46µs max=472.34µs p(90)=58.33µs p(95)=66.47µs
http_req_tls_handshaking.......: avg=1.56ms min=0s med=0s max=37.52ms p(90)=4.48ms p(95)=7.27ms
http_req_waiting...............: avg=698.72ms min=4.45ms med=43.41ms max=5.8s p(90)=2.63s p(95)=3.23s
http_reqs......................: 13343 53.54756/s
iteration_duration.............: avg=14.18s min=1.74µs med=12.87s max=22.24s p(90)=18.83s p(95)=19.99s
iterations.....................: 2225 8.929275/s
vus............................: 1 min=1 max=300
vus_max........................: 300 min=300 max=300

These numbers look a lot better! If you see anything that seems amiss let me know. We appreciate all of your help and expertise.

Sincerely,
Marty

you mentioned previously:
It would be nice to be under 8 seconds in the 95th percentile.
...
NGINX setup .... 95th percentile were around 7 seconds.
now, the http_req_duration has p(95)=3.23s

i see for 150 VU no queued request, but queuing starts with 300 VU. With 300 VU, the filter time has nearly doubled, and the runtime is 50% higher. So there is a resource bottleneck somewhere.

The avgwaittime seems stable now (enough connections now), but it is interestingly much higher than on the posted numbers from our system. ... but the avgsqltime goes quickly up from 150 to 300 VUs.

I would also recommend to set log_min_duration_statement to e.g. 1 sec and check the SQL log file for slow SQL calls to identify potentially DB bottlenecks.

Do you experience locks during the tests? Check in nsstats [1].

all the best
-gn

[1] admin/nsstats?@page=locks&reversesort=1&col=8

Thanks Gustaf,

I did set the log_min_duration_statement to 1 second and found some queries to optimize. I then ran our K6 load test and we are down to 1.01 seconds in the 95th percentile! :)

I very much appreciate your master class on tuning naviserver.

Now that we have optimized one instance. I did run the nginx with 3 naviservers test and found that we are geting the same times with nginx serving 3 naviservers with the following config on each of the 3 naviservers. However, I had not put in the clustering yet with that test.

# 48 CPU's   200 Gig RAM
# Nginx with 3 Naviservers
#   maxconnections 1000
#   maxthreads 20
#   minthreads 20 
#   connsperthread 10000
#   highwatermark 100
#   compressenable  off
#   rejectoverrun true 
#    image pool 6/6 
# ns_section ns/db/pool/pool1
	ns_param        connections        23  
#  DB on Same VM   

There were three main reason for us looking into nginx in the beginning of this.

  1. To get over the 'All available connections are used up' errors that we would see from time to time. This is now solved! thank you :)
  2. If one naviserver fails (core dumps with signal 11) and restarts, nginx just marks that server as down and skips it in the round robin and we see no downtime
  3. Nginx can be set up to allow non-schema change upgrades to happen in the middle of the working day without us restarting the server and therefore seeing no downtime for our users. See: https://www.nginx.com/faq/how-does-zero-downtime-configuration-testingreload-in-nginx-plus-work/. Our developers and users would like this because we could make a bug fix and push it out during the day without restarting naviserver which can take about 1 minute to restart

Let me explain number 2 a bit more, in case you have seen this error before. If you want me to open a new discussion thread I can if you like. We have for some years now been getting a signal 11 core dump from time to time. It has not been too big of a deal because naviserver would restart by it's self and we would only be down for 30 seconds or so back then. However, over the past year we have noticed it happens with more frequency (probably 3 or 4 times a week) but we have not been able to track it down. In the log file the core dump looks like the following:

...
[16/Sep/2021:08:59:39][1.7f36daffd700][-socks-] Fatal: received [16/Sep/2021:08:59:39][1.7f36e3fff700][-driver:nsssl_v4:0-] Notice: ... sockAccept accepted 2 connections
[16/Sep/2021:08:59:39][1.7f36e3fff700][-driver:nsssl_v4:0-] Notice: ... sockAccept accepted 2 connections
[16/Sep/2021:08:59:39][1.7f36daffd700][-socks-] Fatal: received fatal signal 11
ORA-24550: signal received: [si_signo=6] [si_errno=0] [si_code=-6] [si_int=0] [si_ptr=(nil)] [si_addr=0x1]
kpedbg_dmp_stack()+396 -kpeDbgCrash()+204 -kpeDbgSignalHandler()+113 -skgesig_sigactionHandler()+258 -__sighandler() -gsignal()+203 -abort()+299[16/Sep/2021:08:59:48][1.7efe2e4f6800][-main:conf-] Notice: nsmain: NaviServer/4.99.21 (tar-4.99.21) starting
...

As you can see it is an oracle error from what looks like the oracle driver. Our main DB is postgres (99%) but we also connect up to an oracle DB for certain queries.

Again I can open another discussion for this item if you like. Just wanted you to know the three reasons we were looking into nginx.

Thanks again, for your willingness to share your expertise and for helping us scale/tune our server!

Sincerely, Marty
and we are down to 1.01 seconds in the 95th percentile!
sounds good. hopefully this not only on this synthetic test so, but as well on the real application.

The messages "all connection are used" is a condition, which can be used as well with all other servers, but naviserver tells the user what is going on.

NaviServer crashes are seldom (on our production site very few times a year, i see the last one from jan 18, most times in explainable cases). In case, you see many of these (3 cores per week qualifies as many), you should compile NaviServer and Tcl with -g an keep the cores for determining the cause.

For the snipped you are showing, It would be easy for me to say "it the problem of the oracle driver", ... but probably it is not. Oracle just tell the system that there was an abort() happening. It would be certainly interesting, where this abort() is coming from (via core).

bug fix and push it out during the day without restarting naviserver which can take about 1 minute to restart

When using OpenACS, you can use the reload feature from the package manager to reload packages without a restart. Why you are not using this?

I am not saying that horizontal scaling is a bad idea, but - depending on your application - that might be a longer way, especially for cache coherency. One has in NaviServer per-request caches, per-thread caches, caches via ns_cache* and nsv, where ns_cache supports transaction semantics (if there is a complex transaction with API calls performed, where the API calls cache some content, but the transaction fails, the cached content needs to follow as well the first three characters of the ACID properties). Getting these correct for all packages is a longer way to go. The easiest approach would be to deactivate all caching in OpenACS, but this has some performance impact as well.

Thanks Gustaf,

I will look into compiling naviserver with the -g option to see if we can generate a core file that can generate a backtrace with gdb.

We had started a standard some time ago to always restart the server after an upgrade because of some oddities some developers were seeing. We believe it was when a developer would instantiate a proc in the shell (for debug purposes) that the issue would arise. After talking to Tony about it we think that you are right and we should just use the "Reload" option during day releases

Your point is well taken about caching. It would be interesting to disable the caching for testing purposes. Is there a parameter that does this or a compile option? Or would we need to disable it for each type of cache by modifying the code?

Thanks for your help, Marty
Just a question about compiling with debug info:

I can run the install-ns.sh script to build naviserver but I do not see that it has a -g option? I can see that "make" is run inside the script with --enable-symbols though. Does this mean it has the debug symbols by default?


if [ $with_mongo = "1" ] ; then
    echo "------------------------ WITH MONGO"

    ./configure --enable-threads --enable-symbols \
                --prefix=${ns_install_dir} --exec-prefix=${ns_install_dir} --with-tcl=${ns_install_dir}/lib \
                --with-nsf=../../ \
                --with-mongoc=/usr/local/include/libmongoc-1.0/,/usr/local/lib/ \
                --with-bson=/usr/local/include/libbson-1.0,/usr/local/lib/
else
    ./configure --enable-threads --enable-symbols \
                --prefix=${ns_install_dir} --exec-prefix=${ns_install_dir} --with-tcl=${ns_install_dir}/lib
fi

${make}
${make} install

Marty
When I look at the gcc compile lines it does look like there is a -g in there. So I assume it does put them in by default.
gcc -DPACKAGE_NAME=\"tdom\" -DPACKAGE_TARNAME=\"tdom\" -DPACKAGE_VERSION=\"0.9.1\" -DPACKAGE_STRING=\"tdom\ 0.9.1\" -DPACKAGE_BUGREPORT=\"\" -DPACKAGE_URL=\"\" -DHAVE_GETRANDOM=1 -DXML_DEV_URANDOM=1 -DXML_POOR_ENTROPY=1 -DBUILD_tdom=/\*\*/ -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_MEMMOVE=1 -DHAVE_BCOPY=1 -DXML_POOR_ENTROPY=1 -DUSE_THREAD_ALLOC=1 -D_REENTRANT=1 -D_THREAD_SAFE=1 -DTCL_THREADS=1 -DUSE_TCL_STUBS=1 -DUSE_TCLOO_STUBS=1 -DMODULE_SCOPE=extern\ __attribute__\(\(__visibility__\(\"hidden\"\)\)\) -DHAVE_HIDDEN=1 -DHAVE_CAST_TO_UNION=1 -D_LARGEFILE64_SOURCE=1 -DTCL_WIDE_INT_IS_LONG=1 -DUSE_TCL_STUBS=1 -DXML_DTD=1 -DXML_NS=1 -DTDOM_NO_UNKNOWN_CMD=1 -DUSE_NORMAL_ALLOCATOR=1  -I../expat -I../generic -I"/usr/local/ns/include" -I.    -g -O2 -pipe -O2 -fomit-frame-pointer -DNDEBUG -Wall -fPIC  -c `echo ../generic/domxpath.c` -o domxpath.o

The line you are showing is from compiling tdom. The install-ns script does not try to enforce same compiler flags for all packages, one has to care about this separately (already the tcl packages have different flags definitions).

Assuming have have a recent install-ns configured for naviserver 4.99.22 and tcl8.6.11, the following command make sure, tcl + naviserver is complied with "-g":

cd  /usr/local/src/naviserver-4.99.22
make clean all install "CFLAGS_OPTIMIZE=-O2 -g"

cd /usr/local/src/tcl8.6.11/unix
make clean all "CFLAGS_OPTIMIZE=-O2 -g"
(cd  /usr/local/src/tcl8.6.11/unix/pkgs/thread2.8.6; make clean all "CFLAGS_DEFAULT=-O2 -g -DNS_AOLSERVER=1 -I/usr/local/ns/include")
make install

Is there a parameter that does this or a compile option?

no, ... and given the different is would not be feasible (how should a call to nsv_* know by itself that this is for caching or not). For caching via ns_cache_* it would be relative simple. Depending on the application, deactivating caching is still the lower hanging fruit...

-g

I've added for you on bitbucket an experimental global parameter "nocache" for NaviServer, which deactivates on the C level all caching via ns_cache_* (but maintaining some statistics). Default is false; activate it in the section ns/parameters if desired.

By this you can estimate the performance impact, when caching via ns_cache is deactivated for your site when considering horizontal scaling. of course, the other forms for caching are not addressed by this.

Marty: In case you get a reasonable performance with the experimental "nocache" configuration file parameter, i could help you with more nocache-support in OpenACS.

Small update: the parameter for controlling the caching mode is now called "cachingmode", since this leaves more room for potential other future caching modes (e.g. some improved cluster mode or special caching nodes)

ns_section ns/parameters {
   ...
   ns_param        cachingmode none   ;# experimental caching mode
   ...
}

The acs-cache-procs are updated such that the per-thread-cache works like a per-request-cache when the "cachingmode" is configured to "none".