Forum OpenACS Q&A: Re: OpenACS clustering setup and how it relates to xotcl-core.
It was our understanding that the new xotcl may take care of some of the long runners automatically without us defining a specific thread pool and telling xotcl about it. Is this not a good assumption
Well, newer version of the xotcl-request monitor can automatize some steps, but it needs configuration. The xotcl-request monitor can map requests to a "slow" pool when it is defined by the site-admin. The site-admin has also to configure (e.g. min/max threads, rejectoverruns, queue size, ...) this connection pool. From the snippet of the configuration files, it seem, that you have no "slow" pool defined, so no automatic mapping will happen.
From the output i see that the default pool has minthreads 2 and maxthreads 10. I would recommend to define 25/25 on your testing instance based on:
live 24 cores and 80 GB of RAM, staging 16 cores and 200 GBFor a rough estimate, one can run with 24 cores 96 connection threads (assuming typical OpenACS loads) and with 16 about 64.
When looking at the 140 VUs output i see
Request Handling: requests 1.7K, queued 1K (61.48%), spooled 1K (62.24%)
The number of queued requests is way to high. On our production system (typically between 600 and 3000 concurrent users) we have typically less the 0.01% queued requests in the default pool.
Request Timing: avg queue time 660.56ms, avg filter time 5.44ms, avg run time 218.25ms avg trace time 1.31ms
You see here that the avg queue time (660ms, time waiting to be served) is much longer the than the processing time (avg run time). On our production system (currently 650 users) we have now
avg queue time 465.04µs, avg filter time 4.18ms, avg run time 104.08ms avg trace time 4.27ms
the output
[13/Sep/2021:16:00:16][1.7f1a15c5a700][-driver:nsssl_v4:0-] Notice: [east_intrastage pool ] All available connections are used, waiting 90 idle 0 current 10tells that the queue is filled up (100 entries). The queue size is defined per pool via "maxconnections", we have on our live system this value set to 600. One can fine-tune the behavior of queue overruns via "rejectoverrun". A larger queue makes the system more friendly when there are sometime large request bursts.
While the values above are average values, the values below (long-calls) are from individual requests:
14.371059 0.003934 1.550067 15.925060 2021 Sep 13 16:00:30
The columns are: Queuetime Filtertime Runtime Totaltime.
The requests above were waiting 14secs in the queue before being served, whereas the actual run-time was about 1.5 secs. When you get rid of the queuing time, the performance will improve by a factor of 10. Increasing the threads as indicated above should help significantly.
For this test, setting up the "slow" pool will not improve the benchmark times, since the pools make sure that "slow" requests do not kill the "fast" requests on your server. This is important to ensure liveliness of the server, when the majority of requests is "fast", and only a few are "slow", and you do not want that someone firing e.g. 100 "slow" requests brings you server to its knees.
By increasing the number of maxthreads form 10 (default) to e.g. 50 should increase the throughput by a factor of 5.
Hope, this helps
-gn
We appreciate your expertise in helping us tune our system. It has helped a lot!
I was successful in setting up the "slow" pool. I was wondering if there is a setting to only map a request to the slow pool if it is only over 30 or 40 seconds. I found that it started to put our effort reporting page on the slow pool when the system was a little stressed. So I removed the slow pool for the following tests for now.
I have been playing around with the configuration and we have decided to go up to 48 cores on our staging system with the 200 Gig of RAM.
Here is what I found to be somewhat optimal for the 48 cores.
We are not running out of Available connections anymore and the queued request are a lot lower as well as the avg queue times. The K6 tests show that all http request are status 200 as well.
I was wondering if you see any problems with the configuration and if you had suggestions for changes.
Again, thanks for your help
Martyns_section ns/server/${server} ns_param maxconnections 2000 ;# 100; number of allocated connection structures ns_param maxthreads 200 ;# 10; maximal number of connection threads ns_param minthreads 200 ;# 1; minimal number of connection threads ns_param connsperthread 10000 ;# 10000; number of connections (requests) handled per thread ;# Setting connsperthread to > 0 will cause the thread to ;# graciously exit, after processing that many ;# requests, thus initiating kind-of Tcl-level ;# garbage collection. # ns_param threadtimeout 120 ;# 120; timeout for idle threads. ;# In case, minthreads < maxthreads, threads ;# are shutdown after this idle time until ;# minthreads are reached. # ns_param lowwatermark 10 ;# 10; create additional threads above this queue-full percentage ns_param highwatermark 100 ;# 80; allow concurrent creates above this queue-is percentage ;# 100 means to disable concurrent creates ns_param rejectoverrun true ;# If set to true; NaviServer sends on queue overruns an error "503 service unavailable". ;# The request is not added to the waiting list. # Compress response character data: ns_return, ADP etc. # ns_param compressenable off ;# false, use "ns_conn compress" to override ns_section ns/server/${server}/pools ns_param monitor "Monitoring actions to check the health of the system" ns_param image "Image file requests" #ns_param slow "Slow requests" ns_param fast "Fast requests" ns_section ns/server/${server}/pool/monitor ns_param minthreads 3 ns_param maxthreads 5 ns_param map "GET /admin/nsstats" ns_param map "GET /SYSTEM" ns_param map "GET /ds" ns_param map "POST /ds" ns_section ns/server/${server}/pool/image ns_param minthreads 3 ns_param maxthreads 5 ns_param map "GET /*.png" ns_param map "GET /*.jpg" ns_param map "GET /*.PNG" ns_param map "GET /*.JPG" ns_param map "GET /*.JPEG" ns_param map "GET /*.jpeg" ns_param map "GET /*.mp4" ns_param map "GET /*.MP4" ns_param map "GET /*.pdf" ns_param map "GET /*.PDF" ns_param map "GET /*.gif" ns_param map "GET /*.GIF" #ns_section ns/server/${server}/pool/slow #ns_param minthreads 15 #ns_param maxthreads 15 ns_section ns/server/${server}/pool/fast ns_param minthreads 6 ns_param maxthreads 6
# Test w 48 CPU's 200 Gig RAM # maxconnections 2000 # maxthreads 200 # minthreads 200 # connsperthread 10000 # highwatermark 100 # compressenable off # rejectoverrun true 150 VUs Connection Threads: min 200 max 200 current 200 idle 151 stopping 0 waiting 0 Request Handling: requests 1.8K, queued 0 (0.00%), spooled 1K (63.55%) Request Timing: avg queue time 161.33µs, avg filter time 78.53ms, avg run time 212.77ms avg trace time 972.63µs 300 VUs Connection Threads: min 200 max 200 current 200 idle 157 stopping 0 waiting 0 Request Handling: requests 5.4K, queued 508 (9.46%), spooled 4K (66.81%) Request Timing: avg queue time 295.87ms, avg filter time 1.74s, avg run time 251.96ms avg trace time 1.1ms 150 VUs Connection Threads: min 200 max 200 current 200 idle 189 stopping 0 waiting 0 Request Handling: requests 7.1K, queued 627 (8.80%), spooled 5K (65.20%) Request Timing: avg queue time 260.7ms, avg filter time 2s, avg run time 255.49ms avg trace time 1.13ms 70 VUs Connection Threads: min 200 max 200 current 200 idle 141 stopping 0 waiting 0 Request Handling: requests 7.9K, queued 627 (7.98%), spooled 5K (65.70%) Request Timing: avg queue time 236.39ms, avg filter time 1.94s, avg run time 264.67ms avg trace time 1.12ms 10 VUs Connection Threads: min 200 max 200 current 200 idle 199 stopping 0 waiting 0 Request Handling: requests 8.4K, queued 627 (7.48%), spooled 6K (66.52%) Request Timing: avg queue time 221.7ms, avg filter time 1.85s, avg run time 275.66ms avg trace time 1.13ms K6 Summary running (4m09.3s), 000/300 VUs, 1359 complete and 38 interrupted iterations default ✓ [======================================] 000/300 VUs 4m0s ✓ status is 200 ✗ page succeeded ↳ 99% — ✓ 5542 / ✗ 8 █ setup █ teardown checks.........................: 99.92% ✓ 11092 ✗ 8 data_received..................: 1.3 GB 5.2 MB/s data_sent......................: 7.0 MB 28 kB/s http_req_blocked...............: avg=1.46ms min=1.27µs med=6.61µs max=51.42ms p(90)=4.1ms p(95)=4.87ms http_req_connecting............: avg=147.33µs min=0s med=0s max=48.09ms p(90)=376.22µs p(95)=442.16µs ✗ http_req_duration..............: avg=2.32s min=4.86ms med=224.41ms max=24.15s p(90)=8.5s p(95)=13.05s { expected_response:true }...: avg=2.32s min=4.86ms med=224.41ms max=24.15s p(90)=8.5s p(95)=13.05s http_req_failed................: 0.00% ✓ 0 ✗ 8355 http_req_receiving.............: avg=689.98µs min=39.82µs med=179.82µs max=70.57ms p(90)=1.71ms p(95)=1.89ms http_req_sending...............: avg=39.42µs min=8.56µs med=35.11µs max=15.19ms p(90)=66.52µs p(95)=76.12µs http_req_tls_handshaking.......: avg=1.28ms min=0s med=0s max=42.24ms p(90)=3.62ms p(95)=4.51ms http_req_waiting...............: avg=2.32s min=4.66ms med=223.99ms max=24.15s p(90)=8.5s p(95)=13.05s http_reqs......................: 8355 33.512465/s iteration_duration.............: avg=23.56s min=1.64µs med=22.73s max=51.14s p(90)=36.62s p(95)=38.12s iterations.....................: 1359 5.45104/s vus............................: 1 min=1 max=300 vus_max........................: 300 min=300 max=300