Forum OpenACS Development: Re: NaviServer "breaks" under high load

Collapse
Posted by Frank Bergmann on
Hi Gustaf,

The server keeps on "hanging".

I've found that my customers has added "DB-queries in a loop" in the ]po[ page for timesheet logging. So this page now takes 3000ms instead of 90ms. So that explains a bit of the pain.

I it would be kind of OK if the server would be slow. However, this does not explain "hanging" to me.

=> Is it possible that the system hangs because of "recursive" queries/sub-queries?

nsstats

I've installed nsstats 1.7 on my server, but I don't see the "db-pools" option. And the "Process" option is giving me an error: "can't read stats(tracetime): no such element in array". Any idea how I can fix this? We're running:

Tag: tar-4.99.8
Built: Nov 19 2015 at 22:18:45
Tcl version: 8.5
Platform: linux

hopelessly overloaded

I have just noticed the server "hanging" without any activity, with one user on the system apart from myself. So "overload" is not the right term... 😊

Cheers
Frank

Collapse
Posted by Gustaf Neumann on
The urgency does not seem to be very high, since your reply was about 2 months after my prior response.

When you have such an old server (4.99.8 is 3.5 years old) it is likely that you have just a single connection thread pool. With queries in the range of 3 seconds and a single connection thread pool configured, it is easy to block this pool completely, leading to a queuing situation (which is in the user perception a "hang"). The feature of dynamic connection thread pool mapping was introduced with NaviServer 4.99.15 early last year (see e.g. [1]). With this one can map slow requests dynamically to an own pool, where such requests might pile up, but they don't block other traffic.

My recommendation is to update NaviServer to a recent version and use dynamic thread pool mapping. Concerning exceptions from nsstats: the NaviServer modules are released in concert with matching versions in the *modules* directories (see e.g. [2]). there is some tolerance in backward compatibility, but as it looks not so far.

In case, the mapping is not sufficient, and there are more configuration issues, the newer, more detailed statistics can bring more insights.

[1] https://www.mail-archive.com/naviserver-devel@lists.sourceforge.net/msg03446.html
[2] https://sourceforge.net/projects/naviserver/files/naviserver/4.99.16/

Collapse
Posted by Frank Bergmann on
Hi Gustaf,

urgency

My last set of changes seemed to have worked, but the problems re-surfaced.

nsstats.tcl

Do I have any other option in order to get statistics on the DB-pools?

Yesterday I have increased DB-connections:

In the config.tcl (8 physical cores):

maxconnections 200
maxthreads 32
minthreads 32
connsperthread 10000
highwatermark 100

Pool1:
connections 200

Pool2:
connections 20

Pool3:
connections 20

So again, the server seems to be running fine today.

Thanks for you help!
Frank

Collapse
Posted by Gustaf Neumann on
some feedback to suggestions is anticipated.

other option in order to get statistics on the DB-pools?

according to the NEWS file of NaviServer "ns_db stats" was introduced in NaviServer 4.99.9 (jan 2016)