Forum OpenACS Development: Re: Developer Support Database Statistics

Collapse
Posted by Gustaf Neumann on
Hi Dave,

the problem are not complex pages, but the overall load form *all* queries that add tracing information to the shared nsv variables that the developer support uses. One will see such messages only from busy sites, and not when running complex queries on non-busy sites.

I've added the "wait duration" output a few years ago to naviserver to raise the awareness of developers, what's going on. It is actually a very nice feature compared to other systems, which are silently coming to a crawl in such situations. You should consider using nsstats [1] and the munin plugins [2], where nsstats provides among other things a mapping from nsv-names to mutexes, and the munin plugins are nice to show, what is for a site a usual/unusual traffic of mutex locks, busy mutex calls, mutex waits, etc.

The busy mutex locks on nsvs will not create deadlocks. However, it is not recommend to let the developer support (with db query tracking) running on busy site for longer time, since it will slow down the system. On the longer range, we could consider to add optionally a per-session mode to the developer support, such that it tracks only the queries of one session, and not the queries of the overall system. That would improve the lock congestion on the nsvs considerably, while covering some of the important use-cases.

all the best
-g

[1] https://bitbucket.org/naviserver/nsstats
[2] https://github.com/gustafn/munin-plugins-ns

Collapse
Posted by Dave Bauer on
Thanks.

I guess I explained it incorrectly. These aren't complex queries but its a script that processes thousands of records with short queries, so its just the total number of queries running. Thanks for the explanation. To debug or learn about the query statistics of such a script, a test mode which processes just a few rows makes more sense.

I will check out nsstats, that is helpful.

Collapse
Posted by Gustaf Neumann on
Hi Dave,

in terms of the locks, it makes no difference, whether a complex query or some maintenance script is run in a connection thread. The point is, when only one connection thread is active, then there are no locks from the developer support. If multiple connection threads are writing to the variables of the developer support, then locks must occur. Therefore, the locks you experienced depend on the current load caused by other users. If we would confine the developer-support to one connection thread, this would hurt busy sites much less than the current approach, where ds affects all threads.

I've now added the mutex statistics as well to the munin statistics of openacs.org [1]. You can find there
- waiting time for locks (the biggest dent is a waiting time of 1.4ms during log-rolling)
- busy locks (currently 0.5 busy locks per second on average), and
- number of locks (currently 500 locks per second, peaks around 6.000 locks per second)
OpenAcs.org is quite an idle site, one other sites we see easily 10x these values (experiencing good performance). The values of munin are moving averages in 5 minute intervals. One can configure additional line for more mutexes in the munin scripts, when e.g. the lock statistics of nsstats indicate this. The munin graphics are useful to increase the understanding of the factors influencing the perceived performance, and to detect unusal behavior requiring attention.

-g
[1] https://openacs.org/munin/localdomain/localhost.localdomain/index.html#naviserver