Forum OpenACS Q&A: Re: NaviServer stops serving connections when slow external api calls pile up.

Since you did not restart the server, it might be the case that the server "stops serving connections", but that
a) the requests were served by the "default" connection pool, and
b) the slow requests caused a backlog (where queued), but it took a while until these were served again.

Here are suggestions how to proceed:

  • The first measure to improve the situation is to remove the potential queuing in the default connection pool. The simplest thing is to create a connection pool "slow". When using a recent version of the request monitor (not more than 2 years old), and a "slow" pool exists, these requests are automatically moved to this connection pool. This has the advantage, that other requests are still served as usually, even when there is queuing in the "slow" pool. See for an example of t "slow" pool in [1], but don't forget to remove the comment character in the line defining this pool.

  • As a second measure, I would recommend installing the NaviServer munin plugins, since these will provide among other things monitoring of the queuing times per pool.

  • Third measure: get the tip version from NaviServer from Bitbucket. I have today fixed a problem of clearing the timeout status codes (which must have been in the code since quite a long time). I do not see the direct connection to reported symptoms, but I cannot exclude, that it might be related.

  • Fourth measure: when compiling NaviServer, compile it with debugging enabled "using the -g flag" and make yourself familiar how to attach GDB at runtime to nsd and how to dump the stacks of all threads in the running server [3]. In case, you see this symptom again, dump all stacks ("thread apply all bt") and mail to output to me.

How often have you experienced this problem?
Depending on your urge, I would recommend to start with the first two measures. These will also help you get more insights about the runtime behavior and potential bottlenecks on potential load peaks over the day.

Let me know what your new insights are.
all the best
-g

[1] https://bitbucket.org/naviserver/naviserver/src/f3b5ad7dcb956d0ee9251e914e20e3d585f1c7b4/openacs-config.tcl#lines-526
[2] https://github.com/gustafn/munin-plugins-ns
[3] https://stackoverflow.com/questions/18391808/how-do-i-get-the-backtrace-for-all-the-threads-in-gdb