Forum OpenACS Development: Calls to server itself are ok on command line, but hang in /ds/shell
I could use a tip for this situation: I am trying to create automated test for basic page functioning. For this reason I am creating tests where the server calls itself on various pages and throws error on 500 responses.
On a specific instance, running multiple servers behind a nginx proxy, I am encountering difficulties. I have retrieved the actual ip and port of my instance, then I have tried to curl from the command line like this:
and everithing is working as expected. Command was run on the server itself.
When I run the same command from the /ds/shell though, by
exec curl http://<host>
request hangs. Even more strange, the server hangs too until request fails... This is a problem with any other form of http call, as I have tried also ad_httpget and util::http::get in native mode.
I suspect this has to do with the environment nsd is running. I have logged as the server user and executed the command in bash and it worked, so it must me something less trivial. Nginx could have a role too, but I can't figure it out, because curl is ok from command line.
Thanks in advance for any help on this.
I've just tried to following in ds/shell on openacs.org and everything works as expected:
exec -ignorestderr curl https://openacs.org/The flag "-ignorestderr" is used, since the curl likes to write to stderr.... So, on regular sites, this works fine.
i am wondering, why you have troubles. Are you using nsproxy? What Tcl version are you using?
I have succeded on other instances with simpler configuration. This instance is particular because the server hosts different websites on different hostnames. The request is forwarded to the actual nsd instance by a nginx proxy.
I have used the actual address and port to try calling the server (e.g. 192.168.10.10, the address in the internal subnet). I am puzzled because curl is working from bash using the very same user as nsd.
Specifying -ignorestderr doesn't change the situation for me.
Tcl version is 8.5. I am using ns_proxy on Naviserver 4.99.5
This must be something specific to nsd/nginx in that particular environment, because curl on the very same server works as expected, using exactly the same command in bash as in /ds/shell, except for the "exec" part of course.
I have had success in the past with using ns_httpget. Your issue smells like a DNS issue, but I have no evidence for saying that.
I wonder what happens when you try adding a timeout to the call?
What I would do is to debug it further by taking the source code to the ns_httpget proc (which is in modules/tcl/http.tcl on my AOLserver install), and using that in the shell, and add a bunch of ns_logs, and then keep stripping it back until you have the bare minimum code that runs. At its core, it's just some socket connections.
There is a case where nsd will not actually spawn a new thread for a new connection even if there is no thread available, and this can lead to a blocking http call to the server itself hanging everything until it times out.
In naviserver, you can fix this by setting
ns_param lowwatermark 0 ;# eager thread creation at low concurrency
I don't recall the corresponding setting for AOLserver without checking, but the thread creation behavior is a bit different so it might not have the same issue.
# Scaling and Tuning Options
# ns_param maxconnections 100 ;# 100; number of allocated connection stuctures
# ns_param maxthreads 10 ;# 10; maximal number of connection threads
# ns_param minthreads 1 ;# 1; minimal number of connection threads
ns_param connsperthread 1000 ;# 10000; number of connections (requests) handled per thread
# ns_param threadtimeout 120 ;# 120; timeout for idle theads
# ns_param lowwatermark 10 ;# 10; create additional threads above this queue-full percentage
ns_param highwatermark 100 ;# 80; allow concurrent creates above this queue-is percentage
;# 100 means to disable concurrent creates
It is not subjected to heavy load, because we use it for development. I have checked on other instances where I don't have the problem and saw that configurations are the same about threads.
In your test, you have one connection currently running (the ds request), and a second waiting (the loopback request), so only 2 connections is not busy enough to warrant spawning a new thread, and the server doesn't (AFAIR) have any logic to specify that a thread should be created if connections have simply been queued for a certain amount of time without progressing. (The benefit of such logic would be dubious IMO; connections are generally expected to complete quickly unless there is a specific reason they should be acting differently).
So changing the lowwatermark setting to 0 will make the server always create a new thread on a new request if there's not already a thread for it, so the loopback request will be handled.
Alternately, you could set minthreads to something more than 1, or while waiting for the loopback request to time out hit the server 10 or more times using curl to queue up connections and get it to create a new thread, and then things will progress (until a restart, at least).
Thanks Jeff and Gustaf!
- newer versions of NaviServer and OpenACS show a more modest growth of memory per additional connection thread. E.g. OpenACS.org uses minthread 5, and uses after 4 weeks of continuous running < 2GB RSS
- NaviServer checks at the end of each request if there are requests queued and/or sufficient threads available. During a request is running in a connection thread, incoming requests are either handle by other connection requests or they are queued. This is perfectly fine on real-world systems with many small requests, but bad for your kind of application, where the only connection thread deadlocks, since it waits for the completion of a queued request.
- Allowing concurrent connection thread creates won't help, since this means that thread creates are allowed at a time while threads are being created (many tcl version had/have problems with this)