Forum OpenACS Development: Calls to server itself are ok on command line, but hang in /ds/shell

Hello everyone,

I could use a tip for this situation: I am trying to create automated test for basic page functioning. For this reason I am creating tests where the server calls itself on various pages and throws error on 500 responses.

On a specific instance, running multiple servers behind a nginx proxy, I am encountering difficulties. I have retrieved the actual ip and port of my instance, then I have tried to curl from the command line like this:

curl http://

and everithing is working as expected. Command was run on the server itself.

When I run the same command from the /ds/shell though, by

exec curl http://

request hangs. Even more strange, the server hangs too until request fails... This is a problem with any other form of http call, as I have tried also ad_httpget and util::http::get in native mode.

I suspect this has to do with the environment nsd is running. I have logged as the server user and executed the command in bash and it worked, so it must me something less trivial. Nginx could have a role too, but I can't figure it out, because curl is ok from command line.

Thanks in advance for any help on this.

Urls in post were erased, after empty http, imagine an actual ip
Hi Antonio,

I've just tried to following in ds/shell on openacs.org and everything works as expected:

    exec -ignorestderr curl https://openacs.org/
The flag "-ignorestderr" is used, since the curl likes to write to stderr.... So, on regular sites, this works fine.
i am wondering, why you have troubles. Are you using nsproxy? What Tcl version are you using?

-g

Hi Gustaf,

I have succeded on other instances with simpler configuration. This instance is particular because the server hosts different websites on different hostnames. The request is forwarded to the actual nsd instance by a nginx proxy.

I have used the actual address and port to try calling the server (e.g. 192.168.10.10, the address in the internal subnet). I am puzzled because curl is working from bash using the very same user as nsd.

Specifying -ignorestderr doesn't change the situation for me.

Tcl version is 8.5. I am using ns_proxy on Naviserver 4.99.5

Thanks

turn on verbosity and tracing in the curl call, and redirect the stdout+stderr to some debug file. curl should tell you, what's going on. as brian said, dns is an option, but as well a firewall or configuration issues like small-dimensioned special purpose connection pools.
Thanks, I will try your suggestions.

This must be something specific to nsd/nginx in that particular environment, because curl on the very same server works as expected, using exactly the same command in bash as in /ds/shell, except for the "exec" part of course.

Hi Antonio

I have had success in the past with using ns_httpget. Your issue smells like a DNS issue, but I have no evidence for saying that.
I wonder what happens when you try adding a timeout to the call?
What I would do is to debug it further by taking the source code to the ns_httpget proc (which is in modules/tcl/http.tcl on my AOLserver install), and using that in the shell, and add a bunch of ns_logs, and then keep stripping it back until you have the bare minimum code that runs. At its core, it's just some socket connections.

good luck!
Brian

What are your thread settings?

There is a case where nsd will not actually spawn a new thread for a new connection even if there is no thread available, and this can lead to a blocking http call to the server itself hanging everything until it times out.

In naviserver, you can fix this by setting


ns_section "ns/server/${servername}"
ns_param lowwatermark 0 ;# eager thread creation at low concurrency

I don't recall the corresponding setting for AOLserver without checking, but the thread creation behavior is a bit different so it might not have the same issue.

The symptoms you described look just like mine, but still can't figure out my cause. Here is the relevant part of my conf on the infamous instance

# Scaling and Tuning Options
#
# ns_param maxconnections 100 ;# 100; number of allocated connection stuctures
# ns_param maxthreads 10 ;# 10; maximal number of connection threads
# ns_param minthreads 1 ;# 1; minimal number of connection threads
ns_param connsperthread 1000 ;# 10000; number of connections (requests) handled per thread
# ns_param threadtimeout 120 ;# 120; timeout for idle theads
# ns_param lowwatermark 10 ;# 10; create additional threads above this queue-full percentage
ns_param highwatermark 100 ;# 80; allow concurrent creates above this queue-is percentage
;# 100 means to disable concurrent creates

It is not subjected to heavy load, because we use it for development. I have checked on other instances where I don't have the problem and saw that configurations are the same about threads.

Right, this is what I expected. Since minthreads=1 (generally quite reasonable on a development server) a new thread won't spawn until needed. The watermarking logic defaults to not "needing" a new thread until the queue is 10% full, i.e., there are 10 connections waiting to be services.

In your test, you have one connection currently running (the ds request), and a second waiting (the loopback request), so only 2 connections is not busy enough to warrant spawning a new thread, and the server doesn't (AFAIR) have any logic to specify that a thread should be created if connections have simply been queued for a certain amount of time without progressing. (The benefit of such logic would be dubious IMO; connections are generally expected to complete quickly unless there is a specific reason they should be acting differently).

So changing the lowwatermark setting to 0 will make the server always create a new thread on a new request if there's not already a thread for it, so the loopback request will be handled.

Alternately, you could set minthreads to something more than 1, or while waiting for the loopback request to time out hit the server 10 or more times using curl to queue up connections and get it to create a new thread, and then things will progress (until a restart, at least).

Changing minthreads value solved the issue!

Thanks Jeff and Gustaf!

small addon:
- newer versions of NaviServer and OpenACS show a more modest growth of memory per additional connection thread. E.g. OpenACS.org uses minthread 5, and uses after 4 weeks of continuous running < 2GB RSS
- NaviServer checks at the end of each request if there are requests queued and/or sufficient threads available. During a request is running in a connection thread, incoming requests are either handle by other connection requests or they are queued. This is perfectly fine on real-world systems with many small requests, but bad for your kind of application, where the only connection thread deadlocks, since it waits for the completion of a queued request.
- Allowing concurrent connection thread creates won't help, since this means that thread creates are allowed at a time while threads are being created (many tcl version had/have problems with this)
I had the same problem with the proxy server. The cause turned out to be banal, one of my colleagues just reconfigure the proxy and pointed slightly the wrong settings. After the reset, all was well.