Forum OpenACS Development: Pipe-open considered harmful

Collapse
Posted by Gustaf Neumann on

This is a warning for OpenACS programmers to avoid Tcl's open with command pipelines when possible. When moving from bare metal to virtualized environments, we experienced situations where the server was suddenly freezing for several seconds. This might on some applications not a big issue, but when the server receives 1000+ requests per seconds, there will be substantial queuing happening in such situations.

The problem turned out to be a common Tcl idiom of opening a command pipeline with the Tcl "open" command. The following demo page uses a pipe open with a simple external program ("cat").

 set t [clock milliseconds]
 set f [open "|cat" w]; puts $f "hi"close $f
 ns_return 200 text/plain "pipe open took [expr {[clock milliseconds] - $t}]ms"

This page takes under on bare metal servers a few milliseconds, but when the memory footprint of NaviServer is getting large (e.g., 60GB, when running 100+ threads with a huge blueprint and large caches), and it is running in a virtualized environment, we experienced that the same page was talking 6s or more. A quick test showed that the fork in the virtualized environment is twice as slow (but your milage my vary due to different virtualization environments etc.).

The problem is that Tcl performs a fork() operation for spawning the pipe, and during the fork, everything in this process stops (meaning as well every thread of the process, every write operation (logs, network), etc. You will also see in this situation long mutex lock durations, since the unlock will happen after the fork() has finished.

While the fork() operation of exec is greatly avoided via nsproxy (see also [2]), the fork() inside the Tcl open is not covered. So, to avoid the problem, avoid constructs like

 set f [open "|$cmd ..." w]; ....; close $f

and simply write instead to a temporary file, pass it to the command, "exec" it and delete the temporary file later.

Hope, this helps somebody.

-g

[1] https://www.tcl.tk/man/tcl8.6/TclCmd/open.html
[2] https://openacs.org/xowiki/out-of-memory-exec

Collapse
Posted by Brian Fenton on
Thanks Gustaf, very good to know.

Brian

Collapse
Posted by Steffen Tiedemann Christensen on
Thanks Gustaf -- it's a good reminder.

I wanted to add a related note that touches on the same subject, without being exactly identical: We've seen significant slow downs on systems when using Tcl to pipe data to a network-connected file systems (i.e. NFS or EFS). Along the lines of:

set fd [open /nfs/file w]
http::geturl $url -channel $fd ...

I'm assuming this is down to buffering settings on the file handle and the configuration of the network filesystem, but generally it is better to stream to block storage instead.

Collapse
Posted by Gustaf Neumann on

Dear Steffen,

AFIKS, Tcl's geturl does not make a pipe open, so this is something different. By looking at the implementation of http::geturl i would assume that it is not the case, the the full server will all threads stops, but that a single request takes a long time. Tcl's http::geturl should be avoided inside connection threads, since it uses it's own event management, and it will hard crash Tcl (segmentation violation) when more than 1024 file descriptors are open (which can easily happen on busy servers).

One should use instead the builtin ns_http

 set fd [open /nfs/file w]
 set r [ns_http run  $url -outputchan $fd ...]

instead. Running HTTP requests from external sources in connection threads is dangerous (potentially vulnerable to slow-read or slow write attacks), unless the runtime of the request is limited. Blocking connection requests can further lead to run out of connections. To address this, ns_http has the -donecallback which allows the HTTP requests to run in a background thread.

If you really want to write to a slow NSF drive, and this takes long/has a large performance variance, then you should consider doing this asynchronously or in some the background job.

all the best -g