Forum OpenACS Development: Re: Image resizing, exec and server problems

Collapse
Posted by russ m on
Malte - what sort of load is your image resizing server under? Does this seem to happen after some certain number of exec calls, or is it effectively random?

I'm about to shift the hosting of a low volume but important client from Solaris to Linux, in an app that calls /usr/bin/zip to prepare large downloads. The new server is running nsd 4.0.10 so nsproxy isn't an option, and if this is going to start randomly not working I'm in a bit of trouble... :(

Collapse
Posted by Tom Jackson on
Russell,

I'm assuming your application is not image resizing? Since Malte never identified what the problem was, I would ignore it until you run into trouble.

Just keep your exec calls simple.

Collapse
Posted by russ m on
my understanding from what I've since read on the AOLserver list and the TCL core bug reports is that TCL's exec is (to some degree) unreliable under Linux's threading model - whether it's executing ImageMagick's convert to resize images, or zip to bundle up downloads would make no difference.
Collapse
Posted by Malte Sussdorff on
Russell got the problem pinned. The issue is in the TCL exec programm running unreliably on Linux. This is why I started the whole ns_proxy thing for a wrapper around exec. As for the load, it is a busy site, but with load balancing and the ns_proxy that worked out fine.
Collapse
Posted by russ m on
Malte - unfortunately, I can't go to ns_proxy yet (unless it's been backported to 4.0.10)... we're expecting a couple of hundred (perhaps) zip downloads a day at peak times, and I'd like to know how likely we are to be bitten by this before I get nsd upgraded... when you were seeing your nsd processes die after 4-6 hours up, would they have been calling [exec] a couple of times a second? a handful of times a minute? less often?
Collapse
Posted by Stefan Sobernig on
Russell,

Your requirements call for a scalable and robust solution. ns_proxy (AOLSERVER 4.5+) and, therefore, an upgrade is probably one solution, but still, i would consider the use of exec in your case as a smell of "bad design".

I am wondering whether you ever considered using tcl event loop? It is common sense in mono-threaded tcl environments to neglect exec entirely, and go for pipe indirection through open. To talk turkey in your case:

proc done args {...}
proc is_readable {fid} { ... }
set pipe [open "|zip -9 test.zip 454.pdf"]
fileevent $pipe readable [list is_readable $pipe]
vwait ::done

What is so smelly about exec? Well, I am not an insider, nor have I a complete picture of Tcl internals in the field of tension of *nix/win thread models, BUT I do know that TCLs exec is built upon fork() and forking from within threads (at least in *nix) is critical (due to requirements on callstack set-up, if I am not completely mistaken). Besides, exec is inherently blocking I/O. So, the fileevent/open solution above avoids both issues.

see http://wiki.tcl.tk/880

The only tricky thing (at first glance) is how to use async I/O within the multi-threading environment of AOLSERVER. To give you an example, you are not advised to use vwait from within a AOLServer connection thread. Without vwait, the connection thread (i.e. its interpreter) will finish script execution, close the connection and it will be recycled. Still, you can use it and there is even a design fit with another requirement of yours: "a couple of hundred (perhaps) zip downloads"

Well, I guess you will use the bgdelivery feature that comes with XOTcl enhanced OpenACS environments to free resources for new connections while processing downloads in the background?

see https://openacs.org/xowiki/Boost_your_application_performance_to_serve_large_files!

In that case, you can hijack the bgdelivery infrastructure to deal with the archiving/ tarball business. You can place zipping jobs in the standing/ persistent bgdelivery thread which are then, upon completion, directly delivered back.

The nice thing of this design is that you (a) avoid exec, (b) use async I/O in the back, and (c) it is deployable both under AOL 4.0.10 and 4.5.

Collapse
Posted by Malte Sussdorff on
Hi Stefan, I know this might ask for a lot but you seem to have your head around this pretty tightly. Could this trick be put into a procedure that uses the XOTcl bgdelivery feature so we could call something like "xo_exec" ? Or, alternatively, could you implement your idea on something non-core which uses exec (e.g. photo album) so it can be tested and maybe adopted?

Russel, to answer your question: I usually get to around 50 - 60 calls to exec, before I run into an out of memory problem calling the exec, which then results into a restart of the server every I would say 120 calls. But this is rough estimation and really depends on your memory.

Collapse
Posted by Stefan Sobernig on
Malte


you seem to have your head around this pretty tightly.

I'd wish, in the meantime I learnt that the pipe indirection through open also involves a fork(). But, still, async i/o + bgdelivery make sense and provide agility. The tricky thing is to avoid background zombie processes.

I will try to come up with a little solution using bgdelivery that can be tested.

Collapse
Posted by russ m on
hmmm... that number scared me, but I've just run a quick test with 20 parallel clients each making 100 requests, each of which required 20 [exec] calls to service (so 40,000 exec calls in ~10 min spread across 10 connection threads)... every single one completed successfully, all my threads are still there and accepting connections, and the nsd process hasn't grown by any significant amount... perhaps the TCL threaded fork issue has been fixed? (debian etch, kernel 2.6.22-4-amd64, tcl 8.4.12-1.1, aolserver4 4.0.10-7)
Collapse
Posted by Stefan Sobernig on
Russell!

As long as it works for you and you tested (with a minimum effort) that it scales (to whatever extent) than it is apparently fine (at least).

However, another critical thing is the kind of exec call itself (redirector, background, ...) and the code surrounding it. Malte never posted the critical exec line, at least to my knowledge.

In other words, if it works for you. Go ahead!

My option referred more to use a non-blocking variant and link it to the background delivery feature (a larger picture of design).

I agree while there are resources on comp.lang.tcl etc. that report on the threaded fork() issue, I cannot find a concise statement on the issue's state (fixed, open, unsolvable).

Collapse
Posted by russ m on
as I understand it the surrounding TCL code is irrelevant, as is whether the fork is the result of calling [exec], [open "|whatever"] or anything else... the problem is TCL doesn't/didn't implement pthread_atfork, so threads might lock up any time a new process is fork()/exec()'d by TCL... I understand what you're talking about with running the event loop to avoid blocking, but we don't have any problem with long-running children blocking IO for other threads so it's not related to the problem I thought I might be facing... looks interesting though...
Collapse
Posted by Gustaf Neumann on
russell, have you considered using tclmagic?
it allows image resizing, there are no forks involved, since it can be loaded into aolserver

Short intro: http://wiki.tcl.tk/9775