Forum OpenACS Development: Image resizing, exec and server problems

I have the situation that we run a site where we have a lot of photos and images that need to be resized all the time. To do this we have a dedicated AOLserver which is only serving requests that resizes images (meaning it handles /shared/portrait-bits and /photo which is the photo album instance).

This server crashes every 4-6 hours. Crash here means it has run out of threads / connections so a call to dbtest.tcl fails.

It is not much of a big deal for me, luckily we run load balancing, but I was wondering if other people have found this behavior with regards to running exec "irritating" and have a solution. People have been talking about using ns_proxy for calling exec and I think that is a good idea, but can someone provide a "step by step" guide, so we can create an ns_proxy pool by default for OpenACS 5.4 and use that proxy pool to run all the exec commands through?

Note, it is not only imagemagik, but calls to "top" as done through the request-processor or calls to "find" as done in ]po[ that show the same symptom.

On a related note, I submitted "pools-procs.tcl" which allows you to actually work with the pools in AOLserver 4.5 (as you might have noticed maxconnections and minthreads and so on has no effect if you do not set it up in a pool separately). Additionally I enhanced the monitoring package to give me stats from _ns_stats.* so I could see a little bit more details about the memory usage of the threads and how many threads are running. Now I "only" need a nice way of saving that information before I restart the "crashed" server.

Collapse
Posted by Dave Bauer on
I am pretty sure ns_proxy is the recommended solution for this type of problem.
Collapse
Posted by Tom Jackson on
From the AOLserver list:
Periodically some of our AOLserver installations get into a mode where
all calls to "exec" just hang, not really taking up any processor time
but eating up a thread. Doesn't seem to be any memory problems
coinciding, which I had originally suspected being a limiting factor on
the forks. Besides, we usually get an error message when there’s not
enough memory to fork a child process. Haven't yet figured out anything
in particular that causes this to start happening, but once it starts
happening, it keeps going eating up more and more threads until we
restart the nsd. Any ideas what the cause might be and what I can do to
investigate further?

We run AOLserver 4.0.10 on RHELAS 3, and CentOS 4, w/ Tcl 8.4.11.

This is a known issue/conflict with pthreads and fork. Tcl does not yet
have the pthread_at_fork functionality in place because that requires a
careful marshaling of mutexes over the fork. You will find several core
bug reports on this in the Tcl SF db area. It is possible to solve, its
just that noone has invested the time yet.

And nsproxy was the suggested fix.

Collapse
Posted by Malte Sussdorff on
Which would explain why on Solaris this problem does not seem to exist (at least to Patrick G. who mentioned this to me). Thanks, will look into how to setup ns_proxy then. Any pointers are welcome 😊
Collapse
Posted by Tom Jackson on
I'm not sure how much help is available for ns_proxy, but this should be fixed. Obviously you need AOLserver 4.5, and there is a new pools.tcl file in CVS which can be used to adjust the base number of threadpools.

I just posted the htmlized man page for nsproxy:

http://junom.com/document/aolserver/nsproxy/ns_proxy.html

Collapse
Posted by Malte Sussdorff on
I stuffed the pools.tcl into acs-tcl/packages/tcl/pools-init.tcl as I am fairly certain most people will forget that when they download AOLserver and most wont download the latest CVS checkout anyway, but the release.

Alternatively I can use the one provided by AOLserver now and put it into my install directions, which is probably preferred. Tom, if you look at http://cognovis.de/developer/en/aolserver_install would it be included by default or would I need to get it separately (which is what I assume)? If yes, where from and where to put it to? Additionally does it need special treatment in the config.tcl file?

Collapse
Posted by Tom Jackson on
I don't think the ns_pools has anything to do with ns_proxy. I haven't used ns_proxy, but just provided a link to the manpage.

I have some scripts for image resizing if there is an interest.

Collapse
Posted by Malte Sussdorff on
Just for clarification. The pools-init.tcl has nothing to do with the resizing issue. It was just a reply to Tom's comment and badly connected as such. Just in case people get confused.

Tom, have you had a look at the image:: procedures written by Dave? I think a comparison what they do and at which level (inside TCL or outside with imagemagick, alternatives) they interface would be interesting.

Collapse
Posted by Tom Jackson on
Malte, Dave:

I haven't seen how the image procedures work here. I wrote a module for AOLserver which allows upload of a gzip file of original images and extracts them. Then creates a series of smaller versions of each image. You can specify the sizes and a number of other options.

Most of it is not useful outside of the package, but I noticed a single procedure which takes some options for convert and creates a conversion string. There are a lot of options and you can't just plug them in, they have to be calculated. It also detects the orientation of the image.

The proc is ::qphoto::photo::convertOptions and is available in this page, near the bottom:

http://rmadilo.com/m2/servers/rmadilo/modules/tcl/twt/packages/quick-photo/tcl/quick-photo-procs.tcl

When working on this I tried Imagemagick loaded into Tcl. It was a shared library and didn't require exec. However I fould it to be significantly slower at conversions than simply exec'ing convert.

An example of the data used as input is in:
http://rmadilo.com/albums/tom/IHD2005/series.rdl

The output data is in:
http://rmadilo.com/albums/tom/IHD2005/photo.rdl

There is a form for testing out the conversion proc here:
http://rmadilo.com/albums/tom/coptions-form

Collapse
Posted by Malte Sussdorff on
What is the prefered way of doing this:

a) Edit the sourcecode for image generation and use ns_proxy there. Use a specific ns_proxy pool for image generation only

b) Write a generic "exec_proxy" procedure which works just as exec, but executes through an ns_proxy. I could probably add background capabilities to that (so you could push it to the background and have errors reported in the error.log or as user messages). Replace exec with exec_proxy in the places where it matters.

If I go with "b" is it fine to just use the default configuration for the "exec" proxy or should I provide parameters to tune that? Should I put this in acs-tcl?

Collapse
Posted by Dave Bauer on
I guess you need a way to determine is the exec proxy is available. That is, we use the image:: procs extensively but don't use AOLserver 4.5. As far as I know its not required yet.

Probably if you have an exec proxy setup you'd just want to rename exec_proxy to exec to use it everywhere.

Collapse
Posted by Malte Sussdorff on
I just committed proxy-procs.tcl to acs-tcl. This is creating a wrapper procedure for exec called proxy::exec. It is only available if ns_proxy is installed and configured.

Now.... I could rename "exec" to "real_exec" using the rename command. Interestingly the proxy still requires the call to go to "exec", so I might not even have to do the rename.

Sadly I have no clue how I can write a procedure which checks if proxy::exec exists and then calls it with all the arguments, as exec allows for an arbitrary number of arguments and I don't know how to model that in ad_proc.

Collapse
Posted by Gustaf Neumann on
Malte wrote:

Sadly I have no clue how I can write a procedure which checks if proxy::exec exists
Use [info exists ::proxy::exec]. If it returns empty, ::proxy::exec does not exist.


... model in ad_proc an arbitrary number of arguments...

Try in ds/shell:

   ad_proc ppp {args} {doc} {return "<< $args >>"}
   ppp these are multiple arguments
Collapse
Posted by Malte Sussdorff on
Thanks. The first one is already in use, I was more concerned about the multiple arguments, for whatever the reason it did not like it the way I used it before. Works and is committed.
Collapse
Posted by Michael Totschnig on
Hello Malte,

I just committed proxy-procs.tcl to acs-tcl. This is creating a wrapper procedure for exec called proxy::exec. It is only available if ns_proxy is installed and configured.

what would be the right way to configure ns_proxy in the config file, so that your wrapper works. somehting like ?

load /usr/lib/aolserver4/lib/libnsproxy.so
ns_proxy config exec_proxy

Regards,

Michael

Collapse
Posted by Stefan Sobernig on
Michael, To get the ns_proxy family of commands registered with the driver and connection interpreters, you have to options:
  1. add the following entry to your etc/config.tcl:
    # ...
    if {[ns_info version] >= 4.5} {
      if {[file exists ${bindir}/nsproxy.so]} {                
        ns_param	nsproxy		${bindir}/nsproxy.so        
      }        
      ns_limits set default \
          -maxupload [ns_config ns/server/${server}/module/nssock maxinput]
    }
    # ...
    
    Note, I added it to the 4.5+ branch as it won't be available below. You also need to check for its very existence, otherwise, the you will experience a crash. that would be the option for using malte's "ns_proxy proxy".
  2. as you outlined above, an on-demand / or lazy initialisation in your code (convenient for testing purposes). in xotcl jargon, as I assume that you stick with it:
    Class X -proc init args {
       if {[info command ns_proxy] eq "" && \
     	    [file exists [ns_info home]/bin/nsproxy.so]} {
           load [ns_info home]/bin/nsproxy.so
           # create a proxy pool 
           ns_proxy config my_pool
         }
         next
    }
    
    Provided that you class object is defined in a *-procs.tcl file, the constructor "init" will only be called once, upon initial sourcing in the driver thread. The "info command" expression takes care for its deployment in www/* scripts.
hope it helps, //stefan
Collapse
Posted by Michael Totschnig on
thank you Stefan for the explanation.
I think, since there is code in acs-tcl that gives acces to ns_proxy when it is loaded, it should be included into the sample config file distribibuted with OpenACS. Ubuntu seems to put the library into the lib directory and it is called libnsproxy though.
Collapse
Posted by Michael Cordova on
Just an idea... If the resizing sizes are known, and always the same... maybe it could be a good idea to generate a new file for each dimension...

I'm thinking about user-portraits, for instance, having a "photo.jpg" original file, let' say 1024x768 pixels, you could create other 4 files: medium, small, thumbnail and square versions: photo-m.jpg, photo-s.jpg, photo-t.jpg, photo-sq.jpg

Collapse
Posted by russ m on
Malte - what sort of load is your image resizing server under? Does this seem to happen after some certain number of exec calls, or is it effectively random?

I'm about to shift the hosting of a low volume but important client from Solaris to Linux, in an app that calls /usr/bin/zip to prepare large downloads. The new server is running nsd 4.0.10 so nsproxy isn't an option, and if this is going to start randomly not working I'm in a bit of trouble... :(

Collapse
Posted by Tom Jackson on
Russell,

I'm assuming your application is not image resizing? Since Malte never identified what the problem was, I would ignore it until you run into trouble.

Just keep your exec calls simple.

Collapse
Posted by russ m on
my understanding from what I've since read on the AOLserver list and the TCL core bug reports is that TCL's exec is (to some degree) unreliable under Linux's threading model - whether it's executing ImageMagick's convert to resize images, or zip to bundle up downloads would make no difference.
Collapse
Posted by Malte Sussdorff on
Russell got the problem pinned. The issue is in the TCL exec programm running unreliably on Linux. This is why I started the whole ns_proxy thing for a wrapper around exec. As for the load, it is a busy site, but with load balancing and the ns_proxy that worked out fine.
Collapse
Posted by russ m on
Malte - unfortunately, I can't go to ns_proxy yet (unless it's been backported to 4.0.10)... we're expecting a couple of hundred (perhaps) zip downloads a day at peak times, and I'd like to know how likely we are to be bitten by this before I get nsd upgraded... when you were seeing your nsd processes die after 4-6 hours up, would they have been calling [exec] a couple of times a second? a handful of times a minute? less often?
Collapse
Posted by Stefan Sobernig on
Russell,

Your requirements call for a scalable and robust solution. ns_proxy (AOLSERVER 4.5+) and, therefore, an upgrade is probably one solution, but still, i would consider the use of exec in your case as a smell of "bad design".

I am wondering whether you ever considered using tcl event loop? It is common sense in mono-threaded tcl environments to neglect exec entirely, and go for pipe indirection through open. To talk turkey in your case:

proc done args {...}
proc is_readable {fid} { ... }
set pipe [open "|zip -9 test.zip 454.pdf"]
fileevent $pipe readable [list is_readable $pipe]
vwait ::done

What is so smelly about exec? Well, I am not an insider, nor have I a complete picture of Tcl internals in the field of tension of *nix/win thread models, BUT I do know that TCLs exec is built upon fork() and forking from within threads (at least in *nix) is critical (due to requirements on callstack set-up, if I am not completely mistaken). Besides, exec is inherently blocking I/O. So, the fileevent/open solution above avoids both issues.

see http://wiki.tcl.tk/880

The only tricky thing (at first glance) is how to use async I/O within the multi-threading environment of AOLSERVER. To give you an example, you are not advised to use vwait from within a AOLServer connection thread. Without vwait, the connection thread (i.e. its interpreter) will finish script execution, close the connection and it will be recycled. Still, you can use it and there is even a design fit with another requirement of yours: "a couple of hundred (perhaps) zip downloads"

Well, I guess you will use the bgdelivery feature that comes with XOTcl enhanced OpenACS environments to free resources for new connections while processing downloads in the background?

see https://openacs.org/xowiki/Boost_your_application_performance_to_serve_large_files!

In that case, you can hijack the bgdelivery infrastructure to deal with the archiving/ tarball business. You can place zipping jobs in the standing/ persistent bgdelivery thread which are then, upon completion, directly delivered back.

The nice thing of this design is that you (a) avoid exec, (b) use async I/O in the back, and (c) it is deployable both under AOL 4.0.10 and 4.5.

Collapse
Posted by Malte Sussdorff on
Hi Stefan, I know this might ask for a lot but you seem to have your head around this pretty tightly. Could this trick be put into a procedure that uses the XOTcl bgdelivery feature so we could call something like "xo_exec" ? Or, alternatively, could you implement your idea on something non-core which uses exec (e.g. photo album) so it can be tested and maybe adopted?

Russel, to answer your question: I usually get to around 50 - 60 calls to exec, before I run into an out of memory problem calling the exec, which then results into a restart of the server every I would say 120 calls. But this is rough estimation and really depends on your memory.

Collapse
Posted by Stefan Sobernig on
Malte


you seem to have your head around this pretty tightly.

I'd wish, in the meantime I learnt that the pipe indirection through open also involves a fork(). But, still, async i/o + bgdelivery make sense and provide agility. The tricky thing is to avoid background zombie processes.

I will try to come up with a little solution using bgdelivery that can be tested.

Collapse
Posted by russ m on
hmmm... that number scared me, but I've just run a quick test with 20 parallel clients each making 100 requests, each of which required 20 [exec] calls to service (so 40,000 exec calls in ~10 min spread across 10 connection threads)... every single one completed successfully, all my threads are still there and accepting connections, and the nsd process hasn't grown by any significant amount... perhaps the TCL threaded fork issue has been fixed? (debian etch, kernel 2.6.22-4-amd64, tcl 8.4.12-1.1, aolserver4 4.0.10-7)
Collapse
Posted by Stefan Sobernig on
Russell!

As long as it works for you and you tested (with a minimum effort) that it scales (to whatever extent) than it is apparently fine (at least).

However, another critical thing is the kind of exec call itself (redirector, background, ...) and the code surrounding it. Malte never posted the critical exec line, at least to my knowledge.

In other words, if it works for you. Go ahead!

My option referred more to use a non-blocking variant and link it to the background delivery feature (a larger picture of design).

I agree while there are resources on comp.lang.tcl etc. that report on the threaded fork() issue, I cannot find a concise statement on the issue's state (fixed, open, unsolvable).

Collapse
Posted by russ m on
as I understand it the surrounding TCL code is irrelevant, as is whether the fork is the result of calling [exec], [open "|whatever"] or anything else... the problem is TCL doesn't/didn't implement pthread_atfork, so threads might lock up any time a new process is fork()/exec()'d by TCL... I understand what you're talking about with running the event loop to avoid blocking, but we don't have any problem with long-running children blocking IO for other threads so it's not related to the problem I thought I might be facing... looks interesting though...
Collapse
Posted by Gustaf Neumann on
russell, have you considered using tclmagic?
it allows image resizing, there are no forks involved, since it can be loaded into aolserver

Short intro: http://wiki.tcl.tk/9775