Forum OpenACS Q&A: Error: nsd.tcl: error reading output from command: interrupted system call

ACS/pg 3.2.2beta3 seems to be running well with Postgresql 7.0.2 on FreeBSD 4.0, but we keep getting the same error which we cannot figure out. Any help greatly appreciated. The error message follows. Sorry for the length, not sure which is the crucial part:

Error: nsd.tcl: error reading output from command: interrupted system call
   error reading output from command: interrupted system call
      while executing
   "exec $command $options $error_log"
      (procedure "wd_errors" line 17)
      invoked from within
   "wd_errors $num_minutes"
      (procedure "wd_mail_errors" line 8)
      invoked from within
   "wd_mail_errors"
      ("eval" body line 1)
      invoked from within
   "eval [concat [list $proc] $args]"
      (procedure "ad_run_scheduled_proc" line 43)
      invoked from within
   "ad_run_scheduled_proc {f f 900 wd_mail_errors {} 961518237 0 t}"
   Notice: exiting: no waiting connections

It looks like it's choking when it tries to exec the aolserver-errors.pl script. Do you have Perl installed, and is your nsadmin user able to execute the script?
Well, I think the Perl is OK:

$ whoami
nsadmin
$ ls -l
-rwxr-xr-x 1 nsadmin nsadmin 5186 Mar 9 19:03 aolserver-errors.pl
-rwxr-xr-x 1 nsadmin nsadmin 2797 Mar 9 20:55 queue-message.pl
$ perl -v
This is perl, version 5.005_03 built for i386-freebsd

I was looking at those PERL scripts, aolserver-errors.pl and  queue-message.pl, and it turns out that we need Pg.pm -- which we did not have.  But installing it did not fix the problem.  So that raises another question:  Were we supposed to compile Postgresql with "--with-perl"?
Our error (see first posting in this thread) occurs when /web/acspg/bin/aolserver-errors.pl is called by /web/acspg/tcl/watchdog-defs.tcl.  We inserted several "exec echo..." statements to verify that the tcl script passes the correct arguments, the perl script parses the error file correctly, and passes back the expected result -- and then the server logs the mysterious new error in the server log.  Ironically, the only error in the whole system (so far) is this one due to the error-reporting subsystem.

The nsadmin user can successfully execute the aolserver-errors.pl script from the command line.  Substituting simple Hello-World perl or sh scripts for aolserver-errors.pl cause the same error as the original.

What's happening?  Is there some environment setting for aolserver that we have missed?  For now we have just set the error-monitoring interval to once a day instead of every 15 minutes.  But that does not solve the problem with "exec anything.pl" or "exec anything.sh".


I was getting this error about some of the time with things like

exec /bin/ls

and all of the time with ImageMacick stuff like

exec /im/bin/identify -verbose /home/david/test.jpg

I have worked around this problem by defining my own exec proc called dc_exec and in there redirecting the output of program to be executed to a file and then reading the contents back in and returning it.

Unfortunately I've had to go through the all the .tcl files and change exec to dc_exec. Here's the procedure:

# this is a workaround for a problem on FreeBSD where the tcl exec function

# fails half the time with the message

# error reading output from command: interrupted system call

# which as far as I can see is a problem with the output of whatever program

# The workaround is to redirect the output to a file and then read in the file

# and return the output in the normal way

# I have a sequence in the database for unique tmp filenames

#

proc dc_exec {prog_name args} {



# Generate a new unique filename for the output of this command

set db [ns_db gethandle]

set next_id [database_to_tcl_string $db "select nextval('tmp_fname_sequence')"]

ns_db releasehandle $db

set tmpFName [ad_parameter PhotoDataRoot photodb]/dc-exec-$next_id



# Run the command - note the eval here splits the args

set command "exec $prog_name $args > $tmpFName"

eval $command



# Read and return the output of the command - clean up file

set fId [open $tmpFName]

set return_string [read $fId]

close $fId

file delete $tmpFName return $return_string

}

Bboard search is your friend...

Check out this thread (msg_id 0001HI) for two patches (one from Connie Hentosh, one from Matthew Braithwaite) dated just two weeks or so ago, which seem to fix the AOLserver tcl exec issues under FreeBSD. This is a long-standing issue and others have encountered and dealt with it before :-)

BTW I have not tried OpenACS under *BSD myself (yet?), although I am comfortable as a sysadmin in the *BSD world. If you need additional help with these patches, I'd suggest contacting the folks who created them, rather than me!

Thanks Jonathan, I will have to stick to my workaround for now though because I don't have proper access to the freebsd box in question. Just in case anyone else needs the workaround I have changed it so it doesn't need the database and trims the program's output which was causing problems.

proc dc_exec {prog_name args} {

# Generate a new unique filename for the output of this command
set tmpFName [ad_parameter PhotoDataRoot photodb]/dc-exec-[ns_thread getid]

# Run the command - note the eval here splits the args
set command "exec $prog_name $args > $tmpFName"
eval $command

# Read and return the output of the command - clean up file
set fId [open $tmpFName]
set return_string [read $fId]
close $fId
file delete $tmpFName
return [string trimright $return_string]
}
BTW - although I'm familiar enough with the aolserver tcl api - I wasn't aware of functions like ns_rand, ns_thread, ns_critsec, ns_sema, ns_sleep which are all missing from standard tcl.