Forum OpenACS Development: aolserver-errors

Collapse
Posted by David Walker on
It's bad enough that perl scripts wormed their way into the mix

I've started on a tclsh version of aolserver-errors.pl. Are there any technical reasons to use perl over tclsh on this? Is aolserver-errors.pl unchanged from 3.2.5?
Collapse
Posted by Don Baccus on
Actually I wasn't terribly serious when you asked that, but, hey, aolserver-errors.pl is *exactly* the kind of thing that Tcl was originally designed to do.

Unlike the creation of 100,000+ lines of Tcl code to form a "thin layer over an RDBMS" which we now call "OpenACS 4.5"!

AFAIK aolserver-errors.pl hasn't changed an iota since it was written.  One of our folks fixed the infinite loop bug (search the archives, as I must do soon so I can include it in our OpenACS 4.5 release).  You probably won't make the same mistake in Tcl, which after all is cleaner, less obtuse, doesn't come with a puzzle book, etc ...

I guess one must ask: are their Unices out there that come with Perl but not with Tcl?  I assume that Solaris is *not* one of them since Tcl was originally developed by Sun.

Collapse
Posted by Jon Griffin on
The original reason that this was written in Perl was that someone told the MIT boys that "perl==secure tcl==security flaw".

I fought with that for a while at my former employer (yeah you know who that was), but gave up because I had other things to do and they went to Java.

Collapse
Posted by Don Baccus on
Actually I thought they went down the toilet ... :)

That's interesting, I wonder if there's any truth to the claim of tclsh insecurity?  It sounds like it could simply be a perl religionist's FUD attack.

Collapse
Posted by C. R. Oldham on
Having written in both Tcl and Perl, I think I prefer Tcl because code can remain readable.  I have a really hard time going back and reading my old Perl code.  Maybe that's just bad programming style on my part.

However, the fact that Perl has such a *HUGE* number of libraries to do every conceivable thing often makes me pick it over Tcl or Python.  Is there a CPAN for Tcl?  And what about an easy way to get loadable modules into the Tcl interpreter in AOLserver?

Collapse
Posted by David Walker on
Actually once I got to working on it I realized I don't even need tclsh.  My
patch (almost done) will run entirely within aolserver.  If the desire is there
I'll create a tclsh version as well.
Collapse
Posted by David Walker on
OK. Here is a version that I am mostly satisfied with.

https://openacs.org/sdm/one-patch.tcl?patch_id=174

If necessary I'll add support for the num_bytes and num_minutes options.
Collapse
Posted by Don Baccus on
This looks interesting but needs some real-world testing before we make the replacement final (or before we can make that decision, I'm all for it if it passes muster because it's one less piece of Perl code and it doesn't require a forking EXEC to run).

Are any of the folks out there running non-critical OpenACS 4 sites able to plug this patch in and give it a whirl?

Collapse
Posted by David Walker on
I'm running it on a dev site and I'll start running it on a live site whenever
the client gets around to finishing all their changes and we do a rollup.
Collapse
Posted by Don Baccus on
Hey, sounds great ... I don't have a full-time set up here at the moment which is why I asked for others to help out.
Collapse
Posted by Roberto Mello on
C.R Oldham,

Loading Tcl modules in AOLserver is pretty easy. I have that happenning with ACS 4.2.

Collapse
12: I'm to blame (response to 1)
Posted by David Rodriguez on

I wrote the perl version of watchdog.

The original version of watchdog was written all in tcl as a stand-alone aolserver service (like rollover). It was terribly slow. Processing a 1-meg span of error log took nearly 15 minutes. Rewritten as a perl script, that dropped to around 1 second.

This was back in the tcl 7.6 days, and the problem was that the old code relied on long lists (which were slow in 7.6), and many regexps, at least one for every line being read. Also, tcl 7.6 didn't pre-compile, which is a performance killer for an app that is basically a collection of regexps that get executed in a small loop many times.

Don said this is the type of job that tcl is made for. This is basically a text-munging job, and when I think of text-munging, I think of perl (I don't think I'm alone.). I knew perl didn't suffer from the two weakness that made the old code slow.

Another reason it's in perl is because I didn't know about the performance improvements that existed in 8.x at the time. Like most people at aD, my only interaction with Tcl was through AOLserver, so I never considered writing it as a tcl script to be exec'ed by a 8.x tclsh (which seems odd, but was probably the only way to get acceptable performance out of tcl for this task at the time.)

And regarding Jon's comment...

The original reason that this was written in Perl was that someone told the MIT boys that "perl==secure tcl==security flaw"

I've never met Jon, and I've certainly never talked to him about watchdog. I'm not sure where he gets his information.

Collapse
Posted by Jon Griffin on
My information came from the powers that had control over the code base known as ACS.

Lets just say that this script plus restart-aolserver were partially rewritten by myself for bootcamp setups and I was told that they would not be introduced into that pristine ACS tree due to "security flaws" in running tclsh.

There was also a similar aversion to running acs using anything but inittab so I did it anyway. Which prompted me to change the LA office to use daemontools against the sysadmins advice in Boston. This was several weeks before the big fiasco of letting all developers have sudo on the boxes and a little f'd up inittab that brought some clients boxes down for a good while. AD and security were two words that really don't belong together.

One of the big problems at AD was that people felt that if Philip said they were good, they believed him. And, that caused a lot of shi*&y engineering to take place.

I don't know the history of your actual writing of that script, but afterwards not accepting the changes for security reasons was absurd. Also, if tcl can't parse a file as quick as perl we really should be running Apache w mod_perl and one of those great guru hackers at corporate should have guided you in the proper direction.

Collapse
Posted by Ken Kennedy on
Hmm....my only thought here is...we ARE talking about running this from a separate aolserver instance, correct? If not...if your aolserver instance fails, it's not gonna be unable to run the job to tell you that it died *grin*. Or, if this is understood to be non-complete-and-total-crash errors only. Is watchdog still around separately? You're still going to need something NOT associated with the app itself running to let you know it isn't broken. (admittedly, angry websurfers may let you do that as well...but that's rather kludgy (to be kind). At work, we certainly prefer that our cron scripts find db problems before annoyed users do!)
Collapse
Posted by David Walker on
.if your aolserver instance fails, it's not gonna be unable to run the job to tell you that it died

It doesn't do that now. aolserver-errors.pl is exec'ed from aolserver. If aolserver dies then it is not exec'ed. Also, there isn't necessarily an entry in the aolserver log to show that it died. Preferably you would monitor aolserver externally using mon (http://www.kernel.org/software/mon/) or some such thing
Collapse
Posted by David Walker on
The tcl proc version of aolserver-errors has been running on my live servers
since March 11 with no noticeable differences from the perl version.
Collapse
Posted by Andrew Piskorski on
David, I downloaded your patch, but got errors when I tried to applying it to monitoring-procs.tcl (CVS tag oacs-4-5-beta-1-2):
$ gpatch -i openacs-sdm-patch-id-174.txt watchdog-procs.tcl 
patching file watchdog-procs.tcl
Hunk #1 FAILED at 1.
Hunk #2 FAILED at 97.
Hunk #3 FAILED at 107.
Hunk #4 FAILED at 136.
Hunk #5 FAILED at 153.
5 out of 5 hunks FAILED -- saving rejects to file watchdog-procs.tcl.rej

$ gpatch -v
patch 2.5.4
Copyright 1984-1988 Larry Wall
Copyright 1989-1999 Free Software Foundation, Inc.
Am I just doing something dumb with patch??
Collapse
Posted by David Walker on
No.  I am doing the something dumb.  I wrote a patch against OpenACS
3.2.5 and posted the link to it in the OpenACS 4.0 Design forum.

I would appreciate it if someone could take the time to convert it or, if not, I'll
release a 4.5 patch when I get some time.

Collapse
Posted by Don Baccus on
Andrew - if you're motivated and want to create and submit a 4.5-based patch, I'll be more than happy to stick in the development branch (I think it's a bit late to be sneaking this into the 4.5 branch).
Collapse
Posted by Andrew Piskorski on
Yep, I'll submit a patch soon.
Collapse
Posted by Andrew Piskorski on
Ok, I submitted a patch in the SDM against the openacs-4 Monitoring package. It's feature # 1452.
Collapse
Posted by Andrew Piskorski on
David, does your code handle the case where you're giving a max number of bytes to read from the server (error) log, but the log has been rolled over since the last time we've parsed it, so in fact it's a whole new file?

I don't think it does. I'm not really sure what the aolserver-errors.pl script does in that case, either.

Anyway, probably 99% of the time if the log has rolled it will be smaller than before. But that's not entirely certain. Is there any way to directly detect the fact that the log has rolled?

Here's a new code snippet with what I think is a partial solution to this log rolling issue:

if { $num_bytes > 0 } {
   set log_file_size [file size $log_file]
   set sizediff [expr {$log_file_size - $lastread}]

   # If the log has been rolled, presumably it will now be smaller
   # in size than it was last time.  So in that case, always read
   # from the beginning of the file.
   #
   # TODO: But, it isn't GUARANTEED that it will always be smaller.
   # Is there any other way for us to detect if the log has been
   # rolled or otherwise fooled with?  --atp@piskorski.com,
   # 2002/04/09 13:13 EDT

   if { $sizediff < 0 } {
      set lastread 0
      set sizediff [expr {$log_file_size - $lastread}]
   }

   if { $sizediff > $num_bytes } {
      set lastread [expr {$log_file_size - $num_bytes}]

      append output "Log file grew by [expr {round(100.0 * [expr {$sizediff / 1024.0 / 1024.0}]) / 100.0}] megabytes.
Reporting on the last $num_bytes bytes of log.
"

   } 
}
Collapse
Posted by Tom Jackson on

Has anyone looked into using LogSentry (formerly Logcheck) for reading the AOLserver error log? It is available from http://psionic.com/products/logsentry.html. I would love to completely eliminate the use of perl in my AOLserver installations, and this script is the only thing that requires it, I think. Note: I restart with daemontools' svc.

Collapse
Posted by David Walker on
It handles that here.  "First Run" might not be the best choice of terms for that case.<br>
<br>
If you want to be sure about the file rolling over you can store and compare the first couple of lines or the first date you locate.  Since the date is contained in each log message you can be 99.7% sure that the first few lines will change after each log rollover.
<br>

<pre>    if {![info exists lastread] || $lastread > [file size $log_file]} {
        set lastread 0
        set lastread_time "First Run"
    }
</pre>

Collapse
Posted by Andrew Piskorski on
Ah, I'd missed that.  Thanks, David.

It would still be nice to handle the unlikely case of the log all of
sudden growing really big immediately after being rolled.  Hm, I guess
we could parse the time/date stamp from the first line every time, and
compare that to our saved lastread_time in order to decide whether the
log has been rolled.  That's the only thing I can think of.

Collapse
Posted by Andrew Piskorski on
Ah, one more snafu: The wd_aolserver_errors proc in my patch above works fine for all email spamming purposes, but it does not work correctly for use on the packages/monitoring/www/watchdog/index.tcl page. That page wants to look an aribtrary numger of kb or minutes back into the error logs history, and the old aolserver-errors.pl script has that ability, but the wd_aolserver_errors proc does not. So that's another little TODO item with this.

I personally never really use the packages/monitoring/www/watchdog/index.tcl page at all, so fixing this isn't a priority for me, but as a stopgap you can get the correct functionality by, on that page, changing the call to wd_errors to use the old aolserver-errors.pl script instead, like so:

wd_errors -external_parser_p 1 -num_minutes $num_minutes -num_bytes $bytes  
Collapse
Posted by Andrew Piskorski on
The wd_aolserver_errors proc in the patch 231 for feature 1452 above, from April, has a race condition with the growth of the server error log. I've fixed it, and have started using the fixed version on my own systems. I didn't generate a new patch, but I did stick my whole current watchdog-procs.tcl file into new-file-storage.

One of these days, I'll actually:

  1. Add support for the -num_minutes option, so the new wd_aolserver_errors proc will really be a feature-complete replacement for the old aolserver-errors.pl script.

  2. Maybe do some performance comparison between wd_aolserver_errors and aolserver-errors.pl.

  3. Actually test this in a stock OpenACS 4.5, rather than only on my own old hacked installations, and talk to Don (or whomever) about getting it into the OpenACS cvs.

In the meantime, let me know if you're actually using it and have problems, or would like a real patch, etc.

Collapse
Posted by Andrew Piskorski on
Collapse
Posted by Don Baccus on
Let's try to get this into 4.6, OK?
Collapse
Posted by Vinod Kurup on
OK 😊

This is now in CVS. I ported the monitoring package to PG as best as I can, given that much of it is Oracle specific (Cassandracle, etc). I added the new watchdog-procs file posted by Andy above. I've been using it for the past couple days (Oracle and PG) and it's reporting errors properly so far.

In regards to the monitoring package, I had to add a parameter to specify command-line options to send to 'top'. On my version (procps v. 2.07), I had to send it '-b -n 1'. I also had to adjust the top-scraper code, cuz the columns were in a different order. It should still work on whatever version it used to work (Solaris?).

Things to do:

  1. figure out why ad_monitoring_analyze_tables isn't working in Oracle
  2. try to create a PG replacement for Cassandracle
  3. Use templating (currently it just spits out HTML in the tcl pages)
But, it works for my main purpose - getting errors mailed to me from my PG site. (Note that you need to install the package, mount it on your site-map, set the parameters and then *restart* the server so that it schedules the watchdog-procs appropriately)