Forum OpenACS Development: Getting rid of the haystack in the error log

Here's what ns_log offers for logging levels:
  • Notice: Something interesting occurred.
  • Warning: Something that could mean something bad occurred.
  • Error: Something bad occurred.
  • Fatal: Something extremely bad occurred. The server will shut down after logging the message.
  • Bug: Something occurred that implies that there is a bug in the code.
  • Debug: If the server is in Debug mode, the message is printed. Debug mode is specified in the [ns/parameters] section of the configuration file. Otherwise, the message is not printed.
I think most people who have written a line of ACS code, especially when using db drivers that don't pass back SQL error messages (now fixed), are familiar with opening the error log to hunt through hundreds of lines for the information they need. This seems to be in large part due to developers who have used Notice when they should have used Debug. At least, that's the assumption I've been working under while going through the code and changing a bunch of ns_log Notices to ns_log Debug in Notifications, acs-mail, and other notorious log spammers. Two questions:

First, does anybody think that automated processes - sweeps, etc - should write notices instead of debugs? Is there a case where the benefit to that outweighs the cost (making the log hard to use for anything else, eating up disk space) and it's not realistic to switch the server to debug mode?

Second, there are still about 300 ns_log notices in the core package. That seems like too many, although only the recurring ones are really offensive in terms of rendering the log less useful. How many of these are actually "interesting" in routine use? Could most of them be changed to Debugs, warnings, bugs, etc? If we could agree on a better criterion than "Something interesting occurred," this would be a good worker-bee task.

Collapse
Posted by Joel Aufrecht on
Related bug, listing all the Notices in case somebody is willing to mark them up for a worker bee: https://openacs.org/bugtracker/openacs/bug?bug%5fnumber=1032
Collapse
Posted by Janine Ohmer on
I am responsible for a site that cannot be restarted during the day.  I occasionally get a panic emergency report, and have to try to do some debugging without restarting.  This is challenging at best, and sometimes impossible, because verbose logging is off and performance mode is on, meaning no SQL in the log and file watching doesn't work.  So I'm limited in what I can do, and the various Notices sometimes are enough to at least give me a clue as to where to look.

I guess what I'm saying is that I'd miss them, and I think that most people in my situation would feel the same way, but there may not be enough of us to justify inflicting the log clutter on everyone.

Maybe the answer is to create a new logging level for production servers?  Having debug on isn't the answer, since that gives way more detail than you need on a regular basis.

Collapse
Posted by Don Baccus on
Yes, automated processes should write notices and they do so intentionally, for the reason Janine mentions.  On busy production sites it can be very helpful to see what sweeps have run recently, which succeeded and which spawned an error and it is often not possible to restart.

Besides which, normally you're being asked to try to divine what caused a problem in the past, not the present or future, so restarting with "debug" wouldn't help.

Collapse
Posted by Christian Eva on
When I started to use AOL-Server I started to use Notice/Debug as it is thought for. Soon I felt that this is not flexible enough for debugging and I ported a little Debug-Logging proc over into my procs, called at the beginning of each ADP page a notification proc (enabling/disableing a particular debug level) and was happy enough.

When I started this year to use OpenAcs as the environment for a new (I18N) application, I missed my debugging stuff very much. After a while I decided to include a little patch in rp_handler to call my notification proc. So in this way I can trace the pages. My own pages and procs contain a higher density of Debug calls then the ACS average but then I can choose and change at runtime the level between 0 (no debug) and 9 (most details).

So if there is some interest I can publish my code as a base for inspiration/discussion

Collapse
Posted by Mark Aufflick on
I agree with Christian (and others) that we need more levels of debug - so a server like the one Janine talks about could run at, say, level 5, and a server that i don't care too much about but that has limited disk space (or costs me money the more i use), could run at level 1.

Even if we don't want to modify ns_log, surely we could just write an ad_log that supports the extra levels, and ns_log's just what we want.

If bing in tcl space is a problem, it would surely be a very simple server module?

while we're at it, we could build functionality into it like email me on severity 9, or write different log items to different files.

Just some ideas - but i suspectwe have outgrown ns_log.

Collapse
Posted by Randy O'Meara on
Joel,

Your ideas sound good. But why not, while we're at it, provide an interface to dynamically set the level? No reason to force a server restart to change the logging level.

It would be good to be able to do something like: I don't want to see the profuse logging from an uninteresting package... Now, I don't want see logging from another package... and so on. In this way, you could really zero in on the interesting detail while paring away the fluff from other packages, where package in this instance really implies a particular namespace.

This might require an nsv to hold the "uninteresting" package names (or, maybe even the DB so it remembers across service restarts). The TCL call stack could be used to determine the namespace of the caller?

Randy