Forum OpenACS Q&A: What does it take to get Postgresql database error detail in tracebacks?

The single most frustrating problem with developing OpenACS pages (at
least with Postgresql) is the fact that when performing Postgresql
operations, I don't get crucial error information back when I do
something wrong.  For example, with db_dml, all I get is that the
operation failed.  I know what line failed in my code, but it doesn't
tell me the database error.  It's probably something really trivial,
like a misspelled attribute name.  This is really intolerable.  Has
somebody patched this?  Perhaps this is really be an AOLserver
problem?
If you have verbose and logsqlerrors set to true for each of your database pools, then you should see plenty of information in the server log so that you can easily locate the source of your error.  An exception to this might be errors in pl/pgsql code, as the error messages for pl/pgsql code can be kind of opaque.
I believe he wants the plsql errors and they just plain suck in PG as far as I can tell. I just do a select plsql_proc(); and replace the bind vars and see what happens. This is slow but it is the only way I can see to figure it out.
The error info that psql provides is much more useful than what you get in the AOLServer log. I just evaluate offending sql there. Maybe not the most elegant approach, but so far it's always pinpointed the problem. FWIW...
You get the same error messages in psql as you do in the aolserver log file.  Once you isolate a problem to a particular pl/pgsql function, then the best thing to do is start putting raise notice calls in the function to output debug data.  It also works well to comment out parts of a pl/pgsql function until the error message goes away - kind of a divide and conquer approach.
Doh; of course you do. My bad.
PG 7.2 gives better errors for pl/pgsql functions including the
offending line number.
Perhaps Randy wants as complete an error message as what appears in the logs to appear in the browser window.  Is that possible?
Including the last database error in the stacktrace that appears in the browser window would be very helpful for developing. Until someone implements that here is a little error log colorizer for the console that helps me weeding through the masses of verbose error log:
---- cut here ----
#!/usr/bin/tclsh

# Colorize the server.log from aolserver for nice output in the
# console. Put it in a pipe like that:
# 
# $ tail -f /var/log/aolserver/server.log | ./aollogcolor.tcl

set c_red ""
set c_blue ""
set c_green ""
set c_default ""
set c_bold ""

set query_p 0
while { [gets stdin s] > -1 } {
    # square brackets
    regsub -all {^[[^]]*][[^]]*][[^]]*]} $s "$c_blue
Robert, that's exactly what I want.  I'm not at the point of debugging PG/plsql functions yet (but I will be).

I was referring to the easier problem of simple db_dml, such as inserting data into the database.  Another instance is the all encompasing "Query did not return rows" kind of answer (well, what did it return?)  Using the hint about turning on sql debugging, I figured out that the system comes that way.  Looking further up the log files than I expected I should have, I found the actual error, along with the bind substitution, which is nice to see.  It would be so much more useful to just get that error, along with the attempted SQL, in the TCL traceback on the web page.

The more I write about it, though, the more I believe this is an aolserver issue, vs. an OpenACS issue.

I'm not sure exactly what the problem is in that there seems, if anything, an over-abundance of logging in the system.

Things like 'query did not return any rows' are, I would argue, pretty self-explanatory. i.e. you tried to select rows that don't exist.

Also, the error logs also seem to dump out an entire mine of information when theres some form of error. I actually think its way too much.

One of the best ways to avoid the need for endless log hunting wi=hile developing is to get into the write an automated test, before writing the routine, mode of operation. In this was you test harnesses catch the output and report in whatever fashion you like.

You can see examples of this using the acs-automated-testing package.

anyway, sorry for wandering off topic slightly, but for my tuppence worth can we leave the logging well alone?

Cheers
Simon

Sorry, Simon, bad example on my part.  I should have said "query is a command which does not return rows", generally meaning an error occurred.  I believe a fundamental of any good development platform is that when an error occurs useful information is displayed in the most effective way possible to facilitate debugging.  I'm not asking for more debugging information, since that's what threw me off in the first place.  What I find incredibly frustrating is that the traceback, which is incredibly detailed in its own right, at the very top, where the information would be most useful, simply gives a generic error message and one is left to grovel through logs to find it.  Romping through log files is perhaps my 2nd or 3rd choice when I'm working on a really nasty or subtle problem.

I like OpenACS, but this shortcoming would probably keep some people away.  Of course, anybody who continues to read these rants can tell I'm now on a mission.

Randy

If you watch your error log, you will notice that the more specific error comes before what is displayed in your browser as a traceback. For the type of error you are seeing, you might get very valuable information from this initial error message. Less valuable info will be obtained from a bug in a plpgsql procedure, and you will have to use the error log again to cut and paste the substituted query into psql

A simple way to follow errors without too much garbage flying by is:

$ tail -f your-server.log | grep -4 "rror" /dev/fd/0

Which should show a context of 4 lines before and after an error.

Sorry, I am an impatient man. I truly appreciate all the responses to my question. Some study of postgres.c from the pgdriver tarball reveals that it is actually intended for the Postgresql errors to go into the traceback. The code to place this information on into the Tcl result stack relies on a two-part error code, ie. an "exception" and the error message. If the exception is not set (which the postgres driver never does), it assumes there's no error message (which is in fact there). The design appears to be based on Oracle, where you have an Oracle error message which can sometimes be useful to look up (and depending on how your Oracle is installed, might be the only thing you have to go on).

Here's a patch to postgres.c to put the PostgreSQL error message onto the Tcl traceback. It works by putting a string into the exception code, so that the DbFail routines will look for the error message.

--- /goose/aolserver/pgdriver-2.0.1/postgres.c  Tue Sep  4 12:25:44 2001
+++ postgres.c  Tue May 21 14:45:47 2002
@@ -471,15 +471,18 @@
         case PGRES_COPY_IN:
         case PGRES_NONFATAL_ERROR:
               Ns_DStringAppend(nsMsg, PQresultErrorMessage(nsConn->res));
+              strcpy(handle->cExceptionCode, "WARN");
               break;
 
         case PGRES_FATAL_ERROR:
               Ns_DStringAppend(nsMsg, PQresultErrorMessage(nsConn->res));
+              strcpy(handle->cExceptionCode, "FATAL");
               break;
 
         case PGRES_BAD_RESPONSE:
               Ns_DStringAppend(nsMsg, "PGRES_BAD_RESPONSE ");
               Ns_DStringAppend(nsMsg, PQresultErrorMessage(nsConn->res));
+              strcpy(handle->cExceptionCode, "BAD");
               break;
     }
 }

The CVS version of the PG driver has better error reporting (see http://sourceforge.net/projects/acs-pg/. Errors appear in the web page just before the stack trace, with bind variables expanded.

Re: acs-pg in SourceForge.  I checked out driver-2.3 and compared the two.  There are other changes besides in DbFail() which look useful.  Unfortunately, the DbFail() function is defined in both dbtcl.c (in aolserver/nsd) and in postgres.c.  The nsd calls to DbFail() are linked to their own, and the postgres.c uses its own.  So, that version will only give more useful reports in those cases where DbFail() is called from postgres.c, and those situations are certainly not rare.  It seems that both sets of changes are needed.