Forum OpenACS Q&A: RFC: How to improve error handling and messages for users

Hi all,

the intention of this thread is to encourage everybody to commit work and thoughts they have on to improve the error handling for users in OpenACS.

We really have to reduce the frustration of users who do not understand why an error message is returned. This issue seems very important to me because a system is only as good as what users think of what it is, right?

So what can we do?

Here are some thoughts:
- why not use a template (adp) not to shock users when something happens. This template could have at least the admin's email...better an i18n-explaination.
- Even better a form to submit a message to the admin what went wrong. To improve the last part the idea was to make use of the bug-tracker as kind of a support service. Each feature that is accessible by the user (file-storage, calendar, ...) should have a corresponding support-service component in the bug-tracker. So when ever there is an error a form could show up to report a problem message. To help the user most of the stuff in the form could automatically get filled out (component, urls, request info stuff)...one could even have a counter..if a user tries something four times until he gives up and posts a new problem message we could set the severity to high. Maybe it is even possible to distinguish automatically between user mistake or real bugs.

What do you think? I am really interested to know.

I looked at the code and from my understanding packages/acs-tcl/tcl/defs-procs.tcl contains the following procs that do the error handling:

- ad_return_complaint
- ad_return_exception_page used by
    ad_return_error
    ad_return_warning
    ad_return_forbidden

Also in config.tcl we have:

ns_param  NotFoundResponse    "/global/file-not-found.html"
ns_param  ServerBusyResponse  "/global/busy.html"
ns_param  ServerInternalErrorResponse "/global/error.html"

and

ns_section ns/server/${server}/redirects
ns_param  404                "global/file-not-found.html"
ns_param  403                "global/forbidden.html"

When are these files served..i mean from which proc?

I found a post of how to change to *.adp files here:
https://openacs.org/forums/message-view?message_id=29960

Greetings,
Nima

Also I found this:

ad_raise exception [ value ] in packages/acs-tcl/tcl/exception-procs.tcl

return -code error -errorcode [list "AD" "EXCEPTION" $exception] $value

The redirects section is handled automatically by AOLserver. If the response code matches, and a response hasn't been sent, this page is returned. This page could be a tcl script, adp or whatever. This page also has full access to the errorInfo global, which identifies the error.

I've been playing around with error code lately on AOLserver and it seems that handling or not handling errors can be easily messed up. I suspect that it is done wrong in a lot of places in OpenACS, making it hard to track down either the cause of the error or the code which initially called the code which caused the error (you need both). Also, it can be very tricky to catch and handle errors inside filters. If everything isn't done properly you end up with impossible to find bugs, usually an error about an incorrect return from the filter.

So the first point is that you can use a 500 redirect to return a custom error page in AOLserver. I think this is a new feature, not sure. Second point: try to not use catch except to undo things which absolutely must be undone. For instance, if you open a file, you need to close the file, but then you must propigate the error:

if {[catch {
 set fd [open $file r]
 ...
} err ]} {
    global errorInfo
    set savedInfo $errorInfo
    close $fd
    error $err $savedInfo
}

Third point: error handling could be moved to global/500-error.tcl or whatever.

Can you explain more about the errorInfo global? And how do I access the values from global/500-error.tcl?

For instance, I have some test procs and a registered filter:

proc ::mytime { time } {

    return [ns_buildsqltime $time]
}

proc ::filter-error { why } {
     source [ns_info pageroot]/err.cmp
     mytime 10:00:00
     return filter_ok

}

ns_register_filter preauth GET /filter-error.tcl filter-error

My error page contains:

global errorInfo

ns_return 500 text/plain "Oops screwed up: $errorInfo"

My err.cmp page contains:

set abcd efgh
set $abcd

When I access /filter-error.tcl, I get the following returned:

Oops screwed up: can't read "efgh": no such variable
    while executing
"set $abcd"
    (file "/usr/local/aolserver/servers/test/pages/err.cmp" line 2)
    invoked from within
"source [ns_info pageroot]/err.cmp"
    (procedure "filter-error" line 2)
    invoked from within
"filter-error preauth"

So the problem was caused by the fact that the efgh var didn't exist. The error was on page err.cmp, which was sourced by the proc filter-error on line two of that proc, which was called as a preauth filter. If I were to fix this error in some way, I would discover that the next line of my filter proc calls another proc which has an error:

Oops screwed up: wrong # args: should be "ns_buildsqltime time ampm"
    while executing
"ns_buildsqltime $time"
    (procedure "mytime" line 3)
    invoked from within
"mytime 10:00:00"
    (procedure "filter-error" line 3)
    invoked from within
"filter-error preauth"

In both cases, the full stack trace helps you easily find and fix the problem.