Forum OpenACS Development: Re: tDOM parsing error with OpenACS

Collapse
Posted by Andrew Helsley on

The problem is with tDOM. It panics when it sees a character it doesn't understand. I think sometimes this happens because TCL uses UCS-2 internally, which can't represent all of the unicode characters that UTF-8 can. The problem is in tDOM-0.8.2/generic/tcldom.c:2365:


                    if (!clen) {
                        domPanic("tcldom_AppendEscaped: can only handle "
                                 "UTF-8 chars up to 3 bytes length");
                    }

... which I turned into:


                    if (!clen) {
                        DONT_PANIC("can only handle UTF-8 chars up to 3 bytes long");
                        goto error;
                    }

I defined DONT_PANIC earlier in the file:


static void tcldom_startBackTrace (
    Tcl_Interp *interp,
    const char *function,
    char       *message
)
{
    Tcl_ResetResult(interp);
    Tcl_SetErrorCode(interp, "SetErrorCode: ", message, NULL);
    Tcl_AddErrorInfo(interp, "AddErrorInfo: ");
    Tcl_AddErrorInfo(interp, message);
    Tcl_SetResult(interp, message, TCL_STATIC);
}
#define DONT_PANIC(msg)                                             \
    do{ tcldom_startBackTrace(interp,                               \
                              __FUNCTION__,                         \
                              __FILE__ ":" STR(__LINE__)            \
                              msg                       );  }while(0)

I then recompiled tDOM and I haven't had a crash in my server due to this problem since then. If anyone would like to take this patch to the tDOM maintainers, please feel free to do so.

Collapse
Posted by Andrew Helsley on

... oops, forgot to mention that I added this clause to the end of the function which used to have the domPanic(…):


 error:
    Tcl_AppendResult(interp, "\n\tUnescaped string      : \"", value, "\"\n", NULL);
    Tcl_AppendResult(interp, "\n\t................ (hex): ", NULL);
    tcldom_hexDump(interp, value);
    Tcl_AppendResult(interp, "\n", NULL);
    return TCL_ERROR;
}

... which uses this code, also defined earlier:


static void tcldom_hexDumpToStdErr (
    char *cString
)
{
    fprintf(stderr, "\n/* HEX DUMP OF cString / value = \"%s\": */", cString);
    int i = 0;
    char *pc2 = cString;
    char chr;
    while((chr = *(pc2++)) != 0) {
        if((i++ & 0x7) == 0)
            fprintf(stderr, "\n");
        fprintf(stderr, "%02x", chr & 0xff);
    }
}

static void tcldom_hexDump (
    Tcl_Interp *interp,
    char       *cString
)
{
    static const char *x_digit[] = {"0","1","2","3","4","5","6","7","8","9",
                                    "a","b","c","d","e","f",0};
    int i = 0;
    char *pc2 = cString;
    char chr;
    while((chr = *(pc2++)) != 0) {
        if((i++ & 0x7) == 0)
            Tcl_AppendResult(interp, "\n\t\t", NULL);
        Tcl_AppendResult(interp,    x_digit[(chr>>4) & 0xf],
                                    x_digit[(chr>>0) & 0xf],
                                    NULL);
    }
}
Collapse
Posted by Gustaf Neumann on
there was a longer discussion about this panic on the tdom mailing list. The maintainer seems still to believe that the panic is right behavior:

http://tech.groups.yahoo.com/group/tdom/message/1895

i have provided a real-life example with invalid "user input"
http://tech.groups.yahoo.com/group/tdom/message/1921

Andrew, maybe, it helps if you post your message & fix on the tdom mailing list....