Forum OpenACS Development: tDOM parsing error with OpenACS

Collapse
Posted by Jon Suen on
Hi,

I've written up a tDOM script that parses certain variables of a database - and on a small(er) scale, works properly to return an XML schematic with certain variables from the database.
The script should be able to handle large amounts of data, but I keep receiving this error which shuts down my AOLserver when I try to run it with a large database:

"tcldom_AppendEscaped: can only handle UTF-8 chars up to 3 bytes length"

I can't pinpoint where the problem would stem from, and haven't found any other material online which would seem to solve this problem.

Does anyone have any suggestions?

Cheers,

Jon

Collapse
Posted by Nick Carroll on
This might have something to do with text that was copied and pasted from a Word Document. Word's "smart quotes" are an extension of UTF-8 and are not readily recognised.
Collapse
Posted by Andrew Helsley on

The problem is with tDOM. It panics when it sees a character it doesn't understand. I think sometimes this happens because TCL uses UCS-2 internally, which can't represent all of the unicode characters that UTF-8 can. The problem is in tDOM-0.8.2/generic/tcldom.c:2365:


                    if (!clen) {
                        domPanic("tcldom_AppendEscaped: can only handle "
                                 "UTF-8 chars up to 3 bytes length");
                    }

... which I turned into:


                    if (!clen) {
                        DONT_PANIC("can only handle UTF-8 chars up to 3 bytes long");
                        goto error;
                    }

I defined DONT_PANIC earlier in the file:


static void tcldom_startBackTrace (
    Tcl_Interp *interp,
    const char *function,
    char       *message
)
{
    Tcl_ResetResult(interp);
    Tcl_SetErrorCode(interp, "SetErrorCode: ", message, NULL);
    Tcl_AddErrorInfo(interp, "AddErrorInfo: ");
    Tcl_AddErrorInfo(interp, message);
    Tcl_SetResult(interp, message, TCL_STATIC);
}
#define DONT_PANIC(msg)                                             \
    do{ tcldom_startBackTrace(interp,                               \
                              __FUNCTION__,                         \
                              __FILE__ ":" STR(__LINE__)            \
                              msg                       );  }while(0)

I then recompiled tDOM and I haven't had a crash in my server due to this problem since then. If anyone would like to take this patch to the tDOM maintainers, please feel free to do so.

Collapse
Posted by Andrew Helsley on

... oops, forgot to mention that I added this clause to the end of the function which used to have the domPanic(…):


 error:
    Tcl_AppendResult(interp, "\n\tUnescaped string      : \"", value, "\"\n", NULL);
    Tcl_AppendResult(interp, "\n\t................ (hex): ", NULL);
    tcldom_hexDump(interp, value);
    Tcl_AppendResult(interp, "\n", NULL);
    return TCL_ERROR;
}

... which uses this code, also defined earlier:


static void tcldom_hexDumpToStdErr (
    char *cString
)
{
    fprintf(stderr, "\n/* HEX DUMP OF cString / value = \"%s\": */", cString);
    int i = 0;
    char *pc2 = cString;
    char chr;
    while((chr = *(pc2++)) != 0) {
        if((i++ & 0x7) == 0)
            fprintf(stderr, "\n");
        fprintf(stderr, "%02x", chr & 0xff);
    }
}

static void tcldom_hexDump (
    Tcl_Interp *interp,
    char       *cString
)
{
    static const char *x_digit[] = {"0","1","2","3","4","5","6","7","8","9",
                                    "a","b","c","d","e","f",0};
    int i = 0;
    char *pc2 = cString;
    char chr;
    while((chr = *(pc2++)) != 0) {
        if((i++ & 0x7) == 0)
            Tcl_AppendResult(interp, "\n\t\t", NULL);
        Tcl_AppendResult(interp,    x_digit[(chr>>4) & 0xf],
                                    x_digit[(chr>>0) & 0xf],
                                    NULL);
    }
}
Collapse
Posted by Gustaf Neumann on
there was a longer discussion about this panic on the tdom mailing list. The maintainer seems still to believe that the panic is right behavior:

http://tech.groups.yahoo.com/group/tdom/message/1895

i have provided a real-life example with invalid "user input"
http://tech.groups.yahoo.com/group/tdom/message/1921

Andrew, maybe, it helps if you post your message & fix on the tdom mailing list....