Forum OpenACS Q&A: Response to little problem with nsd8x

Collapse
Posted by Henry Minsky on
Actually that's a little can of worms you just found. Tcl 7 did not use Unicode internally,
so it had the feature/bug of preserving 8 bit characters sometimes.

Tcl 8 uses Unicode strings internally, and thus needs to be told what encoding
input and output are in, so it can convert properly to Unicode. This allows Japanese
and other non-ASCII encodings to be handled properly.

However, the AOLserver maintainers did not properly fix AOLserver to use Unicode.
So ArsDigita ended up having to patch AOLserver. YOu need to use the latest
ArsDigita release (+ad12). It pretty much will default to ISO-8859-1 encoding
I think, unless Rob set it to use UTF8 (I thought I saw something about this).

You should look at my patches at http://www.ai.mit.edu/people/hqm/openacs and http://imode.arsdigita.com/i18n
for some advice on how to deal with character sets. But setting
up URLCharset and OutputCharset to ISO-8859-1 in your .tcl init file should mostly work.
The issue is how .tcl and .adp files are sourced from disk. It used to be
.tcl files would get interperted using the defualt tcl system encoding, which was
usually iso-8859-1. But adp files were read as raw UTF8. You need to patch
things a little to get consistent behavior.

Although Rob put the needed hooks
into AOLserver, the toolkit  developers at ArsDigita didn't make it a real priority
to get charset encoding integrated properly into the ACS toolkit releases, since everyone speaks English,
right? At least everyone with money in their pockets, and it was  hard to test for
the  developers.

My hope is that when the OpenACS 4 release is out, we can integrate and document how to  control  charset
encoding  in a nice consistent manner, and we can integrate that with the acs-lang module which
handles message catalogs for translation, and some other goodies like Tcl routines internationalized time and
date formatting.