Forum OpenACS Q&A: Response to Unicode Characters?

Collapse
Posted by Henry Minsky on
You are probably getting Microsoft characters, which are not
valid ASCII or ISO-8859-1, or anything. Tcl will try to convert them
to unicode when it reads the page from disk, and assume they are 8859-1 when it reads them. The only solution is to make sure the
g**damn Microsoft characters are removed before you load the file, or else to tell AOLserver that the charset if CP1252 (I think that's the
official name for it).

There was a perl script that MarkD had call "demoronizer" (http://www.fourmilab.ch/webtools/demoroniser/) that would
do the substitutions. The problem is that if you try to do it in Tcl,
it's too late, unless you have set the channel encoding when you
read the file from disk. You can do this in Tcl, but for your sanity
I recommend you convert documents to ISO-8859-1 before trying to serve them from AOLserver. I.e., it is too painful to try to organize your documents into ISO-8859-1 and CP1252, better to just convert to
a common format. Microsoft's charset is I think so incompatible that some of their characters don't even have Unicode correspondents at all.