Forum OpenACS Development: ad_httpget and tdom

Collapse
Posted by Guan Yang on

There is a pitfall if you fetch XML documents with ad_httpget that have an encoding set, and subsequently parse them with tdom.

The problem is that ad_httpget returns a Tcl string that is usually already UTF-8. When tdom parses it and sees encoding="iso-8859-1" in the xml processing instruction, it will think that the string is ISO-8859-1, and convert the encoding. This will result in uglyness with ISO-8859-1 characters.

The solution I found was to remove the xml processing instruction with a regexp before sending the document to tdom.