Forum OpenACS Development: Re: Ain’t No ☀shine

Collapse
10: Re: Ain’t No ☀shine (response to 8)
Posted by Stefan Sobernig on
The surprising thing in my "fconfigure" example is, that the variable v is not "written", but only used as input source for the file writing process. And this fact alone changes the state of the internal representation in a way, that leads to the problematic behavior.
It may be surprising because you were presented a broken string plus file download. As I tried to explain, this is/was a bad interaction between not shaping the value (bytearray) properly on its way out of Tcl and NaviServer sniffing on the presence of a bytearray, but because of its brokeness presenting an octet-stream.

The fact the values in Tcl (the value Tcl_Obj referenced by variable v) is transmogrified into different internal representations (e.g., different bytearrays without you noticing) with or without an external (string) representation, this is at the heart of Tcl.

Some of the surprise will go away when using [encoding convertto] as shown below because it will take care of not touching any shared value:

set v "\u2600"; # ☀
::xowiki::write_file /tmp/sun [encoding convertto utf-8 $v]
ns_return 200 text/plain $v
The value being written out to the file will distinct from the one ending up in ns_return. As I said, one must follow the protocol.
Collapse
12: Re: Ain’t No ☀shine (response to 10)
Posted by Michael Aram on
Thank you, Stefan, for your extensive clarifications. Your approach might be the correct way to deal with this kind of stuff when programming in plain Tcl, and maybe as well in special cases when implementing NaviServer/OpenACS applications. However, in the typical case of development "inside the framework (NaviServer/OpenACS/XO* Packages)", there is usually no need to convert values from or to UTF-8. The framework deals already with this, so most attempts to "convert" something result in over-converting something.

Actually, the developer who faced this bug in the first place, tried to fix the problem using "convert", and even thought she had fixed it (using [encoding convertfrom [ns_conn encoding] $result]). However, this "fixed" the problem only for German umlauts, not for the three-byte sun (which was not part of the initially observed problem).

During code review I saw this "conversion-based fix" and immediately had the suspicion that there must be something wrong at some other place. Now that we have found the bug, or at least a solution within the framework, everything is fine! Thank you!