Forum OpenACS Development: Re: Ain’t No ☀shine

Collapse
8: Re: Ain’t No ☀shine (response to 1)
Posted by Michael Aram on
Thank you all for your comments!

Firstly, please allow me to mention, that the concrete examples that I have provided are somewhat artificial. Our initial problem was related to "not-usual" encoding problems (i.e. the browser rendered garbage instead of German umlauts). However, the problem existed only on certain "places" (not "everywhere" on the whole instance) and it did not show up on older instances running NaviServer 4.99.11/Tcl8.5 but only on newer instances with newer NaviServer versions (not only Tcl 8.6, also Tcl 8.5).

So, in the course of trying to grasp this, we played around much. It was kind of tricky, because for example ns_log always rendered a three-byte sun, whereas the immediately following ns_return returned a one-byte-broken-sun to the browser. However, ns_return did not have problems with three-byte suns per se. We suddenly realized that "touching" the variable (e.g. via append) seemed to solve the problem completely. In the course of this debugging session, we tried to write the content to a file, just for debugging purposes, and suddenly saw the strange behavior reported in this forum thread. Now we luckily had stumbled upon a reproducible snippet that produced a similar situation as we encounter (we were not able to produce a "damaged" variable that poses problems to ns_return any another way).

The surprising thing in my "fconfigure" example is, that the variable v is not "written", but only used as input source for the file writing process. And this fact alone changes the state of the internal representation in a way, that leads to the problematic behavior.

Just to be clear, let me again show the problem, this time with more details about responses. The following example from above (which does not use Tcl8.6 idioms) triggers the following response from a NaviServer 4.99.14 with Tcl 8.5.

set v "set v "SUN☀SUN"
set F [::open /tmp/sun w]; ::fconfigure $F -translation binary; ::puts -nonewline $F $v ; ::close $F
ns_return 200 text/plain $v
The browser opens the request as a download/attachment file, the name of which is "shell" and the content of which is the content of the v variable (with a 1-byte broken "sun"). mitmproxy's hex-output mode shows the following response from the server (the sun is a 00 here)
2017-11-15 10:44:29 POST https://example.com/ds/shell
                         ← 200 text/plain 7B 59ms

Server:          nginx
Date:            Wed, 15 Nov 2017 09:43:26 GMT
Content-Type:    text/plain
Content-Length:  7
Connection:      keep-alive
X-User-Id:       297369
X-Thread-Id:     7f2d6affd700
Accept-Ranges:   bytes

HEX VIEWER
0000000000 53 55 4e 00 53 55 4e                              SUN.SUN
However, when only changing binary to auto (or when omitting the whole second line), the request succeeds.
set v "SUN☀SUN"
#set F [::open /tmp/sun w]; ::fconfigure $F -translation auto; ::puts -nonewline $F $v ; ::close $F
ns_return 200 text/plain $v
The browser renders the expected result, and mitmproxy shows three bytes for the sun.
2017-11-15 10:45:41 POST https://example.com/ds/shell
                         ← 200 text/plain 9B 134ms

Server:          nginx
Date:            Wed, 15 Nov 2017 09:44:38 GMT
Content-Type:    text/plain; charset=utf-8
Content-Length:  9
Connection:      keep-alive
X-User-Id:       297369
X-Thread-Id:     7f2d6affd700

HEX VIEWER
0000000000 53 55 4e e2 98 80 53 55 4e                        SUN...SUN
So it cannot be a problem with Tcl 8.6 or higher, as it exists under Tcl 8.5 as well. Do you expect, that the HEAD version of NaviServer fixes the problems also with Tcl8.5?
Collapse
9: Re: Ain’t No ☀shine (response to 8)
Posted by Stefan Sobernig on
Hi Michael!

Do you expect, that the HEAD version of NaviServer fixes the problems also with Tcl8.5?

If you make sure that you massage the value (Tcl_Obj) that you pass to/ receive from properly with [encoding convertto] and [encoding convertfrom], the examples that you showed will work whatever Tcl version you run. This is not some issue of some Tcl version you are looking at. It is a more general requirement when you turn Tcl (internal) values into an external value; and vice versa.

browser opens the request as a download/attachment file
This is a symptom of a subtle interaction of failing in complying with the above value protocol plus the way NaviServer (presumably post 4.99.11) sniffs on the value type. It is not related to Tcl. If one fails to comply with the above, will not be presented a broken string as an octet-stream (file download), but a broken string rendered by the browser. No win, just a shift.

So, in short (haven't checked but that is my understanding):

- use [encoding convertto] and [encoding convertfrom] in a disciplined manner when handling your values to an IO channel that does not run the transformation on its own (this is what you observe as the diff between -translation auto/binary, -translation binary implies -encoding binary, meaning don't touch the outgoing value).

- if you want to have the unexpected octet-stream go away in the responses on broken strings, you should update NaviServer, but you have to get the above right anyway. But emphasis is on the first item.

HTH, Stefan

Collapse
10: Re: Ain’t No ☀shine (response to 8)
Posted by Stefan Sobernig on
The surprising thing in my "fconfigure" example is, that the variable v is not "written", but only used as input source for the file writing process. And this fact alone changes the state of the internal representation in a way, that leads to the problematic behavior.
It may be surprising because you were presented a broken string plus file download. As I tried to explain, this is/was a bad interaction between not shaping the value (bytearray) properly on its way out of Tcl and NaviServer sniffing on the presence of a bytearray, but because of its brokeness presenting an octet-stream.

The fact the values in Tcl (the value Tcl_Obj referenced by variable v) is transmogrified into different internal representations (e.g., different bytearrays without you noticing) with or without an external (string) representation, this is at the heart of Tcl.

Some of the surprise will go away when using [encoding convertto] as shown below because it will take care of not touching any shared value:

set v "\u2600"; # ☀
::xowiki::write_file /tmp/sun [encoding convertto utf-8 $v]
ns_return 200 text/plain $v
The value being written out to the file will distinct from the one ending up in ns_return. As I said, one must follow the protocol.
Collapse
12: Re: Ain’t No ☀shine (response to 10)
Posted by Michael Aram on
Thank you, Stefan, for your extensive clarifications. Your approach might be the correct way to deal with this kind of stuff when programming in plain Tcl, and maybe as well in special cases when implementing NaviServer/OpenACS applications. However, in the typical case of development "inside the framework (NaviServer/OpenACS/XO* Packages)", there is usually no need to convert values from or to UTF-8. The framework deals already with this, so most attempts to "convert" something result in over-converting something.

Actually, the developer who faced this bug in the first place, tried to fix the problem using "convert", and even thought she had fixed it (using [encoding convertfrom [ns_conn encoding] $result]). However, this "fixed" the problem only for German umlauts, not for the three-byte sun (which was not part of the initially observed problem).

During code review I saw this "conversion-based fix" and immediately had the suspicion that there must be something wrong at some other place. Now that we have found the bug, or at least a solution within the framework, everything is fine! Thank you!