Forum OpenACS Development: Re: Ain’t No ☀shine

Collapse
11: Re: Ain’t No ☀shine (response to 1)
Posted by Gustaf Neumann on
The short answer to your question, whether or not the tip version of NaviServer will work on that kind of data is: test it! I have set up a test environment with Tcl 8.5, and yes, it works.

The whole situation with Tcl byte arrays is tricky (to use a polite word). I am pretty sure, you do not want to know the all details but since you brought it up, here it goes. Probably, i have not got everything completely correct, and for sure, details change between Tcl versions.

A "bytearray" is an internal type of an Tcl_Obj for interpreting the internal representation of a Tcl value (not necessarily a Tcl variable). There are situations, when a Tcl byte array is really representing binary data (e.g. content of an image), then there are situations, where byte arrays are used, where content has a string representation in UTF-8, but which does not fit into Tcl's internal UCS-2 representation (e.g. >2 byte UTF-8 characters), and situations, where one has a subset of UTF-8 that fits into UCS-2. Depending on this situation, Tcl has an internal machinery trying to detect, when it is safe to use a byte array directly, depending on the fact, whether or not the byte array has a string rep (the latter is called a pure byte array). Unfortunately, it is also possible to create a string rep in situations, where one should not have a string rep (e.g when the one interactively enters a command returning a pure byte array, or when the Tcl C-API function Tcl_GetStringFromObj() is called on such an Tcl_Obj, eg. in a ns_log or some other command used for debugging. In these cases, it happens easily, that a wrong representation is chosen. Here comes the converto/convertfrom into play, which mostly here for converting to/from the UCS-2 rep and producing proper pure Tcl byte arrays.

For a Tcl application developer, it is not so easy to know, when convertto is necessary. Some good Tcl guys are working to improve the situation.

For example, starting Tcl 8.7a1, Tcl has now two types of Tcl byte arrays, the classical one and a so called "proper" bytearray, replacing the former pure bytearray, and making it more robust against creating string reps. I addition to these, there were many changes to address various bug reports from that area.

Just the indicator "german umlauts are working or not" is not a good indicator, whether the encoding is right. Here is a pure Tcl example, showing the "umlaut" correctly (since it is a 2-byte char) but the black sun is damaged. The code without the converto operations works only fine for "a" and "ü" (UTF-8 2 char), bot not for "☀" (UTF-8 3 char).

foreach v {"a" "ü" "☀"} {
   puts "v <$v> v1 <[binary decode base64 [binary encode base64 $v]]>"
} 
with convert operations at the right places, it works fine.
foreach v {"a" "ü" "☀"} {
   puts "v <$v> v1 <[encoding convertfrom utf-8 [binary decode base64 [binary encode base64 [encoding convertto utf-8 $v]]]]>"
}
Nobody is happy about this, people are working to make it better on Tcl 9.

Hope this helps
-gn

Collapse
13: Re: Ain’t No ☀shine (response to 11)
Posted by Michael Aram on
Thank you very much for your answer. I am interested in "all the details", thank you for the extensive answer. I did not immediately test your fix, as you were explicitly refering to Tcl versions 8.6 and 8.7. I simply thought, if you dont expect it to work for Tcl 8.5, for whatever reason, there is no point in setting up a HEAD-version test environment.

Anyhow, now I have compiled a NaviServer that has the same versions for all its components except for NaviServer itself (from 4.99.14 to 4.99.16d10/HEAD), and I can confirm that the problem disappears when using the HEAD version that includes your fix. The problem also disappears, when doing an append myvar "".

So, finally, the problem seems to be solved. I simply add the workaround and wait for the next NaviServer release.

Thank you, again, for your help!!

PS: Actually, the "culprit" in the "real code" was an ns_log line a few lines above (my log 3-[encoding convertfrom [ns_conn encoding] $interactionParams(preview)]). The problem disappears as well, when I simply remove that line. It seems to have a similar effect as the -translation binary in the artificial example.