Forum OpenACS Development: Re: cr_write_content and utf-8

Collapse
Posted by Emmanuelle Raffenne on
Hi Brian,

I had a similar problem with cr_write_content. When content_type is "file" and -string is set, it:

fconfigure $fd -translation binary -encoding [encoding system]

I checked the encoding of our server (debian) in tclsh and using the devsup shell, and to my surprise I would get different results, utf-8 in tclsh but iso-8859 in the devsup shell. However on my laptop (mac), both return utf-8. The diffence between our server and my laptop, beside the OS, is the server is running AOLserver 4.5 and my laptop 4.0.10.

We couldn't figure out why "encoding system" was set to iso-8859 on our server so our workaround was to add "encoding system utf-8" in the config file.

I hope it will help.

Collapse
Posted by Emmanuelle Raffenne on
Correction:

Where it says 'When content_type is "file"', it should read 'When storage_type is "file"'.

Collapse
Posted by Brian Fenton on
Thanks Emmanuelle! That looks useful - I'll take a look at our OS. Does my test case work on your system? I'd be very grateful if you could report back.

thanks
Brian

Collapse
Posted by Gustaf Neumann on
Emmanuelle,

are you sure, you were running on your server system tclsh and aolserver with the same environment variables and linked against the same tcl shared libs? I see with both, Mac OS X and lenny/sid with aolserver 4.5.1 and 4.0 in tclsh and ds/shell always utf-8 for [encoding system].

background: During initialization, Tcl determines the default system encoding from the LC_* or LANG environment variables. If nothing can be found, it uses TCL_DEFAULT_ENCODING, which is set depending on the OS. For example, under Mac OS X the TCL_DEFAULT_ENCODING is utf-8. If configure can't determine anything, the final default system encoding is "iso8859-1". Later, Tcl's system encoding can be altered on the scripting layer via "encoding system ?XXX?" or from C via Tcl_SetSystemEncoding(). Aolserver 4.0.10/4.5.1 does not set it via Tcl or C, naviserver has a config variable named "systemencoding" and sets the encoding in init.tcl (if nothing specified, it defaults to utf-8).

note, that when you load a library file or a www/*tcl script that sets the encoding via "encoding system ...", it is set for the whole server (all threads). The system encoding is a global variable in the Tcl implementation. The only OpenACS package that sets the system encoding is lors-central (most likely, not a good idea).

It is a good idea to check the LANG variable in your startup script for aolserver and use in doubt something like LANG=en_US.UTF-8

Hope this helps and all the best
-gustaf neumann

Collapse
Posted by Emmanuelle Raffenne on
Hi Gustaf,

Thanks for your answer.

I am not sure about the configuration of the server at installation time, I need to check with Héctor on that. From what I can see, LANG is set to es_ES.UTF-8 or en_US.UTF-8 for all the users involved (aolserver one, etc), the default being es_ES.UTF-8.

Regarding setting "encoding system" from inside OpenACS, I already grep'd the whole tree when we first noticed the difference and indeed the only one that sets it is lors-central but in our case 1. we don't use it, 2. it sets it to utf-8 anyway.

Also, trying to run Brian's test case on my mac (so UTF-8 in all cases then), I noticed that "fconfigure $channel -translation binary" would use iso8859-1 unless -encoding is set. I tested with a text file, encoded using utf-8. The new file encoding is iso8859-1. Note that in the content I use spanish specific characters like "ñ".

:-S

Collapse
Posted by Gustaf Neumann on
Do you say that LANG of nsd is set to en_US.UTF-8 and the result of [encoding system] is "iso8859-1"?
Collapse
Posted by Emmanuelle Raffenne on
Gustaf,

Yes, it's what I am saying :S.

Héctor and I just checked again, in case we were missing something, but same result. The user who runs AOLserver has LANG set to UTF-8 (we tried with both en_US.UTF-8 and es_ES.UTF-8 just in case) and still get iso8859-1 when running "encoding system" in the Tcl script of dev-support, while we get "utf-8" from tclsh. Very strange.

Collapse
Posted by Gustaf Neumann on
Just in case: there is a difference between logging in as a user and running a command as a user. If you execute the command

set ::env(LANG)

in ds/shell, do you get "en_US.UTF-8" as result? What Tcl version are you using on the server in question?