Forum OpenACS Development: special characters at tcl page

Collapse
Posted by Iuri Sampaio on
Hi there,

I created a script that reads files and directories from the file system.
However, the names of files and directories that have special characters are not properly assigned into vars

For example: there is a directory called "Conversação"
This name has been assigned as "Conversação" to the tcl list.

The part of code is:

set items [glob -nocomplain "/usr/local/aolserver/servers/cbm/Arquivos/*"]
ns_log Notice "ITEMS: $items"

which brings the following list:

Notice: ITEMS: /usr/local/aolserver/servers/cbm/Arquivos/Rodrigo {/usr/local/aolserver/servers/cbm/Arquivos/Arquivos Geral} /usr/local/aolserver/servers/cbm/Arquivos/Guilherme /usr/local/aolserver/servers/cbm/Arquivos/Informatica /usr/local/aolserver/servers/cbm/Arquivos/Conversação

I already check the system encoding
ns_log Notice "[encoding system]"

the result is iso8859-1

I also tried to force an encoding convertion as in the line:

set items [encoding convertto iso8859-1 $items]

At first i thought it was a matter of how error.log shows the stuff that is logged (showing bytes and chars UTF8). Even because the length of the words are the same.

But the error persists when i insert it in the file-storage. Which means the folder is created with the name "Conversação" instead of "Conversação"

I believe this issue is related to encoding.
What would be the correct encoding to latin america words?

Collapse
Posted by Brian Fenton on
Hi Iuri

I'm afraid I don't have an answer for you but it looks like it may be an issue with glob and encoding. If you try to reproduce the problem in tclsh, that may help to narrow it down. Take a look at this discussion for some pointers:
http://objectmix.com/tcl/352730-tclhttpd-utf-8-a-2.html

There is a script referenced in the article that may be of help: http://www.logic.at/people/avl/stuff/convertNamesToUtf8.tcl

hope this helps
Brian

Collapse
Posted by Iuri Sampaio on
Hi Brian,

Thanks for your help.
[glob] wasn't the issue. It was an econding problem indeed. I didn't find exaclty what the problem was. Somehow AOLServer switches the latin letters that have accents (ex. ç, Â, ã, â, ó, ...) to other signals.
The good thing is it does follow a pattern.

I solved it mannually, which means i read the whole error.log and looked for all special chars generated.
Then, in the script i used the tcl proc [string map {...} ...] to replace them to valid letters.

From that script we could build a new API on oacs core to treat specifically this weird issue.

It would be great if we have time to look at this and talk to OCT people. I am opened for discussion and also to contribute, even improve it, with whatever is possible.

cheers,

Collapse
Posted by Emmanuelle Raffenne on
Hi Iuri,

Check in the developer support shell the result of "encoding system", it should be "utf-8" but I suspect you will get iso8859-1.

If it's the case, then add "encoding sytem utf-8" at the end of your config.tcl file, that should fix it.

Collapse
Posted by Emmanuelle Raffenne on
Iuri,

also make sure your database encoding is utf-8.

Collapse
Posted by Dave Bauer on
It looks like you treated the symptoms but did not actually address the underlying problem. Either there is a filesystem/aolserver encoding mismatch, or there is a problem in the filanem handling in Tcl or AOLserver.

What is the encoding of the filesystem and the default encoding for AOLserver on your installation?

Collapse
Posted by Christian Eva on
Hi Iuri,
I think your problem is that the names are converted twice to utf-8, maybe once by Tcl and once by AOL-Server...,

but the sequence you get you can convert back to utf8 with
set items [encoding convertfrom utf-8 $items]

Regards