Forum OpenACS Development: Re: About Charsets

Collapse
4: Re: About Charsets (response to 1)
Posted by Eduardo Pérez on
<blockquote> The quick answer is that the idea is to send utf-8 from multilingual servers (this is what the translation server does). We set the charset in an HTTP header and Mozilla, IE, and Opera seem to understand this fine. We should probably set the charset in the HTML code as well (don't remember the syntax right now).
</blockquote>

Setting the charset used in the HTML header is a good idea if someone is downloading the file and wants to open it later. I think most browsers don't put the HTTP charset in the HTML header if the HTML page lacks it.

<blockquote> All locales that are not represented with ISO-8859-1 are exported to utf-8 catalog files.
</blockquote>

Why?
Why not having all the catalog files in UTF-8 as (for example) the GNOME project does (with the po files)?

Collapse
5: Re: About Charsets (response to 4)
Posted by Jeff Davis on
> Why not having all the catalog files in UTF-8 as (for example) the GNOME project does (with the po files)?

The problem is that if people edit the file, unless they have their local editor local editor set to utf-8 it will mess up the file if they insert any high bit characters. At a guess I would say the majority of developers are running with their charset in their editor as iso-8859-1 or iso-8859-15.

The reason this came up is because it turns out that tcl has (or had maybe?) a problem with iso-8859-6 (arabic) where the numbers are mapped to the unicode arabic numeric code points and so iso-8859-6 -> utf-8 -> iso-8859-6 was not idempotent. The simple solution was to store arabic things in utf-8 which is how we ended up where we are.

Maybe it would be better to store everything in utf-8 but if we do, I expect we will end up with some messed up catalog files at some point (although this is true no matter what encoding we chose).

Eduardo, when you edit a file, is your editor in utf-8 or iso-8859-1?