I've built a number of i18n sites with UTF-8 on OpenACS 3.x and OpenACS 4.x. I would use the HackContentType parameters in nsd.tcl, and never had to think about it very much.
But I now have a client who wants, for various historical reasons, a static-only site, with the option of introducing dynamic items later on. So I basically have an OpenACS 4.5 installation that's using acs-templating, but no modules other than that.
The problem is that the client's graphic designer doesn't want to use Unicode. Instead, she wants to use files encoded in windows-1255, which is roughly equivalent to iso-8859-8 (aka English and Hebrew). I wouldn't think that this is a problem, but it has caused me no end of headaches:
- I wouldn't mind letting the encoding remain unspecified, and using a meta tag at the top of each .adp page indicate the encoding. But the Content-type header always comes with an explicit declaration of iso-8859-1.
- If I do nothing, then the files are declared to be iso-8859-1, meaning that the Hebrew characters look like Western European accented vowels.
- If I use HackContentType to set the encoding explicitly to iso-8859-8, then I get question marks instead of Hebrew characters.
- The encoding system and ns/mimetypes and ns/encodings tricks that I've seen mentioned on the bboard don't seem to have any effect. I still get the question marks.
- I even modified the acs-templating file-reading utility, such that it explicitly sets the encoding on the input file descriptor to iso-8859-8. No such luck.
- I'm definitely getting question marks sent in the HTTP response. My browser is accurately reflecting what it received; the problem is not a matter of fonts or browser confusion. ngrep clearly indicates that I'm getting a long string of 0x3F characters.
I assume that the problem is that OpenACS templates are somehow assuming that the files on disk -- which are encoded in iso-8859-8 -- are actually encoded in UTF-8, and that this is confusing someone. But I can't figure out just who is getting confused, or why the characters are being returned to me in a garbled state.
Any suggestions? I think that the client is beginning to think that I don't know what I'm talking about with i18n, and I'm beginning to wonder if they're right...
Request notifications