Forum OpenACS Q&A: Re: Asian Characters and UTF-8 Encoding - Any experiences?

Collapse
Posted by Brian Fenton on
I've certainly had no problems with double-byte characters using a UTF-8 database, but I've only been using European letters. Maybe somebody else can confirm that there are no issues with Asian characters.

The latest version of the OpenACS install docs recommends a Character Set of UTF8:

http://openacs.org/doc/current/oracle.html

Best wishes,
Brian

Collapse
Posted by Nis Jørgensen on
At Greenpeace we are serving Arabic and Hebrew pages using UTF-8, Russian using iso-8859-6 and Chinese using big5 and EUC-CN*. All this from one AOLServer setup (hacking the content type header)

Using AOLServer 3.3oacs, a heavily modified OpenACS 4.6.3 and Oracle 8.1.7 with UTF-8 encoding

/Nis

* Note: The EUC-CN encoding is commonly referred to as GB2312 - but the two are different encodings of the same character set, and EUC-CN is the one everyone is using.

Collapse
Posted by Frank Bergmann on
Ni Nis,

thanks a lot for your comment. May I ask you why you are using different encodings for Russian and Chinese?

Bests,
Frank

Nis,

I have some issues with displaying German characters in an old 4.6.2 through ad33.13. Any text returned from PostgreSQL displays properly but any UTF-8 text in adp pages or pulled from the file system simply displays question marks in browsers.

I have tried adding a meta http-equiv tag to set the content-type charset parameter to UTF-8 and also have tried setting it to iso8859-1 but this makes absolutely no difference.

I wonder if you would give me some guidance on how to hack the content type header from aolserver. Also does the ns_param OutputCharset parameter work for ad33.13 or is this a parameter for a later version of Aolserver?

Many Thanks

Regards
Richard

Richard, in your AOLserver config file, do you perhaps have some settings like these?:

ns_section ns/parameters
   ns_param OutputCharset iso-8859-1
   ns_param HackContentType 1

ns_section ns/MimeTypes
   set mime_plain {text/plain; charset=iso-8859-1}
   set mime_html  {text/html; charset=iso-8859-1}

   # See also "http://dqd.com/~mayoff/encoding-doc.html" for advice on
   # character sets and MIME types in AOLserver.

   ns_param Default     $mime_plain
   ns_param NoExtension $mime_plain
   ns_param .txt  $mime_plain
   ns_param .text $mime_plain
   ns_param .htm  $mime_html
   ns_param .html $mime_html

That's what I use in AOLserver 4.0.10, but note that I am purposely serving only iso-8859-1 content. If you are trying to serve UTF-8, some of the settings above would probably break stuff for you.

Talk about serendipity ... I randomly went to look at openacs.org for the first time for months - and found this (I did not have notifications on for this thread - I believe there should be a way to set forums up to always do that).

Anyway, I don't think your problem is the same that we solved. All our adp files were[1] in plain ASCII - the big trick was to make AOLServer do the "correct" conversion of the generated page (Unicode -> local encoding)

It sounds to me like AOLServer makes wrong assumptions about the charset of your files. Not sure how to handle that.

[1] We are now running a new OACS-based CMS, serving everything as utf-8.

Hi All,

Well I was searching for Unicode to Arabic conversions and came across this post.

What I am trying to do is convert some unicode data that I have into Arabic, there is a function written that does the conversion using a sequence of Case stmts and it searches for the unicode chars and then replaces it with the corresponding Arabic character.

e.g. IF strT = '067E' THEN Dest := Dest || Chr(129 USING NCHAR_CS); --129Arabic Peh

Well the function works correctly in Oracle 9i and above but refuses to compile in versions of Oracle below 9i.

Here are some more details:
Oracle Version: Oracle 8.1.7
Compilation error: PLS 00561- Characterset mismatch on value for parameter 'Right'.

I am assuming it is the Nchar_cs that creates the problem.

Would appreciate any help on this. Thx in Advance

Regards
Chetz

I have recycled the parts of my brain I used to store Oracle knowledge. Even then, I don't think I ever worked with non-standard character sets in it - only ASCII and some Oracle version of unicode.

My suggestion would be to do any transformations in tcl, rather than in Oracle. In fact I would suggest getting rid of Oracle and switch to Postgres. We did, and I haven't had a single moment of regret.

/Nis