Forum OpenACS Q&A: Re: Asian Characters and UTF-8 Encoding - Any experiences?

Thanks a lot, Brian!

That sounds excellent. To conclude your statement: There is no other nasty issue with double-byte characters (like: Oracle driver implementation, screwed AOLServer implementation, ...) except for the Oracle charset.

And yes, our default Oracle charset is still US7ASCII, following a recomendation of Philip of around 1999...

Bests,
Frank

I've certainly had no problems with double-byte characters using a UTF-8 database, but I've only been using European letters. Maybe somebody else can confirm that there are no issues with Asian characters.

The latest version of the OpenACS install docs recommends a Character Set of UTF8:

https://openacs.org/doc/current/oracle.html

Best wishes,
Brian

(Not that it's relevant, but I rather doubt that Philip G. ever recommended ASCII7 for the Oracle character set, even in 1999.)

The fact is that at least through Oracle 8.1.7.x, ASCII7 is the default. Oracle also stupidly hides this in their installer, so unless you specifically look for it when installing Oracle, you will almost certainly end up with with an ASCI7 rather than UTF8 database - bad. My old outdated Oracle install notes have some more info on that UTF8 character set stuff.

At Greenpeace we are serving Arabic and Hebrew pages using UTF-8, Russian using iso-8859-6 and Chinese using big5 and EUC-CN*. All this from one AOLServer setup (hacking the content type header)

Using AOLServer 3.3oacs, a heavily modified OpenACS 4.6.3 and Oracle 8.1.7 with UTF-8 encoding

/Nis

* Note: The EUC-CN encoding is commonly referred to as GB2312 - but the two are different encodings of the same character set, and EUC-CN is the one everyone is using.

Ni Nis,

thanks a lot for your comment. May I ask you why you are using different encodings for Russian and Chinese?

Bests,
Frank

Nis,

I have some issues with displaying German characters in an old 4.6.2 through ad33.13. Any text returned from PostgreSQL displays properly but any UTF-8 text in adp pages or pulled from the file system simply displays question marks in browsers.

I have tried adding a meta http-equiv tag to set the content-type charset parameter to UTF-8 and also have tried setting it to iso8859-1 but this makes absolutely no difference.

I wonder if you would give me some guidance on how to hack the content type header from aolserver. Also does the ns_param OutputCharset parameter work for ad33.13 or is this a parameter for a later version of Aolserver?

Many Thanks

Regards
Richard

Richard, in your AOLserver config file, do you perhaps have some settings like these?:

ns_section ns/parameters
   ns_param OutputCharset iso-8859-1
   ns_param HackContentType 1

ns_section ns/MimeTypes
   set mime_plain {text/plain; charset=iso-8859-1}
   set mime_html  {text/html; charset=iso-8859-1}

   # See also "http://dqd.com/~mayoff/encoding-doc.html" for advice on
   # character sets and MIME types in AOLserver.

   ns_param Default     $mime_plain
   ns_param NoExtension $mime_plain
   ns_param .txt  $mime_plain
   ns_param .text $mime_plain
   ns_param .htm  $mime_html
   ns_param .html $mime_html

That's what I use in AOLserver 4.0.10, but note that I am purposely serving only iso-8859-1 content. If you are trying to serve UTF-8, some of the settings above would probably break stuff for you.

Talk about serendipity ... I randomly went to look at openacs.org for the first time for months - and found this (I did not have notifications on for this thread - I believe there should be a way to set forums up to always do that).

Anyway, I don't think your problem is the same that we solved. All our adp files were[1] in plain ASCII - the big trick was to make AOLServer do the "correct" conversion of the generated page (Unicode -> local encoding)

It sounds to me like AOLServer makes wrong assumptions about the charset of your files. Not sure how to handle that.

[1] We are now running a new OACS-based CMS, serving everything as utf-8.

Hi All,

Well I was searching for Unicode to Arabic conversions and came across this post.

What I am trying to do is convert some unicode data that I have into Arabic, there is a function written that does the conversion using a sequence of Case stmts and it searches for the unicode chars and then replaces it with the corresponding Arabic character.

e.g. IF strT = '067E' THEN Dest := Dest || Chr(129 USING NCHAR_CS); --129Arabic Peh

Well the function works correctly in Oracle 9i and above but refuses to compile in versions of Oracle below 9i.

Here are some more details:
Oracle Version: Oracle 8.1.7
Compilation error: PLS 00561- Characterset mismatch on value for parameter 'Right'.

I am assuming it is the Nchar_cs that creates the problem.

Would appreciate any help on this. Thx in Advance

Regards
Chetz

I have recycled the parts of my brain I used to store Oracle knowledge. Even then, I don't think I ever worked with non-standard character sets in it - only ASCII and some Oracle version of unicode.

My suggestion would be to do any transformations in tcl, rather than in Oracle. In fact I would suggest getting rid of Oracle and switch to Postgres. We did, and I haven't had a single moment of regret.

/Nis