Forum OpenACS Q&A: OpenACS Internationalization

Collapse
Posted by Kenny Chan on

I would really like to see internationalization support in OpenACS, but I would present this problem in the language that I am trying to make OpenACS to work with...

I wanna bring this thread up again because it seems that no one has ever gotten OpenACS to work with the Chinese language (or any other non-Eng language?). If anyone has any suggestion / comment / experience with this issue, please post. Here is what I did recently:

I have read Building a Multilingual Web Service Using ACS in ASJ and have attempted to use OpenACS with Big5 character encoding (Tranditional Chinese). The ASJ doc mentions that there are a few levels of problems that have to be resolved before ACS can handle international character encodings.


1. the RDBMS level
2. the DBMS driver level
3. the aolserver level
4. the ACS Tcl code level

In my attempt to make my installation of OpenACS to work with my desired language, here is what I did to address point 1 and 3 of the above problems respectively:

1. building the PostgreSQL database with UNICODE or EUI_TW encoding (I tried both)

3. Using the latest aolserver3.2+ad10

I was immature enough to test if OpenACS would work with Chinese after doing the above two solutions ONLY. I tried using different nsd binaries, the nsd76 and nsd8 as well, because there were postings about differences in handling of double-byte characters in tcl76 and tcl8.

The way I test if OpenACS would work was to submit Chinese characters in the user biography and retrieve to see if it presents the same Chinese characters. All I got was junk characters.

In short, It didn't work.

The remaining 2 problems that have to be resolved are the DB driver and ACS TCL code.

I would like to know how the pgdriver v1.1 (the official OpenACS driver) handles character encodings. Is there any encoding translation? Or does it do SQL queries in UNICODE? If not, is it a planned feature in the near future?

For the very last problem, the ACS Tcl problem, I assume we can port the internationalization patch for ACS to OpenACS available here

If you have ever work with other languages with OpenACS, please post your experience. Any input would be greatly appreciated.

Collapse
Posted by Don Baccus on
I know that one set of bug fixes for PG 7.0.3 involved internationalization.  I suggest downloading and installing that with the right switches, for starters.

Then ...

1. Test PG directly.  Create a table with a "text" field, then insert and retrieve Chinese characters.  Make sure that works.

2. Write a simple Tcl script that retrieves rows ALREADY IN THE TABLE, ns_write them, and see if you get Chinese characters out.

3. Then write a simple Tcl script to insert a literal string with Chinese characters - don't use your browser at this point, do it "by hand".

(try with nsd76 and nsd8x - 76 is much better for OpenACS in general because Tcl 8x has memory leaks associated with regexps).

4. At this point, if Chinese characters are working fine, you can try testing an OpenACS instance.  If it fails, then it's time to start asking questions about the toolkit.

The above is just one approach to isolating the problem.  Unfortunately, I don't have any experience with extended character sets and don't have a keyboard capable of generating random chinese characters.

The person at aD who's done most of the work has been Henry Minsky, as  part of work that's been done for Japanese clients.

Collapse
Posted by Tilmann Singer on
I have a similar problem with german umlauts, here is a summary of some tests (with umlauts instead of Chinese characters).

Aolserver version is always AOLserver/3.2+ad10.


> 1. Test PG directly. Create a table with a
> "text" field, then insert and retrieve Chinese
> characters. Make sure that works.

That works. Postgres has Multibyte enabled and the database is in
UNICODE encoding (according to encoding).


> 2. Write a simple Tcl script that retrieves rows
> ALREADY IN THE TABLE, ns_write them,
> and see if you get Chinese characters out.

Works both with nsd76 and nsd8x for me.

Writing posted values into a file: works with both.

Sending mail with ns_sendmail always works (posted values, read from
db, read from file) with both.

Reading a file from the file system with umlauts in it and storing
its text in the db: works with nsd76, fails with nsd8x.


So they only place where characters get translated wrong seems to be
on the aolserver->postgres path when using nsd8x.


> (try with nsd76 and nsd8x - 76 is much better
> for OpenACS in general because Tcl 8x has
> memory leaks associated with regexps).

Somewhere at arsdigita I read that those issues are now resolved and
everybody should use nsd8x. They will eventually stop support for
nsd76 - e.g. ns_cache only works with nsd8x.

Unfortunately there seems to be no other alternative than to use nsd76
right now if you need a non US encoding.


Is it safe to say that according to the tests above it is an issue
with the nspostgres driver?


> I know that one set of bug fixes for PG 7.0.3
> involved internationalization. I suggest
> downloading and installing that with the right
> switches, for starters.

Tried it, same troubles. (PostgreSQL 7.0.3 on i686-pc-linux-gnu,
compiled by gcc 2.95.2)
Collapse
Posted by MK Tam on
In my experience AOL3.2+ad10 + nsd8x is not a good choice for Big5 charset, two examples I've tried:

1. insert Big5 chinese into a bboard, just get junk codes!
2. static html with meta <text/html; charset=big5>, but my IE uses ISO charset to render it!

I've tried AOL3.1 + nsd8x with Openacs, the bboard works just fine.

Collapse
Posted by Kenny Chan on
Hi MK Tam,

What version of the PostgresQL driver are you using when you got success in the bboard with Big5?

Thanks in advance for your response.

Collapse
Posted by MK Tam on
pg-driver-1.1

one more thing: despite I can insert message in Big5, full-text search fails...

Collapse
Posted by Kenny Chan on
Hi MK Tam,

How about the localization (or locale, whatever it's called) setting in PostgreSQL? What did you use? Unicode ? or TW? Thanks for your response.

Collapse
Posted by MK Tam on
Both works.
Collapse
Posted by Kenny Chan on
Ok here are more detailed tests that my friend and I went thru today, we used RPM installations for OpenACS:

1. Using psql, insert into and select Big5 (double-byte character encoding for Chinese) data from the database, with both UNICODE and EUI_TW multi-byte database settings..... *all worked*.

2. Simple tcl proc to select the existing Big5 data from the database, thru the OpenACS architecture (browser << aolserver + tcl << pgdriver << postgresql)..... *worked*

3. Simple tcl proc to insert Big5 data into the database, thru the OpenACS architecture (browser >> aolserver + tcl >> pgdriver >> postgresql) and then check data with the same proc from #2 above..... ****FAILED****

Can we safely say that it is the problem of aolserver + tcl now? And if so, is there any solution to this?

Collapse
Posted by Kenny Chan on
One more thing... in the test #3 I mentioned, we used form submit for the data...

Also one last thing we tested:

test #4: Simple tcl proc which does insert some hard-coded Big5 characters into the database, when retrieving, get junks..... ****FAILED****

Collapse
Posted by Wolfgang Winkler on
I'm a German native speaker and I succeeded in using all
(Western)-European Languages by simply putting the following Line at
the end of ReturnHeaders and ReturnHeadersNoCache:

ns_startcontent -charset "iso-8859-1"

The reason why this works is simple: Tcl uses UTF8, but I'd like
iso-8859-1. So the ns_startcontent proc tells Tcl to use iso-8859-1.
You should be able to put in any charset because the headers are
still written in whatever they are written normally, as the
charset-conversion starts below the header-output.

BTW, I use openacs 3.2.4, SuSE 7.0/7.1 and PostgreSQL 7.0.3 from
rpm.

Collapse
Posted by Tilmann Singer on
Wolfgang, does your setup also correctly deal with strings that are transmitted from the browser to the server and stored in the db? What is the encoding of your database (output of psql -l)?

see also this thread.

thanks
Collapse
Posted by Wolfgang Winkler on
Hi Tilmann!

encoding in psql gives me SQL_ASCII for encoding. I followed all
threads on encoding at openacs.org but nothing worked for me, so I
tried it myself.

Now I have no problems with strings that are sent via post or get
operations or retrieved from the database.

But I've also put this in my server.tcl file:

set charset iso-8859-1

ns_section "ns/parameters"

ns_param  URLCharset    $charset

ns_param  OutputCharset $charset

ns_param  HackContentType true

But only after the modifications of ReturnHeaders and
ReturnHeadersNoCache the whole thing worked and I've had no problems
so far. I've tested it heavily using german umlauts.

Collapse
Posted by Henry Minsky on
I have a set of patches to make openacs 3.2.4 work in Japanese
(SJIS). that includes handling user POST and GET requests in
Japanese, as well as delivering .tcl and .adp files which are
authored in ShiftJIS.  If you download the files listed at

http://www.ai.mit.edu/people/hqm/openacs/

You will have a system that runs SJIS. There is one set of patches
to AOLserver, which are generic. The patches to openacs are hardcoded
for SJIS, but you can easily see how to generalize them to any
charset(just replace "shift_jis" with $your_desired_encoding).

Collapse
Posted by Kenny Chan on
I would like to direct those people who are interested in Internationalization of OpenACS to this thread too:

OpenACS Internationalization HOWTO v0.1
Collapse
Posted by Gaizka Villate on
I've found a problem soon after i used Henry's patches, and i'd like to know if someone else has found that problem.

In a page i use ad_page_variables to make some variables validation, and then i make a call to set_the_usual_form_variables. I do this because there are some variables whose name i don't know.

If i go to this page, after 5 or 10 seconds, the following error appears:

Error writing content: resource temporarily unavailable
    while executing
"ns_conncptofp $fp"
    (procedure "ns_getform" line 29)
    invoked from within
"ns_getform"
    invoked from within
"set form [ns_getform] "
    ("uplevel" body line 5)
    invoked from within

Has anybody else found this problem? ANy suggestions?

Thanks for your help.

Collapse
Posted by Gaizka Villate on
Nevermid, it was my fault.

I had been playing with ns_setformencoding and i had place two of them in set_the_usual_form_variables.

After i've left that proc in its original state, everything's working smoothly. Sorry.