Forum OpenACS Q&A: OpenACS Internationalization HOWTO v0.1

OpenACS Internationalization (i18n) HOWTO v0

OpenACS Internationalization (i18n) HOWTO v0.1

-- Kenny Chan

2/24/2001

 

This document intends to be a personal journal for my research on i18n under OpenACS and by sharing this, I hope people can save the time experimenting and hitting walls like me. Time was spent and I want to make the time spent worth. Just as Phil says, wasted time is not wasted anymore if we share the information and help others saving their time. I also hope that this doc leads to further investigation for OpenACS i18n because this doc doesn’t deal with problems like how would ns_sendmail work with i18n.

 

There are two options that I have tested and found working as of today (2/24/2001). The current latest versions of all the related softwares are:

 

PostgreSQL v7.0.3 (rpm version, from http://www.postgresql.org)

Aolserver v3.2 (original from aolserver.com)

Aolserver v3.2 + ad12 (from ArsDigita)

Pgdriver v1.1.0 (from http://www.openacs.org/ stop asking about Pgdriver not working if you are still using the one from aolserver.com :~))

OpenACS v3.2.4 (from http://www.openacs.org/)

 

All tested configurations contain the following 3 common components:

 

  1. PostgreSQL v7.0.3 (rpm version, from http://www.postgresql.org)
  2. Pgdriver v1.1.0 (from http://www.openacs.org/)
  3. OpenACS v3.2.4 (from http://www.openacs.org/)

 

Note: the openacs database encoding is just SQL- ASCII

Configurations found to fail for i18n:

 

  1. The 3 common components + Aolserver v3.2 (original) with tcl8x (nsd8x)

 

Working Configurations:

 

  1. The 3 common components + Aolserver v3.2 (original) with tcl76 (nsd76)
  2. The 3 common components + Aolserver v3.2 + ad12 with tcl76 (nsd76)
  3. The 3 common components + Aolserver v3.2 + ad12 with tcl8x (nsd8x)

 

Details of how to make things work

 

Working configurations #1 and #2:

 

For installation procedures, please check the OpenACS installation documentations.

 

For working configurations 1 and 2, I am not going to talk about the details since these two configurations work pretty much out of the box! (no “detail” I can talk about :~)) There have been postings in the forum talking about not being able to make multi-byte characters work with nsd76, that’s not true! Tcl76 (which nsd76 has compiled into) handles strings in raw form and it should work.

 

However, I do have some pointers for those who couldn’t get it to work. The things to pay attention are that the submitting page and outputting page MUST set to the correct (and same) encoding.

 

All db inserts that require user input have 2 or more tcl pages. The first one contains the html form (filename most likely be pagename.tcl, per ArsDigita’s convention), while the 2nd one or more contains data validation and actual db inserting SQL functions (filename most likely be pagename-2.tcl, pagename-3.tcl, etc. as ArsDigita’s convention). In order for multi-byte characters to work correctly thru input and output, we must set the encoding of the html page so that the client browser transfer the data in the correct encoding. This is kinda vague in text so I would illustrate with some code:

 

Assumptions:

 

Assume we want to use big5 encoding for traditional Chinese.

Assume the database contains a table named i18n_test:

create table i18n_test (id int, first_names varchar (1000));

 

myform.tcl:

 

ns_return 200 "text/html; charset=big5" "

    <form method=get action='myform-2.tcl'>

        First Names: <input type='text' name='first_names'><br>

        <input type='submit' name='submit' value='Submit'>

    </form>

 

myform-2.tcl:

 

set_the_usual_form_variables 0

 

set insertion "insert into i18n_test (id, first_names) values (1, '$QQfirst_names)”

set selection "select first_names from i18n_test where id = 1"

set deletion "delete from i18n_test"

 

set db [ns_db gethandle]

ns_db dml $insertion

set dbfirstnames [database_to_tcl_list $db $selection]

set dbfirstnames [lindex $dbfirstnames 0]

 

ns_db dml $db $deletion

ns_db releasehandle $db

 

ns_return 200 "text/html; charset=big5" "

first name = $first_names

<br>

dbfirstname = $dbfirstnames

"

 

If we don’t explicitly set the character encoding of the data-submitting page (myform.tcl in this case), client browsers would most likely set it as iso-8859-1. Users can still input big5-encoded characters in the form field and submit (e.g. if client is using external viewer like Njwin), but the resulting data passed to the data processing page (myform- 2.tcl in this case) would be junk.

 

Now comes the truly useful stuff, working configuration #3.

The 3 common components + Aolserver v3.2 + ad12 with tcl8x (nsd8x):

 

If we have to use the new features in tcl8x (e.g. non-greedy regexp), do we have to give up i18n? No! Just use the Aolserver with ad patches. ArsDigita provides a patched version of Aolserver that contains security, bug fixes and feature enhancements. It also contains patches to make i18n under tcl8x (nsd8x) easy.

 

Installation of the ad-patched version is pretty much the same as the original version. Just untar and cd to appropriate directory and make; make install… blah.

 

To use character encodings other than iso-8859-1 under nsd8x, we have to tell nsd8x “how to interpret the data submitted”. The new myform.tcl and myform-2.tcl look like this:

 

myform.tcl:

 

set _charset "big5"

ns_return 200 "text/html; charset=$_charset" "

    <form method=get action='myform-2.tcl'>

        <input type='hidden' name='_charset' value='$_charset'>

        First Names: <input type='text' name='first_names'><br>

        <input type='submit' name='submit' value='Submit'>

    </form>

"

 

myform-2.tcl:

 

ns_formfieldcharset _charset

 

set_the_usual_form_variables 0

 

set insertion "insert into i18n_test (id, first_names) values (1, '$QQfirst_names)”

set selection "select first_names from i18n_test where id = 1"

set deletion "delete from i18n_test"

 

set db [ns_db gethandle]

ns_db dml $insertion

set dbfirstnames [database_to_tcl_list $db $selection]

set dbfirstnames [lindex $dbfirstnames 0]

 

ns_db dml $db $deletion

ns_db releasehandle $db

 

ns_return 200 "text/html; charset=$_charset" "

first name = $first_names

<br>

dbfirstname = $dbfirstnames

"

 

The hidden form variable _charset get passed to myform-2.tcl and by making use of the proc ns_formfieldcharset, we can tell nsd8x how to interpret the submitted data from myform.tcl.

 

 

Final note:

 

I just wanted to illustrate the very basics of how to make languages other than English work with OpenACS / Aolserver / PostgreSQL. In order to make a true OpenACS i18n under nsd8x, codes in OpenACS have to be modified accordingly. My suggestion is that we could modify the proc set_the_usual_form_variables to run the proc ns_formfieldcharset internally so we could minimize the code editing to the OpenACS tcl’s.

 

And in case you are interested, I am an aD 3-week boot camper in Berkeley, CA in June 2000. Also let me know if you already knew me from the boot camp :P

 

Further reading: Aolserver3_2+ad12 i18n support

Collapse
Posted by Kenny Chan on
All input are greatly appreciated!
Collapse
Posted by Kenny Chan on
Sorry for the loudsy weirdo characters cuz I created the html in MS word... I was just too lazy :P

MS sux, period

Collapse
Posted by Don Baccus on
It totally breaks Mozilla, BTW.  Konqueror and MSIE both render your page correctly.  Maybe I should send in a bug report ...

Thanks for this note, would you be interested in working with Roberto to get it cleaned up a bit and added to our documentation?

Also, have you talked to Henry Minsky about i18n and ACS Classic 4.x?  I know that there's been a lot of aD discussion on the matter but I've not followed it closely.  Since you've been in contact with Henry, I'm hoping that maybe you know whether or not 4.1 will be better in this regard out-of-the-box, i.e. allowing for the parameterization of the charset when a package is installed or something like that.

Collapse
Posted by Kenny Chan on
Hi Don,

Yea sure, I would be happy to clean it up and have it added to the documentation. So I talk to Roberto directly?

Collapse
Posted by Roberto Mello on
Hi Kenny,

I am following the thread. When you are done, send it to me. I would be glad to test it and add to our documentation.

Collapse
Posted by Henry Minsky on
Here are some random comments about my experiences with i18n for
ACS 3 and 4. They won't make too much sense until you've looked at the patch kit
I made, but I hope they can be helpful when you think about some i18n issues. There's more links to notes at http://imode.arsdigita.com/i18n

You can basically break things down into three pieces:

1) character set encoding issues - you should store everything in
the database is unicode, and AOLserver uses Unicode internally.
You then need to make sure you convert to the appropriate encoding
when you output the pages to the browser.

AOLserver has a couple of
mechanisms for this. The one you mention above, setting the "charset" attribute in the MIME type, is usually the most convenient. You can
also set it using the ns_startcontent command.

Handling user-submitted input is the other half of the problem -
your notes mention using the ns_formfieldcharset mechanism. That
will usually work, but usually you know beforehand what to expect
anyhow because the user is submitting a form you just sent to them,
and you know what encoding you sent it in.

The one thing in my patch kit for AOLserver that might be of help
is that I redefined ns_getform to take an optional "encoding"
argument, and it can be used to convert the user-submitted input
form the specified encoding. You could modify ad_page_contract to
take an optional encoding argument, and to try to look for the
charset cookie using ns_formfieldcharset.

2) Formatting numbers, dates, etc, in a localized way.
There are some tcl utility functions in the acs-lang package, such as lc_numeric, or lc_time_fmt,
which will handle a variety of number and date formatting patterns.  It has
a small amount of locale data gleaned from Linux localization tables (for US, UK, Spain, Germany France) which could be added to on. I am not sure where the script that converts unix locale files to
Tcl is though. The ArsDigita London office may still have it someplace.

3) Message-catalog facility.
Like GNU gettext, the acs-lang package provides a catalog for storing
short message strings and their translations. So you can create
message catalog files containing translations of all the strings that you would
otherwise hardcode into your source code, and assign them
symbolic names. There is also a hook to the ACS 4 templating system
to define a "TRN" tag which lets you reference the message catalog
in an .adp page.

Collapse
Posted by Henry Minsky on
I put a small set of patches up at http://www.ai.mit.edu/people/hqm/openacs for running a Japanese OpenACS 3.2 installation.