Forum OpenACS Q&A: Response to Dealing with non-Roman character sets

Collapse
Posted by Henry Minsky on
You need Unicode, my man!

Unicode has a character code for every written human language,
for the most part.

Now, getting the correct font and displaying correctly on your computer is another issue. Microsoft browsers can generally display multilingual data in Unicode. The most common encoding used
is UTF-8, a variable-length code which is quite compact for ASCII
and western languages.

Check out http://www.geocities.com/i18nguy/unicode-example.html for
an example.

Getting ACS to properly encode data in Unicode requires a little bit
of setup. Are you using OpenACS 3 or OpenACS 4?

I just started running OpenACS4 a few weeks ago. I had some patches for
Arsdigita's ACS 4 to use Japanese and so forth by using UNicode internally in the database, and providing hooks to AOLserver to
tell it what character set encoding to accept and emit (if you are
letting people enter text via forms, you may need to know what
encoding the text is in, which is somewhat f*cked because of
lack of standards amongst browsers).

Anyway, I have some patches for ACS 3.2.5 that will let you do
Unicode, and I will try to work up an official set of changes that
could go into OpenACS4. That will require a little bit of effort, but
most of the groundwork is there, AOLserver and Postgres both handle Unicode (and so does Oracle) it's just a few lines of code here
and there in ACS to set the encodings when pages are generated or
forms variables are read from browsers.