Forum OpenACS Q&A: Response to Non-english characters in OpenACS 4.5

Collapse
Posted by Reuven Lerner on

I'm using Unicode more and more on Web sites that I work on, because my clients need English + Hebrew + some other language. There's no easy or reasonable way to do this without Unicode.

Luckily, it's pretty easy to set up an OpenACS system that uses Unicode with UTF-8 encoding:

  • Create your PostgreSQL database with Unicode encoding. That is, instead of just saying createdb openacs, say createdb openacs --encoding=UNICODE.
  • The only other major step involves telling AOLserver that it should modify the outgoing Content-type header such that it indicates output will be in in UTF-8, rather than the default of Latin-1. (If you have HTML forms, then input from those forms will automatically be in UTF-8 if the page itself was sent in UTF-8.) Add these four directives to your nsd.tcl:
    ns_param   HackContentType 1
    ns_param   URLCharset      utf-8
    ns_param   OutputCharset   utf-8
    ns_param   HttpOpenCharset utf-8
    

    (I found these directives on the bboard a while back, and don't remember who originally suggested them.)

These should be sufficient to ensure that OpenACS works in UTF8. However, design issues relating to language -- such as right-to-left and left-to-right issues that we deal with in Hebrew -- and formatting, dates, currencies, and other such things. And I had problems with OpenACS 3's implementation of ns_sendmail, which needed some tweaking to send UTF-8. And my friend and colleague Danny Lieberman reports that depending on your version of glibc, Unicode collating (i.e., sorting) might not work just right, so lists of bboards or users might look funny. (New versions of glibc seem to be much better about this than old ones.)

But these many little issues aside, it works pretty darned well!