Forum OpenACS Q&A: Response to <b>Official Announcment: Internationalization of OpenACS Starting</b>

I have few comments on these plans.

A. It is not clear to me that they would allow my to set up a site the following way:

  1. language is an explicit part of URL; URL defines language. No cookies, no user prefs, no automatic guesses please; everyone gets the same page you linked to (e.g. linked from an external site), simple and predictable. Example:
    http://www.russia.no/sitemap.html
    http://www.russia.no/sitemap-ru.html
    
    It is also important that language-defining part of URL does not bring "?" sign into it, because search engines are more willing to index URLs that don't contain "?"

  2. if the page is available in another language, a language switch (i.e. link to URL of that page in another language) is displayed somewhere in the layout (e.g. in the header). The piece of code to check the availability of page in another language and display the switch should be easy to insert into template where the webmaster wants it; preferably an example should be provided.

  3. note that internal URLs on pages can now be "in different languages" and are part of translated text, for they contain language explicitly. You want to link to English internal pages from an English page, to Russian internal pages from a Russian page - unless the page you link to is not available in the language of current page, in which case you link to page in another language of present more than one link to pages in different languages.
The plans are not worth a penny to me if I can't use OpenACS this way, for this is how I build my sites.
I understand that this approach may be less straightforward to implement into OpenACS and more time-consuming to administer for a Web site, but it IS in many cases more straightforward and predictable to the user, which should override everything else.
 

B. For submitting multilingual texts via forms, I know two ways to handle it:

  1. Pages served to client in UNICODE encoding. Forms input delivered to server in UNICODE encoding.

  2. Pages served to client in some charset A. Forms input is delivered back to server in charset A (usually, unless there's a glitch in browser configuration). If the input contains characters outside charset A, most browsers (hopefully any browser but Netscape 4.x) will encode those characters as &foo; sequences.

    These &foo; sequences can be stored in the database as is. This can cause inconsistencies (e.g. in search), because there is more than one way in use among browsers and tcl/adp coders to denote the same character, e.g. &oslash; and &#248; represent the same "ø" character.

    Better yet, &foo; sequences can be converted into/from UNICODE characters at the same places where UNICODE <-> charset A conversion happens on the server. A possible problem with conversion is that I'm not sure if &foo; sequences can legitimately appear in non-text parts of HTML source and GET/PUT parameters.

While the "all UNICODE, period" approach is tempting, I'd actually prefer to have client-side pages and sources in local charset with &foo; sequences for out-of-charset characters.
 

C. For God's sake, serve Content-Type header in both HTTP response and HTML source <head> section. If served in just one place (e.g. in HTTP response as it is now), some browsers may behave inconsistently (e.g. IE 5.0). I understand that the entire HTML source is the territory of adp template editor, but at least the piece of code that inserts Content-Type header can be in the master template by default.