Forum OpenACS Q&A: Official Announcment: Internationalization of OpenACS Starting

As some may already know we (Heidelberg) are going to have Collaboraid start work on the internationalization problem so we can go live with dotLRN in 2003.

We want this to be a VERY transparent process and the first thing we will be funding is the creation of a detailed specification of the work that is necessary to fully internationalize OpenACS (a todo list that will go beyond that which we will be able to initially fund).

We want to develop this "grand internationalization plan" in a very transparent, iterative, and open way, where participants from the OpenACS community can follow the progress day by day and help ensure that everything moves in the right direction. Eventually we will use these specifications to search for other sources of funding and/or work within the community to complete the job (which is why it is essential that the community take an active part in these specs!).

The specification will be developed and delivered as DocBook XML and compiled HTML pages in the OpenACS repository, as part of the standard OpenACS documentation. Snapshots of the progress will be put up at a well know URL.

I ask everyone that might have a source of funding for this undertaking (personal/clients/companies/others) to take part in the discussion and to contact me so that we can make sure that specific needs are addressed in the specs to facilitate the creation of an internationalization pot (of money) when and if it is needed. I am sure there will also be a lot of places where volunteers will be able to make a big difference in pushing this forward and I will be asking Collaboraid to manage this.

https://openacs.org/wp/display/326/

Regarding your punchlist:
  • acs-reference is completely international. It was created that way.
  • acs-currency has currencies and the names can be international.
  • timezones was created to be international (to the point of having offsets in the minutes not just hours).
  • acs-person (and acs-address) are internationalized already.(these names are preliminary, assuming Don B & Janine acceptance of said name).
<p>Thank you Jon. If you are talking about the "punchlist" in the Wimpy Point... it will be replaced soon by the spec I mention above. I posted the address to give people a starting point on the problem. In the new spec more time will be invested in closely looking at individual packages and problems (that list in the WP was a very rough starting point that Lars put together when we were very superficially  evaluating how much work would be involved). I am sure Lars, Peter, or Christian from Collaboraid will be posting more soon.
I have few comments on these plans.

A. It is not clear to me that they would allow my to set up a site the following way:

  1. language is an explicit part of URL; URL defines language. No cookies, no user prefs, no automatic guesses please; everyone gets the same page you linked to (e.g. linked from an external site), simple and predictable. Example:
    http://www.russia.no/sitemap.html
    http://www.russia.no/sitemap-ru.html
    
    It is also important that language-defining part of URL does not bring "?" sign into it, because search engines are more willing to index URLs that don't contain "?"

  2. if the page is available in another language, a language switch (i.e. link to URL of that page in another language) is displayed somewhere in the layout (e.g. in the header). The piece of code to check the availability of page in another language and display the switch should be easy to insert into template where the webmaster wants it; preferably an example should be provided.

  3. note that internal URLs on pages can now be "in different languages" and are part of translated text, for they contain language explicitly. You want to link to English internal pages from an English page, to Russian internal pages from a Russian page - unless the page you link to is not available in the language of current page, in which case you link to page in another language of present more than one link to pages in different languages.
The plans are not worth a penny to me if I can't use OpenACS this way, for this is how I build my sites.
I understand that this approach may be less straightforward to implement into OpenACS and more time-consuming to administer for a Web site, but it IS in many cases more straightforward and predictable to the user, which should override everything else.
 

B. For submitting multilingual texts via forms, I know two ways to handle it:

  1. Pages served to client in UNICODE encoding. Forms input delivered to server in UNICODE encoding.

  2. Pages served to client in some charset A. Forms input is delivered back to server in charset A (usually, unless there's a glitch in browser configuration). If the input contains characters outside charset A, most browsers (hopefully any browser but Netscape 4.x) will encode those characters as &foo; sequences.

    These &foo; sequences can be stored in the database as is. This can cause inconsistencies (e.g. in search), because there is more than one way in use among browsers and tcl/adp coders to denote the same character, e.g. &oslash; and &#248; represent the same "ø" character.

    Better yet, &foo; sequences can be converted into/from UNICODE characters at the same places where UNICODE <-> charset A conversion happens on the server. A possible problem with conversion is that I'm not sure if &foo; sequences can legitimately appear in non-text parts of HTML source and GET/PUT parameters.

While the "all UNICODE, period" approach is tempting, I'd actually prefer to have client-side pages and sources in local charset with &foo; sequences for out-of-charset characters.
 

C. For God's sake, serve Content-Type header in both HTTP response and HTML source <head> section. If served in just one place (e.g. in HTTP response as it is now), some browsers may behave inconsistently (e.g. IE 5.0). I understand that the entire HTML source is the territory of adp template editor, but at least the piece of code that inserts Content-Type header can be in the master template by default.

The conversion of & into &amp; for user text input can clash with preserving &foo; sequences. To resolve, one must only convert those &'s that are not the beginning of recognised &foo; sequences.
Your point about using URLs encoding language information is what we did at Greenpeace Planet (see http://greenpeace.org and http://greenpeace.nl, or if you prefer http://greenpeace.org/international_en and http://greenpeace.org/nederland_nl).

We used cookies only to make the user's language choice persistent when revisiting the generic http://greenpeace.org site.  If you turn off cookies, navigation works fine.  We set some global connection values that make it simple to make internal links that link to the right language version.

Check out the "case study" one-page note I wrote for the presentation Carl references in his first note.

Now ... our work wasn't completely generalized so Lars will have some work to do when trying to figure out just how to generalize some stuff  .  But he did work with me on the Greenpeace project so he's familiar with how that works.

Be sure to read and comment on the specs as they evolve.

Sorry Don, I somehow missed the Case study page when I read the presentation.

Did you have any template support for checking whether a version in another language is available or not? How would I implement that?

I don't see many language switches on Greenpeace site; it looks like there are just completely different sites for each language. I have "different languages, identical content" approach and put language switch on every page, not just a link to the root of the site in another language. The multilinguality is handled on a per-page basis. Every page needs to know in what languages it is available, and also be able to ask the same question about any other page in the system.

You're right, the idea behind Greenpeace's site is that each local Greenpeace organization are the masters over the content on their own site, so they're separate sites.

We won't be focusing on the "same content in multiple languages" use case, either, because it isn't relevant to our client at this point. We will look at it, though, but it'll not be part of what we'll be implementing.

/Lars

Well ... Lars is right that the implementation allows each national Greenpeace organization full control over their own content.

But within each national organization's view of their content management system, content can be tagged as to language.  So they can provide the same content in multiple languages if they want.

However it's not terribly well thought out IMO (Lars and I both worked on this system but it was designed by an earlier firm whose contract was terminated by Greenpeace after it became apparent they wouldn't finish in finite time).  It will work but it will be somewhat clumsy.

Thus far Greenpeace hasn't migrated any multilingual national organizations over to the site, as both the International and Netherlands sites are single-language.  When organizations like Greenpeace Belgium (Dutch/French) and Canada (English/French) come on board you'll then see sites that presumably will present the same content in two languages.

Hi,
  We did a system fro www.xtender.com with aol31ml.  Just a question for any of you who have applied the ML patches to any later versions of Aolserver. What kind of problems did you have if any?