Forum OpenACS Q&A: Response to Globalization

For the Development Gateway we use an approach similar to the GP sysetm described above. An ADP page may contain bits of translatable text enclosed by TRN tags (our homegrown variety). A translator who has "translation mode" toggled on sees an edit link next to every translatable item. For simplicity the system uses 100% UTF-8, making no attempt to negotiate character sets with the User-Agent. Pages which use this system are linked from here: http://www.developmentgateway.org/node/118859/. (note: though the top page uses images, and links to URLs that all contain "en", if you drill down you'll see that character data in the various languages is returned)

The DG provides one interesting feature that I don't think was mentioned above. We distinguish between publisher-added "navigational" text and user-added content (a couple of our custom packages support language tagging of content). This helps, for example, a user whose first language is Spanish but who can read English and French as well. We allow such users to specify a single navigation language but multiple content languages.

In a few cases, we bypass the TRN tags and use language-specific ADPs, switching the target of ad_return_template based on the user's setting. This is useful for pages containing HTML forms that we don't expect to change much.

Good things about this system:

A lot of functionality for not too much work
Translators can view items in context before translating them
No tcl-level programming intervention required to make items translatable or to add new translatable items

Not-so-good things about this system:

Internationalizing an existing ADP page can require a lot of tedious hand-editing to add the needed TRN tags. I've tooled around with some code that uses regexps to break up a markup page along "<" and ">" boundaries but haven't gone very far with it. I'd be happy to share the code and ideas with anyone who is interested.
Our translation keys are global. Though we have some informal naming conventions, we have something of the system-wide sea of keys Don mentions above.
We don't have an easy way to track all of the places that a key is used. This is helpful for ensuring consistency, i.e. that a translation is correct for all of the places where a key shows up (or helping us determine that new keys are needed).
We haven't made any attempt at solving date- or currency-formatting issues.

Not-so-good things that seem harder and more subtle:

As Henry Minsky has pointed out elsewhere, Unicode/UTF-8 is not smart about fonts. Henry can describe it better than I, but the problem is that while two (or more) languages share certain characters, their "pretty" representation may depend on the language. So ultimately the right solution is probably to do the locale/charset negotiation and rely on the browser to pick a good font.
Our system doesn't know anything about language constructs. I.e. it doesn't know that "le voiture" and "voiture" are related.

As a final, mildly off-topic point, one of the cooler things to appear in recent months is this idea of a distributed translation database (see http://www.newscientist.com/news/news.jsp?id=ns99992115). This opens up the possibility of building something of long-lasting value with a much wider community, not to mention making translation help available through XML-RPC and SOAP.