Forum OpenACS Q&A: Notes from Internationalization Discussion at Heidelberg 2004 .LRN conferenc

Translators would like to have critical (ie, core, .LRN) message keys differentiated from all the other packages' keys.

The user list on translate.openacs.org has duplicate entries.

There should be a notification when the English text changes, because this may invalidate the translations.
  - We could change the "translator mode" indicator for these translations back to the "untranslated" token
  - We should log these changes in the audit trail for each affected message.  Eg., if the english for message acs-kernel.foo is "Foo" and it gets changed to "Bar", the audit-trail for the same key in Dutch should show "English changed from 'Foo' to 'Bar'."

What resources are available for translators?
  - very short translator guide that just covers how to use the UI

What else do we need to localize?
  - tutorials
  - user help text
  - admin/developer help? Not as urgent

What is the minimal process for collecting and translating help?

Translators may be uncomfortable working on the admin sections on the site, because they may break the site.  One workaround is to use the automatically rebuilt test servers to explore the UI and translation keys, and then repeat the translations on the test server.  A translator would then not need to have site-wide admin - they can change the strings for the admin pages without accessing those pages on the translation server itself.

We should set up per-language translation forums for basic coordination of effort and communication.

We should not put any non-throwaway material on the translation server (other than the accounts and the fresh translations) because it complicates the CVS, upgrades, etc.  A dedicated discussion server, or discussion area on OpenACS.org, would be simpler to maintain.

The most desirable new features for i18n include:
  - "manager" role for locales, so that anybody can submit translations but only the manager can actually change the value.  Submissions would sit in a workflow queue.
  - segregation by locale so that people can't accidentally or deliberately change keys in other locales

The complete list of checks would include:
- Developers check message keys for valid code (keywords, quoting, etc)
- Linguist checks i18n for suitability of message keys (right size, scope, etc)
- Linguist checks i18n for per-language issues; ie, a lowest-common-denominator list that spans all languages.  For example, some languages may need two keys, one per user gender, where English needs only one.
- Usability checking.  The original English strings are subject to usability issues, and each locale may have new issues

General rule: Building text by compounding message keys, eg putting "More" and "Forum Posts" together, should be avoided because word order, number of keys needed, and other factors may be different in other languages.

The reusable keys in acs-kernel (yes, no, More, etc) should only be used standalone (on button labels, etc) instead of within longer strings.

Code words that are stored in the database and checked in code must never be i18ned.  Therefore, they must never be shown in the UI.

When the developer is not sure if a message key is suitable, we need a process to indicate that the key should be checked by someone else and may change.  We could do this with a convention of naming all such keys "provisional", such as
acs-subsite.provisional-front-page-text.  A developer would use this convention whenever the key seems too big or too small, or the context may be confusing.  Whenever a developer reuses a key but isn't certain if it's appropriate, a forum post may be more appropriate than doing anything odd to the key or comments.

Icons should never contain text.

What about plurals?  In English, all numbers are plurals except for 1 and -1.  So this would suggest to make two keys, plural and singular, and use logic whenever the keys are used  so that 1/-1 gets singular and the rest get plural.  But is this generalizable?

(after a bit of research)

http://www.delorie.com/gnu/docs/glibc/libc_135.html

The rules for

Okay, we are going to need to figure out:
1. A naming convention for pluralizable keys.  If we want to solve this completely, we will need at least four keys each time we have a dynamic countable word in a key.  Singular and Plural aren't enough; even singular, 2-ular, 3-ular, plural may not be enough because the extra ones may be different in different languages.

We could call them widget.num_of_widget and widget.num_of_widgets, which makes the key names make sense in English but gets odd for the cases we don't have in English.

We could use widget.num_of_widget_singular, widget.num_of_widget_plural, etc.

2. How we can piggyback on existing code, such as glibc, to do the mapping of locale/number/form?  We need a function that takes as input a number and a locale and gives as output which case should be used.  I don't think tcl has this.

I think the answer could be in the link you gave, although I may be way off the mark. How about just ripping off the system described there? I _think_ this can be done using the existing localisation infrastructure. The idea is pretty rough, but goes like this:

Each locale needs to define the number of plurals plus the plural expression as described in that doc. It makes sense (to me, at least :) to make these message keys: for argument's sake, I'll pretend that they are messages in acs-lang, i.e. acs-lang.nplurals and acs-lang.plural_expr. For example, English (en_US) would have acs-lang.nplurals = '2' and acs-lang.plural_expr = '$n != 1', Polish (pl_PL) would have acs-lang.nplurals = '3', acs-lang.plural_expr = '$n==1 ? 0 : $n%10>=2 && $n%10<=4 && ($n%100<10 || $n%100>=20) ? 1 : 2'. (I just copied these straight out the doc, I guess they're probably valid Tcl expressions, but maybe not...)

So then you need to define the plural forms of a noun. So for English you'd have message keys noun0 = 'noun', noun1 = 'nouns'. I don't speak Polish, but you'd need to define noun0, noun1, noun2 appropriately.

And then the procedure to get the appropriate localised form. Something like:

ad_proc localize_plural {
   {-message_key:required}
   {-n:required}
} {
  set plural_expr [_ acs-lang.plural_expr]
  set plural_index [expr $plural_expr]
  return [_ "${message_key}${plural_index}"]
}

Oh, I guess nplurals may be redundant then - s'pose you could use it for validation or something. To be honest,I'm not au fait with localisation in OpenACS to the extent that I could say this definitely will/will not work. For example, I don't know what would happen if translations for a locale weren't present and the system had to fall back on defaults. But, it gives you a naming convention (entirely detached from our Anglicised perception of how language works), and ties in with how glibc does it.

You could maybe even bypass the plural_expr business and make a call to glibc to get the appropriate plural index, assuming the indexing scheme is the same. Not sure whether that's desirable, there's probably something to be said for having it all in OpenACS.

And it might be a horrific security risk allowing translators to specify arbitrary expressions to be evaluated by the server, e.g. by setting acs-lang.plural_expr = '[exec rm -rf /]'. That's a seperate issue that could be worked around though.

Just a thought :)

I have been pondering and have a simpler implementation, I hope.  The goal is to capitalize on our strength, the excellent localization UI, and not spend time on wheel re-invention:

1) keys can be designated "pluralizable".  This can be done in the
  admin UI for the English version (similar to description) and is
  stored in the catalog files.

2) When a translator accesses a pluralizable key, they see the
appropriate number of versions for their locale.  ie, a German
translator will see two versions, a plural and a singular; a Chinese
translator only one version; a Polish translator, three versions:
singular, plural, and genitive (for 2-4).  (This is determined by
calling a function from gettext or equivalent program.)

3) Pluralized keys are accessed via a new option for lang::message::lookup, "-number".  Pluralized localizations must happen in TCL, not ADP. The syntax is [_ keyname -num x], where -num is an optional
  parameter to the localize function.  Locale and other optional
  parameters remain optional.  -num defaults to the plural form.
  This function should call gettext or whatever we end up using and
  send in the number, the locale, and all available forms of the
  key.  These forms are also stored in the catalog files.

4) if the pluralizer localizer calls for a form which is not
  localized, it should then fall back through the locales using the
  existing scheme.  Ie, if it is pluralizing/localizing foo for
  de_CH (Swiss German), and only the singular form of foo is
  localized to de_DE, it should succeed fully for n=1; for n=2 it
  should fall back to German German if that is the default locale for
  de and is available in plural; if not, it should fall back to the
  English plural.

One other issue is that we now have keys such as acs-kernel.common_open.  Should we replace this with common_open_adj and common_open_verb?  If we do, I think we should drop the naked root form and have only the two specific forms.