Forum OpenACS Development: acs-lang: Handling dynamically generated text in .tcl-files

In doing internationalization on .lrn we have run into fair number of case with fairly long text in the .tcl-files that would need to be translated. More often than in the .adp-files the text is partly dynamically generated leading to "swapped word order" problems as discussed in the thread "Metadata on messages in acs- lang".

Here is an example of evil code from dotlrn/tcl/class- procs.tcl:

ad_return_complaint 1 "Error: A [parameter::get -localize -parameter classes_pretty_name] must have <em>no</em>[parameter::get -localize -parameter class_instances_pretty_plural] to be deleted"

There are a number of ways to handle this. Here are the options we have discussed:

  1. Simple format Using format and filling the text up with %s's solves the problem but requires the order of the dynamically generated strings to be the same in all languages.
  2. Format /w numbers Writing our own format rutine that names the strings with a simple number scheme.
  3. Format /w named variables Writing our own format rutine that allows the string to be named and uses the @...@ notation similar to .adp-files.
  4. Variable substitution Calling subst with -nocommands
  5. Complete substitution
  6. Insist that all text should be in .adp-files

Of these options 2-3 seem the most robust solutions.

Going for option 2 - the code prepared for internationalization would look like this:

ad_return_complaint 1 [_ dotlrn.class_may_not_be_deleted "Error: A % 1 must have <em>no</em> %2 to be deleted" "" [parameter::get -localize -parameter classes_pretty_name] [parameter::get -localize -parameter class_instances_pretty_plural]]

This is fairly nice for the tcl-coders perspective. The trouble is the text is hard to figure out for translator what the numbers in the default us/en text mean:

Error: A %1 must have <em>no</em> %2 to be deleted

Secondly the translator would need to treat messages that would go the tcl-files different from messages that would go to the .adp-files. That is why we are considering going for the third option requiring the programmer to make a list of name/value pairs - the translated string would look similar to an .adp chunk with variables:

Error: A @classes_pretty_name@ must have <em>no</em> @class_instances_pretty_plural@ to be deleted

Any comments? Should messages go thru variable substition at all? What about method calls? Should we just keep it simple and slip a little technical detail into the translation UI?

I suspect you can't get away without the #s since it gets annoying to stuff everything into a variable to get the substitution.  I would
think if you wanted to provide guidance to translators you could
do something like:

Error: A @class:1@ must have no @classes:2@ to be deleted.

where the class bit is simply for the benefit of translators and
is ignored and in adp files you could omit the number.  You then
just tell translators to leave anything in @s alone.

I had a quick grep through the .po files I had and did not see very many instances of reordered strings.  Gettext does not have any hinting like this (simply the ability to say %#$s where #$ is the
arg number) and people seem to be able to work with it.  Then again there is more text in OpenACS than in a typical program so the extra
hinting would probably be helpful.

What we have implemented now (not cast in stone of course) is the numbering scheme mentioned by Christian above. Basically, if you have a piece of text with embedded variables, there are at least two different approaches to consider. The first is to provide one message key for every piece of text between the variables, as illustrated by:

ad_return_complaint 
     1 
     "#dotlrn.The_name# $pretty_name 
#dotlrn.is_already_in_use#. 

#dotlrn.lt_Please_select_a_diffe#."

with corresponding catalog entries:

mr en_US dotlrn.The_name {The name}
_mr en_US dotlrn.is_already_in_use {is already in use}
_mr en_US dotlrn.lt_Please_select_a_diffe {Please select a different name}

The other approach is to stuff a whole such text into a message and interpolate variables into it with the numbering scheme, like this:

           ad_return_complaint 1 [_ [ad_conn locale] dotlrn.class_may_not_be_deleted "" [list [parameter::get -localize -parameter classes_pretty_name] [parameter::get -localize -parameter class_instances_pretty_plural]]]

which results in only one entry in the catalog file:

_mr en_US dotlrn.class_may_not_be_deleted {Error: A %1 must have <em>no</em> %2 to be deleted}

Note that this approach allows for variable place holders to be re-ordered. We currently don't have any provision for indicating to the translator what is behind these place holders, however, in most cases it should be possible for the translator to look at the message in the web interface and figure that out.

How does this sound?
It may get even trickier once you hit languages (e.g. various Slavic ones) that have multiple cases with endings dependant on gender of a noun and singular/plural state...  Ah... a beauty of plain English...
<p>
For example in Russian, taking a a word 'soroka' (a type of a bird) the word will change thusly:
<pre>
0: sorok
1: soroka
2: soroki
3: soroki
4: soroki
5: sorok
6: sorok
7: sorok
...
</pre>
Yes Andrew, we talked about that problem, there is a similar issue with german with the gender of nouns (der/die/das). I'm not sure how to solve that problem really, we have to work around it with the phrasing of sentences somehow, and in how we choose messages.

I improved on the syntax that I described above (inspired by Jeffs proposal) so that my example would now look like this in the message catalog:

_mr en_US dotlrn.class_may_not_be_deleted {Error: A %1:subject% must have <em>no</em> %2:class_instances% to be deleted}

The words in between the percentage signs are merely pretty names that give the translator some guidance. They are not adp or tcl variable names.

On Christians suggestion I have made the syntax a little cleaner for translators and others to read by removing the numbers between the percentage signs and passing in an array list rather than a plain list. Now the example becomes:
set msg_subst_list [list subject [parameter::get -localize -parameter classes_pretty_name] 
                         class_instances [parameter::get -localize -parameter class_instances_pretty_plural]]

ad_return_complaint 1 [_ [ad_conn locale] dotlrn.class_may_not_be_deleted "" $msg_subst_list]
and the message in the catalog file is:
_mr en_US dotlrn.class_may_not_be_deleted {Error: A %subject% must have no %class_instances% to be deleted}
The reason I'm using percentage signs rather than @-signs is that I want to make it clear that those variables are not adp variables.