Forum OpenACS Development: Mini-Conference on Globalization Results and Call for Feedback

The meeting started as planned at 16:00 hours in Amsterdam on the 26th of June 2002.

Lars Pind and I started the meeting with the following basic plan of attack:

  1. Write specification for what we need to change about acs-lang and the templating system to support this, and a tutorial for developers on how to internationalize their packages.
  2. Solicit community feedback on this specification, so we get the best solution, and so that everybody's happy with it.
  3. Implement these changes, and do a pilot internationalization of one package, to see how well it works, and what might need to be done differently.
  4. Coordinate with the community about internationalizing further packages.

Lars and I started working on step one, which we then discussed at the meeting. Now we would like to start step two using this thread and this presentation on globalization (the spec is the last slide).

Here is a list of the people present at the meeting (who also contributed to the present version of the specification):

Don Baccus
Chip Blank
Carl Robert Blesius
Sjoerd de Buer
Michel Henry de Generet
Christian Hvid
Heiko Kern
Ben Koot
Peter Marklund
Bruno Mattarollo
Aldert Nooitgedagt
Hilmar Nooitgedagt
Gregor Obernosterer

Claudio Pasolini
Lars Pind
Andreas Ryge
Pascal Scheffers
Tilmann Singer
Malte Sussdorff

I am really hoping it will give the internationalization effort a push.

Now we need your feedback...

One thought I just had was why the message catalog should be in the database at all.

I can see how a web-based interface for translating is nice, and that's easier to write if you have things in the database.

But as far as I can see, we still need it as a flat text file for a lot of things (distributing language packs, handing them off to off-site translators who can just go through and translate it all, etc.), and having to synchronize DB and flat file is annoying.

After all, we're slurping the whole thing into RAM, anyway, at server startup.

Any thoughts?

Concurrency control for one thing, if we stored the information in flat files we'd need to build an infrastructure including AOLserver-level locking to keep things consistent.  A busy site supporting a bunch of locales - as GP Planet will grow to become - will have more than one translator working at a time on a subsite.

If the information were in flat files you'd still need to load the NSV  cache at start-up, so that shouldn't be an issue.

The general problem of dumping and restoring data of this sort is something we should provide a simple solution for. XML-based, presumably?  Straight DB dumps are a problem for any table that includes sequence-generated keys (makes it difficult to merge data from multiple instances) ...

We already do something simillar to record parameters in package .info files.
Collapse
5: DB vs. Flat Files (response to 1)
Posted by Malte Sussdorff on
A thought just jumped into my mind. What do you think about a system where we store the language information in the .adp files and use multiple .adp files depending on locale. The way I thought about it:

- We add TRN tags around text in the .adps. We do NOT tag them with a number or anything.

- We add a table which relates the original englisch text (in between the TRNs) with files in the system.

- We add to the system the possibility for locale administrators to edit the text within an adp. (e.g. everything within TRN tags)

- These changes will be stored in the database (the original englisch text with a mandatory locale beeing the primary key). Once a change has been made, all files (which we gather from the table, see above), will be changed or created (with the locale added like foo_de.adp). The database keeps track of the changes.

- Create a script that sweeps through the file system and checks, if .adp files (without locale information) have been changed. If yes, inform the locale administrators about this so they have the possibility of updateing their locale (I think this will happen when the .adp files are changed for layout reasons).

- If we have a system in place for changing foo.adp files via OACS instead of SSH, we could add a flag stating "Only design was changed". If this is set to true, change all locale adp files (for the changed file, e.g. foo_de.adp, foo_cz.adp) automatically.

- I just realized we might think about a different seperator than "_" as this is part of locale Definitions (e.g. de_DE or de_CH).

- This would us allow to have a different layout depending on locale, if you want to drive this even further.

- Last but not least, the request processor checks first if a locale for this user is available and deliver the one according to the rules (remember the discussion we had in amsterdam).

So long and thanks for all the fish

Malte

Malte,

Not a bad idea. Greenpeace already has the option of having separate templates for different locales, and that's good.

One concern, though: I don't like the idea of using the original English (or Englisch, if you prefer) text as the key into the message catalog. It makes it too painful to change a typo if you then have to retranslate all the other texts.

Otherwise, the idea that you at least have the option of simply editing the .ADP file is nice.

/Lars

malte,

keys should be keys not one of the localizations. one good reason is what lars says, you can change one localization without having to change all localizations. we have to stick with that. another reason is that one of the goals of globalization is to allow people to write packages in their native locale. there is no need to force everyone to write a package in english and then localize it to japanese. your method would force them to do that and that breaks the first principle of globalization: "do not assume ANYTHING about the locale of an application," not even what language the code is written in.

i am still not totally familiar with the current plans for localization but i was assuming that the multiple adp's concept was already established. it was defined by henry (or jeff) in his original globalization proposal at arsdgita and was also used in the globalization of acs 4.6 (acs java). i think the request processor is already set up for this although we'd have to ask rob mayoff. the idea is that one you have negotiated the locale and character set for a request you can then find the adp that best matches the request. locale and character set negotiation is well described in my globalization docs that are linked from the presentation.

having different layout based on locale is essential, and that is the main point of having different adp's for each locale that requires different layout, not for each that requires a different language. the language is taken care of by the trn tag. the layout by the adp file. as an example, it allows you to have address forms that match each locale. that way not everyone gets the "zip + 4" widget when they are asked to enter an address, just those in the states.

finally can you explain why changing the separator from _ to something else would help with the above. the _ is the standard separator used by everyone and i would hate to change convention unnecessarily.

yon

Yon,

let me clarify: I did not assume that the .adp file would contain the default locale. It can contain any string you like in any language. As long as it is unique in it's meaning (something you can safely asssume, as we are not talking about words but sentences and these tend  to be different from language to language). But we'd have to store the default locale of the package in the apm {And yes, I assume that one package will be written in one language at a time. But if this assumption is too restrictive, you could add the locale as a parameter of the TRN tag}. And it is not only about .adp files. We might even end up changing .tcl files depending on locale (just take cal-dayview.tcl as an example, doesn't work in Germany, we don't use am and pm).

For each language (including en_US) there would be an .adp file. My reasoning started with performance implications (hitting the database with every TRN tag) and ended with different layout (which also includes different views of calendars, different positioning of items if you read from right to left [There was this cool ad where you had a pile of dirty cloths on the left, a laundry machine in the middle and clean cloths on the right. Interesting if one reads right to left :-)]. But does it really matter why we do it as long AS we do it :-) ?

As for the seperator, the only thought behind it was that the seperato could be part of the locale as well as the filename (as all the people are using it). And me personally hates filenames like edit-page_de_DE.adp. But as I said, it is only something to think about not necessarily to implement (aka. random thought).

Last but not least, how would the TRN tag work automatically on the script? Where would the TRN tag be placed in the first place? My assumption was that we take what we have and put a TRN tag around all hardcoded text within the .adp files (In the form <TRN LOCALE=en_US>Life is beautiful</TRN> {with or without the locale definition}). Everything between the TRN tags would be used as a key like I described above. One other option I heard was using identifiers like we use for database queries. But I don't see the difference (except that the text might be larger), but one looses the ability to work on the default .adp without storing data for the text one wants to use in the database. Well, and this still does not cope with the fact, that we have sentences which occur in multiple packages. Giving them identifiers will result in the same sentence being stored in the DB multiple times (as one will not be able to coordinate the identifier naming across the packages so that the same sentence gets the same identifier in dotLRN, survey and e-commerce).

Malte

P.S.: Could we get a spell checker installed for the bboard. It's a shame to read how many errors I make :.(

Although providing multiple adp's per page may be useful, I think in the majority of cases layouts will remain the same or be handled automagically by widgets, and *requiring* one adp per page would be a pain.

I agree that trn tags should have real keys, and suggest that they be formatted like the rest of our procedure names (in English), and have an optional second part, the package key.  This would allow pieces of text to be defined once in the message catalog and be reused in the same or other packages, like a procedure call.

I think it would be a mistake to emulate the db_* calls and allow embedded translations between trn tags.  It's just too confusing to have these bogus queries/translations accumulate in the most prominant place in the toolkit, which may or may not have anything to do with the final output.

Last time I looked, the request processor didn't have any of the code needed for locale negotiation, multiple adp pages etc.  It's all still to do.

Slurping the translated snippets of text into nsv arrays at startup is going to be too slow -- that's an extra dozen locks acquired per page request.  Better I think to have the trn tag lookup the translation at compile time and have the text embeded directly in the page.  This would require extra support from the request processor to store multiple compiled versions from one source version of an adp, but then it's all extra support at the momemnt...

While we're on the subject of translations, 'englisch' is incorrect, Malte.  It should be 'engrish'  :-)
http://www.engrish.com/