Forum OpenACS Improvement Proposals (TIPs): TIP #66: (Approved) Standards for Internationalizing Content

Proposal

The following changes will be made to the OpenACS data model:
  1. In OpenACS 5.2, cr_items will be changed to use the ad_locales table instead of the cr_locales table. Sample code for PostGreSQL:
    alter table cr_items drop constraint cr_items_locale_fk;
    alter table cr_items 
      add constraint cr_items_locale_fk 
            foreign key (locale) 
            references ad_locales;
  2. The cr_locales table is deprecated for OpenACS 5.2, and will be removed in OpenACS 6.
The following coding standards are adopted, and future implementations of OpenACS functions which access the content repository should follow these standards.
  1. All internationalizable content is stored in the content repository
  2. Localized versions of content are stored as cr_items, separate from the original cr_item
  3. cr_items which are localized versions of other cr_items are related in the database. (Original Proposal: cr_items which are localized versions of other cr_items have a cr_child_rel relationship to the original item, with the relationship type "localized")
  4. There can be only one localized child per item per locale.
  5. If the original item is not in a published state, none of its localized versions are published.
  6. If a localized child is not in published state, the original item is not available for that locale.
  7. If a parent item has children, all of those children must also be published in a specific locale before the item can be considered available in a specific locale. If an item is published in a locale but not all children are published in that locale, the item should be identified as "partially available" and developers can choose whether or not to treat it as published.
  8. Localized cr_items have their own internal versions just like all cr_items. To have two different versions of an original item that have independently maintained and published localizations, you must use two different items.
  9. There is a mechanism to determine if the original item has been changed since the most recent publication of a localized version. This can be used to notify authors or to automatically de-publish obsoleted translations.
  10. API functions listing items should exclude by default items which are translations of other items on the same list.
  11. Whenever a function would normally return the title or content of a cr_item, the internationalized form of such a function should instead return, in descending order of preference:
    1. The title and/or content from the live version of the "localized" version of the original item in the locale specified to the function, if available.
    2. The title/content from the live version of the "localized" version of the original item in the default locale for the language of the specified locale, if available.
    3. The title/content live version of the original cr_item.
    4. Nothing

Reasons

OpenACS does not provide a standard way to internationalize content, and so different implementations are using different workarounds. This TIP will provide a standard for new implementations and for upgrades to existing implementations.

Disadvantages

The data model change will ultimately break any existing code which depends on cr_locales. There is no such code in the standard modules, so this is much outweighed by the benefit of removing duplicative but incompatible tables.

Implementation

I will implement the data model changes in OpenACS 5.2. Packages which use the content repository will not qualify for Maturity Level 3 until compliant with this TIP.

This is the second version of this proposal. All comments prior to 16 June 2004 refer to the first version. The change has been made in place for ease of reading.

Finally, I really like this, but I wont approve til the new OCT has been constituted.
Joel,

I think the items should use a cr_item_rel instead of cr_child_rel.

Also, did we decide to leave out the storage of a "translation tree" in case a translated item was not translated from the original, but from one of the other translations?

What are the implications of using cr_item_rel instead of cr_child_rel?
I think a "translation" rel could be an optional convention used when needed, but I didn't feel like I really understood how it would work or how to manage its complexity.  So I didn't want to put it into the TIP and I didn't want to hold the TIP up waiting for enlightenment.
If you keep the cr_child_relationship, you would be able to get the translation tree, as I assume (but maybe I read wrong) that each "child" is only related to the parent it was translated from (thereby getting e.g. english->german->hungarian).
My intent with 3. cr_items which are localized versions of other cr_items have a cr_child_rel relationship to the original item, with the relationship type "localized". is that any localized items are directly related to the original item. If someone wants to track items which are translated from other items, the recommendation is to use a 'translated' cr_item_rel, but formal rules about what that means and how to deal with it should go in another TIP, which will extend but not invalidate this TIP.

So to rephrase my other question: disregarding translation vs localization issues, what's the difference between a cr_item_rel and a cr_child_rel? Is there currently any API in cr_items that uses either? If not, is there any guidance on how they are intended to be used, so that if that basic API ever gets built as intended, the data already in the system will be correct?

a cr_child_rel is for an item that should not really exist independent of the parent (an example would be the collection of images photo album creates for a single photo) and an cr_item_rel is for relating two items (like relating an author bio to an article for example); things related via cr_child_rel should be deleted when the parent is deleted but for cr_item_rel deleting one item should not affect the other.
Then cr_child_rel seems more appropriate for localizations.  By analogy to acs-lang, messages can't exist for a locale unless the message key exists for en_US.  For i18ned content, we will have the same thing except that the parent item could be of any locale, not just en_US.
This is also analogous to the use of folders. You could say that each content item "contains" all its localizations.
Guan,

But it does not. They are all peers. One is just the "original" I think cr_child_rel is for compound content items where the children are parts of one big item. So a "page" might be made up of several content items. So for a compound item each translation could would have it's own children.

Joel,

The issue is that someone might want to know what language an item was translated from. I think you are correct that this is probably more application specific and does not need to be part of this TIP.

Hi Joel,

I agree with Dave, it should be cr_item_rel.  We have actually changed our design in of our project from cr_child_rel to cr_item_rel.  We are implementing it right now.  The reason why we decided to use cr_item_rel is that localized items must be able to exists even if the parent item is deleted.  The items are peers and should not be parent-child relation.

The other aspect that Dave is mentioning is keeping track how things where translated.  Its essentially a different TIP as Dave mentions.  Essentially we have a requirement and should also be useful in general to keep track things where cloned for translation purposes for historical reasons.

Rule #1 raises the question of what else references the ad_locales table and whether the referring stuff is considered "content".  A quick grep of the tree (5.0 branch) shows that the message catalog and category system also reference ad_locales.  I don't think the message cat stuff belongs in the cr, but maybe this isn't considered content.  Not sure about categories.  But imo rule #1 needs clarification.
The clarification might come as an answer to the question: Under what cirumstances can a non-cr table refer to ad_locales?
Jun,

Under what conditions would the original item be deleted?

If the original item is deleted, how will you find the other localizations for that item?

I'd probably go for a single anchor item that contains or is parent of the translations for that item. Delete this anchor implies deleting all the translations: what reason would they have to exist otherwise? if a localization did exist, how would you find the other localizations when that's necessary?

One more thing... you should still be able to keep a digraph of the translation order (what was cloned for what language, etc) and still have a parent or container item.
Jim,

Thanks after the OCT chat we are looking at implementing just that. One "container" item with each translation as a child. I think we want to try a quick implemetnation and see if any issues come up before finalizing the translation linking.

Joel,

Perhaps we should appprove the datamodel changes seperate from the APis and rules to manage translations?

I believe there is no question that the items should use ad_locales table.

Hi Jim,

I think its possible to have one translation to be deleted.  There will be cases that a site will just get a common base of content and use its own translated content on its own.  So I believe we can't discount that its possible to delete the original item where the translation came from.

That is why I and Dave is asking to make things more of peer level rather than parent child relationship.  This way when the original item is deleted the remaining translation can still find its peers.

I re-proposed this.  I changed code item #3 to avoid the implementation detail, with the intent (as discussed) that we all formalize our agreement on all of the other points, and then look at specific implementations before committing OpenACS to that level of detail.
Collapse
20: Approved. (response to 1)
Posted by Andrew Grumet on
I'm voting here to standardize how we do i18n in the CR.  This approach isn't yet road-tested so it would be reasonable to expect changes or refinements.  I expect any changes or new best practices discovered in implementation will be documented or re-tipped as necessary.
Collapse
Posted by Andrew Grumet on
Please pardon the newbie mistake.  I was voting to Approve, not marking a final decision.
Approved
approved
Approved
I marked this approved since it's been a week and there were
plenty of yesses and no no votes.