ACS 4 Globalization Detailed Design
by Henry MinskyI. Essentials
When applicable, each of the following items should receive its own link:- User directory: none
- ACS administrator: acs-lang
- Subsite administrator directory: none
- Tcl script directory: /api-doc
- PL/SQL file:
- Data model:
- Requirements document
II. Introduction
III. Historical Considerations
V. Design Tradeoffs
- Performance: availability and efficiency
For Internationalization to be effective, it needs to be integrated into every module in the system. Thus making the overhead as low as possible is a priority, otherwise developers will be reluctant to use it in their code.
Wherever possible, caching in AOLserver shared memory is used to remove the need to touch the database. Precompiling of template files should reduce the overhead to zero in most cases for translation message lookups. The amount of overhead added to the request processor can be reduced by caching filesystem information on matching of template files for locales.
- Flexibility
- Interoperability
- Reliability and robustness
- Usability
- Maintainability
- Portability
The ACS Tcl I18N APIs should be as close as possible to the ultimate Java APIs. This means that using the same templates if possible, as well as the same message catalogs and format strings should be a strong goal.
- Reusability
- Testability
A set of unit tests are included in the acs-lang package, to allow automatic testing after installation.
VI. API
VI.A Locale API
10.30 A Locale object represents a specific geographical, political, or cultural region. An operation that requires a Locale to perform its task is called locale-sensitive and uses the Locale to tailor information for the user. For example, displaying a number is a locale-sensitive operation--the number should be formatted according to the customs/conventions of the user's native country, region, or culture.We will refer to a Locale by a combination of a language and country. In the Java Locale API there is an optional variant which can be added to a locale, which we will omit in the Tcl API.
The language is a valid ISO Language
Code. These codes are the lowercase two-letter codes as
defined by ISO-639. You can find a full list of these codes at a
number of sites, such as:
http://www.ics.uci.edu/pub/ietf/http/related/iso639.txt
The country is a valid ISO Country
Code. These codes are the uppercase two-letter codes as
defined by ISO-3166. You can find a full list of these codes at a
number of sites, such as:
http://www.chemie.fu-berlin.de/diverse/doc/ISO_3166.html
Examples are
en_US English US
ja_JP Japanese
fr_FR France French.
The i18n module figures out the locale for a current request makes it accessible via the ad_locale function:
[ad_locale user locale ] => fr_FR [ad_locale subsite locale ] => en_USIt has not yet been decided how the user's preferred locale will be initialized. For now, there is a site wide default package parameter [parameter::get -parameter DefaultLocale -default "en_US"] , and an API for setting the locale with the preference stored in a session variable: The ad_locale_set function is used to set the user's preferred locale to a desired value. It saves the value in the current session.
ad_locale_set locale "en_US" will also automatically set [ad_locale user language] ( to "en" in this case) ad_locale_set timezone "PST"The request processor should use the ad_locale API to figure out the preferred locale for a request (perhaps combining user preference with subsite defaults in some way). It will make this information accessible via the ad_conn function:
ad_conn locale
Character Sets and Encodings
We refer to MIME character set names which are the valid values which can be passed in a MIME header, such asContent-Type: text/html; charset=iso-8859-1
You can obtain the preferred character set for a locale via the ad_locale API shown below:
set locale "en_US" [ad_locale charset $locale ] => "iso-8859-1" or "shift_jis"Returns a case-insensitive name of a MIME character set.
We already have an AOLserver function to convert a MIME charset name to a Tcl encoding name:
[ns_encodingforcharset "iso-8859-1"] => iso8859-1
Templating
The goal of templates is to separate program logic from data presentation.For presenting data in multiple languages, there are two basic ways to use templates for a given abstract URL. Say we have the URL "foo", for example. We can provide templates for it in the following ways:
-
Separate template for each target language
Have a copy of each template file in each language you support, e.g., foo.en.adp, foo.fr.adp, foo.de.adp, etc.
We will refer to this style of template pages as language-specific templates.
-
A single template file for multiple languages
You write your template to contain references to translation strings either from data sources or using <TRN> tags.
For example, a site might support multiple languages, but use a single file foo.adp which contains no language-specific content, and would only make use of data sources or <TRN> tags which in turn use the message catalog to look up language-specific content.
We will refer to this style of page as a multilingual template.
But for a page which has a very fixed format, such as a data entry form, it would mean a lot less redundant work to use a single template source page to handle all the languages, and to have all language-dependent strings be looked in a message catalog. We can do this either by creating data sources which call lang_message_lookup, or else use the <TRN> tag to do the same thing from within an ADP file.
Caching multilingual ADP Templates
Message catalog lookups can be potentially expensive, if many of them are done in a page. The templating system can already precompile and cache adp pages. This works fine for a page in a specific language such as foo.en.adp , but we need to modify the caching mechanism if we want to use a single template file to target multiple languages.Computing the Effective Locale
Let's say you have a template file "foo.adp" and it contains calls to look up message strings using the TRN tag:
If the user requests the page foo , and their ad_locale is "en_US" then effective locale is "en_US". Message lookups are done using the effective locale. If the user's locale is "fr_FR", then the effective locale will be "fr_FR".<master> <trn key=username_prompt>Please enter your username</tr> <input type="text" name=username> <p> <trn key=password_prompt>Enter Password:</trn> <input type=password name=passwd>
If we evaluate the TRN tags at compile time then we need to associate the effective locale in which the page was evaluated with the cached compiled page code.
The effective locale of a template page that has an explicit locale, such as a file named "foo.en.adp" or "foo.en_US.adp", will be that explicit locale. So for example, even if a user has a preferred locale of "fr_FR", if there is only a page named "foo.en.adp", then that page will be evaluated (and cached) with an effective locale of en_US.
VI.B Naming of Template Files To Encode Language and Character Set
10.40 The templating system will use the Locale API to obtain the preferred locale for a page request, and will attempt to find a template file which most closely matches that locale.We will use the following convention for naming template files: filename.locale_or_language.adp.
Examples:
foo.en_US.adp foo.en.adp foo.fr_FR.adp foo.fr.adp foo.ja_JP.adp foo.ja.adp
The user request has a locale which is of the form language_country. If someone wants English, they will implicitly be choosing a default, such as en_US or en_GB. The default locale for a language can be configured in the system locale tables. So for example the default locale for "en" could be "en_US".
The algorithm for finding the best matching template for a request in a given locale is given below:
- Find the desired target locale using [ad_conn locale] NOTE: This will always be a specific Locale (i.e., language_COUNTRY)
- Look for a template file whose locale suffix matches exactly.
For example, if the filename in the URL request is simply foo and [ad_conn locale] returns en_US then look for a file named foo.en_US.adp.
- If an exact match is not found, look for template files whose
name matches the language portion of the target locale.
For example, if the URL request name is foo and [ad_conn locale] returns en_US and a file named foo.en_US.adp is not found, then look for all templates matching "en_*" as well as any template which just has the "en" suffix.
So for example if the user's locale en_GB and the following files exist:
foo.en_US.adp
then use foo.en_US.adp
If however both foo.en_US.adp and foo.en.adp exist, then use foo.en.adp preferentially, i.e., don't switch locales if you can avoid it. The reasoning here is that people can be very touchy about switching locales, so if there is a generic matching language template available for a language, use it rather than using an incorrect locale-specific template.
- If no locale-specific template is found, look for a template
matching just the language
I.e., if the request is for en_US, and there exists a file foo.en.adp, use that.
- If no locale-specific template is found, look for a simple .adp file, such as foo.adp.
Once a template file is found we must decide what character set it is authored in, so that we can correctly load it into Tcl (which converts it to UTF8 internally).
It would be simplest to mandate that all templates are authored in UTF8, but that is just not a practical thing to enforce at this point, I believe. Many designers and other people who actually author the HTML template files will still find it easier to use legacy tools that author in their "native" character sets, such as ShiftJIS in Japan, or BIG5 in China.
So we make the convention that the template file is authored in its effective locale's character set. For multilingual templates, we will load the template in the site default character set as specified by the AOLserver OutputCharset initializatoin parameter. For now, we will say that authoring generic multilingual adp files can and should be done in ASCII. Eventually we can switch to using UTF8.
A character set corresponding to a locale can be found using the [ad_locale charset$locale] command. The templating system should call this right after it computes the effective locale, so it can set up that charset encoding conversion before reading the template file from disk.
We read the template file using this encoding, and set the default output character set to it as well. Inside of either the .adp page or the parent .tcl page, it is possible for the developer to issue a command to override this default output character set. The way this is done is currently to stick an explicit content-type header in the AOLserver output headers, for example to force the output to ISO-8859-1, you would do
ns_set put [ns_conn outputheaders] "content-type" "text/html; charset=iso-8859-1"
design questionWe should have an API for this. The hack now is that the adp handler adp_parse_ad_locale user_file looks at the output headers, and if it sees a content type with an explicit charset, it passes it along to ns_return.
The default character set for a template .adp file should be the default system encoding.
VI.C Loading Regular Tcl Script Files
10.50 By default, tcl and template files in the system will be loaded using the default system encoding. This is generally ISO-8859-1 for AOLserver running on Unix systems in English.This default can be overridden by setting the AOLserver init parameter for the MIME type of .tcl files to include an explicit character set. If an explicit MIME type is not found, ns_encodingfortype will default to the AOLserver init parameter value DefaultCharset if it is set.
Example AOLserver .ini configuration file to set default script file and template file charset to ShiftJIS:
ns_section {ns/mimetypes } ... ns_param .tcl {text/plain; charset=shift_jis} ns_param .adp {text/html; charset=shift_jis} ns_section ns/parameters ... # charset hacking ns_param HackContentType 1 ns_param URLCharset shift_jis ns_param OutputCharset shift_jis ns_param HttpOpenCharset shift_jis ns_param DefaultCharset shift_jis
VI.A Message Catalog API
We want to use something like the Java ResourceBundle, where the developer can declare a set of resources for a given namespace and locale.For AOLserver/TCL, to make the message catalog more manageable, we will split it into one message catalog per package, plus one default global message namespace in case we need it. So for example,
Message lookups are done using a combination of a key string and a locale or language, as well as an implicit package prefix on the key string. The API for using the message catalog is as follows:
The locale arg can actually be a full locale, or else a simple language abbrev, such as fr , en , etc. The lookup rules for finding strings based on key and locale are tried in order as follows:lang_message_lookuplocalekey [default_string]lang_message_lookup
is abbreviated by the procedure named "_
", which is the convention used by the GNU strings message catalog package.
- Lookup is first tried with the full locale (if present) and package.key
- Lookup is tried with just the language portion of the locale and package.key
- Lookup is tried with the full locale and key without package prefix.
- Lookup is tried with language and key without package prefix.
[lang_message_lookup $locale notes.title "Title"] can be abbreviated by [_ $locale notes.title "Title"] # message key "title" is implicitly with respect to package key # "notes", i.e., notes.title [_ $locale title "Title"]The string is looked up by the symbolic key notes.title (or title for short), and the constant value "Title" is supplied as documentation and as a default value. Having a default value allows developers to code their application immediately without waiting to populate the message catalog.
Default Package Namespace
By default, keys are prefixed with the name of the current package (if a page request is being processed). So a lookup of the key "title" in a page in the bboard package will actually reference the "bboard.title" entry in the message catalog.You can override this behavior by either using a fully qualified key such as bboard.title or else by changing the message catalog namespace using the lang_set_package command:
[lang_set_package "bboard"]So for example code that runs in a scheduled proc, where there is not necessarily any concept of a "current package", would either use fully qualified keys to look up messages, or else call lang_set_package before doing a message lookup.
Message Catalog Definition Files
A message catalog is defined by placing a file in the catalog subdirectory of a package. Each file defines a set of messages in different locales, and the file is written in a character set specified by its file suffix:/packages/bboard/catalog/ bboard.iso-8859-1 bboard.shift_jis bboard.iso-8859-6A message catalog file consists of tcl code to define messages in a given language or locale:
_mr en mail_notification "This is an email notification" _mr fr mail_notification "Le notification du email" ...In the example above, if the catalog file was loaded from the bboard package, all of the keys would be prefixed automatically with "
bboard.
".
Loading A Message Catalog At Package Init Time
The API functionlang_catalog_loadpackage_keyIs used to load the message catalogs for a package. The catalog files are stored in a package subdirectory called catalog . Their filenames have the form *.encoding.cat , where encoding is the name of a MIME charset encoding (not a Tcl charset name as was used in a previous version of this command).
/packages/bboard/catalog /main.iso8859-1.cat /main.shift_jis.cat /main.iso-8859-6.cat /other.iso8859-1.cat /other.shift_jis.cat /other.iso-8859-6.cat
You can add more pseudo-levels of hierarchy in naming the message keys, using any separator character you want, for example
which will be stored with the full key of bboard.alerts.mail_notification ._mr fr alerts.mail_notification "Le notification du email"
Calling the Message Catalog API from inside of Templates
Inside of a template, you can always make a call to the message catalog API via a Tcl escape:<%= [_ $locale bboard.passwordPrompt "Enter Password"]%>However, this is awkward and ugly to use. We have defined an ADP tag which invokes the message catalog lookup. As explained in the previous section, since our system precompiles adp templates, we can get a performance improvement if we can cache the message lookups at template compile time.
The <TRN> tag is a call to lang_message_lookup that can be used inside of an ADP file. Here is the documentation:
Procedure that gets called when the <trn> tag is encountered on an ADP page. The purpose of the procedure is to register the text string enclosed within a pair of <trn> tags as a message in the catalog, and to display the appropriate translated string. Takes three optional parameters:lang
,type
andkey
.Example 1: Display the text string Hello on an ADP page (i.e. do nothing special):
key
specifies the key in the message catalog. If it is omitted this procedure returns simply the text enclosed by the tags.lang
specifies the language of the text string enclosed within the flags. If it is omitted value defaults to English.type
specifies the context in which the translation is made. If omitted, type is user which means that the translation is provided in the user's preferred language.static
specifies that this tag should be translated once at template compile time, rather than dynamically every time the page is run. This will be unneccessaru and will be deprecated once we have implemented effective locale based caching for templates.<trn>Hello</trn>Example 2: Assign the key key hello to the text string Hello and display the translated string in the user's preferred language:<trn key="hello">Hello</trn>Example 3: Specify that Bonjour needs to be registered as the French translation for the key hello (in addition to displaying the translation in the user's preferred language):<trn key="hello" lang="fr">Bonjour</trn>Example 4: Register the string and display it in the preferred language of the current user. Note that the possible values for thetype
parameter are determined by what has been implemented in thead_locale
procedure. By default, only theuser
type is implemented. An example of a type that could be implemented issubsite
, for displaying strings in the language of the subsite that owns the current web page.<trn key="hello" type="user">Hello</trn>Example 5: Translates the string once at template compile time, using the effective local of the page.
<trn key="hello" static>Hello</trn>
VII. Data Model Discussion
Internationalizing the Data Models
Some data which is stored in ACS package and core database tables may be presented to users, and thus may need to be stored in multiple languages. Examples of this are the descriptions of package or site parameters in the administrative interface, the "pretty names" of objects, and group names.Tables which are in acs kernel and have user-visible names that may need to be translated in order to create an admin back end in another language:
user groups: group_name acs_object_types: pretty_name pretty_plural acs_attributes: pretty_name pretty_plural acs_attribute_descriptions description (clob) procedure add_description- add a lang arg ? acs_enum_values ? pretty_name acs_privileges: pretty_name pretty_plural apm_package_types pretty_name pretty_plural apm_package "instance_name"? Maybe a given instance gets instantiated with a name in the desired language? apm_parameters: parameter_name section_nameOne approach is to split a table into two tables, one holding language-independent datam, and the other holding language-dependent data. This approach was described in the ASJ Multilingual Site Article .
In that case, it is convenient to create a new view which looks like the original table, with the addition of a language column that you can specify in the queries.
Drawbacks to Splitting Tables
It is not totally transparent to developersEvery query against the table which requests or modifies language-dependent columns must now include a WHERE clause to select the language.
Extra join may slow things down
The extra join of the two tables may cause queries to slow down,
although I am not sure what the actual performance hit might be. It
shouldn't be too large, because the join is against a fully
indexed table.
VIII. User Interface
IX. Configuration/Parameters
X. Code Examples
- fconfigure -encoding blah
- content type in outputheaders set for encoding conversion
ad_proc adp_parse_ad_conn_file {} { handle a request for an adp and/or tcl file in the template system. } { namespace eval template variable parse_level "" #ns_log debug "adp_parse_ad_conn_file => file '[file root [ad_conn file]]'" set parsed_template [template::adp_parse [file root [ad_conn file]] {}] db_release_unused_handles if { $parsed_template ne ""} { set content_type [ns_set iget [ns_conn outputheaders] "content-type"] if { $content_type eq "" } { set content_type [ns_guesstype [ad_conn file]] } else { ns_set idelkey [ns_conn outputheaders] "content-type" } ns_return 200 $content_type $parsed_template } }
XI. Future Improvements/Areas of Likely Change
XII. Authors
- System creator
- System owner: hqm@arsdigita.com
- Documentation author: hqm@arsdigita.com
XII. Revision History
The revision history table below is for this template - modify it as needed for your actual design document.
Document Revision # | Action Taken, Notes | When? | By Whom? |
---|---|---|---|
0.1 | Creation | 12/4/2000 | Henry Minsky |
0.2 | More specific template search algorithm, extended message catalog API to use package keys or other namespace | 12/4/2000 | Henry Minsky |
0.3 | Details on how the <TRN> tag works in templates | 12/4/2000 | Henry Minsky |
0.4 | Definition of effective locale for template caching, documentation of TRN tag | 12/12/2000 | Henry Minsky |
hqm@arsdigita.com