This paper discusses requirements and solutions for the localization of schema languages. Main requirements are the adaptation of markup names and documentation, the modification of data types and the integration of information which is relevant for internationalization and localization. Existing approaches which respond to these requirements are integrated into a general framework of schema language localization, which can be applied to XML Schema, RELAX NG and XML DTD. In addition, the approach allows for relating instances of localized schemas to instances of the general, locale independent schema. In this way, a common level of data processing is maintained.
Keywords: Schema Languages; Modeling
| XML Source | PDF (for print) | Author Link | Typeset PDF |
This paper discusses requirements and solutions for the localization of schema languages. Some example requirements are illustrated using the XML Schema document in fig. 1.
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://example.com/purchaseOrder"
xmlns:po="http://example.com/purchaseOrder">
<xs:element name="name" type="xs:string"/>
<xs:element name="street" type="xs:string"/>
<xs:element name="city" type="xs:string"/>
<xs:element name="state" type="xs:string"/>
<xs:element name="zip" type="xs:string"/>
<xs:element name="country" type="xs:string"/>
<xs:element name="price">
<xs:complexType>
<xs:simpleContent>
<xs:extension base="xs:decimal">
<xs:attribute name="currency" type="xs:string"/>
</xs:extension>
</xs:simpleContent>
</xs:complexType>
</xs:element>
<xs:element name="language" type="xs:string"/>
<xs:element name="comment">
<xs:complexType mixed="true">
<xs:sequence>
<xs:any processContents="lax" minOccurs="0" maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="shipTo">
<xs:complexType mixed="false">
<xs:sequence>
<xs:element ref="po:name"/>
<xs:element ref="po:street"/>
<xs:element ref="po:city"/>
<xs:element ref="po:state"/>
<xs:element ref="po:zip"/>
<xs:element ref="po:country"/>
<xs:element ref="po:price"/>
<xs:element ref="po:language"/>
<xs:element ref="po:comment" minOccurs="0"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="purchaseOrder">
<xs:complexType mixed="false">
<xs:sequence>
<xs:element ref="po:shipTo"/>
</xs:sequence>
<xs:attribute name="orderDate" type="xs:date" use="required"/>
</xs:complexType>
</xs:element>
</xs:schema>Example schema
The XML Schema document is motivated by the purchase order example introduced in XML Schema Part 0: Primer [XML Schema 0]. There are several possible targets for localization in fig. 1:
It is not a difficult task to implement these requirements ad hoc in a given schema: a schema user needs just to translate all element names, modify data types as appropriate, and provide information about translatability for various element and attribute types. However, there are some chances and needs for the localization of schemas which are better realized with mechanisms applicable for schema languages in general:
The paper is organized as follows: sec. “Background” describes basic terms like locale and locale identification, and input to this paper from existing approaches and data to schema language localization. Sec. “Localization of Schema Languages: Realization” describes a format which integrates the existing approaches and allows for using locale data in schema localization. The paper finishes with a description of the implementation of the approach in sec. “Implementation” and a summary and outlook in sec. “Summary and Outlook”.
Internationalization is the process of making a product ready for its global use. Localization is the process of the actual adaptation of the product to a specific locale, that is a country, region or market. See the definition of Localization vs. Internationalization [i18n l10n] for further information on these terms.
Taking the area of schema languages, an example for schema internationalization is to provide markup to express directionality of text in scripts with mixed directionality. With such markup, document instances of the schema can be created by users who work with Arabic or Hebrew texts. The translation of element and attribute names mentioned in sec. “Introduction” can be part of the localization of the schema.
A key definition for this paper is the notion of a locale. LDML describes a locale as follows:
[...] a locale is an identifier (id) that refers to a set of user preferences [which ...] provide support for formatting and parsing of dates, times, numbers, and currencies; for measurement units, for sort-order (collation), plus translated names for timezones, languages, countries, and scripts. They can also include text boundaries (character, word, line, and sentence), text transformations (including transliterations), and support for other services.
Unfortunately there is no general agreement about what a locale is. For example LDML and the POSIX locale model differ in many areas. Also, it is not clear whether language should be the center of the locale. Although this is often adequate, there are cases like time zone information, which are independent of a specific language or a distinguished set of languages.
This paper mainly follows LDML by using language as the center of a locale. Nevertheless, there are examples in the paper which do not rely on language like the description of locale specific currency names.
Since a clear definition of the term locale is not possible, a problem arises: How is a locale identified? Or to put it differently: How can the choice of German versus English translated names, modified data types or other localization information like in fig. 1 be made explicit?
The solution is to use the locale identifiers defined in LDML. Since LDML puts language in the core of its local model, it is natural that it applies a standard for language identification and matching of so-called language tags: the IETF BCP 47 [Best Common Practice 47] [BCP 47]. Previously, BCP 47 was represented by RFC 3066 [RFC 3066]. Recently RFC 3066 was replaced by RFC 4646 [RFC 4646] and RFC 4647 [RFC 4647]. RFC 4646 describes the structure of a language tag, and RFC 4647 describes requirements for matching of language tags. Parts of the syntax of an RFC 4646 language tag and its application in LDML are introduced in fig. 2.
RFC 4646 SYNTAX (part)
Language-Tag = langtag
/ privateuse ; private use tag
/ grandfathered; grandfathered
langtag = (language
["-" script]
["-" region]
*("-" variant)
*("-" extension)
["-" privateuse])
LDML LOCALE IDENTIFIER
locale_id := base_locale_id options?
base_locale_id := extended_RFC3066bis_identifiers
options := "@" key "=" type ("," key "=" type )*
SAMPLE IDENTIFIER
de_DE@collation=phonebook,currency=DDMPart of the Syntax of an RFC 4646 language tag, and its application in an LDML locale identifier
RFC 4646 defines a syntax for language tags. These consist of one or several subtags. The subtags provide information about language (language), script (script), region (region) and variants (variant). The values of these subtags are registered in the iana language subtag registry (http://www.iana.org/assignments/language-subtag-registry). In addition, there are extension (extension) subtags or subtags for private use (privateuse), which are not part of the subtag registry.
LDML extends such a language tag with zero or more keys. In fig. 2 there are keys for a collation (German phonebook order) and the currency (DDM which means "East German Ostmark"). Another difference is that LDML uses the delimiter _ instead of - between subtags.
The requirements of schema language localization described in sec. “Introduction” can make use of various parts of a locale identifier:
The following subsections describe existing approaches towards schema localization. These will be introduced here, modified and later used for a general schema language localization approach.
The TEI ODD format [TEI ODD] created by the TEI [Text Encoding Initiative] is used for a literate programming approach towards markup languages. An ODD [One Document Does It All] document provides both markup declarations and their documentation. ODD is used for the creation of the TEI guidelines (that is, the TEI documentation and schemas in the schema languages XML Schema, RELAX NG and XML DTD) themselves. But it is also applied within the Internationalization Tag Set 1.0 specification [ITS 10], see sec. “Information about XML Localization (and Internationalization): ITS 1.0”.
For localization, ODD provides facilities for renaming and adaptation of documentation, as described in a presentation on the internationalization and localization of the TEI [TEI LOC]. The former are relevant for this paper and are exemplified in fig. 3. The element declaration of <city> is associated with translations into German and Japanese.
<define name="city-elem"> <elementSpec ident="city"> <altIdent xml:lang="de">Stadt</altIdent> <altIdent xml:lang="ja">都市</altIdent> ....</elementSpec> </define>
Element renaming with ODD
To be able to use translated elements in content models, the ODD approach keeps the names of RELAX NG patterns, like the city-elem pattern above. The content models refer only to these patterns and not to element declarations directly. This approach works since the ODD format is processed one-way: from an ODD document to generated schemas. Hence, there is no need to provide a mechanism for the localization of global element declarations.
This paper differs in the TEI approach by providing such a mechanism, see sec. “Adaptation of Names”. Only in this way it is possible to localize existing schemas. This paper follows the TEI approach by not changing encapsulation mechanisms like patterns in RELAX NG schemas, names of groups and type definitions in XML Schema, or entities in XML DTDs.
LDML is an XML format to represent locale information. It provides the structure for locale data in CLDR. See http://unicode.org/cldr/repository_access.html for the latest deliverable of CLDR (which includes a specification describing LDML).
A part of LDML is the definition of locale identifiers introduced in sec. “Locale Identification”. LDML defines an inheritance and overriding model for locale identifiers. For example the locale locale_id "en" defines the display name for the currency USD as US Dollar. The locale locale_id "en_US" (English in the territory of the United States) overrides this definition and uses the display name $. In this paper only the locales directly following the neutral root locale are used.
An example of CLDR data represented in LDML is given in fig. 4.
<ldml>
<identity> [...] <language type="en"/>
</identity>
<localeDisplayNames>
<languages>
<language type="de">German</language> [...] </languages>
<scripts>
<script type="Latn">Latin</script> [...] </scripts>
<territories>
<territory type="DE">Germany</territory> [...] </territories>
<variants>
<variant type="1901">Traditional German orthography</variant>
<variant type="1996">German orthography of 1996</variant>
[...] </variants>
</localeDisplayNames>
<numbers>
<currencyFormats>
<currencyFormatLength>
<currencyFormat>
<pattern>¤#,##0.00</pattern>
</currencyFormat>
</currencyFormatLength>
</currencyFormats>
<currencies>
<currency type="USD">
<displayName>US Dollar</displayName>
</currency>
</currencies>
</numbers>
<dates> [...] <calendars>
<calendar type="gregorian">
<months>
<monthContext type="format">
<monthWidth type="wide">
<month type="1">January</month>
</monthWidth> [...] </monthContext>
</months>
<eras>
<eraNames>
<era type="0">Before Christ</era>
</eraNames> [...] </eras>
<dateFormats>
<dateFormatLength type="full">
<dateFormat>
<pattern>EEEE, MMMM d, yyyy</pattern>
</dateFormat>
</dateFormatLength> [...] </dateFormats>
</calendar>
</calendars>
</dates>
</ldml>Examples of CLDR data given in LDML
The example is an excerpt from CLDR data for the locale locale_id "en". The locale is identified via the <identiy> element as being specific to English, using the nested element <language type="en"/>. For the locale locale_id "en_US", there would be another nested element <territory type="US"/>.
The <localeDisplayNames> contains display names for languages, territories, variants etc. The <numbers> element contains information about currency formatting and currency display names. Date and time related information is represented in the <dates> element. Each set of date and time related information is specific to a calendar, for example <calendar type="gregorian">. For various parts of date and time tokens like years, months, days etc., there are lists of lexical items and patterns which make use of these. See the complete list of fields in patterns at http://www.unicode.org/reports/tr35/tr35-7.html#Date_Format_Patterns. An example lexical item is <month type="1">January</month> used within <monthWidth type="wide">. An example <pattern> is the <dateFormat> of the type <dateFormatLength type="full">. The <pattern> is EEEE, MMMM d, yyyy, which reads as:
The commas , is used as a separator. An example date would be Wednesday, April 18, 2007.
Using CLDR data for localization of XML Schema data types leads to a problem: In some areas, the CLDR data is semantically richer than related XML Schema data types. For example, there is no weekday information directly represented in the lexical space of the XML Schema data type date3. Nevertheless, for the purpose of this paper, this problem is not relevant: A user can produce semantically rich information using the localized data type. Only for the purpose of mapping localized information to the locale unspecific representation described in sec. “Introduction”, information like the weekday will not be taken into account.
Another type of problem arises with CLDR data which does not map directly to a related XML Schema data type. An example is the Japanese calendar data which is part of the locale locale_id "ja" and identified as <calendar type="japanese">. This calendar separates years into 235 areas. Each area starts for a new Emperor. The Gregorian year 2007 maps to the year 19 of the area HEISEI. However, the mapping relation (i.e. the first year HEISEI maps to 1988) is not available in CLDR.
A solution to this problem might be to apply additional data, which is currently not part of CLDR. For example the mapping data for the Japanese calendar is available at http://source.icu-project.org/repos/icu/icu/trunk/source/i18n/japancal.cpp. The approach described in this paper currently relies only on CLDR.
To summarize, for the purposes of data type localization and mapping of localized data types to locale unspecific representations, CLDR provides a great variety of data. However, not all of this data can be used "as is", since it might be semantically too rich or the mapping would need more information than available. Hence, the data type modification has to be specific to a locale and an adequate part of the locale data.
ITS 1.0 is a standard4 for the expression of information related to Internationalization and Localization. It defines 7 data categories which convey different kinds of information:
These data categories can be implemented locally or globally. An example of both approaches for the Translate data category is given in fig. 5.
DOCUMENT 1 (local ITS markup): <help xmlns:its="http://www.w3.org/2005/11/its" its:version="1.0"> <head> <title>Building the Zebulon Toolkit</title> </head> <body> <p>To re-compile all the modules of the Zebulon toolkit you need to go in the <path its:translate="no">\Zebulon\Current Source\binary</path> directory. Then from there, run batch file <cmd its:translate="no">Build.bat</cmd>.</p> </body> </help> DOCUMENT 2 (global ITS rules): <help> <head> <title>Building the Zebulon Toolkit</title> <its:rules version="1.0" xmlns:its="http://www.w3.org/2005/11/its" its:version="1.0"> <its:translateRule selector="//path | //cmd" translate="no"/> </its:rules> </head> <body> <p>To re-compile all the modules of the Zebulon toolkit you need to go in the <path>\Zebulon\Current Source\binary</path> directory. Then from there, run batch file <cmd>Build.bat</cmd>.</p> </body> </help>
Example of the Translate data category implemented locally and globally
In the example documents, the content of the <path> and the <cmd> elements should not be translated. In the first document, this information is conveyed locally via the ITS attribute its:translate="no" on the respective elements. In the second document, an ITS global rule <its:translateRule> is used for the same purpose. Global rules make use of XPath to select piece of markup, i.e. via the selector="//path | //cmd" attribute. In this way, the global implementation of data categories is independent of a position in a target document.
This paper will make use of ITS information within element or attribute declarations. That is, the information will be relevant for a document type and all its instances. This approach of ITS information on the schema level makes sense for the ITS 1.0 data categories Translate, Localization Note, Terminology and Elements Within Text. The remaining data categories Directionality, Ruby and Language Information will not be applied on the schema level, since in their case it is unlikely that every instance of an element or attribute has the same ITS 1.0 data category related information5.
The following table summarizes the input of existing approaches to (schema) localization and their modification made within this paper.
| Requirement | Input Approach | Modification |
|---|---|---|
| Translation of names for elements and attributes | TEI ODD (see sec. “The TEI Approach towards Markup Language Localization”) | Providing a mechanism for translation of global element declarations |
| Modification of data types | CLDR (see sec. “Overview of CLDR and LDML”) | Omitting CLDR categories which are semantically too rich or which miss information for a mapping to local unspecific data types; for this purpose, proving access specific to a locale and an adequate part of the locale data |
| Document type specific information about internationalization and localization | ITS 1.0 (see sec. “Information about XML Localization (and Internationalization): ITS 1.0”) | Omitting ITS 1.0 data categories which are not useful on a schema level |
In the following section, a general approach towards schema language localization will be introduced. Before, some potential implementation mechanisms will be discussed here.
DSRL [Document Schema Renaming Language] [ISO/IEC 19757-8] provides a means to rename names of elements, attributes, processing instructions etc. from a source into a target vocabulary. DSRL basically provides the functionality of Architectural Forms [ISO/IEC 10744]. An example of DSRL is given in Fig. 6.
<dsrl:element-name-map target="po:purchaseOrder">poloc:Kaufbestellung</dsrl:element-name-map> <dsrl:attribute-name-map target="po:purchaseOrder[@orderDate]">Lieferdatum</dsrl:attribute-name-map>
Renaming of Elements and Attributes with DSRL
The fragment of a DSRL document shows the renaming of the <purchaseOrder> element to <Kaufbestellung>, and the orderDate attribute to Lieferdatum.
An approach which can be used for the implementation of data type definitions and data type modification is DTLL [ Data Type Library Language] [ISO/IEC 19757-5], see fig. 7.
<datatype name="monthNamesGerman">
<super type="month" />
<parse name="month">
<enumeration code="@name"
values="document('months.xml')/months/month"/>
</parse>
<property name="Januar" select="$month/@january" />
<property name="Februar" select="$month/@february" />
<property name="März" select="$month/@march" />
</datatype>
The DTLL document specifies the relation of locale specific month names like März to their general counterpart march. DTLL allow for specifying much more complex relations than simply value mapping and could also be used for implementing other data type modifications described above.
In summary, both DSRL and DTLL could be used to implement some requirements for schema localization, but they are not used in this paper. The reason is that the goal of this paper is to show one framework for specifying localization and internationalization information in a schema. The implementation described in sec. “Implementation” goes a direct way to XSLT, which seems to be simpler than taking a step through DSRL or DTLL.
From a different point of view, the approach of this paper may look like a danger of getting in the way of the customization mechanisms already built into a schema language or related technologies. Both perspectives are understandable: the need to reduce the numer of involved technologies as much as possible, and the need to have one framework for a specific purpose. This paper puts an emphasis on the later perspective, taking the position of localization workers into account who might benefit from a single framework for their needs.
The outline of the approach is demonstrated in fig. 8.
<loc:localInfo locale="de_DE">
[...]
</loc:localInfo>
regular expression for locale value testing:
(
((([a-z]|[A-Z]){2,3})|(([a-z]|[A-Z]){5,8}))
(_(([a-z]|[A-Z]){4}))?
(_(([a-z]|[A-Z]){2}|\d{3}))?
(_(([a-z]|[A-Z]|\d){5,8})|(\d{1}([a-z]|[A-Z]|\d){3}))?
)
|
(
(([a-z]|[A-Z]){1,3})((_([a-z]|[A-Z]){2,8}){1,2})?
)"/>Container for locale related information and regular expression for locale identifier check
The container for locale related information is an <localInfo> element with a mandatory locale attribute. The value of that attribute specifies the target locale. The figure contains the regular expression which is used for testing the locale value. It is based on the ABNF in sec. 2.1 of RFC 4646. The difference is that the delimiter between sub tags - is replaced with _, to follow the LDML convention mentioned in sec. “Locale Identification”.
Locale information can be applied as schema annotation or in a separate document. The former usage is applicable for XML Schema or RELAX NG and will be exemplified in most of the following sections. The latter usage is used to apply such a description to XML DTDs6. It is exemplified in fig. 9.
<loc:localInformation xmlns:loc="http://example.com/schemalocalization" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xs="http://www.w3.org/2001/XMLSchema" xsi:schemaLocation="http://example.com/schemalocalization localInfo.xsd"> <loc:localInfo locale="de" targetDeclaration="xs:element[@name='purchaseOrder']" generalization="Kaufbestellung" xmlns:po="http://example.com/purchaseOrder"> <loc:altIdent>Kaufbestellung</loc:altIdent> </loc:localInfo> <loc:localInfo locale="de" targetDeclaration="xs:element[@name='purchaseOrder']" generalization="Kaufbestellung"> <loc:altDocumentation>Schema zu Kaufbestellungen</loc:altDocumentation> </loc:localInfo> <loc:localInfo locale="de" targetDeclaration="xs:attribute[@name='orderDate']" generalization="Kaufbestellung/@Lieferdatum"> <loc:dateInfo calendarType="Gregorian" dateFormatLengthType="long"/> </loc:localInfo> <loc:localInfo locale="de" targetDeclaration="xs:element[@name='language']" generalization="Sprache"> <loc:localeDisplayNames type="languages"/> </loc:localInfo> <loc:localInfo targetDeclaration="xs:element[@name='purchaseOrder']" translate="no" locale="de"> <loc:locNote>Localization already available in German. Make sure that the exising localization can be reused as much as possible. </loc:locNote> </loc:localInfo> </loc:localInformation>
"Standoff" usage of locale information
The document contains all locale information which will be discussed in the following sections. To be able to apply this information to global or local declarations in an XML Schema document, the XPath expressions in the targetDeclaration attribute can be exploited. An additional, optional generalization attribute provides information for the transformation to an instance of the general schema.
During the development of this approach, XML Schema: Component Designators [SCD] have been considered as an alternative means to XPath for selecting declarations. This idea was dropped since XPath is applicable to various schema languages, whereas component designators are specific to XML Schema.
An example of this functionality is given in fig. 10:
<xs:element name="purchaseOrder">
<xs:annotation>
<xs:appinfo>
<loc:localInfo locale="de_DE">
<loc:altIdent>Kaufbestellung</loc:altIdent>
</loc:localInfo>
</xs:appinfo>
</xs:annotation> ... </xs:element>Renaming of elements
The <localInfo> element contains an <altIdent> element which fulfils a similar functionality as the <altident> element in the TEI localization approach described in sec. “The TEI Approach towards Markup Language Localization”. The difference is that the <altIdent> element here can be applied to locally declared elements and global elements in an XML Schema.
The mapping to a locale unspecific representation (i.e. from <Kaufbestellung> to <purchaseOrder> is realized by exploiting the generalization attributes in fig. 9. The XPath expressions can be used to transform an instance of the localized schema to an instance of the general schema.
This functionality is demonstrated in fig. 11.
<xs:element name="purchaseOrder">
<xs:annotation>
<loc:localInfo locale="de_DE">
<loc:altDocumentation>Schema zu
Kaufbestellungen</loc:altDocumentation>
</loc:localInfo> ... </xs:annotation> ... </xs:element>Translation of documentation
This approach is again very similar to the TEI localization approach described in sec. “The TEI Approach towards Markup Language Localization”. The difference is again that it can be applied both for global and local markup declarations.
In this section the adaptation of the date data type will be exemplified. Only a subset of the fields described in LDML are used:
With this information the Gregorian calendar can be represented. CLDR provides many other calendars as well. Nevertheless, for many locales there is at least a Gregorian calendar, which allows for using these fields, and eases the usage of CLDR information in XML Schema. Fig. 12 demonstrates how the localization of the orderDate attribute is achieved.
<xs:attribute name="orderDate" type="xs:date">
<xs:annotation>
<xs:appinfo>
<loc:localInfo locale="de">
<loc:dateInfo calendarType="Gregorian"
dateFormatLengthType="long"/>
</loc:localInfo>
</xs:appinfo>
</xs:annotation>
</xs:attribute>Adaptation of the date data type
The <dateInfo> element contains two attributes: calendarType describes the calendar to be used (currently only Gregorian). dateFormatLengthType describes the date format, see the <dateFormatLength> element in fig. 4.
A date format pattern like <pattern>EEEE, MMMM d, yyyy</pattern> can be used for two purposes. First, the type of the not localized declaration can be changed to a localized version. E.g., instead of 2007-03-16, one would have Wednesday, March 16, 2007. In that case, a regular expression is being created which covers constraints imposed by CLDR. The second usage is the creation of a canonical date representation (i.e. an instance of the data type xs:date) from a localized value: to create 2007-03-16 from Wednesday, March 16, 2007. For this purpose some fields in the localized value have to be omitted, like the weekdays. Nevertheless the comparison of values from different locales becomes possible, and the application of date related functions in e.g. XPath 2.0.
The adaptation of locale display names basically means the creation of enumeration lists. The necessary information will be exemplified for display names of languages in fig. 13.
<xs:element name="language">
<xs:annotation>
<xs:appinfo>
<loc:localInfo locale="de">
<loc:localeDisplayNames type="languages"/>
</loc:localInfo>
</xs:appinfo>
</xs:annotation> ... </xs:element>
The adaptation of locale display names basically means the creation of enumeration lists. This will be exemplified for display names of languages in fig. 9
The <localeDisplayNames> element contains a type attribute. It defines what kind of data from CLDR has to be used. In addition to language display names, there are currency, territories and variants.
Using this information, enumeration lists containing the localized display names can be generated. The generation of locale unspecific display names (e.g. from US Dollar to USD) uses the same information from the <localeDisplayNames> element and applies it in the other direction.
Fig. 14 demonstrates how "Translate" and "Localization Note" related information can be provided for the <purchaseOrder> element.
<xs:element name="purchaseOrder">
<xs:annotation>
<loc:localInfo targetDeclaration="xs:element[@name='purchaseOrder']" translate="no"
locale="de">
<loc:locNote>Localization already available in German.
Make sure that the existing localization can be reused as much as possible.</loc:locNote>
</loc:localInfo>
</xs:annotation> [...] </xs:element>Information about internationalization and localization
The <localInfo> element contains the translate attribute and the <locNote> element. Their function is identical to the markup described in ITS 1.0. The difference is that in the case described in fig. 14 they apply to all instances of the markup which they are attached to. In the case of the "standoff" usage described in fig. 9, they apply to all element or attribute declarations selected by the targetDeclarations attribute.
It depends on the application (e.g. a translation or localization tool), how this information should be processed. Example applications are given in sec. 2 of ITS 1.0.
The example schema in fig. 15 implements the requirements formulated in sec. “Introduction”, using the existing approaches described in sec. “Background”. It can be generated relying on the markup described in sec. “Localization of Schema Languages: Realization”.
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://example.com/purchaseOrderLocalized"
xmlns:loc="http://example.com/schemalocalization"
xmlns:poloc="http://example.com/purchaseOrderLocalized">
<xs:annotation>
<xs:appinfo>
<loc:baseSchema uri="purchaseOrderExample.xsd"/>
</xs:appinfo>
</xs:annotation>
<xs:element name="Name" type="xs:string"/>
<xs:element name="Straße" type="xs:string"/>
<xs:element name="Stadt" type="xs:string"/>
<xs:element name="Bundesland" type="xs:string"/>
<xs:element name="PLZ" type="xs:string"/>
<xs:element name="Land">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:enumeration value="Vereinigte Staaten"/>
<xs:enumeration value="Deutschland"/>
<xs:enumeration value="Japan"/>
<!-- ... -->
</xs:restriction>
</xs:simpleType>
</xs:element>
<xs:element name="Sprache">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:enumeration value="Englisch"/>
<xs:enumeration value="Deutsch"/>
<xs:enumeration value="Japanisch"/>
<!-- ... -->
</xs:restriction>
</xs:simpleType>
</xs:element>
<xs:element name="Preis">
<xs:complexType>
<xs:simpleContent>
<xs:extension base="xs:decimal">
<xs:attribute name="Waehrung">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:enumeration value="US Dollar"/>
<xs:enumeration value="Yen"/>
<xs:enumeration value="Europäische Währungseinheit (XBB)"/>
<!-- ... -->
</xs:restriction>
</xs:simpleType>
</xs:attribute>
</xs:extension>
</xs:simpleContent>
</xs:complexType>
</xs:element>
<xs:element name="Kommentar">
<xs:complexType mixed="true">
<xs:sequence>
<xs:any processContents="lax" minOccurs="0" maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="Lieferaddresse">
<xs:complexType mixed="false">
<xs:sequence>
<xs:element ref="poloc:Name"/>
<xs:element ref="poloc:Straße"/>
<xs:element ref="poloc:Stadt"/>
<xs:element ref="poloc:Bundesland" minOccurs="0"/>
<xs:element ref="poloc:PLZ"/>
<xs:element ref="poloc:Land"/>
<xs:element ref="poloc:Sprache"/>
<xs:element ref="poloc:Preis"/>
<xs:element ref="poloc:Kommentar" minOccurs="0"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="Kaufbestellung">
<xs:complexType mixed="false">
<xs:sequence>
<xs:element ref="poloc:Lieferaddresse"/>
</xs:sequence>
<xs:attribute name="Lieferdatum" use="required">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:pattern
value="\d{2}\.\s+(Januar|Februar|März|April|Mai|Juni|Juli|August
|September|Oktober|November|Dezember)\s+\d{4}"
/>
</xs:restriction>
</xs:simpleType>
</xs:attribute>
</xs:complexType>
</xs:element>
</xs:schema>
Localized Schema 1
As an example target locale, a German user will be assumed. For such a user, all element names are translated into German, and the types of elements are modified (see for example the Land element). The elements contain no information about the localization of document instances, e.g. the translate attribute. This information will be used not for the localized schema, but instance documents of the locale unspecific schema.
The pattern of the data type for the <Lieferdatum> element is based on the pattern d. MMMM yyyy. However, this pattern is relaxed regarding whitespace, to allow the user for a convenient input of data.
The ongoing implementation is realized with XSLT 2.0 [XSLT20]. One stylesheet is used for the creation of localized schemas. It takes as an input:
If the schema language is DTD, the stylesheet generates a Schematron document which can be used for validation with modified data types. For element and attribute renaming in XML DTDs, a separate script will be used. If the schema language is RELAX NG or XML Schema, the data type modifications and other changes will be made in the schema itself.
For the transformation of instances of a localized schema to instances of the general schema, another stylesheet is used. It takes the same input information as the stylesheet described above and the document instance. A third stylesheet is used to create "native" global ITS 1.0 markup from information like in fig. 14. The output can be processed by ITS 1.0 processors which are independent of the approach described in this paper.
This paper introduced a framework for the localization of schema languages (XML Schema, RELAX NG and XML DTD) which can be applied to modify markup names and data types, and add information about localization and internationalization to a schema. It made use of existing approaches and data and integrated them in a general manner.
Desires for the future of this approach concern mainly the modification of data types. More data types like dateTime and time need to be processed. More information like other calendars than Gregorian calendar needs to be taken into account. And it should be investigated if a modification only of lexical values of data types is possible, while keeping the not localized value internally in a schema processor. This requires a finite set of symbols in the lexical space of localized data types. The data provided by CLDR cannot be applied as is for this purpose, but the possibility seems to be promising.
|
An example is the access to the localized dates like 16. März 2007 (for a German user) or 2007年3月16日 (for a Japanese user). To make them comparable, both need to be converted to the locale-independent lexical representation 2007-03-16. |
|
|
An example: the locale display name for the region DE in English (that is, target locale locale_id "en") would be Germany. In Japanese (that is, target locale locale_id "ja"), it would be ドイツ. |
|
|
To be more precise: the date data type does not represent this information directly in its lexical space. Nevertheless, it is possilbe to derive weekday information from a date value. |
|
|
The ITS 1.0 specification is written in the ODD format. However, the ODD source document at http://www.w3.org/TR/2007/REC-its-20070403/itstagset.xml does not make use of the ODD localization facilities mentioned in sec. “The TEI Approach towards Markup Language Localization”. |
|
|
For example, it is unlikely that text Directionality is identical for all instances of an element, or that all textual content needs the same Ruby annotation. |
|
|
DTDs offer means to describe locale information within a DTD itself, e.g. via fixed values. That is, standoff information is not the only way to express such information. Nevertheless, this paper proposes such standoff annotation for DTDs, to ease the task of adding information more complex than attribute values, e.g. for localization notes. |
[BCP 47] A. Phillips, M. Davis, eds. Tags for Identifying Languages. IETF, September 2006. Available at http://www.rfc-editor.org/rfc/bcp/bcp47.txt.
[CLDR] Common Locale Data Registry. Available at http://unicode.org/cldr/.
[i18n l10n] R. Ishida, S. Miller. Localization vs. Internationalization.. Article of the W3C Internationalization Activity, January 2006. Available at http://www.w3.org/International/questions/qa-i18n.
[ISO/IEC 10744] Information Technology - Hypermedia/Time-based Structuring Language (HyTime). International Organization for Standardization, 1997.
[ISO/IEC 19757-5] Information Technology - Document Schema Definition Languages (DSDL) - Part 5: Data Type Library Language - DTLL, ISO/IEC 19757-5. International Organization for Standardization, 2006 (under development).
[ISO/IEC 19757-8] Information Technology - Document Schema Definition Languages (DSDL) - Part 8: Document Schema Renaming Language - DSRL, ISO/IEC 19757-8. International Organization for Standardization, 2006 (under development).
[ITS 10] C. Lieske, F. Sasaki, eds. Internationalization Tag Set (ITS) 1.0. W3C Recommendation April 2007. Available at http://www.w3.org/TR/2007/REC-its-20070403/.
[LDML] Locale Data Markup Language. Unicode Technical Standard #35, November 2006. Available at http://unicode.org/reports/tr35/tr35-7.html.
[RFC 3066] H. Alvestrand, ed. Tags for the Identification of Languages. IETF, January 2001. Available at http://www.rfc-editor.org/rfc/rfc3066.txt.
[RFC 4646] A. Phillips, M. Davis, eds. Tags for the Identification of Languages. IETF, September 2006. Available at http://www.rfc-editor.org/rfc/rfc4646.txt.
[RFC 4647] A. Phillips, M. Davis, eds. Matching of Language Tags. IETF, September 2006. Available at http://www.rfc-editor.org/rfc/rfc4647.txt.
[SCD] Holstege, M. A. S. Vedamuthu, eds. Schema Component Designators. W3C Working Draft 29 March 2005. Available at http://www.w3.org/TR/2005/WD-xmlschema-ref-20050329/.
[TEI LOC] S. Rahtz. Towards an internationalized and localized TEI. Presentation, Kyoto, May 2006. Available at http://tei.oucs.ox.ac.uk/Oxford/2006-05-17-kyoto/i18n.xml.
[TEI ODD] C.M. Sperberg-McQueen, L. Burnard, eds. TEI P5: Guidelines for Electronic Text Encoding and Interchange. TEI Consortium, March 2007 (release 06). Chapter 23 "Documentation Elements". The latest version of TEI P5 is available at http://www.tei-c.org/release/doc/tei-p5-doc/html/.
[XML Schema 0] D. C. Fallside, P. Walmsley, eds. XML Schema Part 0: Primer Second Edition. W3C Recommendation, October 2004. Available at http://www.w3.org/TR/2004/REC-xmlschema-0-20041028/.
[XSLT20] M. Kay, ed. XSL Transformations (XSLT) Version 2.0. W3C Recommendation, January 2007. Available at http://www.w3.org/TR/2007/REC-xslt20-20070123/.