t is sometimes challenging to create and organize XML documents. This is even more true if the document content needs to be presented in several different languages. It creates unwanted dependencies and makes it harder to keep the document content up to date.
But don’t despair. When supporting multiple languages in XML, the key to success is to pay close attention to how you structure your documents. Following a few simple rules will help a lot. This article will help you minimize the effort.
Structuring the Document
There are two main possibilities for multilingual document structuring. Either you keep all the language descriptions near each other or you separate each language so individual language descriptions are held in their own sections or files. The first method is more content oriented while the latter is more language oriented.
The content-oriented structure is suitable for documents that have a mixture of language-specific and language-independent information. The language-oriented structure is useful when the number of languages grows or when the documents are large. The language structure makes it possible to separate each language’s description to its own XML document and to only invoke the ones that are needed (e.g. using xsl:include) during the processing.
Content-oriented Structure
Listing 1 shows a simple multilingual resume (Curriculum Vitae). The document is structured in the content-oriented way.
Listing 1. Content-oriented structure.
Curriculum Vitae Ansioluettelo ??? Skills Taidot Project management Projektinhallinta ???????? Education Koulutus University of Technology Teknillinen korkeakoulu ????? 1993
All the descriptions in different languages are located near each other. The descriptions are separated from each other using the standard xml:lang attribute. The descriptions that do not need localization are presented without xml:lang attribute. See the graduation year in the example above.
When the XML document contains xml:lang attributes the XPath function lang() can be used during transformation. Listing 2 provides an example of an XSL stylesheet that transforms the document into a localized XHTML page.
Listing 2. Transforming the content-oriented structure for a multilingual resume.
en
The stylesheet defines currLang parameter. This has the default of value “en” for the English version of the resume. Values “fi” or “ja” produce the resume in Finnish or Japanese, respectively.
The first template in the stylesheet matches the root element “cv”. It produces the XHTML headers and the title for the resume. Note how the lang() function plays important role throughout the stylesheet by selecting the appropriate language branch from the source document. Finally the template processes the “skills” and “education” subtrees.
The template that handles the “skills” element first produces a subtitle for this section and then processes each child element, i.e. the “skill” elements. Note that the lang() function is used in the predicate. Therefore, the resultant node set contains only the skills in the preferred language. The template for “skill” just outputs the element value within the XHTML list element.
The “education” template produces the title and the school name using the lang() function again. The graduation year is common in all the languages the example resume supports and therefore does not require language filtering.
Language-oriented Structure
Listing 3 is the same resume example but this time structured in the language-oriented way.
Listing 3. Language-oriented structure for a multilingual resume.
Curriculum Vitae Skills Project management Education University of Technology 1993 Ansioluettelo Taidot Projektinhallinta Koulutus Teknillinen korkeakoulu 1993 ??? ?? ???????? ?? ????? 1993
This XML document in Listing 3 has a fundamentally different structure than the one in Listing 1. Listing 3 introduces a “language” element at the top level in the document structure. This splits the document into high-level sections?one for each language the author intends to support. Each part uses an identical copy of the document structure, substituting the appropriate words for that language. Note also that the xml:lang attribute is used only within “language” elements. There is no need to scatter these attributes throughout the document.
Listing 4 is an XSL stylesheet for localized XHTML transformation.
Listing 4: Transforming the language-oriented structure for a multilingual resume.
en
The major difference between this stylesheet and the one in Listing 2 is in the first template, the one that processes the “cv” element. In Listing 4 the template selects the preferred language subtree and processes the elements of that subtree only. The rest of the templates do not need to worry about the current language because the choice has already been made. The stylesheet is therefore much more straightforward than the one in Listing 2.
Using Separate Documents
You can push this kind of structural efficiency even farther, if you like, by separating each disparate language section in your original XML document, into standalone documents instead. Listing 5 shows this ultimate separation solution for our example. Each
Listing 5. Separating each language into its own document.
First, the English version. Curriculum Vitae Skills Project management Education University of Technology 1993 In a separate document, the Finnish. Ansioluettelo Taidot Projektinhallinta Koulutus Teknillinen korkeakoulu 1993 In yet another document, the Japanese, and so on. ??? ?? ???????? ?? ????? 1993
Listing 6 utilizes these language-specific XML documents from Listing 5 and produces the resume in localized XHTML format.
Listing 6. XSL stylesheet to produce the localized language version from separate XML documents.
en cv- .xml
This stylesheet contains a template that handles the document root element (“/”). The separate CV documents have names like cv-lang.xml. E.g. cv-en.xml and cv-ja.xml. The stylesheet variable cvData is initialized with the document name and the document is loaded using XSLT document() function. The rest of the stylesheet is basically the same as in Listing 4.
As we have seen there are a lot of options for organizing multilingual content. Each document structure has its pros and cons. The content-oriented structure is good when you have stable set of languages you need to support. It is easier to keep language variants synchronized in this structure. The drawback is more complex transformation.
Language-oriented structure is strong when you expect the number of supported languages to grow. However, it is more difficult to keep all language versions synchronized. This article should help you decide which will work better for your localization needs, and help you format your XML properly to suit those conditions.