Well-structured XML Goes Cosmopolitan

Well-structured XML Goes Cosmopolitan

t is sometimes challenging to create and organize XML documents. This is even more true if the document content needs to be presented in several different languages. It creates unwanted dependencies and makes it harder to keep the document content up to date.

But don’t despair. When supporting multiple languages in XML, the key to success is to pay close attention to how you structure your documents. Following a few simple rules will help a lot. This article will help you minimize the effort.

Structuring the Document
There are two main possibilities for multilingual document structuring. Either you keep all the language descriptions near each other or you separate each language so individual language descriptions are held in their own sections or files. The first method is more content oriented while the latter is more language oriented.

The content-oriented structure is suitable for documents that have a mixture of language-specific and language-independent information. The language-oriented structure is useful when the number of languages grows or when the documents are large. The language structure makes it possible to separate each language’s description to its own XML document and to only invoke the ones that are needed (e.g. using xsl:include) during the processing.

Content-oriented Structure
Listing 1 shows a simple multilingual resume (Curriculum Vitae). The document is structured in the content-oriented way.

Listing 1. Content-oriented structure.

  Curriculum Vitae  Ansioluettelo  ???      Skills    Taidot        Project management    Projektinhallinta    ????????        Education    Koulutus        University of Technology    Teknillinen korkeakoulu    ?????    1993  

All the descriptions in different languages are located near each other. The descriptions are separated from each other using the standard xml:lang attribute. The descriptions that do not need localization are presented without xml:lang attribute. See the graduation year in the example above.

When the XML document contains xml:lang attributes the XPath function lang() can be used during transformation. Listing 2 provides an example of an XSL stylesheet that transforms the document into a localized XHTML page.

Listing 2. Transforming the content-oriented structure for a multilingual resume.

  en                      <xsl:value-of select="title[lang($currLang)]"/>                            

  • The stylesheet defines currLang parameter. This has the default of value “en” for the English version of the resume. Values “fi” or “ja” produce the resume in Finnish or Japanese, respectively.

    The first template in the stylesheet matches the root element “cv”. It produces the XHTML headers and the title for the resume. Note how the lang() function plays important role throughout the stylesheet by selecting the appropriate language branch from the source document. Finally the template processes the “skills” and “education” subtrees.

    The template that handles the “skills” element first produces a subtitle for this section and then processes each child element, i.e. the “skill” elements. Note that the lang() function is used in the predicate. Therefore, the resultant node set contains only the skills in the preferred language. The template for “skill” just outputs the element value within the XHTML list element.

    The “education” template produces the title and the school name using the lang() function again. The graduation year is common in all the languages the example resume supports and therefore does not require language filtering.

    Language-oriented Structure
    Listing 3 is the same resume example but this time structured in the language-oriented way.

    Listing 3. Language-oriented structure for a multilingual resume.

          Curriculum Vitae          Skills      Project management              Education      University of Technology      1993            Ansioluettelo          Taidot      Projektinhallinta              Koulutus      Teknillinen korkeakoulu      1993            ???          ??      ????????              ??      ?????      1993      

    This XML document in Listing 3 has a fundamentally different structure than the one in Listing 1. Listing 3 introduces a “language” element at the top level in the document structure. This splits the document into high-level sections?one for each language the author intends to support. Each part uses an identical copy of the document structure, substituting the appropriate words for that language. Note also that the xml:lang attribute is used only within “language” elements. There is no need to scatter these attributes throughout the document.

    Listing 4 is an XSL stylesheet for localized XHTML transformation.

    Listing 4: Transforming the language-oriented structure for a multilingual resume.

      en                              <xsl:value-of select="title"/>                            

  • The major difference between this stylesheet and the one in Listing 2 is in the first template, the one that processes the “cv” element. In Listing 4 the template selects the preferred language subtree and processes the elements of that subtree only. The rest of the templates do not need to worry about the current language because the choice has already been made. The stylesheet is therefore much more straightforward than the one in Listing 2.

    Using Separate Documents
    You can push this kind of structural efficiency even farther, if you like, by separating each disparate language section in your original XML document, into standalone documents instead. Listing 5 shows this ultimate separation solution for our example. Each section is a different language element that should be separated into a unique document.

    Listing 5. Separating each language into its own document.

    First, the English version.      Curriculum Vitae          Skills      Project management              Education      University of Technology      1993      In a separate document, the Finnish.      Ansioluettelo          Taidot      Projektinhallinta              Koulutus      Teknillinen korkeakoulu      1993      In yet another document, the Japanese, and so on.      ???          ??      ????????              ??      ?????      1993      

    Listing 6 utilizes these language-specific XML documents from Listing 5 and produces the resume in localized XHTML format.

    Listing 6. XSL stylesheet to produce the localized language version from separate XML documents.

      en          cv-            .xml                              <xsl:value-of select="title"/>                            

  • This stylesheet contains a template that handles the document root element (“/”). The separate CV documents have names like cv-lang.xml. E.g. cv-en.xml and cv-ja.xml. The stylesheet variable cvData is initialized with the document name and the document is loaded using XSLT document() function. The rest of the stylesheet is basically the same as in Listing 4.

    As we have seen there are a lot of options for organizing multilingual content. Each document structure has its pros and cons. The content-oriented structure is good when you have stable set of languages you need to support. It is easier to keep language variants synchronized in this structure. The drawback is more complex transformation.

    Language-oriented structure is strong when you expect the number of supported languages to grow. However, it is more difficult to keep all language versions synchronized. This article should help you decide which will work better for your localization needs, and help you format your XML properly to suit those conditions.


    About Our Editorial Process

    At DevX, we’re dedicated to tech entrepreneurship. Our team closely follows industry shifts, new products, AI breakthroughs, technology trends, and funding announcements. Articles undergo thorough editing to ensure accuracy and clarity, reflecting DevX’s style and supporting entrepreneurs in the tech sphere.

    See our full editorial policy.

    About Our Journalist