Browse DevX
Sign up for e-mail newsletters from DevX


Well-structured XML Goes Cosmopolitan

Everyone knows that among its many uses, XML can store the same data in multiple languages and dish it out like a U.N. translator when the computers of the world come calling. What everyone doesn't know is how best to structure that polyglot of data. There's more than one way




Building the Right Environment to Support AI, Machine Learning and Deep Learning

t is sometimes challenging to create and organize XML documents. This is even more true if the document content needs to be presented in several different languages. It creates unwanted dependencies and makes it harder to keep the document content up to date.

But don't despair. When supporting multiple languages in XML, the key to success is to pay close attention to how you structure your documents. Following a few simple rules will help a lot. This article will help you minimize the effort.

Structuring the Document
There are two main possibilities for multilingual document structuring. Either you keep all the language descriptions near each other or you separate each language so individual language descriptions are held in their own sections or files. The first method is more content oriented while the latter is more language oriented.

The content-oriented structure is suitable for documents that have a mixture of language-specific and language-independent information. The language-oriented structure is useful when the number of languages grows or when the documents are large. The language structure makes it possible to separate each language's description to its own XML document and to only invoke the ones that are needed (e.g. using xsl:include) during the processing.

Content-oriented Structure
Listing 1 shows a simple multilingual resume (Curriculum Vitae). The document is structured in the content-oriented way.

Listing 1. Content-oriented structure.

<?xml version="1.0" encoding="UTF-8"?> <cv> <title xml:lang="en">Curriculum Vitae</title> <title xml:lang="fi">Ansioluettelo</title> <title xml:lang="ja">履歴書</title> <skills> <title xml:lang="en">Skills</title> <title xml:lang="fi">Taidot</title> <title xml:lang="ja"></title> <skill xml:lang="en">Project management</skill> <skill xml:lang="fi">Projektinhallinta</skill> <skill xml:lang="ja">プロジェクト管理</skill> </skills> <education> <title xml:lang="en">Education</title> <title xml:lang="fi">Koulutus</title> <title xml:lang="ja"></title> <school xml:lang="en">University of Technology</school> <school xml:lang="fi">Teknillinen korkeakoulu</school> <school xml:lang="ja">技術の大学</school> <graduation>1993</graduation> </education> </cv>

All the descriptions in different languages are located near each other. The descriptions are separated from each other using the standard xml:lang attribute. The descriptions that do not need localization are presented without xml:lang attribute. See the graduation year in the example above.

When the XML document contains xml:lang attributes the XPath function lang() can be used during transformation. Listing 2 provides an example of an XSL stylesheet that transforms the document into a localized XHTML page.

Listing 2. Transforming the content-oriented structure for a multilingual resume.

<?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns="http://www.w3.org/1999/xhtml"> <xsl:param name="currLang">en</xsl:param> <xsl:output method="xml" encoding="UTF-8" indent="yes" doctype-public="-//W3C//DTD XHTML 1.0 Strict//EN"/> <xsl:template match="cv"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="{$currLang}" lang="{$currLang}"> <head> <title><xsl:value-of select="title[lang($currLang)]"/></title> <meta http-equiv="Content-type" content="text/html; charset=UTF-8" /> </head> <body> <h2><xsl:value-of select="title[lang($currLang)]"/></h2> <xsl:apply-templates select="skills"/> <xsl:apply-templates select="education"/> </body> </html> </xsl:template> <xsl:template match="skills"> <h3><xsl:value-of select="title[lang($currLang)]"/></h3> <ul> <xsl:apply-templates select="skill[lang($currLang)]"/> </ul> </xsl:template> <xsl:template match="skill"> <li><xsl:value-of select="."/></li> </xsl:template> <xsl:template match="education"> <h3><xsl:value-of select="title[lang($currLang)]"/></h3> <p> <xsl:value-of select="school[lang($currLang)]"/> <xsl:text> </xsl:text> <xsl:value-of select="graduation"/> </p> </xsl:template> </xsl:stylesheet>

The stylesheet defines currLang parameter. This has the default of value "en" for the English version of the resume. Values "fi" or "ja" produce the resume in Finnish or Japanese, respectively.

The first template in the stylesheet matches the root element "cv". It produces the XHTML headers and the title for the resume. Note how the lang() function plays important role throughout the stylesheet by selecting the appropriate language branch from the source document. Finally the template processes the "skills" and "education" subtrees.

The template that handles the "skills" element first produces a subtitle for this section and then processes each child element, i.e. the "skill" elements. Note that the lang() function is used in the predicate. Therefore, the resultant node set contains only the skills in the preferred language. The template for "skill" just outputs the element value within the XHTML list element.

The "education" template produces the title and the school name using the lang() function again. The graduation year is common in all the languages the example resume supports and therefore does not require language filtering.

Comment and Contribute






(Maximum characters: 1200). You have 1200 characters left.



Thanks for your registration, follow us on our social networks to keep up-to-date