Well-structured XML Goes Cosmopolitan (cont'd)
Using Separate Documents
You can push this kind of structural efficiency even farther, if you like, by separating each disparate language section in your original XML document, into standalone documents instead. Listing 5 shows this ultimate separation solution for our example. Each section is a different language element that should be separated into a unique document.
advertisement

Listing 5. Separating each language into its own document.


First, the English version.
<?xml version="1.0" encoding="UTF-8"?>
<cv>
<language xml:lang="en">
<title>Curriculum Vitae</title>
<skills>
<title>Skills</title>
<skill>Project management</skill>
</skills>
<education>
<title>Education</title>
<school>University of Technology</school>
<graduation>1993</graduation>
</education>
</language>
</cv>
In a separate document, the Finnish.
<?xml version="1.0" encoding="UTF-8"?>
<cv>
<language xml:lang="fi">
<title>Ansioluettelo</title>
<skills>
<title>Taidot</title>
<skill>Projektinhallinta</skill>
</skills>
<education>
<title>Koulutus</title>
<school>Teknillinen korkeakoulu</school>
<graduation>1993</graduation>
</education>
</language>
</cv>
In yet another document, the Japanese, and so on.
<?xml version="1.0" encoding="UTF-8"?>
<cv>
<language xml:lang="ja">
<title>履歴書</title>
<skills>
<title>技術</title>
<skill>プロジェクト管理</skill>
</skills>
<education>
<title>教育</title>
<school>技術の大学</school>
<graduation>1993</graduation>
</education>
</language>
</cv>

Listing 6 utilizes these language-specific XML documents from Listing 5 and produces the resume in localized XHTML format.

Listing 6. XSL stylesheet to produce the localized language version from separate XML documents.


<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet
version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns="http://www.w3.org/1999/xhtml">
<xsl:param name="currLang">en</xsl:param>
<xsl:output method="xml" encoding="UTF-8" indent="yes"
doctype-public="-//W3C//DTD XHTML 1.0 Strict//EN"/>
<xsl:template match="/"
<xsl:variable name="cvData">
<xsl:text>cv-</xsl:text>
<xsl:value-of select="$currLang"/>
<xsl:text>.xml</xsl:text>
</xsl:variable>
<xsl:apply-templates select="document($cvData})"/>
</xsl:template>
<xsl:template match="language">
<html xmlns="http://www.w3.org/1999/xhtml"
xml:lang="{$currLang}" lang="{$currLang}">
<head>
<title><xsl:value-of select="title"/></title>
<meta http-equiv="Content-type" content="text/html; charset=UTF-8" />
</head>
<body>
<h2><xsl:value-of select="title"/></h2>
<xsl:apply-templates select="skills"/>
<xsl:apply-templates select="education"/>
</body>
</html>
</xsl:template>
<xsl:template match="skills">
<h3><xsl:value-of select="title"/></h3>
<ul>
<xsl:apply-templates select="skill"/>
</ul>
</xsl:template>
<xsl:template match="skill">
<li><xsl:value-of select="."/></li>
</xsl:template>
<xsl:template match="education">
<h3><xsl:value-of select="title"/></h3>
<p>
<xsl:value-of select="school"/>
<xsl:text> </xsl:text>
<xsl:value-of select="graduation"/>
</p>
</xsl:template>
</xsl:stylesheet>

This stylesheet contains a template that handles the document root element ("/"). The separate CV documents have names like cv-lang.xml. E.g. cv-en.xml and cv-ja.xml. The stylesheet variable cvData is initialized with the document name and the document is loaded using XSLT document() function. The rest of the stylesheet is basically the same as in Listing 4.

As we have seen there are a lot of options for organizing multilingual content. Each document structure has its pros and cons. The content-oriented structure is good when you have stable set of languages you need to support. It is easier to keep language variants synchronized in this structure. The drawback is more complex transformation.

Language-oriented structure is strong when you expect the number of supported languages to grow. However, it is more difficult to keep all language versions synchronized. This article should help you decide which will work better for your localization needs, and help you format your XML properly to suit those conditions.

Previous Page: Language-oriented Structure  
Ilari Aarnio is a software architect in Codesys, providing consulting on J2EE, XML technologies and enterprise application integration. Before founding Codesys in 2002, Ilari spent 12 years in various software development and application integration positions, where he developed practical insights into the fundamental issues of software architectures and system solutions. Reach him by e-mail .
Page 1: IntroductionPage 3: Using Separate Documents
Page 2: Language-oriented Structure