Browse DevX
Sign up for e-mail newsletters from DevX


Well-structured XML Goes Cosmopolitan : Page 3

Everyone knows that among its many uses, XML can store the same data in multiple languages and dish it out like a U.N. translator when the computers of the world come calling. What everyone doesn't know is how best to structure that polyglot of data. There's more than one way




Building the Right Environment to Support AI, Machine Learning and Deep Learning

Using Separate Documents
You can push this kind of structural efficiency even farther, if you like, by separating each disparate language section in your original XML document, into standalone documents instead. Listing 5 shows this ultimate separation solution for our example. Each section is a different language element that should be separated into a unique document.

Listing 5. Separating each language into its own document.

First, the English version. <?xml version="1.0" encoding="UTF-8"?> <cv> <language xml:lang="en"> <title>Curriculum Vitae</title> <skills> <title>Skills</title> <skill>Project management</skill> </skills> <education> <title>Education</title> <school>University of Technology</school> <graduation>1993</graduation> </education> </language> </cv> In a separate document, the Finnish. <?xml version="1.0" encoding="UTF-8"?> <cv> <language xml:lang="fi"> <title>Ansioluettelo</title> <skills> <title>Taidot</title> <skill>Projektinhallinta</skill> </skills> <education> <title>Koulutus</title> <school>Teknillinen korkeakoulu</school> <graduation>1993</graduation> </education> </language> </cv> In yet another document, the Japanese, and so on. <?xml version="1.0" encoding="UTF-8"?> <cv> <language xml:lang="ja"> <title>履歴書</title> <skills> <title>技術</title> <skill>プロジェクト管理</skill> </skills> <education> <title>教育</title> <school>技術の大学</school> <graduation>1993</graduation> </education> </language> </cv>

Listing 6 utilizes these language-specific XML documents from Listing 5 and produces the resume in localized XHTML format.

Listing 6. XSL stylesheet to produce the localized language version from separate XML documents.

<?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns="http://www.w3.org/1999/xhtml"> <xsl:param name="currLang">en</xsl:param> <xsl:output method="xml" encoding="UTF-8" indent="yes" doctype-public="-//W3C//DTD XHTML 1.0 Strict//EN"/> <xsl:template match="/" <xsl:variable name="cvData"> <xsl:text>cv-</xsl:text> <xsl:value-of select="$currLang"/> <xsl:text>.xml</xsl:text> </xsl:variable> <xsl:apply-templates select="document($cvData})"/> </xsl:template> <xsl:template match="language"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="{$currLang}" lang="{$currLang}"> <head> <title><xsl:value-of select="title"/></title> <meta http-equiv="Content-type" content="text/html; charset=UTF-8" /> </head> <body> <h2><xsl:value-of select="title"/></h2> <xsl:apply-templates select="skills"/> <xsl:apply-templates select="education"/> </body> </html> </xsl:template> <xsl:template match="skills"> <h3><xsl:value-of select="title"/></h3> <ul> <xsl:apply-templates select="skill"/> </ul> </xsl:template> <xsl:template match="skill"> <li><xsl:value-of select="."/></li> </xsl:template> <xsl:template match="education"> <h3><xsl:value-of select="title"/></h3> <p> <xsl:value-of select="school"/> <xsl:text> </xsl:text> <xsl:value-of select="graduation"/> </p> </xsl:template> </xsl:stylesheet>

This stylesheet contains a template that handles the document root element ("/"). The separate CV documents have names like cv-lang.xml. E.g. cv-en.xml and cv-ja.xml. The stylesheet variable cvData is initialized with the document name and the document is loaded using XSLT document() function. The rest of the stylesheet is basically the same as in Listing 4.

As we have seen there are a lot of options for organizing multilingual content. Each document structure has its pros and cons. The content-oriented structure is good when you have stable set of languages you need to support. It is easier to keep language variants synchronized in this structure. The drawback is more complex transformation.

Language-oriented structure is strong when you expect the number of supported languages to grow. However, it is more difficult to keep all language versions synchronized. This article should help you decide which will work better for your localization needs, and help you format your XML properly to suit those conditions.

Ilari Aarnio is a software architect in Codesys, providing consulting on J2EE, XML technologies and enterprise application integration. Before founding Codesys in 2002, Ilari spent 12 years in various software development and application integration positions, where he developed practical insights into the fundamental issues of software architectures and system solutions. Reach him by e-mail .
Comment and Contribute






(Maximum characters: 1200). You have 1200 characters left.



Thanks for your registration, follow us on our social networks to keep up-to-date