RSS Feed
Download our iPhone app
Browse DevX
Sign up for e-mail newsletters from DevX


Using YAML to Decrease Data Transfer Bandwidth Requirements

This new alternative to XML ain't markup language, but for data serialization, it serves the same purpose, albeit with greater brevity!

ML is a wonderful thing. It has empowered a whole new class of application—loosely-coupled Web services cooperating to form applications—with XML being the glue that binds them together through well-known, easy to parse documents of data or well-known easy to understand commands in SOAP (an XML variant).

The power behind XML lies in the fact that XML data is both well-structured and (to some degree) self-describing using tag names and attributes. Coupled with the availability of powerful parsers to deserialize XML documents, and XML schemas that let you define how your XML should appear, validating parsers can "prove" that an XML document is a 'good' document that meets the schema criteria.

But XML has a size problem. The simplest XML document can look something like this:

   <node item="value" />
Even a minimal example uses many characters to represent a simple value, such as:
   <score value="1" />
The preceding line uses 19 characters to store a text representation of the integer value 1. And that doesn't include the open and close tags for the document, nor any schema references or other tags that may be necessary.

With XML, increased usability tends to lead towards increased file size, particularly when that also involves schemas, taxonomies, XLINK pointers, rollups, etc. Possibly the single most important use of XML in the future will be XBRL (eXtensible Business Reporting Language) which is revolutionizing the way that businesses interpret financial information—but XBRL carries incredible overhead. Take a look at this Microsoft SEC filing in XBRL as an example. The ratio of overhead to content in that link has to be at least 5:1.

Solving the XML overhead problem is where YAML (which stands for "YAML Ain't Markup Language") is attempting to carve a niche. There are many cases, particularly for smaller, simpler, well-known data documents where XML's high overhead is unnecessary, and its bandwidth expense can be prohibitive. Many Web sites run by smaller companies have caps on their bandwidth allotment that they don't want to waste. For them, YAML can provide a great alternative.

Comparing YAML and XML
You should note that YAML isn't intended to compete with XML, as there is no direct correlation between them. Instead, YAML is intended primarily as a data serialization language. It doesn't have the overhead that XML has because it isn't designed to have the backward compatibility that XML's designers wanted XML to have. In addition, while XML is designed to support generalized structured documents, YAML is targeted specifically at data structures and messaging. There are ongoing efforts to define XML/YAML mappings, and a good resource to find them is http://yaml.org/xml.html.

In XML you create a document using hierarchical tags and child tags to describe data. A simple XML document could look something like:

     <day date="1-1-2004">
     <day date="1-2-2004">

In YAML, the same information could be rendered as:

   Day: "1 January 2004"
     - open: 20.00
     - close: 21.23
     - high: 21.34
     - low: 19.92
   Day: "2 January 2004"
     - open: 20.00
     - close: 21.23
     - high: 21.34
     - low: 19.92

Not only is this less verbose, but it's also easier for humans to read. In this article you'll learn how to turn your data into YAML, and use an open source Java parser to read it.

Close Icon
Thanks for your registration, follow us on our social networks to keep up-to-date