Behind Multimodal Web Applications
With the advent of VoxML (by Motorola) and VoiceXML (a W3C standard), voice applications were some of the first applications to leverage the ubiquity of XML to build speech-oriented, Web-enabled applications. The XHTML+Voice standardoften called simply X+V, a practice I’ll continue hereuses the modular nature of XML to define a markup language suitable for text and voice, including the following modules:
- XHTML Basic, which provides a grammar for basic text formatting facilities including type face selection and common stylistic formatting optionsincluding bullet, numbered, and definition lists.
- XML Events, which provide a grammar for managing incoming events and how they interact with voice-interaction behaviors.
- Voice XML modules provide a grammar for speech-enabling XHTML.
- An additional, new X+V extension integrates the voice and visual features of the other modules.
All X+V applications use XHTML+Voice as their markup language, and must include the following preamble:
<?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE html PUBLIC "-//VoiceXML Forum//DTD XHTML+Voice 1.2//EN"
<!—Your content goes here-->
If you're a seasoned XML developer, this won't give you pause, but I'd like to run through it anyway, because it showcases a key feature of XML that's not used as often as it should be: namespaces. As in other programming environments, XML supports namespaces so that an XML document can include pieces of other XML definitions with the same name. As the XML shows, X+V documents draw from three disparate namespaces (look at the html tag after the XML !DOCTYPE
- The XHTML namespace: XML tags without a namespace prefix are XHTML tags.
- The XML Event namespace: XML tags with a namespace prefix ev: are XML event tags.
- The VoiceXML namespace: XML tags with a namespace prefix vxml are VoiceXML tags.
It's often easiest to start with your site's visual content, and only after it's complete incrementally add the voice content. Doing this lets you play from your strengthexisting knowledge of XHTML and the problem domainand after you get the easy stuff out of the way, you can iterate over the voice interface until it's perfect.