Browse DevX
Sign up for e-mail newsletters from DevX


Creating Multimodal Applications Using the IBM Multimodal Toolkit : Page 2

As computing becomes more pervasive, different kinds of user input such as voice are required. Get a head start on learning how to include client-side voice technology using the IBM Multimodal Toolkit.




Building the Right Environment to Support AI, Machine Learning and Deep Learning

Behind Multimodal Web Applications
With the advent of VoxML (by Motorola) and VoiceXML (a W3C standard), voice applications were some of the first applications to leverage the ubiquity of XML to build speech-oriented, Web-enabled applications. The XHTML+Voice standard—often called simply X+V, a practice I’ll continue here—uses the modular nature of XML to define a markup language suitable for text and voice, including the following modules:
  • XHTML Basic, which provides a grammar for basic text formatting facilities including type face selection and common stylistic formatting options—including bullet, numbered, and definition lists.
  • XML Events, which provide a grammar for managing incoming events and how they interact with voice-interaction behaviors.
  • Voice XML modules provide a grammar for speech-enabling XHTML.
  • An additional, new X+V extension integrates the voice and visual features of the other modules.
All X+V applications use XHTML+Voice as their markup language, and must include the following preamble:

<?xml version="1.0" encoding="iso-8859-1"?> <!DOCTYPE html PUBLIC "-//VoiceXML Forum//DTD XHTML+Voice 1.2//EN" "http://www.voicexml.org/specs/multimodal/x+v/12/dtd/xhtml+voice12.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xmlns:ev="http://www.w3.org/2001/xml-events" xmlns:vxml="http://www.w3.org/2001/vxml" xml:lang="en-US"> <!—Your content goes here--> </html>

If you're a seasoned XML developer, this won't give you pause, but I'd like to run through it anyway, because it showcases a key feature of XML that's not used as often as it should be: namespaces. As in other programming environments, XML supports namespaces so that an XML document can include pieces of other XML definitions with the same name. As the XML shows, X+V documents draw from three disparate namespaces (look at the html tag after the XML !DOCTYPE preamble):
  • The XHTML namespace: XML tags without a namespace prefix are XHTML tags.
  • The XML Event namespace: XML tags with a namespace prefix ev: are XML event tags.
  • The VoiceXML namespace: XML tags with a namespace prefix vxml are VoiceXML tags.
It's often easiest to start with your site's visual content, and only after it's complete incrementally add the voice content. Doing this lets you play from your strength—existing knowledge of XHTML and the problem domain—and after you get the easy stuff out of the way, you can iterate over the voice interface until it's perfect.

Thanks for your registration, follow us on our social networks to keep up-to-date