ALT and VoiceXML are both markup languages for writing applications that use voice input and/or output. Both languages were developed by industry consortia (the SALT Forum and the VoiceXML Forum, respectively), and both were contributed to W3C as part of their ongoing work on speech standards.
So why two specifications? Mainly because they were designed to address different needs, and they were designed at different stages in the life cycle of the Web. VoiceXML arose out of a need to define a markup language for over-the-telephone dialogs?Interactive Voice Response, or IVR, applications?and at a time (1999) when many pieces of the Web infrastructure as we know it today had not matured. SALT arose out of the need to enable speech across a wider range of devices, from telephones to PDAs to desktop PCs, and to allow telephony (voice-only) and multimodal (combined voice and visual) dialogs. SALT was also designed at a time (2002) when many key Web technologies have become well-established (XML, DOM, XPath, etc.).
I will declare my interest here: I represent Microsoft in the SALT Forum’s Technical Working Group. However, I have studied SALT and VoiceXML in depth, and will use this forum to take an objective look at the two specifications, and point out the main technical differences between them in an unbiased way. You can decide for yourself which specification is most suitable for your applications. (See Sidebar: Developer Communities)How Do They Work?
SALT focuses on the speech interface, defining a small set of XML elements which are used inside a “host” page of markup, such as XHTML, HTML + SMIL, WML, etc. SALT elements expose a DOM interface, which places them at the disposal of the execution environment of the host markup. So speech input and output is controlled by developer code in whatever environment is supported by the host page, e.g. the scripting module in HTML pages, SMIL 2.0, and so on. Web functionality is also handled by the host page, so page navigation and form submission are written as usual in HTML. SALT also contains built-in declarative mechanisms intended for use in less rich device profiles. SALT’s feature set is kept low-level, to allow flexibility of interactional logic and fine-grained control of the speech interface.
VoiceXML provides a larger set of XML elements, since it is intended as a complete, standalone markup. Hence, VoiceXML includes tags for data (forms and fields), control flow, and Web functionality. Speech input and output is controlled by VoiceXML’s dedicated execution environment: the Form Interpretation Algorithm (FIA), and ECMAScript can be used at certain points within the page to direct flow. Again, simple dialogs can also be written in a declarative manner. VoiceXML’s feature set is at a higher level, encompassing Web functionality and dialog flow. This allows VoiceXML pages to be used alone, and elementary dialogs to be built rapidly by the novice developer.
SALT Dialog Flow Example
Since SALT elements are DOM objects, they expose an interface of properties, events and methods, and can be manipulated accordingly inside the page. Activation will typically follow the event wiring model familiar to many HTML Web developers. A
id property to identify the object;Start() method to begin playback;oncomplete event thrown when playback is complete;
Similarly, the
a grammar to recognize speech recognition input directive to bind the user's response into a control on the page. onreco event thrown on a successful recognition.
This allows code such as the following HTML and SALT fragment:
Welcome to my speech recognition application. Please say your password.
This sample plays a simple welcome prompt (sayWelcome), then asks for a password (askPassword) and simultaneously activates the
The example shows simple event wiring for interactional flow. For more complex SALT dialogs, you would probably use script functions and reusable blocks of code across SALT pages and applications. But script isn’t always necessary: another way to activate prompts and listen elements would be to use SMIL 2.0 (see the SALT specification for an example), or, on small devices, the declarative mechanisms available through data and event binding.
1 Points By Player Player Name Points Points Scored
VoiceXML Dialog Flow Example
In contrast, VoiceXML applies its own page interpretation mechanisms (the Form Interpretation Algorithm, or FIA) and programmatic elements to conduct dialog flow. A