Browse DevX
Sign up for e-mail newsletters from DevX


SALT or VoiceXML For Speech Applications? : Page 3

Competing speech-recognition standards, SALT and VoiceXML, are remarkably similar in what they achieve. But for the developer, there are important distinctions in how each language behaves. Microsoft's Stephen Potter details the technical and philosophical differences between the two so you can choose the right specification for your needs.




Building the Right Environment to Support AI, Machine Learning and Deep Learning

VoiceXML Dialog Flow Example
In contrast, VoiceXML applies its own page interpretation mechanisms (the Form Interpretation Algorithm, or FIA) and programmatic elements to conduct dialog flow. A <form> is composed of <blocks> or <fields>, and <fields> contain <prompts> and/or <grammars>. Navigation from <form> to <form> and page to page is coded by <goto> elements. Navigation within the form is provided by the FIA which 'visits' the fields individually until they contain values. Processing blocks are available at certain points in execution. For example, the <filled> element inside a <form> or <field> is used to say what to do when the form is complete or the field has a value. Navigational manipulation can be effected by snippets of ECMScript inside cond attributes (condition) on certain elements, or in conditional elements <if>, <elseif>, etc. in the processing blocks.

The interaction is accomplished in VoiceXML using the following code (all the other functionality is the same as in the SALT example):

<vxml version="2.0" xmlns="http://www.w3.org/2001/vxml"> <form id="sayWelcome"> <block> Welcome to my speech recognition application. <goto next="#askPIN" /> </block> </form> <form id="PIN" > <field id="iptPIN"> <prompt> Please say your password. </prompt> <grammar src="PINdigits.grxml" /> </field> <filled> <submit next="checkPIN.vxml"/> </filled> </form> </vxml>

As you can see, in contrast to the explicit event-driven model of SALT, VoiceXML uses an implicit page execution model. When forms contain more than a single field, VoiceXML's FIA allows the writing of elementary dialogs in a largely declarative manner. VoiceXML also contains a <subdialog> mechanism which can be useful for embedding a form-filling dialog from one page inside another.

The differences shown in these examples result largely from the goals of each markup.

In VoiceXML, the <form> provides a unit which contains both the data model (the fields) and a built-in way to navigate the model (the Form Interpretation Algorithm, or FIA). This allows you to build form-filling dialogs which follow a 'system-initiative' control model, that is, dialogs for which the system prompts the user for every piece of information. The FIA also allows a degree of simple 'mixed initiative' control, where the user is a little freer to provide extra information when the form is first visited. This is useful in the initial design stages of IVR-style telephony dialogs.

In SALT, you use the data model and execution environment of the host environment (eg HTML forms and scripting). This is typically more familiar to the today's Web developer. It also provides a flexible way to write and tune dialogs, so that complex dialogs, including mixed-initiative and user-initiative dialogs, are firmly under developer's control. An event-driven interaction model is also generally more useful for multimodal applications.

Stephen Potter is a Program Manager in the .NET Speech Technologies Group at Microsoft Corp. and serves as chair of the Technical Working Group of the Speech Application Language Tags (SALT) Forum. In his position at Microsoft, Stephen works on the development of the .NET Speech platform and the .NET Speech Software Development Kit (SDK) developer tools. As chair of the Technical Working Group of the SALT Forum, he oversaw the development and completion of the SALT 1.0 specification, which was released in July 2002. He can be reached at stephen_potter@hotmail.com.
Comment and Contribute






(Maximum characters: 1200). You have 1200 characters left.



Thanks for your registration, follow us on our social networks to keep up-to-date