Browse DevX
Sign up for e-mail newsletters from DevX


SALT or VoiceXML For Speech Applications? : Page 2

Competing speech-recognition standards, SALT and VoiceXML, are remarkably similar in what they achieve. But for the developer, there are important distinctions in how each language behaves. Microsoft's Stephen Potter details the technical and philosophical differences between the two so you can choose the right specification for your needs.




Building the Right Environment to Support AI, Machine Learning and Deep Learning

How Do They Work?
SALT focuses on the speech interface, defining a small set of XML elements which are used inside a "host" page of markup, such as XHTML, HTML + SMIL, WML, etc. SALT elements expose a DOM interface, which places them at the disposal of the execution environment of the host markup. So speech input and output is controlled by developer code in whatever environment is supported by the host page, e.g. the scripting module in HTML pages, SMIL 2.0, and so on. Web functionality is also handled by the host page, so page navigation and form submission are written as usual in HTML. SALT also contains built-in declarative mechanisms intended for use in less rich device profiles. SALT's feature set is kept low-level, to allow flexibility of interactional logic and fine-grained control of the speech interface.

VoiceXML provides a larger set of XML elements, since it is intended as a complete, standalone markup. Hence, VoiceXML includes tags for data (forms and fields), control flow, and Web functionality. Speech input and output is controlled by VoiceXML's dedicated execution environment: the Form Interpretation Algorithm (FIA), and ECMAScript can be used at certain points within the page to direct flow. Again, simple dialogs can also be written in a declarative manner. VoiceXML's feature set is at a higher level, encompassing Web functionality and dialog flow. This allows VoiceXML pages to be used alone, and elementary dialogs to be built rapidly by the novice developer.

SALT Dialog Flow Example
Since SALT elements are DOM objects, they expose an interface of properties, events and methods, and can be manipulated accordingly inside the page. Activation will typically follow the event wiring model familiar to many HTML Web developers. A <prompt> element in SALT, for example, exposes an interface which includes the following features:

id property to identify the object; Start() method to begin playback; oncomplete event thrown when playback is complete;

Similarly, the <listen> element is a basic building block of speech recognition. It also has an id and a Start() method, as well as the following:

<grammar> a grammar to recognize speech recognition input <bind> directive to bind the user's response into a control on the page. onreco event thrown on a successful recognition.

This allows code such as the following HTML and SALT fragment:

<html xmlns:salt="http://www.saltforum.org/02/SALT"> <body onload="sayWelcome.Start()"> <form id="PIN" action="checkPIN.html"> <input id="iptPIN" type="text" /> </form> <salt:prompt id="sayWelcome" oncomplete= "askPIN.Start(); recoPIN.Start()"> Welcome to my speech recognition application. </salt:prompt> <salt:prompt id="askPIN"> Please say your password. </salt:prompt> <salt:listen id="recoPIN" onreco="PIN.submit()"> <salt:grammar src="PINdigits.grxml" /> <salt:bind targetElement="iptPIN" /> </salt:listen> </body> </html>

This sample plays a simple welcome prompt (sayWelcome), then asks for a password (askPassword) and simultaneously activates the <listen> element named recoPassword. When recognition is successful, the bind copies the response into the iptPIN textbox, and the onreco event handler submits the HTML form to the Web server.

The example shows simple event wiring for interactional flow. For more complex SALT dialogs, you would probably use script functions and reusable blocks of code across SALT pages and applications. But script isn't always necessary: another way to activate prompts and listen elements would be to use SMIL 2.0 (see the SALT specification for an example), or, on small devices, the declarative mechanisms available through data and event binding.

<chart> <Type>1</Type> <Title>Points By Player</Title> <CatTitle>Player Name</CatTitle> <ValTitle>Points</ValTitle> <SeriesTitles>Points Scored</SeriesTitles> </chart>

Comment and Contribute






(Maximum characters: 1200). You have 1200 characters left.



Thanks for your registration, follow us on our social networks to keep up-to-date