Browse DevX
Sign up for e-mail newsletters from DevX


Creating Multimodal Applications Using the IBM Multimodal Toolkit : Page 3

As computing becomes more pervasive, different kinds of user input such as voice are required. Get a head start on learning how to include client-side voice technology using the IBM Multimodal Toolkit.




Building the Right Environment to Support AI, Machine Learning and Deep Learning

A Sample Application
Let's take a simple example, a Web application to provide simple weather reports. The baseline XHTML for the location prompt for this application is in Listing 1. It's a very simple form, which prompts for either the city and state or the zip code of the desired location, returning the content to the server-side Java page submit.jsp. You can see how the page will appear in Figure 1.

Author's Note: The server-side code doesn't interest us in this article, because all of the multimodal interface work is being performed on the client-side. If you're interested in seeing a server-side voice application, see my previous article "Creating Voice Applications Using VoiceXML and the IBM Voice Toolkit".

Once you create the XHTML—which you can do by hand or using your favorite Web authoring tools—the next thing to do is start adding the voice interface. This resides in the <head> element of your document, giving you a primitive way to separate your content from its presentation.

Figure 1. How's the Weather? This is the entry page for the sample weather application.

It's easiest to add the voice content within WebSphere Studio using the Multimodal Toolkit. To add the voice interface, in this example, you must:

  1. Open the X+V file in the WebSphere Studio Editor.
  2. Position the cursor where you want the editor to place the X+V content, at the end of the <head> block (I like to insert a blank line or two to keep things readable around the tags I insert).
  3. Place the VoiceXML tag by pressing control-space and choosing <vxml:form> from the Content Assist menu.
  4. Name the VoiceXML tag by giving it an id, so the new tag now reads <vxml:form id="city_form">.
  5. Use the Multimodal Toolkit's Reusable Dialog Wizard (right-click the source editor and choose the wizard) to select the usamajorcity item.
  6. Edit the resulting <vxml:form> to insert the response in the city field of the form by changing the first <vxml:assign> tag to 'city' from 'VARusmajorcityUSMajorCity.'
After this sequence of events, your <head> element looks like this:

<head> <title>X+V Weather Demonstration</title> <vxml:form id="city_vform"> <vxml:subdialog name="usmajorcity" src="reusable_comp/XML/subdialogs/usmajorcity/en_US/usmajorcity.mxml#formUSMajorCity"> <vxml:param name="paramSubdialogObj" expr="objEn_USUsmajorcityDef_XML" /> <vxml:filled> <!-- To hear return values uncomment the following lines --> <!-- <vxml:value expr="usmajorcity.returnUSMajorCity"/> --> <!-- <vxml:value expr="usmajorcity.returnUSMajorCityUtterance"/> --> <vxml:assign name="document.getElementById('city').value" expr="usmajorcity.returnUSMajorCity" /> <vxml:assign name="document.getElementById('VARusmajorcityUSMajorCityUtterance').value" expr="usmajorcity.returnUSMajorCityUtterance" /> </vxml:filled> </vxml:subdialog> </vxml:form> </head>

Note that the voice toolkit has inserted a reference to a pre-built dialog provided by IBM, rather than the dialog itself. It has also inserted some additional code you don’t need to return the utterance as well as the interpreted speech to the server through the last <vxml:assign> tag. You can choose to comment this out or remove it altogether, unless you're doing work with a recognizer on the back end (or want to log utterances somewhere in order to investigate complaints about missed recognition events, handy during field tests.) The wizard will have also added text form elements to the document’s form, which you'll want to remove; you'll find those in the <form> block in the document's body.

You've now specified the voice equivalent of an XHTML form element, using the predefined voice form element provided by IBM. The only remaining work is to link the two, so that when the city field has focus, the VoiceXML form element is active. You create this link using XML Events. You can learn more about XML Events here. The event your forms must watch for is the focus event, which the browser provides when its focus changes from one input to another. Each event must also have a handler, which indicates what should be active when the client triggers the event. The XML event handlers are bound to the HTML element which should be associated with the event’s generation. Therefore, you link the text input to the voice form input in the input element, like this:

<td align="left"> <input type="text" id="city" name="city" value="Los Gatos" size="25" ev:event="focus" ev:handler="#city_form"/> </td>

You can see the final bits of code in Listing 2.

Thanks for your registration, follow us on our social networks to keep up-to-date