Creating Multimodal Applications Using the IBM Multimodal Toolkit

Creating Multimodal Applications Using the IBM Multimodal Toolkit

he merging of computing with our everyday life?through computers, personal digital assistants, cell phones, and a plethora of other gadgets?is driving a trend towards pervasive computing, in which computing is becoming a backdrop to our daily activities. IBM, long a leader in the pervasive computing arena, is one of three firms driving a standard to support multimodal interfaces to mobile devices through the XHTML+Voice markup language, an XML application that leverages XHTML and VoiceXML to provide voice-and-video interfaces to Web applications from desktop and handheld computers. Using tools such as the IBM Multimodal toolkit, you can learn how to extend your application to include voice control and output using a multimodal browser such as the ones available from Access and Opera.

These multimodal interfaces let you differentiate your application from others in two keys ways. First, your application can accept and parse voice input in Web forms, meaning that a user can use your application without needing to resort to a keyboard, mouse, or stylus. Second, your application can provide key pieces of information including warnings, prompts, and search results using voice synthesis, which adds an additional dimension to the presentation of results. Because these technologies leverage XML and the traditional Web browsing paradigm, it’s easy for you to work these features into your Web-based application. There’s very little overhead involved and you needn’t build a sophisticated client-side application using a native platform’s (such as Microsoft Windows or Mac OS X) support for voice recognition and synthesis.

Installation
The IBM Multimodal toolkit requires Windows 2000 or later, and a copy of IBM WebSphere Site Studio or IBM WebSphere Application Studio 5.1.1. Installation is through a clickable installer available from IBM’s Pervasive Computing here.

As you install the IBM Multimodal Toolkit, you also have the option to install one or more multimodal browsers from Access and Opera for testing your application. You should definitely install one or the other (or both!) and also consider downloading a handheld version of the same browser from the IBM Pervasive Computing for your target mobile device (at this point, both Microsoft PocketPC and Sharp Zaurus are supported).

Behind Multimodal Web Applications
With the advent of VoxML (by Motorola) and VoiceXML (a W3C standard), voice applications were some of the first applications to leverage the ubiquity of XML to build speech-oriented, Web-enabled applications. The XHTML+Voice standard?often called simply X+V, a practice I’ll continue here?uses the modular nature of XML to define a markup language suitable for text and voice, including the following modules:

  • XHTML Basic, which provides a grammar for basic text formatting facilities including type face selection and common stylistic formatting options?including bullet, numbered, and definition lists.
  • XML Events, which provide a grammar for managing incoming events and how they interact with voice-interaction behaviors.
  • Voice XML modules provide a grammar for speech-enabling XHTML.
  • An additional, new X+V extension integrates the voice and visual features of the other modules.

All X+V applications use XHTML+Voice as their markup language, and must include the following preamble:

        

If you’re a seasoned XML developer, this won’t give you pause, but I’d like to run through it anyway, because it showcases a key feature of XML that’s not used as often as it should be: namespaces. As in other programming environments, XML supports namespaces so that an XML document can include pieces of other XML definitions with the same name. As the XML shows, X+V documents draw from three disparate namespaces (look at the html tag after the XML !DOCTYPE preamble):

  • The XHTML namespace: XML tags without a namespace prefix are XHTML tags.
  • The XML Event namespace: XML tags with a namespace prefix ev: are XML event tags.
  • The VoiceXML namespace: XML tags with a namespace prefix vxml are VoiceXML tags.

It’s often easiest to start with your site’s visual content, and only after it’s complete incrementally add the voice content. Doing this lets you play from your strength?existing knowledge of XHTML and the problem domain?and after you get the easy stuff out of the way, you can iterate over the voice interface until it’s perfect.

A Sample Application
Let’s take a simple example, a Web application to provide simple weather reports. The baseline XHTML for the location prompt for this application is in Listing 1. It’s a very simple form, which prompts for either the city and state or the zip code of the desired location, returning the content to the server-side Java page submit.jsp. You can see how the page will appear in Figure 1.

Author’s Note: The server-side code doesn’t interest us in this article, because all of the multimodal interface work is being performed on the client-side. If you’re interested in seeing a server-side voice application, see my previous article “Creating Voice Applications Using VoiceXML and the IBM Voice Toolkit“.

Once you create the XHTML?which you can do by hand or using your favorite Web authoring tools?the next thing to do is start adding the voice interface. This resides in the element of your document, giving you a primitive way to separate your content from its presentation.

Figure 1. How’s the Weather? This is the entry page for the sample weather application.

It’s easiest to add the voice content within WebSphere Studio using the Multimodal Toolkit. To add the voice interface, in this example, you must:

  1. Open the X+V file in the WebSphere Studio Editor.
  2. Position the cursor where you want the editor to place the X+V content, at the end of the block (I like to insert a blank line or two to keep things readable around the tags I insert).
  3. Place the VoiceXML tag by pressing control-space and choosing from the Content Assist menu.
  4. Name the VoiceXML tag by giving it an id, so the new tag now reads .
  5. Use the Multimodal Toolkit’s Reusable Dialog Wizard (right-click the source editor and choose the wizard) to select the usamajorcity item.
  6. Edit the resulting to insert the response in the city field of the form by changing the first tag to ‘city’ from ‘VARusmajorcityUSMajorCity.’

After this sequence of events, your element looks like this:

    X+V Weather Demonstration                                                                                       

Note that the voice toolkit has inserted a reference to a pre-built dialog provided by IBM, rather than the dialog itself. It has also inserted some additional code you don’t need to return the utterance as well as the interpreted speech to the server through the last tag. You can choose to comment this out or remove it altogether, unless you’re doing work with a recognizer on the back end (or want to log utterances somewhere in order to investigate complaints about missed recognition events, handy during field tests.) The wizard will have also added text form elements to the document’s form, which you’ll want to remove; you’ll find those in the

block in the document’s body.

You’ve now specified the voice equivalent of an XHTML form element, using the predefined voice form element provided by IBM. The only remaining work is to link the two, so that when the city field has focus, the VoiceXML form element is active. You create this link using XML Events. You can learn more about XML Events here. The event your forms must watch for is the focus event, which the browser provides when its focus changes from one input to another. Each event must also have a handler, which indicates what should be active when the client triggers the event. The XML event handlers are bound to the HTML element which should be associated with the event’s generation. Therefore, you link the text input to the voice form input in the input element, like this:

    

You can see the final bits of code in Listing 2.

The Key Benefits of the IBM Multimodal Toolkit
As you’ve just seen, the IBM Multimodal toolkit provides several advantages over hand-coding your X+V interfaces. First, having multimodal browsers in which to test your code is priceless. Of course, you could download just the appropriate handheld client for your work, but having to switch between your development workstation and your handheld for each bit of testing and debugging is a real chore?and the only other option would be going entirely without. Driving at night in the fog without headlights isn’t my idea of fun and neither is debugging a Web application without a client on my development machine!

Aside from giving you an excellent test tool, the IBM Multimodal toolkit also provides an excellent collection of wizards to speed the coding of common tasks, such as specifying specific kinds of forms input and often-used grammars. Key among these are the prebuilt snippets of X+V code, such as tested dialog components for entering addresses, credit card numbers, URLs, email addresses, and so on. Each of us, as developers, build a library of snippets for such things; why not leverage the library built by a leader in the field?

By continuing to lead and leverage global standards, the IBM Multimodal toolkit is an excellent way to get your feet wet in writing multimodal applications, whether you’re about to deploy a mobile Web solution or just keeping current with the latest trends.

devx-admin

devx-admin

Share the Post:
USA Companies

Top Software Development Companies in USA

Navigating the tech landscape to find the right partner is crucial yet challenging. This article offers a comparative glimpse into the top software development companies

Software Development

Top Software Development Companies

Looking for the best in software development? Our list of Top Software Development Companies is your gateway to finding the right tech partner. Dive in

India Web Development

Top Web Development Companies in India

In the digital race, the right web development partner is your winning edge. Dive into our curated list of top web development companies in India,

USA Web Development

Top Web Development Companies in USA

Looking for the best web development companies in the USA? We’ve got you covered! Check out our top 10 picks to find the right partner

Clean Energy Adoption

Inside Michigan’s Clean Energy Revolution

Democratic state legislators in Michigan continue to discuss and debate clean energy legislation in the hopes of establishing a comprehensive clean energy strategy for the

Chips Act Revolution

European Chips Act: What is it?

In response to the intensifying worldwide technology competition, Europe has unveiled the long-awaited European Chips Act. This daring legislative proposal aims to fortify Europe’s semiconductor

USA Companies

Top Software Development Companies in USA

Navigating the tech landscape to find the right partner is crucial yet challenging. This article offers a comparative glimpse into the top software development companies in the USA. Through a

Software Development

Top Software Development Companies

Looking for the best in software development? Our list of Top Software Development Companies is your gateway to finding the right tech partner. Dive in and explore the leaders in

India Web Development

Top Web Development Companies in India

In the digital race, the right web development partner is your winning edge. Dive into our curated list of top web development companies in India, and kickstart your journey to

USA Web Development

Top Web Development Companies in USA

Looking for the best web development companies in the USA? We’ve got you covered! Check out our top 10 picks to find the right partner for your online project. Your

Clean Energy Adoption

Inside Michigan’s Clean Energy Revolution

Democratic state legislators in Michigan continue to discuss and debate clean energy legislation in the hopes of establishing a comprehensive clean energy strategy for the state. A Senate committee meeting

Chips Act Revolution

European Chips Act: What is it?

In response to the intensifying worldwide technology competition, Europe has unveiled the long-awaited European Chips Act. This daring legislative proposal aims to fortify Europe’s semiconductor supply chain and enhance its

Revolutionized Low-Code

You Should Use Low-Code Platforms for Apps

As the demand for rapid software development increases, low-code platforms have emerged as a popular choice among developers for their ability to build applications with minimal coding. These platforms not

Cybersecurity Strategy

Five Powerful Strategies to Bolster Your Cybersecurity

In today’s increasingly digital landscape, businesses of all sizes must prioritize cyber security measures to defend against potential dangers. Cyber security professionals suggest five simple technological strategies to help companies

Global Layoffs

Tech Layoffs Are Getting Worse Globally

Since the start of 2023, the global technology sector has experienced a significant rise in layoffs, with over 236,000 workers being let go by 1,019 tech firms, as per data

Huawei Electric Dazzle

Huawei Dazzles with Electric Vehicles and Wireless Earbuds

During a prominent unveiling event, Huawei, the Chinese telecommunications powerhouse, kept quiet about its enigmatic new 5G phone and alleged cutting-edge chip development. Instead, Huawei astounded the audience by presenting

Cybersecurity Banking Revolution

Digital Banking Needs Cybersecurity

The banking, financial, and insurance (BFSI) sectors are pioneers in digital transformation, using web applications and application programming interfaces (APIs) to provide seamless services to customers around the world. Rising

FinTech Leadership

Terry Clune’s Fintech Empire

Over the past 30 years, Terry Clune has built a remarkable business empire, with CluneTech at the helm. The CEO and Founder has successfully created eight fintech firms, attracting renowned

The Role Of AI Within A Web Design Agency?

In the digital age, the role of Artificial Intelligence (AI) in web design is rapidly evolving, transitioning from a futuristic concept to practical tools used in design, coding, content writing

Generative AI Revolution

Is Generative AI the Next Internet?

The increasing demand for Generative AI models has led to a surge in its adoption across diverse sectors, with healthcare, automotive, and financial services being among the top beneficiaries. These

Microsoft Laptop

The New Surface Laptop Studio 2 Is Nuts

The Surface Laptop Studio 2 is a dynamic and robust all-in-one laptop designed for creators and professionals alike. It features a 14.4″ touchscreen and a cutting-edge design that is over

5G Innovations

GPU-Accelerated 5G in Japan

NTT DOCOMO, a global telecommunications giant, is set to break new ground in the industry as it prepares to launch a GPU-accelerated 5G network in Japan. This innovative approach will

AI Ethics

AI Journalism: Balancing Integrity and Innovation

An op-ed, produced using Microsoft’s Bing Chat AI software, recently appeared in the St. Louis Post-Dispatch, discussing the potential concerns surrounding the employment of artificial intelligence (AI) in journalism. These

Savings Extravaganza

Big Deal Days Extravaganza

The highly awaited Big Deal Days event for October 2023 is nearly here, scheduled for the 10th and 11th. Similar to the previous year, this autumn sale has already created

Cisco Splunk Deal

Cisco Splunk Deal Sparks Tech Acquisition Frenzy

Cisco’s recent massive purchase of Splunk, an AI-powered cybersecurity firm, for $28 billion signals a potential boost in tech deals after a year of subdued mergers and acquisitions in the

Iran Drone Expansion

Iran’s Jet-Propelled Drone Reshapes Power Balance

Iran has recently unveiled a jet-propelled variant of its Shahed series drone, marking a significant advancement in the nation’s drone technology. The new drone is poised to reshape the regional

Solar Geoengineering

Did the Overshoot Commission Shoot Down Geoengineering?

The Overshoot Commission has recently released a comprehensive report that discusses the controversial topic of Solar Geoengineering, also known as Solar Radiation Modification (SRM). The Commission’s primary objective is to

Remote Learning

Revolutionizing Remote Learning for Success

School districts are preparing to reveal a substantial technological upgrade designed to significantly improve remote learning experiences for both educators and students amid the ongoing pandemic. This major investment, which

Revolutionary SABERS Transforming

SABERS Batteries Transforming Industries

Scientists John Connell and Yi Lin from NASA’s Solid-state Architecture Batteries for Enhanced Rechargeability and Safety (SABERS) project are working on experimental solid-state battery packs that could dramatically change the

Build a Website

How Much Does It Cost to Build a Website?

Are you wondering how much it costs to build a website? The approximated cost is based on several factors, including which add-ons and platforms you choose. For example, a self-hosted