Creating Voice Applications Using VoiceXML and the IBM Voice Toolkit

Creating Voice Applications Using VoiceXML and the IBM Voice Toolkit

‘ll be honest. I’ve always been a sucker for voice-navigated applications. At the first startup I worked at, two of us were given Macintosh Quadras as development workstations, and we spent more office time the first week playing with the speech recognition engine than actually cutting code. A good voice application (which admittedly the Finder running on the Quadra wasn’t!) is a dream to use, freeing users from the traditional ball-and-chain of a keyboard and monitor. Unfortunately, until recently, the tools to build such an application were out of reach of all but a few. The IBM alphaWorks Voice Toolkit preview puts professional tools for developing voice applications in the hands of every developer, through evaluation or professional licenses of the Rational Software Development Platform and WebSphere Studio.

Installation
Installation is straightforward, albeit slow, unless you’re already running the Rational and WebSphere Studio suites. Your development workstation must be running Windows 2000 or Windows XP, and in addition to downloading the IBM Voice Toolkit, you will need to download evaluation or professional versions of both the Rational and WebSphere suites?a minimal install can span almost two gigabytes. Installing the tool chain is a multi-step process; the download consists of images combined by an extraction program which creates CD images, from which you then install the necessary tool chain and finally the IBM Voice Toolkit. Mercifully, you do not have to burn the CD images to disk first!

What You Can Do with the IBM Voice Toolkit
The toolkit is actually everything you need to get a prototype of a voice application up and running, from a front-end call simulator which lets you emulate incoming calls from a call center, to tools for editing grammar and application flow, all packaged as plug-ins for the Rational Integrated Development environment. Once you finish the prototyping, you can either transition to a production-ready platform using WebSphere Studio and the WebSphere Voice Response system, used to answer incoming calls from users and optionally originate outgoing calls to users.

The actual application development process varies, depending on the kind of voice application you’re creating, but relies heavily on your knowledge of VoiceXML and integration with your back-end Web infrastructure using Java and HTTP.

Introducing VoiceXML
While you’re downloading and processing the installer images, it’s a good time to brush up?or learn?VoiceXML, the XML application at the heart of applications developed with the toolkit. VoiceXML is at the heart of the user interface for voice applications, much as the widget library of choice is at the heart of a traditional GUI. Written in XML, the syntax should be familiar to you?all you need to know are the tags used in VoiceXML. Here’s a simple example:

 
Hello, world!

The structure of this document should be familiar. After the XML preamble, which specifies the XML version, the encoding scheme, and the document type (which is a VoiceXML 2.0 document meeting the 2.0 DTD generated by the IBM Voice Toolkit), the document itself follows. This document consists of a single form?the top-level entity in VoiceXML. This form has a single block, a spoken segment that does not require user input.

In the VoiceXML paradigm, your user interface is modeled as a finite state machine; inputs and outputs are individual states, expressed as forms. Some forms, like the one in the previous example, are output-only, directing the application to speak to the user. Others are input/output forms, with fields that the user populates through speech (called utterances in VoiceXML documentation). Forms can be named, and the execution through a path of forms can be driven by the VoiceXML content itself, as you can see from Listing 1.

Listing 1’s admittedly an artificial example?short of a talking vending machine or the matter generator in Star Trek, there’s little use for a speech interface to serve coffee drinks?but it points to many of the key aspects of VoiceXML programming.

Starting at the top of the listing, you see how to declare variables of global scope using the tag. Note that when setting a variable, if you want to specify a literal value, you must include it in single quotes. Thus, the skipintro variable is being set to the string ‘play’?if I omit the single quotes, it instead sets skipintro to the value of the variable play.

The next two blocks consist of top-level links, which manage utterances that you can use at any point in the application. The first indicates that if you say either “Menu” or “Start over,” the application will restart. The second indicates that if you say “Goodbye” or “Exit” that the application will exit.

The application itself consists of a series of forms. Navigation between the forms occurs using the goto tag, which simply references the inline name of the destination form. The goto tag can also reference an entirely separate document; you’d simply specify the URL of the destination document; you can use this mechanism to chain to other VoiceXML documents or trigger server-side scripts that return new dynamic VoiceXML content. Note that the first form uses an if-then construct to skip the introductory text if needed, such as when the top-level “Start Over” action is taken.

The second form takes a single input, the kind of beverage you want to order. This is an example of the use of a prompt tag, which causes the application to prompt the user for input and pause until it’s received. You must accompany a prompt tag with a grammar tag that indicates valid responses to the query; the application server uses these to tune the recognizer and determine the appropriate course of action. The grammar in Listing 1 is simple but representative; it outlines a series of responses and uses the tag tag to indicate to what class a specific response belongs (i.e., chai is a type of tea). This use of tags can be very helpful in applications where responses are really selecting types of things, or to map a group of synonyms to a single response.

Prompt tags would be useless without the ability to respond to user input. The filled tag, in conjunction with an if-then tag, provides you with a way to act on the user’s response to the prompt. This tag lets you set variables or document properties to the class of the response, or the actual value of the utterance made by the user. You can also execute conditionals based on these values, selecting the next form to be played based on the content of the utterance or the class of the response.

The final forms process the selection you made from the menu prompt, and show you how to include the value of a VoiceXML variable in the context of a body or prompt by using the value tag by specifying the variable to evaluate.

VoiceXML has many other facets beyond what can be covered here. For example, you can imbed references to specific recorded sound samples?such as alert tones or prerecorded speech?to be played during specific states of your application. You can also imbed pieces of Speech Synthesis Markup Language (SSML) within your VoiceXML application, letting you fine-tune the pronunciation and emphasis of specific voice prompts. And, of course, VoiceXML is fully internationalized; its implementation in the IBM Voice Toolkit supports most of the world’s major languages for industrialized nations.

Integrate Your VoiceXML and Your Existing Services with Java
The combination of VoiceXML and an application server like IBM’s is interesting, but it’s not the whole story. With VoiceXML and server-side Java, you can do a lot?building exactly the same sorts of applications you build today using XHTML and server-side scripting. The WebSphere Voice Response API takes things further, letting you initiate calls and perform programmatic actions hard-to-do with VoiceXML and server-side scripting alone.

The Voice Response API is based around the notion of a voice application, encapsulated in a WVRApplication class. This class has its own entry point, voiceMain, from which you can determine the characteristics of the current connection with an end user through a Call object. You also have access to a WVR (presumably this stands for WebSphere Voice Response) object, which lets you make and receive calls and handle individual voice segments. In fact, it’s entirely possible to code an entire application using just the Voice Response API and the WebSphere Voice Response System, but you really shouldn’t do that; using VoiceXML to encapsulate as much of your user interface as possible makes localization and extension much easier?just as separating a Web site’s style directives from its data does. In point of fact, as you look at the WebSphere Voice Response API, it becomes pretty clear that the API itself is either a wrapper around the internals of the Voice Response System, or the significant parts of the foundation of the WebSphere Voice Response system, depending on your point of view.

Use of the Voice Response API is fairly simple and clearly documented; the voice toolkit has some excellent tutorials that walk you through the gamut of interfaces available. Where the Voice Response API shines is when you must integrate a Web application with outgoing calls, such as database triggers. For example, an outside plant management application might use a database trigger and the Voice Response API to call the cell phone of a maintenance worker when a failure is detected in an automated system, and then actually describe the failure over the call.

A Soup-to-nuts Environment
The IBM Voice Toolkit preview is interesting not just in what it offers through its support of the latest standards in voice applications, but in its integration with a world-class development and deployment platform. It provides a soup-to-nuts environment for building voice applications, with plenty of help along the way (for example, there are graphical editors to ease the writing of the grammar segments of your VoiceXML, and a way to execute a VoiceXML file and interact with it right from the Rational IDE). It’s an excellent way to develop and deploy a voice application, or, if you’re an independent developer curious to find out what really happens when you call your local credit card company, build a voice application prototype of your own.

devx-admin

devx-admin

Share the Post:
Global Layoffs

Tech Layoffs Are Getting Worse Globally

Since the start of 2023, the global technology sector has experienced a significant rise in layoffs, with over 236,000 workers being let go by 1,019

Cybersecurity Banking Revolution

Digital Banking Needs Cybersecurity

The banking, financial, and insurance (BFSI) sectors are pioneers in digital transformation, using web applications and application programming interfaces (APIs) to provide seamless services to

FinTech Leadership

Terry Clune’s Fintech Empire

Over the past 30 years, Terry Clune has built a remarkable business empire, with CluneTech at the helm. The CEO and Founder has successfully created

The Role Of AI Within A Web Design Agency?

In the digital age, the role of Artificial Intelligence (AI) in web design is rapidly evolving, transitioning from a futuristic concept to practical tools used

Global Layoffs

Tech Layoffs Are Getting Worse Globally

Since the start of 2023, the global technology sector has experienced a significant rise in layoffs, with over 236,000 workers being let go by 1,019 tech firms, as per data

Huawei Electric Dazzle

Huawei Dazzles with Electric Vehicles and Wireless Earbuds

During a prominent unveiling event, Huawei, the Chinese telecommunications powerhouse, kept quiet about its enigmatic new 5G phone and alleged cutting-edge chip development. Instead, Huawei astounded the audience by presenting

Cybersecurity Banking Revolution

Digital Banking Needs Cybersecurity

The banking, financial, and insurance (BFSI) sectors are pioneers in digital transformation, using web applications and application programming interfaces (APIs) to provide seamless services to customers around the world. Rising

FinTech Leadership

Terry Clune’s Fintech Empire

Over the past 30 years, Terry Clune has built a remarkable business empire, with CluneTech at the helm. The CEO and Founder has successfully created eight fintech firms, attracting renowned

The Role Of AI Within A Web Design Agency?

In the digital age, the role of Artificial Intelligence (AI) in web design is rapidly evolving, transitioning from a futuristic concept to practical tools used in design, coding, content writing

Generative AI Revolution

Is Generative AI the Next Internet?

The increasing demand for Generative AI models has led to a surge in its adoption across diverse sectors, with healthcare, automotive, and financial services being among the top beneficiaries. These

Microsoft Laptop

The New Surface Laptop Studio 2 Is Nuts

The Surface Laptop Studio 2 is a dynamic and robust all-in-one laptop designed for creators and professionals alike. It features a 14.4″ touchscreen and a cutting-edge design that is over

5G Innovations

GPU-Accelerated 5G in Japan

NTT DOCOMO, a global telecommunications giant, is set to break new ground in the industry as it prepares to launch a GPU-accelerated 5G network in Japan. This innovative approach will

AI Ethics

AI Journalism: Balancing Integrity and Innovation

An op-ed, produced using Microsoft’s Bing Chat AI software, recently appeared in the St. Louis Post-Dispatch, discussing the potential concerns surrounding the employment of artificial intelligence (AI) in journalism. These

Savings Extravaganza

Big Deal Days Extravaganza

The highly awaited Big Deal Days event for October 2023 is nearly here, scheduled for the 10th and 11th. Similar to the previous year, this autumn sale has already created

Cisco Splunk Deal

Cisco Splunk Deal Sparks Tech Acquisition Frenzy

Cisco’s recent massive purchase of Splunk, an AI-powered cybersecurity firm, for $28 billion signals a potential boost in tech deals after a year of subdued mergers and acquisitions in the

Iran Drone Expansion

Iran’s Jet-Propelled Drone Reshapes Power Balance

Iran has recently unveiled a jet-propelled variant of its Shahed series drone, marking a significant advancement in the nation’s drone technology. The new drone is poised to reshape the regional

Solar Geoengineering

Did the Overshoot Commission Shoot Down Geoengineering?

The Overshoot Commission has recently released a comprehensive report that discusses the controversial topic of Solar Geoengineering, also known as Solar Radiation Modification (SRM). The Commission’s primary objective is to

Remote Learning

Revolutionizing Remote Learning for Success

School districts are preparing to reveal a substantial technological upgrade designed to significantly improve remote learning experiences for both educators and students amid the ongoing pandemic. This major investment, which

Revolutionary SABERS Transforming

SABERS Batteries Transforming Industries

Scientists John Connell and Yi Lin from NASA’s Solid-state Architecture Batteries for Enhanced Rechargeability and Safety (SABERS) project are working on experimental solid-state battery packs that could dramatically change the

Build a Website

How Much Does It Cost to Build a Website?

Are you wondering how much it costs to build a website? The approximated cost is based on several factors, including which add-ons and platforms you choose. For example, a self-hosted

Battery Investments

Battery Startups Attract Billion-Dollar Investments

In recent times, battery startups have experienced a significant boost in investments, with three businesses obtaining over $1 billion in funding within the last month. French company Verkor amassed $2.1

Copilot Revolution

Microsoft Copilot: A Suit of AI Features

Microsoft’s latest offering, Microsoft Copilot, aims to revolutionize the way we interact with technology. By integrating various AI capabilities, this all-in-one tool provides users with an improved experience that not

AI Girlfriend Craze

AI Girlfriend Craze Threatens Relationships

The surge in virtual AI girlfriends’ popularity is playing a role in the escalating issue of loneliness among young males, and this could have serious repercussions for America’s future. A

AIOps Innovations

Senser is Changing AIOps

Senser, an AIOps platform based in Tel Aviv, has introduced its groundbreaking AI-powered observability solution to support developers and operations teams in promptly pinpointing the root causes of service disruptions

Bebop Charging Stations

Check Out The New Bebob Battery Charging Stations

Bebob has introduced new 4- and 8-channel battery charging stations primarily aimed at rental companies, providing a convenient solution for clients with a large quantity of batteries. These wall-mountable and

Malyasian Networks

Malaysia’s Dual 5G Network Growth

On Wednesday, Malaysia’s Prime Minister Anwar Ibrahim announced the country’s plan to implement a dual 5G network strategy. This move is designed to achieve a more equitable incorporation of both

Advanced Drones Race

Pentagon’s Bold Race for Advanced Drones

The Pentagon has recently unveiled its ambitious strategy to acquire thousands of sophisticated drones within the next two years. This decision comes in response to Russia’s rapid utilization of airborne

Important Updates

You Need to See the New Microsoft Updates

Microsoft has recently announced a series of new features and updates across their applications, including Outlook, Microsoft Teams, and SharePoint. These new developments are centered around improving user experience, streamlining