devxlogo

Top 10 Tips for Designing Telephony Applications

Top 10 Tips for Designing Telephony Applications

here’s a strong chance that you’ve used a speech-based application at least once?to access your bank information, check the status of an airline flight, or get the latest weather?because the number and breadth of speech-based applications is growing every day. One reason for this growth is obvious, the increasing use of cell phones, but another is the idea that information must be available quickly and easily twenty-four hours a day, seven days a week. Most companies cannot afford to keep staff on hand to answer telephones at all times of the day and night. So, telephony applications that let callers access data by telephone have emerged as a logical choice.

For telephony applications, the user interface is the telephone itself. These applications receive input through spoken commands or through DTMF (Dual Tone Multi-frequency), when users press keys on the telephone keypad. Telephony applications are not new; they’re also known as Interactive Voice Response (IVR) systems. Large organizations and call centers have been using them for years. Recently however, the technology behind speech recognition and speech synthesis has advanced significantly.

Microsoft Speech Server, first introduced in 2004, includes an SDK (Software Development Toolkit) that .NET developers use to build telephony applications using Visual Studio .NET. This article is not an introduction to using the Microsoft Speech Application SDK (SASDK)?for that, you can refer to the online tutorial that accompanies the SDK or chapters two through four of my book, “Building Intelligent .NET Applications: Agents, Data Mining, Rule-based Systems, and Speech Processing.” Instead, this article presents 10 topics I think you should consider before writing a telephony application. I chose these items based on personal experience gained while building applications for the Microsoft Speech Server SDK as well as my opinions as a user of telephony applications.

Tip 1?Allow Users to Interrupt All Prompts
Recently I called my bank because I noticed some unusual activity with my bank account. I was told to contact their fraud hotline. When calling the hotline, a female voice recites a welcome message, and then asks users to press “1” to hear options in English. But after pressing “1,” callers are then forced to listen to all the available menu choices again. For some reason the designers of this telephony application did not allow users to barge-in and enter a menu choice; instead, they must wait and listen to the prompts for all four menu choices before the application lets them select one. Some people might argue that it’s a good idea to force callers to listen to all the choices before making a selection, but I believe all that does is aggravate users, and that you should always allow users to interrupt prompts, a process called “barging-in.”

A prompt is a message that the system sends to the user asking them for input. The SASDK allows you to set a BargeIn property for all prompts. The property accepts a Boolean value that indicates whether the prompt control will stop playing and accept either speech or numerical input. The default value for the prompt control is True, so you shouldn’t typically need to set this value specifically. But, before changing the value to False, consider the implications that the change might have on user acceptance.

Tip 2?Use Dynamic Grammars when Content Is Unknown or Changes Often
Just like traditional applications, telephony applications typically interact with a database. Because the data within a database is dynamic, the information you collect from the user may need to be dynamic as well.

Grammars represent what the user says to the application. Developers building telephony applications with the SASDK can use a graphical grammar editor to design the Extensible Markup Language (XML) based files that comprise the grammar. The resulting grammar files have a .grxml file extension. The speech recognition engine uses them to understand what the user is saying. Each interaction with the user is associated with one or more of these grammar files.

Dynamic grammars are built each time a Web page is executed. The following C# code shows a dynamic grammar built using the results of a stored procedure call.

   StringBuilder s = new StringBuilder();      s.Append("" );    s.Append("" );                    s.Append("");                       SqlDataReader dr = SqlHelper.ExecuteReader(connstr,      CommandType.StoredProcedure, sp );                       while ( dr.Read() ) {      s.Append(ConvertGrammarItem(Name,dr.GetString(0)));    }                       s.Append("");    s.Append("");    s.Append("");       Microsoft.Speech.Web.UI.Grammar gram = new Grammar();                   Gram.InlineGrammar = s.ToString();   QAControl.Reco.Grammars.Add(gram);   QAControl.Visible = true;                                      

In the next-to-last line, the application associates the resulting grammar with the question-and-answer control named QAControl. The following ConvertGrammarItem method builds the individual grammar items.

   private string ConvertGrammarItem(string Name,       string GrammarItem ) {                          StringBuilder s = new StringBuilder();                          s.Append("");       s.Append("");       s.Append(GrammarItem);       s.Append("");       s.Append("$." + Name + " = """ + GrammarItem         "");       s.Append("");                          return s.ToString();     }

Tip 3?Let Users Enter Multiple Items with One Command.
In many cases it is faster for a user to speak a complex command than to type in a query or click a series of controls. For example, to make an airline reservation, you need to provide several key pieces of information, such as departure and destination locations, arrival dates and times, and travel preferences. Online systems typically collect this information through a series of text and combo boxes; but for a telephony system, this information is collected entirely through spoken commands. Such applications could prompt users for each piece of information required?this is the most straightforward approach, and involves a relatively simple grammar. However, it is faster to allow users the option of speaking more than one piece of information at a time. For instance, they might be allowed to say “Find me all Delta flights out of Louisiana for October 18th.” The downsides to allowing this type of query are that the potential for error or misunderstanding increases and developers must spend more time building the grammar. Ease of use vs. development resources is a tradeoff that you should consider when developing telephony applications.

Tip 4?Limit the Number and Depth of Choices
The overall goal of a telephony application should be to make information available to the caller as quickly as possible. For the same reasons users of traditional Web-based applications do not want to waste time clicking through nested menus, telephony callers do not want to waste time navigating complex voice-only menus. This means that as the designer of a telephony application, you will need to limit the number of menu choices and nested submenus. I recommend limiting each menu to no more than three choices, and submenu depth to no more than two levels.

In addition to limiting the number and depth of menus, you also need to limit the number of words in each menu choice. If you force users to speak a four word menu choice in its entirety, they’re likely to make mistakes and become frustrated. You can reduce the likelihood of mistakes by defining the grammar carefully. For example, if a menu choice has four words, such as “get bank account balance,” make sure that the grammar also allows users to say “get bank balance,” or “account balance.”

Tip 5?Keep Prompts Short and Simple
Limiting the length of prompt messages helps prevent overwhelming users with too much information. Experienced users of telephony applications often know exactly what piece of information they are trying to retrieve. A long prompt that goes into great detail about all the available options will aggravate anyone but a first time caller?and even for first time callers, too much information can be confusing or overwhelming.

As an alternative to lengthy instructional prompts you can include help commands, designing the application so that users can access help at any time by simply saying “help,” or “I need help.” The help messages can change depending on context?where the user is in the application. Remember, you should state that help is available, and might have to remind users periodically that a help command is available.

Limiting the length of the prompt message is especially important for the welcome message. A telephony application plays a welcome message it’s first initiated. Messages that are too long or complex will discourage first time callers from using the system. Return callers will be frustrated and will generally tune out the message, even if it changes and contains some piece of valuable information.

Tip 6? Make Sure Prompts are Understandable
A telephony application built with the Microsoft SASDK uses a prompt database to store all potential prompts. Your applications can deliver prompts created using text-to-speech technology or associated with pre-recorded messages. The pre-recorded messages can be recorded using the developer’s voice or a professional voice talent. If you choose to record prompts, be sure to have the recorded messages spoken slowly and clearly so that all users can understand them. In addition, when your telephony application will be accessed by callers from different areas of the country or outside the United States, you may want to consider using regional dialects.

The Microsoft SASDK gives you the ability to tune both pre-recorded and text-to-speech prompts. For text-to-speech prompts, you can use SSML (Speech Synthesis Markup Language) tags to specify volume and control the speech rate. For example, the following code uses the ssml:prosody element to adjust the pitch and rate of the spoken text. It is wrapped inside the ssml:speak tag, which is a required root element for SSML.

            No flights are available on October             18 th         

The spectrogram view, part of the Microsoft SASDK’s built-in Prompt Editor, can help you tune pre-recorded messages. When a recording is imported into the prompt database, the speech recognizer does a good job of aligning the recording with the typed text; however, sometimes adjustments are needed to make the prompt sound clearer. The spectrogram view lets developers adjust the word boundaries for imported recordings.

Tip 7?Provide Help, Back, and Repeat Commands
At any time during a call, telephony users should have the option to ask for help, go back to the main menu or repeat a prompt. In addition, your application will need to tell callers and remind them that these options are available. It is not enough to tell them about these commands at the beginning of the call. Periodically, or at the end of each question you need to remind them of these commands with a prompt such as, “At any time, you can ask for help, go back to the main menu or repeat a prompt.”

You can use user controls to simplify the implementation of these commands. Just as in an ASP.NET application, user controls help to group similar functionality used throughout multiple Web pages. For example, you could place the following HTML tags inside a user control included with each Web page in an application.

                                                                          

Tip 8?Log and Review Exceptions
Telephony developers may need to debug their applications long after the initial development process. Logging the results of interactions with actual users will be critical in the success of these applications. By default, Microsoft Speech Server logs each call and stores the results in a Windows event trace log file with an .etl (event log tracing) extension. Developers can then import these files into a SQL database and use tools provided with the SASDK to extract data from the interactions.

Obviously, developers will want to monitor these logs closely after an application is first deployed. However, after a telephony application is considered stable, this exercise should not stop. You should establish periodic reviews of the logs to monitor the success of user interactions.

Tip 9?Allow Time for Application Design, Testing, and Revisions
In the world of speech application development, the GUI (Graphical User Interface) has been replaced with the VUI (Voice User Interface), which has different challenges and considerations than traditional applications. To identify and face these new obstacles, telephony developers should allow plenty of time for application design. The call flow should be designed carefully so that it guides the user through the application and never abandons them. Tools such as Visio can be helpful in diagramming the call flow, making it easy to visually identify all the possible paths a user can take.

Tip 10?Strive for Intuitive Prompts and Navigation, Test with Real Users
New developers using the Microsoft SASDK must realize that just because telephony applications can now be built in the same way as traditional Web applications, they are not the same. The way that callers use telephony applications and the expectations they have for them are very different than that of traditional applications. Developers should strive to design prompts and navigation so that the flow is natural and intuitive. More importantly, user feedback is critical. Before proceeding too far with the development of a telephony application, perform some type of field test using a sample of actual callers. This will help to ensure positive user acceptance and the least amount of rewrites necessary.

The Microsoft Speech Server and SASDK are wonderful new tools every developer should be familiar with. If you have not had a chance to play with this product, I encourage you to visit the Microsoft Speech Server Web site and download a copy of the SASDK.

Speech-based applications will continue to grow in use as companies like Microsoft and others remain committed to making speech mainstream. Just like there were mistakes made when the number of Web applications seemed to double every day, the maturity of speech-based applications will undergo similar challenges. Developers will be quick to implement telephony applications using the newly available tools like Microsoft’s SASDK without first considering the complexity of speech-based applications.

devxblackblue

About Our Editorial Process

At DevX, we’re dedicated to tech entrepreneurship. Our team closely follows industry shifts, new products, AI breakthroughs, technology trends, and funding announcements. Articles undergo thorough editing to ensure accuracy and clarity, reflecting DevX’s style and supporting entrepreneurs in the tech sphere.

See our full editorial policy.

About Our Journalist