Browse DevX
Sign up for e-mail newsletters from DevX


Top 10 Tips for Designing Telephony Applications

Using Microsoft Speech Server, .NET developers can build telephony or voice-only applications quickly and easily. This article lists 10 tips to consider before designing these types of applications.




Building the Right Environment to Support AI, Machine Learning and Deep Learning

here's a strong chance that you've used a speech-based application at least once—to access your bank information, check the status of an airline flight, or get the latest weather—because the number and breadth of speech-based applications is growing every day. One reason for this growth is obvious, the increasing use of cell phones, but another is the idea that information must be available quickly and easily twenty-four hours a day, seven days a week. Most companies cannot afford to keep staff on hand to answer telephones at all times of the day and night. So, telephony applications that let callers access data by telephone have emerged as a logical choice.

For telephony applications, the user interface is the telephone itself. These applications receive input through spoken commands or through DTMF (Dual Tone Multi-frequency), when users press keys on the telephone keypad. Telephony applications are not new; they're also known as Interactive Voice Response (IVR) systems. Large organizations and call centers have been using them for years. Recently however, the technology behind speech recognition and speech synthesis has advanced significantly.

Microsoft Speech Server, first introduced in 2004, includes an SDK (Software Development Toolkit) that .NET developers use to build telephony applications using Visual Studio .NET. This article is not an introduction to using the Microsoft Speech Application SDK (SASDK)—for that, you can refer to the online tutorial that accompanies the SDK or chapters two through four of my book, "Building Intelligent .NET Applications: Agents, Data Mining, Rule-based Systems, and Speech Processing." Instead, this article presents 10 topics I think you should consider before writing a telephony application. I chose these items based on personal experience gained while building applications for the Microsoft Speech Server SDK as well as my opinions as a user of telephony applications.

Tip 1—Allow Users to Interrupt All Prompts
Recently I called my bank because I noticed some unusual activity with my bank account. I was told to contact their fraud hotline. When calling the hotline, a female voice recites a welcome message, and then asks users to press "1" to hear options in English. But after pressing "1," callers are then forced to listen to all the available menu choices again. For some reason the designers of this telephony application did not allow users to barge-in and enter a menu choice; instead, they must wait and listen to the prompts for all four menu choices before the application lets them select one. Some people might argue that it's a good idea to force callers to listen to all the choices before making a selection, but I believe all that does is aggravate users, and that you should always allow users to interrupt prompts, a process called "barging-in."

A prompt is a message that the system sends to the user asking them for input. The SASDK allows you to set a BargeIn property for all prompts. The property accepts a Boolean value that indicates whether the prompt control will stop playing and accept either speech or numerical input. The default value for the prompt control is True, so you shouldn't typically need to set this value specifically. But, before changing the value to False, consider the implications that the change might have on user acceptance.

Tip 2—Use Dynamic Grammars when Content Is Unknown or Changes Often
Just like traditional applications, telephony applications typically interact with a database. Because the data within a database is dynamic, the information you collect from the user may need to be dynamic as well.

Grammars represent what the user says to the application. Developers building telephony applications with the SASDK can use a graphical grammar editor to design the Extensible Markup Language (XML) based files that comprise the grammar. The resulting grammar files have a .grxml file extension. The speech recognition engine uses them to understand what the user is saying. Each interaction with the user is associated with one or more of these grammar files.

Dynamic grammars are built each time a Web page is executed. The following C# code shows a dynamic grammar built using the results of a stored procedure call.

StringBuilder s = new StringBuilder(); s.Append("" ); s.Append("" ); s.Append(""); SqlDataReader dr = SqlHelper.ExecuteReader(connstr, CommandType.StoredProcedure, sp ); while ( dr.Read() ) { s.Append(ConvertGrammarItem(Name,dr.GetString(0))); } s.Append(""); s.Append(""); s.Append(""); Microsoft.Speech.Web.UI.Grammar gram = new Grammar(); Gram.InlineGrammar = s.ToString(); QAControl.Reco.Grammars.Add(gram); QAControl.Visible = true;

In the next-to-last line, the application associates the resulting grammar with the question-and-answer control named QAControl. The following ConvertGrammarItem method builds the individual grammar items.

private string ConvertGrammarItem(string Name, string GrammarItem ) { StringBuilder s = new StringBuilder(); s.Append(""); s.Append(""); s.Append(GrammarItem); s.Append(""); s.Append("$." + Name + " = """ + GrammarItem ""); s.Append(""); return s.ToString(); }

Comment and Contribute






(Maximum characters: 1200). You have 1200 characters left.



Thanks for your registration, follow us on our social networks to keep up-to-date