Buy and Sell Stocks with the Sound of Your Voice Using the .NET Speech SDK

here are thousands of ways of applying speech recognition technology to business applications. If you’re not already using some form of speech recognition in your organization, you’ve probably thought of several ways in which you’d like to use it one day. The .NET Speech SDK from Microsoft gives you some new options for making applications that respond to voice commands from end users.

In this article, we’ll take you step-by-step through the process of using this SDK to build a demo application to buy and sell stocks and perform other related financial transactions using voice. The application we’ll build in this article will be for a fictional financial services company called “Fitch and Mather” and the application itself will be called “FMStocks Voice.” It is a voice-only version of the traditional (non-voice) Web application that Fitch and Mather already uses to let its customers buy and sell stocks online. The voice and Web versions share business and data layers. You can find more information on the Fitch and Mather Web application, including downloads and documentation, here.

While our goals are to shed light on the process of building voice-only applications in general, we’ll discuss lessons learned from the testing, design, and development stages, as well as thoughts about the differences between building visual applications for the Web and speech applications for telephony. Specifically, we’ll show how to create a voice-only service from an existing Web application. FMStocks Voice leverages the existing business- and data-layers of the Fitch and Mather sample it is based on, with only minor modifications. To this end, the Web-based version is included with this sample to illustrate how the two presentation-layers work together simultaneously on the same data. It is possible to place a trade in the voice-only application and see the account change immediately reflected in the Web version. And we’ll demonstrate best-practice programming and design techniques for using the Speech SDK.

FMStocks Voice will deliver these functions:

  • Buying a Stock.
  • Selling a Stock from the user’s portfolio.
  • Getting current stock quotes.
  • Browsing stocks.
  • Viewing current portfolio.
  • Account login security using Windows Authentication.
  • Leveraging pre-existing business-layer and data-layer code.

Code Reuse
In essence, a voice-only version of an existing application is in fact a new presentation layer. The user interface is now auditory rather than graphical. This means that the business logic and the data layer should essentially remain unchanged.

With a few exceptions, we have followed this concept as a development guideline. In the sample VS.NET solution, note that the FMStocksVoice project file includes a reference to the Components folder in FMStocksWeb (the included Web-based version, see Figure 1). Because the two applications share this same code, trades that occur in one interface are immediately reflected in the other.

Figure 1: The FMStocksVoice project file includes a reference to the Components folder.

This will come in very handy in cases where we want to extend the functionality of both applications while writing the voice version. For example, in the current FMStocksVoice implementation the user must already have an account in order to use the voice-only application. It would be useful (though it is not covered in this article) to extend the FMStocksVoice application to allow the user to create an account. The challenge is handling the entry of the user’s personal information (i.e. name, username, password, PIN, etc.). One solution would be to limit the amount of information required to set up an account.

Designing the Application
The FMStocks Voice system was designed to target more advanced users, who use this system frequently and are familiar with the options and navigation. With this in mind, the system was designed to include advanced features to enable the typical user to navigate the system quickly, yet include enough help and explanation such that the user would not get lost.

You should select the speaking voice for your application carefully, being sure to account for both speed and personality. The recorded sentences should be spoken at a moderate-to-quick pace, since the user usually knows what will be said, yet not so quickly that the recordings are mumbled or too difficult to follow. The system voice should be businesslike and factual.

Navigation
Designing a voice-only system is much different from designing traditional GUI-based applications. Whereas a Web page is a two-dimensional interface, the voice medium is one-dimensional. For example, a table of data on a Web page needs to be read item by item over the phone. As one designer put it, the challenge becomes, “How do you chop things up to establish a coherent flow of information? How do you express content in a way that the user can digest, understand, and then act upon?”

We started our design process by following our standard methodology of user-centered design. The 80/20 rule is a good guide: 80 percent of the users use 20 percent of the application. We focused on ideal scenarios and common user paths rather than considering exceptional cases in the preliminary stages. We acted out sample dialogues that helped us get a better sense of how a typical conversation might go.

From these sample dialogues, we began creating flow charts for each major component of the system. Figure 2 shows the high-level flow diagram for the application:

Figure 2: This application flow diagram shows the high-level flow.

In addition to the flow diagram above, several global commands are available to the user throughout the application:

  • Main Menu: Returns the user to the main menu.
  • Help: Provides the user with context-sensitive help text at any prompt.
  • Instructions: Provides instructions on the basic usage of the system and global commands available to them at any point.
  • Repeat: Repeats the most relevant last prompt. If the last prompt informed the user that his/her input was invalid, the repeat text will provide the user with the previous question prompt instead of repeating the error message.
  • Representative: Transfers the user to a customer service representative.
  • Goodbye: Ends the call.

To buy stock, sell stock, or get a quote on a stock, the user must first choose a company from among a large company list. To do that, they must first say a company name, such as “Microsoft Corporation,” or a partial ticker symbol, such as “M” to get started. If there is more than one match, the user enters a speech Navigation control and selects which company they want. In each of the three pages mentioned, we implement a user control called the “Selectable Navigator,” which encapsulates a Statement-only QA control, a Navigator application control, and a Select command control. This user control is discussed in detail in the next section, “How It Works.”

Prompts
The design team found creation of a prompt specification document to be a challenge in itself. The number of paths available to the user at any one prompt leads to a complicated flow-chart diagram that, while technically accurate, loses a sense of the conversation flow that the designers had worked to achieve. The design team arrived at a compromise specification that allowed them to illustrate an ideal scenario while also handling exceptions. The following example illustrates the beginning of the “Buy Stock” scenario from the main menu:

Prompt: Main Menu

Expected User Input

“Buy Stock”

Recognition

System Response

Recognized Expected Input

Please say the name of the company or spell the tickersymbol of the company that you are interested in. You may also say Main Menu,Help, or Representative at any time.

Recognized Alternate Input: “Help”

To help me direct your call, please say one of thefollowing: Quotes, Buy Stock, Sell Stock, or Check Portfolio.

Prompt: Buy Stock

Expected User Input

“Microsoft Corporation”

Recognition

System Response

Recognized Expected Input

I understood ‘Microsoft Corporation.’ Is that correct?

Recognized Alternate Input: “Help”

Please say the name of the company or spell the tickersymbol, leaving clear pauses between letters. You may say Main Menu if youwish to cancel this transaction.

This format of specifying functionality makes it very easy to conduct “Wizard-of-Oz” style testing. In this scenario, the test subject calls a tester who has the functional documents in front of him/her. The tester acts as the system, prompting the test subject as the system would and responding to their input likewise. Trouble spots are easily identified and fixed using this style of testing.

How It Works
The following section is devoted to the architecture of the system. We start with an explanation of common user controls and common script files. Then we’ll go into detail on the Buy Stock feature, which provides a good encapsulation of many of the programming techniques used throughout the application. Finally, we’ll review some of the coding conventions and practices we used as best-practice techniques for development.

Common Files: User Controls
Two ASP.NET user controls are included on almost every page in our application. Together they encapsulate much of the functionality of the site, and each deserves discussion. Implementing user controls, whether in a regular ASP.NET application or while using the ASP.NET Speech controls, can provide a consistent user experience while saving a great deal of code.

GlobalSpeechElements.aspx: The GlobalSpeechElements user control is used on every page of the application except for Goodbye.aspx and RepresentativeXfer.aspx, which do little more than read a prompt and transfer the user away. It contains the main stylesheet control that defines common properties for the speech controls used throughout the application, as well as global command controls and common script files that provide client-side functional components.

MainStyleSheet: The Speech SDK style control is a powerful way of defining global application settings and assigning globally scoped functionality. In the FMStocks sample we have four different styles:

BaseCommandStyle: This style is applied to all command controls. Its one attribute sets the AcceptCommandThreshold at .6, meaning that any command must be recognized with at least a 60 percent confidence rating to be accepted.

                                                         

GlobalCommandStyle: This style is applied only to the six global styles contained in GlobalSpeechElements.aspx. This style inherits the attributes of BaseCommandStyle and adds a dynamically set scope attribute. We want global commands to apply to all controls on any page they are included in, so we set the scope to be the parent page’s ID at runtime.

                                            

BaseQAStyle: This style is applied to all QA controls that accept user input (QA controls that do not accept user input are called “Statements” and use the StatementQA style below). In addition to setting timeout and confidence thresholds, this style also defines the OnClientActive event handler for all QA controls. HandleNoRecoAndSilence is a JScript event handler that monitors a user’s unsuccessful attempts to say a valid response and transfers the user to customer service after enough unsuccessful events. It is described in the section on Common Script files below.

                                                                        

StatementQA: For QA controls that do not accept user input, we want to disable BargeIn?the act of interrupting a prompt before it ends with a response?and turn on PlayOnce, which ensures the prompt is not repeated. Normal QA controls are activated when their semantic item is empty; since Statement QA controls have no semantic item, the control would be played over and over again if PlayOnce was turned off.

                                                         

Global Commands: The global commands in GlobalSpeechElements (described in the Navigation Design section) each have a command grammar file associated with them that define how the command is activated.

Commands fall into two categories: those that affect the current prompt (HelpCmd, InstructionsCmd, RepeatCmd), and those that trigger an event (RepresentativeCmd, GoodbyeCmd, MainMenuCmd). For the former, the prompt function looks for a particular “Type” value in its lastCommandOrException parameter and creates an appropriate command. For the latter, the command’s associated OnClientCommand event handler is executed.

                  

Common Script File Includes: GlobalSpeechElements is an ideal place to include references to all global script files. These files constitute all global client-side event handlers and prompt generation/formatting routines for the application. Since they are included in the control, individual pages can rely on their availability without explicitly including them.

               

SelectableNavigator.aspx: The SelectableNavigator user control is used on any page that needs a dynamically generated list of items from which the user may select. While the Navigator application control included in the Speech SDK can read a list of items, it does not allow the user to select one of the items in the list. The SingleItemChooser application control does allow the user to select an item, but it is unwieldy for large lists. The SelectableNavigator contains a Navigator application control, as well as a QA control and a Command control.

InitialStatement: The prompt of the InitialStatement QA is used to tell the user something about the list. Originally, we had this initial statement as part of what the navigator says for its first item. However, if the user mumbles, this initial statement and the first item are lost. Since we wanted to ensure the user heard the first item, we separated the initial statement from the first item. This way, even if the user mumbles during the initial statement, they will still hear the first item after the system recovers.

TheNavigator: The Navigator takes care of the tasks associated with reading and navigating through the list of items associated with the control.

SelectCmd: This command, scoped to TheNavigator, allows the user to select an item. The grammar for this command may be specified dynamically by setting the IsDynamic property of the SelectableNavigator to true (the default). The grammar always contains at least “select” and “that one” to select the current (most-recently read) item but if this flag is set, the grammar also contains the items found in the first field specified in the DataHeaderFields property. Thus, if the user is on the first item and says the name of the fifth item, the fifth item is selected.

Because user controls have no designer support, all properties are set programmatically in the code-behind file. In addition, this means dragging the user control onto a page does not add additional speech mark-up to the page as does dragging one of the Speech SDK’s controls, such as a QA, onto the page. This isn’t normally a problem since you will usually want other speech controls on the page, such as a semantic map. However, something else you don’t automatically get is a prompt function file to keep the prompt function for the SelectableNavigator. Again, this isn’t normally a problem so long as there are other controls on the page that will want a prompt function. For instance, using the Property Builder for a QA to add a prompt function will automatically add a prompt function file if one does not already exist for the page. After doing so, you may add a prompt function to that file yourself for a SelectableNavigator on the same page.

Because the Speech SDK is very client-side-heavy, there are two client-side JavaScript functions to write to handle two events fired by the SelectableNavigator, OnCancel and OnSelect.

OnCancel: The SelectableNavigator fires this event if the user says “Cancel” while in the SelectableNavigator. Since the Navigator’s built-in cancel command deactivates the Navigator, RunSpeech will skip the SelectableNavigator during subsequent iterations.

OnSelect: The SelectableNavigator fires the event if the user selects an item, either by saying “Select” or the name of an item if IsDynamic is true. Return true from this handler to deactivate the SelectableNavigator.

The client- and server-side properties and methods exposed by the SelectableNavigator are documented in SelectableNavigator.html.

Common Files: Client-Side Scripting
The globally scoped client-side script files for the application are:

  • Speech.js: NoReco/Silence event handler and object accessors
  • Routines.js: String-formatting routines
  • Debug.js: Client-side debugging utilities
  • FMStocks7V.js: Global Navigation Event Handlers
  • PromptGenerator.js: Prompt Generation Utility

A few of the more interesting functions of these scripts are outlined below:

HandleNoRecoAndSilence (Speech.js): HandleNoRecoAndSilence takes care of handling cases where the user repeatedly responds to a prompt with silence or with an unrecognizable input. To avoid frustration, we don’t want to repeat the same prompt over and over again. This function, executed each time a QA is made active, counts the number of consecutive times the input is invalid. It increments a counter that the prompt generation utility (see below) uses to generate an appropriate prompt. If the count exceeds a maximum (in this application, 3), we redirect the user to a Customer Service Representative.

This function is defined as the OnClientActive event handler for the BaseQAStyle in the GlobalSpeechElement’s MainStyleSheet. Each QA that accepts user input must use this style in order for the function to be called correctly.

   function HandleNoRecoAndSilence(   eventSource, lastCommandOrException,                                     count, semanticItemList)   {      if (count == 1)         PromptGenerator.noRecoOrSilenceCount = 0;         if (lastCommandOrException == "Silence" ||          lastCommandOrException == "NoReco")      {         PromptGenerator.noRecoOrSilenceCount++;                  if (PromptGenerator.noRecoOrSilenceCount >= representativeXferCount)            Goto(representativeXferPage);      }      else      {         PromptGenerator.noRecoOrSilenceCount = 0;      }   }

Navigator Functions (Speech.js)
Speech.js contains the following functions to make working with the Navigator application control easier:

ActivateNavigator(navigatorName, active): In the Speech SDK, speech-controls are activated and deactivated by modifying the semantic state of the control’s associated Semantic Item. The same is true for Navigator application controls, though the semantic item is hidden from the user. In order to make activation and deactivation of Navigators simpler, we created a function that sets the Navigator’s “ExitSemanticItem” to some dummy value. If the value is empty, the Navigator is activated. If not, the Navigator is inactive.

   function ActivateNavigator(navigatorName, active)   {      var si = eval (navigatorName + "_ExitSemanticItem");            if(active || arguments.length == 1)         si.Clear();      else         si.SetText("x", true); // value can be anything            return active;   }

GetNavigator(navigatorName): Returns a Navigator object reference given its name as a string.

GetNavigatorCount(navigatorName): Returns the count of items in the given navigator.

GetNavigatorData(navigatorName, columnName): Returns the data contained in the currently-selected row of the specified navigator’s specified column.

GetNavigatorQA(navigatorName): Returns a reference to a Navigator’s internal QA control.

Prompt Generation (PromptGenerator.js)
Prompt Generation is perhaps the most central element when creating a successful voice-only application. Providing a consistent voice interface is essential to creating a successful user experience. PromptGenerator.js does just this by encapsulating all common prompt-generation functionality in one place.

A prompt function in a typical page will always return the result of a call PromptGenerator.Generate() as its prompt:

   return PromptGenerator.Generate(      lastCommandOrException,       count,      "Prompt Text Here",       "Help Text Here"   );   

Notice that the prompt function passes both its main prompt and its help prompt into the function every time. PromptGenerator.Generate() decides the appropriate prompt to play given the current lastCommandOrException, the NoReco/Silence state (see HandleNoRecoAndSilence, above), and other factors:

   function PromptGenerator.Generate(lastCommandOrException,       count, text, help)   {      help += " You can always say Instructions "        + "for more options."            switch (lastCommandOrException)      {         case "NoReco":            if (PromptGenerator.noRecoOrSilenceCount > 1)               return "Sorry, I still don't understand you."              + help;            else               return "Sorry, I am having trouble "              + "understanding you. " +               "If you need help, say help. " + text;         case "Silence":            if (PromptGenerator.noRecoOrSilenceCount > 1)               return "Sorry, I still don't hear you.  " +               help;            else               return "Sorry, I am having trouble " +                 "hearing you. " +                  "If you need help, say help. " + text;         case "Help":            PromptGenerator.RepeatPrompt = help;            return help;         case "Instructions":            var instructionsPrompt =               "Okay, here are a few instructions...";            PromptGenerator.RepeatPrompt =               instructionsPrompt + text;             return instructionsPrompt;         case "Repeat":            return "I repeat: " +               PromptGenerator.RepeatPrompt;         default:            PromptGenerator.RepeatPrompt = text;            return text;      }   }
Note: Some of the longer strings have been shortened in the above code sample to save space.

A note on “Repeat”: The PromptGenerator.RepeatPrompt variable stores the current text that will be read if the user says, “Repeat.” The first time the function is executed for any prompt, the RepeatPrompt will be set to the standard text. The RepeatPrompt is then only reset when the user says, “Help,” or “Instructions.”

Other PromptGenerator functions: PromptGenerator also includes a few other functions for generating prompts in the application. They include:

GenerateNavigator(lastCommandOrException, count, text, help): This function adds to the functionality of Generate() by including standard prompts commonly needed while in a Navigator control. These prompts include additional help text and messages for when the user tries to navigate beyond the boundaries of the navigator.

ConvertNumberToWords(number, isMoney): In order to generate recorded prompts for all possible number values, we must convert numbers (i.e. 123,456) to a readable string (i.e. “one hundred twenty three thousand four hundred fifty six). This reduces the number of unique words that must be recorded to a manageable amount.

Designing Your Grammar
Items in your grammar files define what words and phrases are recognized. When the Speech engine matches an item from the grammar file, it returns SML, or Speech Markup Language, which your application uses to extract definitive values from the text that the user spoke. Having too strict a grammar will result in no flexibility from the user’s perspective in regards to what they can say; however, too many unnecessary grammar items can lead to lower speech recognition.

Preambles and Postambles
Very often, you will want to allow a generic “preamble,” text said before the main item, and “postamble,” text said after the main item. For instance, if the main command is “Buy Stock,” you would want to allow the user to say “May I Buy Stock please.”

Typically, you can use one grammar (.grxml) file for your preambles and one for your postambles. Within your other grammar rules, you can then reference the pre- and post-ambles by using RuleRef’s.

Tip: Make the pre- and post-ambles generic and robust enough that you don’t limit your users’ experience, but keep them reasonable in size so that you don’t risk lowering the speech recognition for your main elements.

Static Grammar
Use the Grammar Editor tool (see Figure 3) to graphically set up grammar files. The basic task is to set up a text phrase or a list of phrases, and then assign a value that you want your application to use when each phrase is recognized.

Figure 3: Use the Grammar Editor tool to set up grammar files visually.

We found that the following strategies helped us in grammar development:

  • Typically, if we only need to recognize that a text phrase has been matched, especially in the case of commands, we fill in the Value field with the empty string rather than a value. For example, if you want to capture when the user says “Help,” you can simply return the following SML:
                     
  • The control associated with this grammar file recognizes the phrase, and returns the SML element “GoHelp”; the code-behind or client-side script makes a decision based on the SML element being returned, rather than the value.
  • Use rule references within grammar files to avoid duplicating the same rule across different speech controls.
Tip: You must make sure that a rule to be referenced is a public rule, which you can set through the properties pane.

  • A common grammars file is included with the Speech SDK, both in an XML file version (cmnrules.grxml) and in a smaller, faster compiled version (cmnrules.cfg). We copied the compiled version into our project and used it for commonly used grammar elements, such as digits and letters in the alphabet.

Creating Grammar Files Programmatically
Because grammar files are simply XML files, it is possible to create grammars programmatically. This was especially helpful when creating the grammar for the stock trading companies, as not only were there a number of companies, but also there needed to be at least two grammar phrases for each company. For instance, if the company in question is “Microsoft Corporation,” we want the grammar to recognize both “Microsoft” and “Microsoft Corporation.”

We created two Web pages to be used as a tool to dynamically create company grammar from the database, and also as a way to show how this can be done.

CreateCompanyGrammar.aspx: This is the main Web page to create the company grammar, and it resides in the Tools folder. It consists mainly of a button and a text area. When you run the page and press the button, you should see a printout of either the converted XML, for debugging purposes, or an error message if there was a problem. The XML is automatically saved into the grammar file Companies.grxml, so there is no need to copy and paste the XML.

TestTickers.aspx: This page is also located in the Tools folder. Once you use the CreateCompanyGrammar.aspx page, a link to this page will become visible. This page is simply available to test the newly created company grammar file. Once you say either a company name or ticker, a DataGrid will appear and show you what database matches were found.

Database Stored Procedures: There are two stored procedures and one user-defined function installed from the database scripts that relate to dynamic grammar creation. Each has to do with string manipulation and/or loading the companies from the database, in order to most efficiently create the grammar.

Markup: Markup characters like ‘&’ (ampersand), while common in company names, cannot be used within XML strings or within the grammar and prompt tools. Several string replacement functions are performed to normalize these company names for use in the grammar files.

The most common example of this is the case of the ampersand. We replace the ampersand with the string ‘amp’ in the code-behind, for grammar/prompt recording matching. Our transcriptions in the Prompt Database also read ‘amp’, again, to be sure to match what is being sent in by the prompt functions (see Figure 4). However, when we record the company name, we say ‘and,’ not ‘amp’.

Figure 4: Note the difference between the Transcription and the Display Text.

Special Semantic Cases: In some rare cases, the speech recognition engine cannot match a company name with its correct pronunciation. We then have to manually add an extra grammar phrase in order to correctly recognize that company. For instance, the speech engine cannot understand ‘Novo-Nordisk’, but will match correctly to ‘No vo nor disk’. We enter a grammar element with the text ‘no vo nor disk’, with a corresponding value of ‘Novo-Nordisk’.

Coding Conventions?Server-side
We used several different coding conventions while building the application. Unlike traditional ASP.NET programming, the Speech SDK is primarily a client-side programming platform. Although its controls are instantiated and their properties manipulated on the server-side, controlling flow from one control to another is primarily a client-side task.

The controls offer opportunities to post back to the server automatically, including the SemanticItem’s AutoPostBack property and an automatic postback when all QAs on a page are satisfied. As a convention, though, we chose to only post back when we needed to access data or business-layer functions. Most of our code is written through client-side event handlers, using SpeechCommon.Submit() to post back explicitly when data was needed from the server.

Coding Conventions?Client-side
Because Jscript lacks many of the scoping restrictions found in C# or VB.NET, it is possible when programming on the client-side to perform a certain task in many different places. The SpeechCommon object is accessible from any client-side script, and its Submit() method can be executed from event handlers, prompt functions, or any helper routines. For this and other reasons, we have followed set of guidelines for the usage of these various components:

  • Prompt Functions Are Only For Generating Prompts: Never perform an action inside a prompt function that is not directly related to the generation and formatting of a prompt: no navigation flow, semantic item manipulation, etc. Besides good practice, the other key reason for reserving prompt functions only for generating prompts is for using the prompt validation tool. If prompt functions contain calls to SpeechCommon or other in-memory objects, those objects must be declared and their references included in the “Validation References,” for the prompt function. If these references are not included, validation will fail for the function. As a rule, the only functions referenced by prompt functions are in PromptGenerator.js.

    One exception to this rule was necessary. Navigator application controls do not expose events that are equivalent to OnClientActive, or which fire each time a prompt function is about to be executed. For QA controls, we use OnClientActive to call HandleNoRecoAndSilence(), which monitors consecutive invalid input for a QA. We expect future versions of the SDK to expose this type of event in the Navigator control, but until then, we call HandleNoRecoAndSilence() from PromptGenerator.GenerateNavigator().

  • No Inline Prompts: Inline prompts may seem attractive when the prompt text never changes, but they introduce a maintenance issue when being used with recorded prompts. Unlike prompt functions, which reference prompt databases through the values in the web.config file, inline prompts explicitly copy these values into the prompt tags. Should the location of the prompt database change (as it most likely will between development, staging, and production) each inline prompt must be modified in addition to the web.config file. Since the cost of using a prompt function is so low, we avoided inline prompts altogether, even when we had a static prompt.
  • Control of Flow Handled In Event Handlers: Flow control is the most important function of event handlers and client activation functions. Most applications that have any complexity require a more complicated flow control than the standard question-and-answer format afforded by laying QA controls down in sequence on a page. For the most part, we achieved this control by manipulating the semantic state within event handlers.

Naming Conventions
We used the following naming conventions throughout our application for consistency:

  • QA Controls: The QA Control can be used for a variety of purposes. We distinguish these purposes by their function: traditional question-and-answer controls fill a semantic item with the result of user input, confirmations confirm a pre-filled semantic item, and statements are output-only; they do not accept user input.
    • Question-And-Answer: QA (e.g. CompanyOrTIckerQA)
    • Confirm: Confirm (e.g. NumberOfSharesConfirm)
    • Statement: Statement (e.g. CompanyOrTickerNavStatement)
  • Navigator Controls: Nav (e.g. CompanyOrTickerNav)
  • Commands: Command (e.g. HelpCommand)
  • Semantic Items: si (e.g. siTicker)

Jscript and C# server-side code use naming conventions standard in those environments.

In-Depth: Buy Stock Feature?Company Selection
We’ll tie together many of the features discussed in the “How It Works” section by examining the Company selection process in the Buy Stock feature of FMStocksVoice.

The Buy Stock feature allows a user to choose a company by speaking either a company name or ticker symbol, and to buy shares of stock from that company using the money in their account.

We take the entry that the user speaks and query for matches from the database. For example, if the user says “United,” we would return “United Technologies” and “United Television” as choices for the user; if they say “M,” we would return ten companies whose ticker symbol begins with “M.”

If the database returns more than one match, we activate the SelectableNavigator user control, to allow the user to browse through the companies and choose which one they want. Once the user makes a selection, we make sure they have enough money in their account to buy at least one share in the company, and then we confirm the choice with the user.

Speech Controls
We begin our page by determining which Speech elements to use, including controls and Semantic Items. Semantic items are the key to holding the answers which our users speak, and to controlling the flow of the page.

Tip: Page Flow?Because during the Buy Stock process, we want the users to be able to return to a previously read QA, we’ll focus on manipulating the “semantic state” of the semantic items to determine what QA should be activated next. This is referred to as “flow.”

Generally, one semantic item corresponds to one user’s “answer.” Each QA expecting an answer is assigned a semantic item in which the value of the user’s answer will be stored.

Step 1?Prompt Functions and Grammar
The first step in the Buy Stock process is to tell the user what we want. We need to set up some text to be read to the user, so they know what question they are expected to answer. We do this by entering our text and our logic into a client-side “prompt function.”

Each prompt function is responsible for determining what is said to a user based on certain criteria. Specifically, we want to read something different if the user says “Help,” if the user just entered the page for the first time, or if the user has returned to this QA from a later question.

For BuyStock.aspx, we have one prompt function file that holds all of the prompt functions for all of the speech controls, named BuyStock.pf. Our first function, CompanyOrTickerQA_Prompt will ask the user which company they would like.

   function CompanyOrTickerQA_prompt_inner(      lastCommandOrException, count,      missingEntry, userCanceled, zeroMaxShares, company)   {      var help= "Please say the name of the company " +         "or spell the ticker " +         "symbol, leaving clear pauses between letters...";      var text= "Please say the name of the company " +          "or spell the ticker...";          if(userCanceled)          text= "You canceled.  " + text;       else if(missingEntry != null)       {          text= "I did not find any matches for the " +            "entry " + missingEntry +            ".  Please make a new entry to search again.";       }       else if(zeroMaxShares)       {          text= "I understood " + company +            " .  You do not have sufficient " +            "funds in your account to buy " +            "shares in this company...";       }       else          text= "To buy stock, " + text;         return PromptGenerator.Generate (         lastCommandOrException, count, text, help);   }
NOTE: Some of the longer strings have been shortened in the above code sample to save space.

Since it is possible for the user to come back to this QA later, we have had to add some parameters to this prompt function, so that we know which text to prompt the user with.

missingEntry: Holds either null or a Ticker entry which the user entered, but which was not found (i.e. “ZZZZ”

userCanceled: Either true or false, indicating if the user is returning to this prompt after canceling out of the Selectable Navigator

zeroMaxShares: Either true or false. If the user picks a company, and the code-behind determines that the user does not have enough money to buy even one share, we send the user back to this first QA, indicate that they cannot buy any shares, and ask them to enter another company name.

company: Holds a company name. Used with the zeroMaxShares parameter, to tell the user which company they don’t have enough money to buy.

We choose which text to send based on the values of the parameters, and then we call the PromptGenerator.Generate function, which will determine if the user spoke one of the global commands, such as “Help,” “Repeat,” or “Instructions.”

Next, we set up a Grammar file to define what phrases we expect the user to say (see Figure 5). Our grammar file contains a list of acceptable companies and their corresponding values, as well as a reference to a grammar rule for ticker symbols.

Figure 5: The grammar file contains a list of acceptable companies and their corresponding values.

When one of the choices is matched, the speech engine returns an SML document containing the name of the SML element matched, the text recognized, and the corresponding value. The SML element name and its corresponding value are set in the Assignments window of the grammar editor, as in Figure 5.

In our case, we wanted to set up two different SML element names, to distinguish whether the grammar matched a company name, or matched a ticker symbol. When the speech engine makes a match, it will return one of two SML documents, such as these:

               MSFT         Ticker Match                            Microsoft Corporation                   Company Match

Step 2?Parse user input: Client Normalization Function
After the grammar is matched and the SML document is returned, we’ll next want to have our server-side code retrieve a list of companies from the database that matches these criteria. However, we’ll need to run a different business logic function based on which type of information we have: company or ticker. We need a way to pass some information to our code telling it which SML element was matched.

When we set up the CompanyOrTickerQA, we specified that the value of the answer that was matched should be stored in the siTicker semantic item (see Figure 6). It will recognize that as a match only if the SML element returned is satisfied by the expression in the XPath Trigger column. So in our example, the XPath Trigger string “/SML/Ticker | /SML/Company” signifies that a match has been satisfied if either the SML element “Ticker” has been returned, or the SML element “Company” has been returned.

Figure 6: The CompanyOrTickerQA Property Pages

In the Answers tab, we also have a field for “Client Normalization Function.” This holds the name of a client-side script function that we want to run when the SML is recognized, before siTicker is filled. This allows us to query the value returned by the grammar match and manipulate it, and fill in the semantic item with the updated value.

In our case, we use the JScript function “SetResponseType” as our Client Normalization Function for the CompanyOrTickerQA control.

   function SetResponseType(smlNode, semanticItem)   {      semanticItem.attributes["ResponseType"] =          smlNode.nodeName;   }

Here we set an attribute of the semantic item siTicker, called “ResponseType,” to the node name of the SML returned?which will be either “Company” or “Ticker.”

Step 3?Load the Company Matches
After the semantic item is filled, the CompanyOrTickerQA has now been satisfied, but we are still running on the client side. We need to make a few more checks, and to submit the page, so we specify in the property pages that when this QA is satisfied, and the OnClientComplete event is fired, we want a JScript function to run.

   function SubmitTickerForSearch(eventSource,       lastCommandOrException, count, semanticItemList)   {      userCanceled= false;         CompanyOrTickerNav.Activate();         if(lastCommandOrException == "")         SpeechCommon.Submit();   }

In the function SubmitTickerForSearch, we do three things:

  1. Reset userCanceled flag: If the user was in the context of the SelectableNavigator, and cancelled out, they would be sent back to the CompanyOrTickerQA with a slightly different prompt. This variable is what tells the CompanyOrTickerQA prompt function whether the user cancelled out of the navigator control; we reset that value here.
  2. Reactivate Navigator: If the user was in the context of the SelectableNavigator and selected a company, the navigator would have been deactivated. We therefore want to reactivate it in the case that the user is sent back to the first QA.
  3. Manually submit page: Although we could have used the semantic item’s AutoPostBack feature, the page would post back every time the semantic item’s state changed (i.e. “Empty,” “NeedConfirmation,” or “Confirmed”). Instead we manually submit the page if the lastCommandOrException parameter is blank.

Once our semantic item has been filled (and therefore its state is no longer “Empty”), we want to retrieve matching values from the database. In the LoadCompanies() function on the server, we decide which method to call and retrieve the data.

   switch(siTicker.Attributes["ResponseType"])   {      case "Company":         dt= tickerObj.ListByCompany(AccountID,           siTicker.Text.Replace(" amp ", " & "));         break;      case "Ticker":         dt= tickerObj.ListByTicker(AccountID,            siTicker.Text);         break;      default:         throw new ApplicationException(...));   }

We use the semantic item’s “ResponseType” attribute, which we set in the Client Normalization Function SetResponseType, to determine if we received a match on a company or on a ticker. If we get an unrecognized value in “ResponseType,” we manually throw an application exception.

Tip: In server-side code, reference a semantic item’s attributes collection with an upper-case “A” (siTicker.Attributes); in client-side script, however, remember that it’s a lower-case “a” (siTicker.attributes).

Step 4?Determine Number of Companies Selected
If more than one company is returned from the call to the database, we want to activate the SelectableNavigator control so that the user can navigate through the list of possible choices.

   if(dt.Rows.Count > 1)   {      CompanyOrTickerNav.Initialize("Ticker,Exchange,         CurrentPrice,MaxShares",         "Company", dt, "StartNewSearch", "SelectTicker",         "CompanyOrTickerNav_prompt",          "CompanyOrTickerNavInitialStatement_prompt");             CompanyOrTickerNav.Visible= true;       ...   }

We initialize the navigator in the server code, passing it the result set to use as its company list, as well as several other pieces of information, including the name of the OnSelect, OnCancel functions. Refer to the SelectableNavigator section for more detailed information.

As we continue with our example, we’ll assume that more than one company match was returned, and that the user is now sent to the SelectableNavigator control.

Step 5?SelectableNavigator
The SelectableNavigator allows the user to browse through a list of items?in our case, company names. They will hear the ticker symbol, and then the name of the company; they can navigate through the list by using the built-in commands “First,” “Next,” and “Previous.” They can also cancel out of the list entirely by saying “Cancel,” or they can make their selection by saying, “Select” after an item has been read.

The SelectableNavigator user control exposes two client-side functions that we must code for: OnSelect and OnCancel. When we initialize our SelectableNavigator from the server code, we indicate the names of the client function that will handle these two events.

   function StartNewSearch()   {      siTicker.Clear();      userCanceled= true;   }

For the OnCancel event, the StartNewSearch function is run, which occurs when the user says the word “Cancel.” Two things occur in this function:

  • Clears the semantic item siTicker; its value becomes the empty string, and the state is set to “Empty.” When RunSpeech next loops through the speech controls on the page, the prompt for CompanyOrTickerQA will be run again, as its semantic item’s state is “Empty.” In this way, we tell the page to “start over.”
  • Sets the client variable userCanceled to true, to indicate to the CompanyOrTickerQA prompt function that we have just cancelled from the SelectableNavigator.

Step 6?Choosing the Company
When the user either says the word “Select” after an item is read, or if the user says the name of one of the items in the list, the OnSelect event is fired. We have to manually set the semantic item in this case, by querying the currently selected index of the navigator; this is wrapped for us in the method Item() of the SelectableNavigator.

   // OnSelect event handler   function SelectTicker()   {      siTicker.attributes["Company"]=         CompanyOrTickerNav.Item("Company").replace("&",          "amp");      siTicker.attributes["CurrentPrice"]=          CompanyOrTickerNav.Item("CurrentPrice");      siTicker.attributes["MaxShares"]=          CompanyOrTickerNav.Item("MaxShares");            siTicker.SetText(CompanyOrTickerNav.Item(         "Ticker"), false);            if(siTicker.attributes["MaxShares"] == "0")      {         siTicker.Clear();         siTicker.attributes["ZeroMaxShares"] = "true";       }      return true;   }

Our client function “SelectTicker” sets the semantic item and attributes with the selected company information, and logically decides if the user is allowed to continue:

  • Set attributes: We first set some attributes for the semantic item to values from the current row in the SelectableNavigator’s dataset. We’ll use these values in prompt functions later, to determine what to say to the user.
  • Set Text value: The SetText method both sets the value of the semantic item and stores that information to the viewstate, so that the values can be accessed after postback; call this method after any attributes are set, so that those values will also be saved.
  • Check Max Shares: One of the pieces of information that is returned from the dataset is how many shares the user could potentially buy of the selected company. If they cannot buy any, because their account balance is too low, we clear siTicker and send them back to the first QA, with a special flag (“ZeroMaxShares”) to tell the prompt function to change what is spoken to the user.

Step 7 – Confirmation
Now that the company has been selected, and the siTicker semantic item has been set, we want to confirm with the user that we have the right company name.

We use the “Confirms” tab in the SelectedTickerConfirm control to both confirm whether the company name is correct, but also to accept a different answer if they say “No.” For example, the user is prompted, “I understood Yahoo! Inc., is that correct?” and replies “No, Microsoft Corporation.” “Microsoft Corporation” is now filled in to siTicker (see Figure 7).

Figure 7: The “Confirms” tab in the SelectedTickerConfirm control contains information about confirming company names.

We do not use the Extra Answers tab here, because we are not asking for an additional answer; rather we are asking for a replacement to the answer we already have.

   //SelectedTickerConfirm OnClientComplete   function SubmitOnDenyCompany(eventSource,       lastCommandOrException, count, semanticItemList)   {      if(siTicker.NeedsConfirmation() &&          lastCommandOrException == "")         SpeechCommon.Submit();   }

In the case where the user rejects the company confirmation, the page is submitted manually. The server-side LoadCompanies function begins again, as the user may have specified a different company or ticker, and the possible company match list will have to be retrieved from the database for these new criteria.

If the user accepts the company choice, the semantic item’s state is set to “Confirmed,” and we move on to the next QA while remaining on the client.

In-Depth: Buy Stock Feature?Extra Answers
In the Buy Stock process, after the user has placed the order, the user is given the option to buy more stock. At this point, we use the Speech SDK’s Extra Answers feature to allow expert users to quickly place more orders.

Without the use of Extra Answers, the user replays the entire Buy Stock process:

   [prompt]      "Do you want to buy more stock?"   [user]         "Yes."   [prompt]      "Please say the name of the company or spell      the ticker symbol you are interested in."   [user]         "Microsoft."   [prompt]      "I understood Microsoft Corporation, is this       correct?"   [user]         "Yes."   [prompt]      "How many shares would you like to purchase?"   [user]         "Four."   [prompt]      "I understood Microsoft Corporation, is       this correct?"   [user]         "Yes."   [prompt]      "I understood four shares, is this correct?"   [user]         "Yes."   [prompt]   "So you want to purchase four shares of Microsoft       Corporation. Would you like to complete this order?"   [user]   "Yes."

With Extra Answers, the conversation is much simpler:

   [prompt]      "Do you want to buy more stock?"   [user]         "Yes, I would like four shares of Microsoft       Corporation."   [prompt]   "So you want to purchase four shares of Microsoft       Corporation. Would you like to complete this order?"   [user]   "Yes."

The Extra Answers feature (see Figure 8) works like regular Answers, but it doesn’t have to be filled for the QA to complete.

Figure 8: The Extra Answers feature works like regular Answers, but it doesn’t have to be filled for the QA to complete.

Semantic items siTicker and siNumberOfShares are added to the Extra Answers collection of the BuyMoreQA control, so that when any of the SML elements “Ticker,” “Company,” or “NumberOfShares” are matched, the appropriate semantic item is automatically filled.

We also must handle the case if the user indicates that they are done (by saying “no”), as well as set the confirmation status of the semantic items that they do fill in should they say “yes.” We add an OnClientComplete function for the BuyMoreQA control to handle these cases.

Important: The fact that we are at the last QA on the page implies that the siTicker and siNumberOfShares semantic items are in a CONFIRMED state. If the user supplies these as extra answers, they will go from a state of CONFIRMED to a state of NEEDSCONFIRMATION. This will activate the confirmation QAs for these semantic items; we don’t want this. Instead, we want to immediately proceed to the order confirmation QA. We set semantic item states in the OnClientComplete EventHandler for the BuyMoreQA:

   // BuyMoreQA OnClientComplete   function GoToMainMenuIfNo(eventSource,      lastCommandOrException, count, semanticItemList)   {      if(lastCommandOrException == "")      {         if(siBuyMore.value == "No")            Goto("MainMenu.aspx");         else         {            siBuyMore.Clear();            siOrder.Clear();                        if(siNumberOfShares.IsConfirmed())               siNumberOfShares.Clear();            else               siNumberOfShares.Confirm();                           if(siTicker.IsConfirmed())               siTicker.Clear();            else            {               siTicker.Confirm();               SpeechCommon.Submit();            }         }      }   }

In this code snippet, the case “no” is straightforward. If the user responds “no” to the question “Do you want to buy more stock?”, the user is redirected to the Main Menu.

If the user says “yes,” we take the following actions:

  • Clear siBuyMore and siOrder: Clear out the last two QA’s (by clearing their corresponding semantic items) so that they will run again.
  • Check answer for siNumberOfShares: See if the user said the number of shares that they want. Because of the way the Extra Answers feature has been set up, we can do this by checking the semantic state.
  • If the user did not say anything, siNumberOfShares would still be confirmed from the purchase that was just completed. We then manually clear the semantic item, so that on the next loop, the user will be prompted to enter the number of shares.
  • If the user did give a number, the semantic item’s state would change to “NeedConfirmation.” We then manually set the item’s state to “Confirmed” because we do not want to run the control NumberOfSharesConfirm.
  • Check answer for siTicker: Next, we see if the user has entered a new company. Take the same logic with setting the semantic state as above.
  • Manually submit the page: Finally, if the user did specify a new company name, submit the page so that the server-side LoadCompanies method can retrieve the possible company match list from the database using these new criteria.

Prompt Databases
The standard Text-To-Speech (TTS) engine may work well for development and debugging, but recorded prompts make a voice-only application truly user-friendly. Although the process can be tedious, and one of the biggest tasks of setting up your voice-enabled application, Microsoft’s recording engine and prompt validation utilities make the process easy.

Although setting up the Prompt databases are rather straight-forward, there are a couple of tips that we would like to point out, which we came across while setting up the FMStocks Voice prompt databases:

  • Keep your databases small: Within reason, keep your databases as small as possible so they are more manageable.
  • Use the import/export features: You can export the transcriptions to a comma-delimited file, and you can also import transcriptions and individual wave files. This comes in handy, especially if you record your prompts at a studio, outside of the prompt database recording tool.
  • Use the “Comment” column: If you have one database that holds prompts for multiple tasks, use the “Comment” column to keep track of which category your prompts belong to. You can then sort on the Comment column if you are trying to locate a particular prompt, or are trying to consolidate several similar prompts.
  • Try the wave normalization tool: If you have a large number of prompts, it is not uncommon to record them at different times. You need to remember that the voice will probably sound different based on the time of day, mood, etc., that the speaker is in at the time.

The volume of the recordings will probably also differ, but this can be normalized by setting a property in the property pages of the Prompt Database project. Note: we had the most success by picking one wave file and normalizing to that.

Achieving Realistic Inflection
The following techniques allow us to make our prompts play as smoothly as possible when reading strings that involve combining many different recordings (i.e., “[Microsoft Corporation] [is at] [one] [hundred] [one] [point] [twenty] [five] [dollars]”). (NOTE: throughout this section, individual prompt extractions are identified with brackets, just as they are in the prompt editor.)

  • Record Extractions in Context: Prompt extractions almost always sound more realistic when spoken in context. While it may be tempting to record common single words like “companies,” “shares,” and “dollars” as individual recordings, they will sound much better when recorded along with the text that will accompany them when they are used in a prompt: “one [share],” “two [shares],” etc. In one highly effective example, we recorded all of our large number terms in one recording: “one [million] three [thousand] five [hundred] twenty five dollars.”
  • Recognize and Group Common Word Pairings: When recording singular words like “dollar” and “share,” we almost always group them with “one” as they will always be used this way. Our extractions become “[one dollar]” and “[one share].”
  • Use Prompt Tags: Although we did not use any prompt tags in the FMStocks application, you can use tags if you have the same word with a different inflection (e.g. “Two dollars” vs. “Two dollars and ten cents”). You implement these tags by adding a element to your prompt function, as in the example below:
   {      var num = "two";         if (isCents == true)         return num + " dollars" +                      " and zero cents";      else         return num + " dollars.";   }
  • Use ‘Display Text’ To Your Advantage: To achieve high-quality extractions when recording sentences, you can modify the display text column of your transcriptions to indicate where the extractions are, to put a very small pause in, for clearer extractions. As an example, the transcription, “[I understood] 25 [is that correct]?” would have the display text, “I understood, 25, is that correct?” During recording, the voice talent can pause at the appropriate places so that the extractions are recorded clearly. You can also manually align the wave files (see Figure 9).
  •    
    Figure 9: Manual Wave Alignment

    The wave alignment tool is a very handy tool if you want to cut and paste wave sections, refine pre-set alignments, insert new alignments, and insert or delete “pauses.”

    In the FMStocks Voice application, we used this tool mostly when recording the letters and numbers, so that when ticker symbols and prices were read back to the user, they were all uniformly spaced.

    Validation
    Once you have completed your first round of recordings, thorough validation is important to make sure that no prompts have been missed. A few general strategies enabled us to make sure that our prompt generation functions were being validated completely and accurately:

    Validation Values: In each prompt function, a “Validation Value” must be filled in for each Parameter, with a value or values that you wish to validate. When it comes to validation values with a large number of potential values (i.e. numbers, dates, company names, etc.) we want to provide a stand-in validation value that can represent as large a set for the validator as possible, without unduly slowing down the validator tool.

    For instance, if we wish to validate a prompt which tells us how many companies matched their selection, we might enter both “1” and “2” for an itemCount parameter. This way, we can test both the sentences “One Company matched” and “Two Companies matched,” showing two unique prompt results from those validation paths.

    No object-references within prompt functions: Except for calls to PromptGenerator.js, we never make calls to script objects within the body of our prompt functions. Instead, our prompt function arguments are defined so that all function calls are made before the inner prompt function is executed. This avoids errors during validation. Example: In Figure 10, note the call to insertSpaces(true) in the ticker variable declaration. A ticker symbol (e.g. “MSFT”) must be separated into its component letters to be read correctly by recorded prompts. We make the call to the helper function that does this in the variable declaration and provide an already-formatted version of the ticker (e.g. “M, S, F, T”) as the validation value.

    Figure 10: The ticker variable declaration screen.

    Running the Application
    Our user tests were designed with two main goals in mind:

    • Verify that the system performed well in real-life scenarios: The main goal is simply to verify that testers can manage the basic tasks that real customers would want to perform.
    • Exercise the full feature-set of the application: In addition to testing standard goals, it was important to make sure that the complete feature set of the application was tested as well. Testers were guided to parts of the system that might not necessarily be on a most-likely-path scenario, in order to make sure that the entirety of the system worked as expected.

    To accomplish these goals we gave our testers scenarios that included both common tasks and special cases designed to guide the user toward special situations. A sample script might look like this:

          TASK ONE (Researching and Buying)      You are considering buying shares from IBM, Microsoft    or Grey Advertising, but you are not sure which one.    Check the market value of each of these companies.      Once you know the market values, buy as many shares    as you can of the least expensive stock with the money    in your account.       TASK TWO (Checking a Porfolio)      1. Check your portfolio to verify that your purchase has    been made.       TASK THREE (Searching for a Company)      1. You hear a hot stock tip about a company, but you    can't remember the full name. You only remember that it    starts with the word "American." Find companies that    might match, select the correct one when you hear it, and    buy ten shares. (Since you don't actually know the    company, choose whichever one you want.)      TASK FOUR (Selling Stock)      1. After your purchase you want to sell all of the shares    of the two holdings with the least expensive per-share    cost. Look up the company.

    Test subjects were given account numbers and PINs to log into their account, but otherwise were left alone to complete the tasks. Tests were repeated with a number of different test subjects and over a number of successive product revisions.

    Lessons Learned
    We learned a great deal about building voice-only applications through the process of building these samples. Here we note some of the major points in the areas of user testing, design, and development.

    Testing: The testing and tuning phase is important in any application, but in terms of design, it is especially important in voice-applications. We found that tuning our prompts, accept thresholds, and timeouts, were key to making the application useful. Here are a few suggestions on how to conduct effective testing and tuning for voice-only systems.

    Properly Configure Testing Equipment First: Many of our early user tests generated numerous usability problems that were due to improper configuration of the microphone. The microphone was too sensitive, picking up background noise, feedback from the speaker output, and slight utterances as user input. Users became increasingly frustrated as they found it difficult to hear a prompt in its entirety. This affected test results significantly.

    Select Testers Carefully: We found that testing subjects brought a variety of expectations to the testing process. Developers whom we used as subjects often made assumptions about the way the system was working and became confused with ambiguous prompts like, “Would you like to start shopping or review your previous orders?” They preferred more explicit choices: “Say start shopping to start shopping or review orders to review your account history.” Testers with a less technical background preferred less structured prompting; they felt they were speaking with a more friendly system.

    To conduct effective tests, make sure the user group you are testing matches the target user group for your application.

    Design
    The most important lesson designing the application was the importance of tuning the prompt design throughout development. From the first stages of implementation through user testing of the completed system, we made changes to prompts to achieve a more fluid program flow. Our experience speaking with other teams who have attempted similar projects is that this is a fundamental part of voice-only application development.

    With that in mind, here are a few points that will make the tuning process much more efficient:

    • Long Prompts Don’t Equal Helpful Prompts: At the outset, our design team approached the goal of a friendly interface by writing friendly text. Testing quickly revealed that verbose prompts were a serious impediment to usability. By keeping prompts short, users understood better what to do.
    • Express Sentiment with Tone/Inflection: We found that helpfulness is best expressed through intonation and inflection, rather than extra words. A prompt like, “I’m sorry. I still didn’t understand you. My fault again,” expresses an apologetic sentiment on paper quite well, but spoken, it becomes excessive. This prompt became, “I’m sorry. I still didn’t understand you,” and we let the inflection of the speaker express the emotion.
    • Build Cases For Invalid (but likely) Responses: Our tests surprised us when a majority of users answered, “Yes,” to the question, “Would you like to start shopping or review your previous orders.” We realized that part of the problem was the way in which the question was asked, but still, we built in a command to accept that response and provide a helpful response.
    • Maintain a Prompt Style Guide: Design teams are used to maintaining style guides for their designs, and voice-only applications should be no exception. Having a consistent set of prompt styles and standard phrasings is paramount to creating a sense of familiarity for the user. Our team recommends an iterative process: modify the guide liberally in the early stages of a project as new cases arise. Then, toward the later stages, tweak new cases to fit the existing rules. This process should lead to a consistent user experience throughout your system.

    Development
    We needed to make several changes to our development strategy worth noting here.

    Necessary Modifications to the Business and Data Layers: The concept of building a voice-only presentation layer as a replacement for a GUI necessitates a few changes to the database and business logic layers we didn’t foresee.

    Account Balance: Instead of tracking the user’s account balance by calculating their portfolio transactions, we added the CurrentBalance field to the Accounts table. The stored procedures Ticker_ListByTicker and Ticker_ListByCompany were modified to accept Account Number as a parameter, and now return the user’s account balance in addition to the matching companies.

    Limited amount of Companies: We only chose 100 companies out of the original 7,950 companies, because we wanted to keep the grammar manageable, and we felt it unrealistic to record over 7,000 company prompts.

    Field for Grammar Names: We added a field to the TickerList table called CompanyGrammar, for creating a dynamic grammar file. This field contains slightly normalized text so it’s easier to load it in. The stored procedure Speech_PopulateCompanyGrammarField was created to automatically read in the company names, normalize the text, and populate the CompanyGrammar field.

    Company

    CompanyGrammar

    J.P. Morgan & Co.

    j p morgan and company

    Nat’l Western Life Insur.

    national western life insurance

    Different Login Information: The Web version of FMStocks accepts an email address and password as its login information. Both of these pieces of information are not easily expressed in a voice context. We replaced these fields with “Account Number” and “PIN” fields, which would typically also necessitate database changes.

    For More Information
    The complete documentation and source code for the FMStocks Voice application can be obtained at http://www.microsoft.com/speech.

Share the Post:
Share on facebook
Share on twitter
Share on linkedin

More From DevX