How to Build Grammars for Speech-enabled Applications

How to Build Grammars for Speech-enabled Applications

ost companies you call have automated voice attendants that walk you through a series of menu choices until you are finally directed to a live person. As the automated portions of these systems gain use and popularity, they’re becoming increasingly sophisticated?even helpful. For example, voice-automated systems can now let you reset passwords, check email or get flight information. The good news is that as a .NET developer you already have some of the expertise needed to create these kinds of cutting edge applications.

Microsoft Speech Server lets you build complex voice-only applications in the same environment used to build today’s Web-based applications, but instead of a point-and-click graphical user interface, users access information using a phone and a series of voice commands. The dialog between the user and the computer can be natural, intuitive and in many cases more convenient for the end user.

Don’t worry if you don’t know anything about developing a speech-based application. To start, you need to go to the Microsoft Speech Web site and download a free copy of the Speech Application Software Development Toolkit (SASDK). After installing the SASDK, you may want to familiarize yourself with the terminology for speech-based applications by walking though the SDK tutorial that comes with the download.

To be useful, a speech-based application needs to understand what the user says, in other words, it needs a grammar. The grammar represents what the user says to the system. For the computer to understand callers without a formal training process, the computer must have a good idea what the caller might say. This is where the grammar comes in. It identifies all the likely potential word choices a caller might use when accessing the system. This article walks you through the process of creating a grammar for a voice-only application using the SASDK.

What You Need
Microsoft SQL Server 2000 or 2005, Visual Studio .NET 2003, and the Microsoft Speech Application SDK

The downloadable application that accompanies this article, HowToBuildGrammar.csproj, uses data from the Northwind database that installs by default with SQL Server 2000. This ASP.NET application uses the Microsoft.Speech.Web.UI namespace, and allows callers to access order information for pending or shipped orders. Each caller hears a welcome message and is then asked to say a contact name that identifies the company they wish to access. After the company is confirmed, callers are asked whether they want to hear information about shipped or pending orders. Based on their answer, the application retrieves a list of orders that callers can navigate through using voice commands such as Next, Previous, or Select.

The start page for the HowToBuildGrammar project is Default.aspx. Because the interface is not graphical, the page contains no fancy graphics; instead it contains a series of speech-based Web controls that guide callers through the application. Each Question/Answer (QA) control on this page references a grammar file built with the SASDK’s Grammar Editor.

Building Grammars with the Grammar Editor
If you download the code that accompanies this article, you can access the Grammar Explorer by opening the project file and drilling down to the Orders.grxml file located in the Grammars subdirectory. From there, you should see a list of rules associated with this grammar file. Double-click the OrderTypeRule item, and the Grammar Editor opens (see Figure 1). From here you can build, edit, and test grammar files.

?
Figure 1. The Grammar Editor: The figure shows the Grammar Editor opened on the OrderTypeRule item.

You build grammars by dragging controls from the toolbox on the left side of the Grammar Editor onto the drawing design surface. Table 1 identifies the grammar controls available in the grammar toolbox. The OrderTypeRule in Figure 1 uses a group control to relate three different rule references. The rule enables the OrderTypeQA control to identify exactly what order type the caller wishes to hear. It’s referenced by the OrderTypeQA control in the Default.aspx page through the grammar collection. You can see this reference if you view the HTML for Default.aspx. The reference should appear as follows:

           

Table 1: The table lists the grammar controls available through the grammar toolbox in Visual Studio.

Control Description
Phrase Represents the actual phrase spoken by the user.
List Used to contain multiple phrase elements that all relate to the same thing. For example, a “yes” response could be spoken as “Yes”, “Yeah”, or “OK”.
RuleRef Used to reference other rules through the URI property. This is useful when you want to reuse the logic in existing rules.
Group Used to group related elements such as a List, Phrase, or RuleRef.
Wildcard Used to specify which words in a phrase can be ignored.
Halt Used to stop the recognition path.
Skip Used to indicate that a recognition path is optional.
Script Tag Used to compare the value of the Semantic Markup Language (SML) tag with the matching values.

The PreambleRuleRef contains a list control that identifies several phrases a caller might say at the beginning of a response. For instance, callers are prompted for the order type with the following text to speech: “Do you need to hear about pending or shipped orders?” Callers might respond in a variety of ways: They might say, “Give me all shipped orders,” or “I want the pending ones please.” Either response is fine, because you can use PreambleRuleRef and PostambleRuleRef rules to represent the extra words people use in natural speech. In the example phrases listed above, “Give me all” and “I want the” represent the preamble phrases, while “orders” and “ones please” represent postamble phrases. The meat of the response is the word “shipped” or “pending” that represents which order type the caller wants to retrieve. The OrderTypeRuleRef captures this essential part of the caller’s response.

Testing Grammars
The Grammar Editor lets you to test grammars by typing a phrase into the Recognition String text box. After typing a phrase, click the Check button, and watch the Output window for results. If the grammar was located and validated successfully, then you should see an XML based string that represents the SML. So, if you were to type in the phrase, “Give me all shipped orders,” the output would appear as shown in Figure 2.

?
Figure 2. Testing Grammars: When testing grammars, the test results appear in the Output window as an XML string that represents the SML.

The SML result returned from the Speech Recognizer is used to value semantic items. These semantic items are declared in the Default,aspx page as part of the SemanticMap speech control. Each semantic item is used to store some piece of information collected from the caller. For example, the semantic item named siOrderType stores the value associated with the tag named Type. This value will be used in the code when populating the DataTableNavigator speech control with customer orders.

You may notice that the SML contains an attribute named confidence. In Figure 2 this attribute contains a value of 1.00, which indicates 100 percent confidence that the result returned from the speech recognizer is accurate. The confidence score is this high because I typed in the text when testing from the Grammar Editor. If I had used a microphone to speak the order type, the confidence score would have been less than 100%. Exactly how much less that value will be depends on the quality of the microphone and the speaking tone and rate used by the caller.

Building Dynamic Grammars
Because most applications use data from relational databases in which the data is constantly changing, grammars may need to be dynamic as well. For instance, suppose your application requires customers to speak their names. For the application to recognize the name, all customer names would need to be part of a grammar file. Most companies maintain databases with hundreds if not thousands of customers. It would be impossible to expect the application developer to manually update a grammar file every time a new customer is added to the database. This is where a dynamic grammar comes in handy.

Because grammar files are XML-based, it is simple to create a string programmatically that contains the XML associated with each grammar. The sample application contains a class file named DynamicGrammar.cs. The class contains a public method named LoadGrammarFile (partially shown below) that accepts two string parameters. The first parameter, name, should contain a value for the grammar item. The second parameter, sql, should contain the actual SQL string used to query the database. The method returns a string containing the XML-based grammar.

   public string LoadGrammarFile(string name,string sql)    {                             StringBuilder s = new StringBuilder();          s.Append(@"");      s.Append(@"") ;       s.Append("") ;                       SqlDataReader dr = SqlHelper.ExecuteReader(       ConfigurationSettings.AppSettings         [ "ConnectionStr" ],       System.Data.CommandType.Text, sql );                          while ( dr.Read() )       {          string GrammarItem;         GrammarItem = dr.GetString( 0 );         s.Append("") ;          s.Append("") ;          s.Append(GrammarItem) ;          s.Append("") ;          s.Append("$." + name + @" = """ );         s.Append(GrammarItem + @"""") ;          s.Append("" );       }                       s.Append("") ;       s.Append("") ;       s.Append("") ;              if ( !( dr.IsClosed ) )       {          dr.Close();       }       dr = null;          return s.ToString();               } 

You can assign the resulting XML-based grammar string to the InlineGrammar property for a specific QA control. The InlineGrammar property value can be either a URI (Universal Resource Identifier) or a string value. In the sample application, the OnInit method for the Default web page contains the InlineGrammar property set code:

   Microsoft.Speech.Web.UI.Grammar gram = new Grammar();      gram.InlineGrammar =       Convert.ToString(Session["ContactGrammar"]);   AskContactNameQA.Reco.Grammars.Add(gram);

The preceding code starts by creating a new grammar item and then uses a session variable to assign the InlineGrammar property value. The application uses a Session variable in an effort to improve application scalability. Rather than calling the LoadDynamicGrammar method each time the Web page is initialized, storing the value in a Session variable means the page needs to call the method only when the application is first initialized. So, the code that calls the LoadDynamicGrammar method is located in the Session_Start method of the Global.asax file. The contents of the method are as follows:

   protected void Session_Start(Object sender,       EventArgs e)   {      CDynamicGrammar _DynamicGrammar;       //Build the dynamic grammar string listing all       //contact names in the Northwind database. Then      //store this grammar string in a session variable         _DynamicGrammar = new CDynamicGrammar();          string s = _DynamicGrammar.LoadGrammarFile(         "ContactName",          "SELECT DISTINCT ContactName FROM Customers");      Session["ContactGrammar"] = s;    
?
Figure 3. Speech Debugging Console: You use this console to debug dynamic grammars and user input at runtime. The figure shows the results after entering the name "howard snyder" in the "Input" textbox in the sample application.
_DynamicGrammar = null; }

Because dynamic grammars are not built with the grammar editor, they cannot be tested at design time, like the static grammar test dialog shown earlier. Instead, you’ll need to test these grammars while executing the application. When you start a speech-based application using Visual Studio, it executes the Speech Debugging Console automatically (see Figure 3). From there, you can enter input using text or audio. As the application collects user input, the output tab displays the results collected from the recognition engine, letting you see the resulting SML used to value semantic items.

Grammars are a critical component of speech-based applications. If they do not contain all the valid potential words or phrases that a caller may speak, then the application will not be successful in meeting the callers’ needs. As you’ve seen in this article, you can build grammars manually using the GUI tool provided with the built-in Grammar Editor, or dynamically, using code. The most effective way to ensure that your grammar meets a caller’s needs is through user testing and review. Even applications that have been successfully tested and deployed need to be reviewed periodically to ensure that their grammar files are effectively designed.

devx-admin

devx-admin

Share the Post:
Poland Energy Future

Westinghouse Builds Polish Power Plant

Westinghouse Electric Company and Bechtel have come together to establish a formal partnership in order to design and construct Poland’s inaugural nuclear power plant at

EV Labor Market

EV Industry Hurting For Skilled Labor

The United Auto Workers strike has highlighted the anticipated change towards a future dominated by electric vehicles (EVs), a shift which numerous people think will

Soaring EV Quotas

Soaring EV Quotas Spark Battle Against Time

Automakers are still expected to meet stringent electric vehicle (EV) sales quotas, despite the delayed ban on new petrol and diesel cars. Starting January 2023,

Affordable Electric Revolution

Tesla Rivals Make Bold Moves

Tesla, a name synonymous with EVs, has consistently been at the forefront of the automotive industry’s electric revolution. The products that Elon Musk has developed

Poland Energy Future

Westinghouse Builds Polish Power Plant

Westinghouse Electric Company and Bechtel have come together to establish a formal partnership in order to design and construct Poland’s inaugural nuclear power plant at the Lubiatowo-Kopalino site in Pomerania.

EV Labor Market

EV Industry Hurting For Skilled Labor

The United Auto Workers strike has highlighted the anticipated change towards a future dominated by electric vehicles (EVs), a shift which numerous people think will result in job losses. However,

Soaring EV Quotas

Soaring EV Quotas Spark Battle Against Time

Automakers are still expected to meet stringent electric vehicle (EV) sales quotas, despite the delayed ban on new petrol and diesel cars. Starting January 2023, more than one-fifth of automobiles

Affordable Electric Revolution

Tesla Rivals Make Bold Moves

Tesla, a name synonymous with EVs, has consistently been at the forefront of the automotive industry’s electric revolution. The products that Elon Musk has developed are at the forefront because

Sunsets' Technique

Inside the Climate Battle: Make Sunsets’ Technique

On February 12, 2023, Luke Iseman and Andrew Song from the solar geoengineering firm Make Sunsets showcased their technique for injecting sulfur dioxide (SO₂) into the stratosphere as a means

AI Adherence Prediction

AI Algorithm Predicts Treatment Adherence

Swoop, a prominent consumer health data company, has unveiled a cutting-edge algorithm capable of predicting adherence to treatment in people with Multiple Sclerosis (MS) and other health conditions. Utilizing artificial

Personalized UX

Here’s Why You Need to Use JavaScript and Cookies

In today’s increasingly digital world, websites often rely on JavaScript and cookies to provide users with a more seamless and personalized browsing experience. These key components allow websites to display

Geoengineering Methods

Scientists Dimming the Sun: It’s a Good Thing

Scientists at the University of Bern have been exploring geoengineering methods that could potentially slow down the melting of the West Antarctic ice sheet by reducing sunlight exposure. Among these

why startups succeed

The Top Reasons Why Startups Succeed

Everyone hears the stories. Apple was started in a garage. Musk slept in a rented office space while he was creating PayPal with his brother. Facebook was coded by a

Bold Evolution

Intel’s Bold Comeback

Intel, a leading figure in the semiconductor industry, has underperformed in the stock market over the past five years, with shares dropping by 4% as opposed to the 176% return

Semiconductor market

Semiconductor Slump: Rebound on the Horizon

In recent years, the semiconductor sector has faced a slump due to decreasing PC and smartphone sales, especially in 2022 and 2023. Nonetheless, as 2024 approaches, the industry seems to

Elevated Content Deals

Elevate Your Content Creation with Amazing Deals

The latest Tech Deals cater to creators of different levels and budgets, featuring a variety of computer accessories and tools designed specifically for content creation. Enhance your technological setup with

Learn Web Security

An Easy Way to Learn Web Security

The Web Security Academy has recently introduced new educational courses designed to offer a comprehensible and straightforward journey through the intricate realm of web security. These carefully designed learning courses

Military Drones Revolution

Military Drones: New Mobile Command Centers

The Air Force Special Operations Command (AFSOC) is currently working on a pioneering project that aims to transform MQ-9 Reaper drones into mobile command centers to better manage smaller unmanned

Tech Partnership

US and Vietnam: The Next Tech Leaders?

The US and Vietnam have entered into a series of multi-billion-dollar business deals, marking a significant leap forward in their cooperation in vital sectors like artificial intelligence (AI), semiconductors, and

Huge Savings

Score Massive Savings on Portable Gaming

This week in tech bargains, a well-known firm has considerably reduced the price of its portable gaming device, cutting costs by as much as 20 percent, which matches the lowest

Cloudfare Protection

Unbreakable: Cloudflare One Data Protection Suite

Recently, Cloudflare introduced its One Data Protection Suite, an extensive collection of sophisticated security tools designed to protect data in various environments, including web, private, and SaaS applications. The suite

Drone Revolution

Cool Drone Tech Unveiled at London Event

At the DSEI defense event in London, Israeli defense firms exhibited cutting-edge drone technology featuring vertical-takeoff-and-landing (VTOL) abilities while launching two innovative systems that have already been acquired by clients.

2D Semiconductor Revolution

Disrupting Electronics with 2D Semiconductors

The rapid development in electronic devices has created an increasing demand for advanced semiconductors. While silicon has traditionally been the go-to material for such applications, it suffers from certain limitations.

Cisco Growth

Cisco Cuts Jobs To Optimize Growth

Tech giant Cisco Systems Inc. recently unveiled plans to reduce its workforce in two Californian cities, with the goal of optimizing the company’s cost structure. The company has decided to

FAA Authorization

FAA Approves Drone Deliveries

In a significant development for the US drone industry, drone delivery company Zipline has gained Federal Aviation Administration (FAA) authorization, permitting them to operate drones beyond the visual line of

Mortgage Rate Challenges

Prop-Tech Firms Face Mortgage Rate Challenges

The surge in mortgage rates and a subsequent decrease in home buying have presented challenges for prop-tech firms like Divvy Homes, a rent-to-own start-up company. With a previous valuation of

Lighthouse Updates

Microsoft 365 Lighthouse: Powerful Updates

Microsoft has introduced a new update to Microsoft 365 Lighthouse, which includes support for alerts and notifications. This update is designed to give Managed Service Providers (MSPs) increased control and

Website Lock

Mysterious Website Blockage Sparks Concern

Recently, visitors of a well-known resource website encountered a message blocking their access, resulting in disappointment and frustration among its users. While the reason for this limitation remains uncertain, specialists