Browse DevX
Sign up for e-mail newsletters from DevX


Building Speech-Enabled Applications with ASP.NET : Page 3

As speech-enabled applications become a core requirement for many enterprise and commercial applications, you'll need to become familiar with the Microsoft Speech Platform.




Building the Right Environment to Support AI, Machine Learning and Deep Learning

Prompt Authoring
Prompts are such an important part of any application because they serve as the voice and interaction point with users. Basically, they act as the main interface for a dialog-driven application. As you begin to build a speech application you will quickly notice that the synthesized voice prompts sound a bit mechanical; definitely not like the smooth slightly British tone of HAL in "2001." This is because, by default, unless otherwise specified the local text to speech engine will synthesize all prompt interaction. Prompts should be as flexible as any portion of the entire application. Just as you should invest time in creating a well-designed Web page, so should you spend time on designing a clean sounding dynamic prompt system for application users to interact with. As with any application, the goal is to quickly prototype the proof of concept and usability testing. The extensibility of the speech environment makes it easy to have a parallel development track occurring of the dialog and prompt recording.

Figure 6. Prompts Project Database: The prompts project contains a database that is used to store pre-recorded voice prompts.
The Prompt database is the repository of recorded prompts for an application. It is compiled and downloaded to the telephony browser during run time. Before the speech engine plays any type of prompt, it queries the database and if a match is found, it plays the recoding instead of using the synthesized voice. Within Visual Studio the Prompt Project is used to store these recordings and is available within the new project dialog as shown in Figure 6. The Prompt Project contains a single prompt database with a .promptdb extension. By default, Prompt databases can be shared across multiple applications and mixed together. In practice it's actually a good idea to use separate prompt databases across a single application to both reduce size and make it more manageable. The database can contain wave recordings either directly recorded or imported from external files.

You can edit the prompts database through Visual Studio's Prompt Editor as shown in Figure 7. This window is divided into a Transcription and Extraction window. The Transcription window (top) is used to identify an individual recording and its properties. These include playback properties including volume, quality, and wave format. More importantly, you use the Transcription window to define the text representation of the wave file content. The bottom portion of the Prompt Editor contains the Extraction window. This identifies one or more consecutive speech alignments of a transcription. Essentially, extractions constitute the smallest individual element or words within a transcription that a system can use as part of an individual prompt.

Figure 7. Recording New Prompts: You can record new prompts directly within Visual Studio 2003.
Recording a Prompt
The first step in creating a prompt is to add a new transcription using the Prompt Editor. Once this is done you can then record or import a wave file that matches the transcription exactly. For example, a transcription may be as short as a single word or as long as a single sentence. When creating transcriptions you should keep the following things in mind.

  • Transcriptions are always full sentences. This makes it easier for a speaker to record with the correct voice inflections.
  • Transcriptions contain no punctuation. When recording the prompt editor will automatically remove any punctuation from a transcription because they are not explicitly stated in a recording.
Once you type the sentence that will be used for transcription you can then record the prompt as shown in Figure 8. Once you've recorded the prompt the Prompt Editor will
Figure 8. Editing Prompts: The figure shows the process of editing the prompts database within Visual Studio 2003.
attempt to create an alignment of the sentence and the transcription as shown Figure 9. Once a successful transcription alignment is completed it is time to build extractions. This is done by selecting a series of consecutive alignments from the transcription to form an extraction. Extractions can be combined dynamically at run time to create a prompt. For example, the extractions "ham," "roast beef," "club," and "sandwich" can be combined with "you ordered a" to create the prompt, "You ordered a ham sandwich."

Figure 9. Aligning Prompts: Proper speech recognition requires defining an alignment between the prompts and transcription within Visual Studio 2003.
Once all the application prompts are recorded they are then referenced within a project to create inline prompts as shown in Figure 10. Within an application this creates a prompts file that contains only the extractions identified within the prompts database. By default, anything not marked as an extraction is not available within a referenced application. The result is that when the application runs, the prompt engine matches the prompt text in your application with the extractions in your database. If the required extraction is found it is played, otherwise the text-to-speech engine uses the system-synthesized voice to play the prompt.

Figure 10. Sharing Prompt Databases: Sharing a prompts database across projects is simply a process of creating a reference.
Building Dynamic Prompts
Prompts can be defined statically as you saw earlier using the <prompt> tag or QuestionPrompt property of application controls. However, most speech applications tend to use dynamically defined prompts based on extractions.

Programmatically, this is provided by the Dialog Speech Control through the PromptSelectFunction property of every Dialog and Application speech control. The PromptSelectFunction property is actually a callback function for each control that is executed on the client side. It is responsible for returning the prompt and its associated markup to use when the control is activated. This built in function enables speech applications to check and react to the current state of the dialog as shown in the following code.

Figure 11. Prompt Function File: Editing and managing the code and states for each prompts can be done through the prompt function file.

function GetPromptSelectFunction() { var lastCommandOrException = ""; var len = RunSpeech.ActiveQA.History.length; if(len > 0) { lastCommandOrException = RunSpeech.ActiveQA.History[len - 1]; } if (lastCommandOrException == "Silence") { return "Sorry I couldn't hear you. " + "What menu selection would you like?"; } }

In this example, the PromptSelectFunction is checking the most recent voice command looking for an exception like silence. If this error is encountered, the prompt is modified to provide valid feedback to the user. PromptSelectFunction can be added inline. However, the PromptFunction Editor tool within Visual Studio is designed to manage the individual prompts and their states and is directly integrated into the Prompt Validation speech engine. This Visual Studio window is activated through the prompt function file as shown in Figure 11.

Comment and Contribute






(Maximum characters: 1200). You have 1200 characters left.



Thanks for your registration, follow us on our social networks to keep up-to-date