RSS Feed
Download our iPhone app
Browse DevX
Sign up for e-mail newsletters from DevX


Make Your ASP.NET Applications Talk with Text-to-Speech : Page 2

Silence may be golden, but increasingly, applications, appliances, and other automated systems are acquiring the ability to speak. You can take advantage of text-to-speech technology to voice-enable your .NET applications.

Microsoft Speech Application SDK 1.1
In 2004, Microsoft released Microsoft Speech Server along with a free SDK that lets you develop Web-based speech applications that run on Speech Server. You can use the SDK to build telephony or voice-only applications in which the computer-to-user interaction is done using a telephone. You can also build multimodal applications in which users can choose between using speech or traditional Web controls as input.

The Microsoft text-to-speech engine synthesizes text by first breaking down the words into phonemes. Phonemes are the basic elements of human language. They represent a set of "phones," which are the sounds that form words. The text-to-speech engine then analyzes the extracted phonemes and converts them to symbols used to generate the digital audio speech.

You can use the downloadable sample application (ExploringTextToSpeech.csproj) that accompanies this article to experiment with configurable aspects of the Microsoft text-to-speech engine. The multimodal application contains one Web page (see Figure 2) into which you enter some text. You can then click a button to hear the text read in one of the following ways:

Author's Note: In cases where the text to be spoken is not known ahead of time, using a text-to-speech engine is unavoidable; however you can generally get better quality from recorded audio. When audio quality is critical, you can use the Microsoft Speech Application Software Development Toolkit (SASDK) to record audio. For example, you may want to use recorded audio to prompt users for information. The recorded audio can be broken out into a series of prompts that are concatenated together at runtime.

Phonemes are the basic elements of human language. They represent a set of "phones," which are the sounds that form words.
  • Speak Text Normally—Provides a benchmark for testing
  • Say as an Acronym—The text, "ASP" is spoken as, "A.S.P."
  • Say as Name—Mr. John Doe is pronounced as "Mister John Doe"
  • Say As Date—In this case, date is formatted as month, day, year
  • Say as Web Address—In this case, the text is formatted as a Universal Resource Identifier (URI)
  • Say as Digits—A number entered as text is spoken as a series of digits
  • High Pitch/Slow Rate—The text is read with a high pitch and a slow rate
  • Rate Fast/Volume Loud—The text is read with a fast rate and loud volume
  • Low Pitch/Volume Soft—The text is read with a low pitch and a soft volume
Figure 2. Sample Application: You can use this multimodal application to hear text spoken in a variety of ways.
Multimodal applications use a prompt control to specify audio that is played to a user. The prompt control contains an InlineContent property that may contain either a Content or a Value Basic Speech Control. The Content control specifies a specific prompt file containing stored audio recordings. The Value control specifies elements from an HTML Web page. The sample application uses a Value control that references the input element named txtText (the "Type some text here:" field in Figure 2). Here's the HTML that represents the markup for a prompt:

   <speech:prompt id="prmText" runat="server">
         <speech:Value runat="server"

Close Icon
Thanks for your registration, follow us on our social networks to keep up-to-date