hile sleepless the other night, I was channel surfing and ran across a rerun of the 1968 science fiction classic "2001: A Space Odyssey."
If you haven't seen this movie, it's definitely a must-see. HAL, one of the main characters of the movie, is a slightly psychotic speech-enabled super computer. HAL is responsible for steering the Discovery spacecraft on its ill-fated Jupiter mission. As I watched the movie I was completely amazed at HAL's abilities. HAL handled press interviews, played a wicked game of chess, has varied opinions on art, controls life support, and can read lips. Not to completely destroy the movie if you haven't seen it, but I have to say that I am grateful that most of the movie's predictions aren't true. However, like the HAL of 1968, speech-enabled applications have become a core requirement for both corporate and commercial developers. In this article, I'll help you explore the Microsoft Speech Platform that comprises the Speech Application Software Development Kit (SASDK) and Microsoft Speech Server 2004. I'll also show you how you can use these technologies with Visual Studio 2003 to both build and deploy speech-enabled applications.
The name HAL was derived from a combination of the words "heuristic" and "algorithmic." These are considered the two main processes involved in human learning. These were important characteristics for early speech developers as well. Initially, speech applications were targeted at a "say anything" programming mentality. The result was a very specialized type of system-level programmer. Among other things, they studied natural speech and language heuristics to develop a set of unique algorithms for their applications. These pioneers had to be part application developers, part language experts, and part hardware engineers. The good news is that the mainstreaming of speech-based technology has enabled us, as mere mortal ASP.NET developers, to leverage this type of technology. The familiar combination of Visual Studio 2003 coupled with a free add-on kit called the Speech Application Software Development Kit (SASDK) allows your Web-based application to include speech functionality. It is the integration of these familiar toolsets into a server-based product called Microsoft Speech Server 2004 that completes the server-side solution and brings speech to the mainstream Windows platform.
The Architecture of Speech-Enabled Applications
|Figure 1. SASDK Templates: The SASDK installs a set of templates that can be used to develop speech-based applications.|
The SASDK is the core component of the Microsoft Speech platform that enables Web developers to create and debug speech-based applications for telephones, mobile devices, and desktop PCs. The SASDK includes a set of samples, documentation, and debugging utilities that are important for developing speech applications. This includes a set of speech authoring tools I will cover later that are directly integrated into the Visual Studio 2003 environment. Finally, the SASDK installs a new project template (Figure 1
) that serves as the starting point for any speech application. Typically, the lifecycle of developing a speech application starts with the tools available within the SASDK and Visual Studio 2003, and once completed the application in then deployed to the Microsoft Speech Server 2004 (MSS).
Table 1: The standards of speech applications.
Speech Application Language Tags (SALT)
SALT is an extension of HTML and other markup languages which adds a speech and telephony interface to Web applications. It supports both voice-only and multimodal browsers. SALT defines a small number of XML elements like the <listen> and <prompt> that serve as the core API for user interactions.
Speech Recognition Grammar Specification (SRGS)
SRGS provides a way to define the phrases that an application recognizes. This includes words that may be spoken, patterns that words may occur in, and the spoken language of each word.
Speech Synthesis Markup Language (SSML)
SSML provides an XML-based markup language for creating the synthetic speech within an application. It enables the control of synthetic speech that includes pronunciation, volume, pitch, and rate.