Browse DevX
Sign up for e-mail newsletters from DevX


Buy and Sell Stocks with the Sound of Your Voice Using the .NET Speech SDK : Page 13

Some applications are even more useful when people can interact with them using nothing but a telephone. We used the .NET Speech SDK to voice-enable the existing FMStocks sample application—and learned some useful lessons along the way.




Building the Right Environment to Support AI, Machine Learning and Deep Learning

Prompt Databases
The standard Text-To-Speech (TTS) engine may work well for development and debugging, but recorded prompts make a voice-only application truly user-friendly. Although the process can be tedious, and one of the biggest tasks of setting up your voice-enabled application, Microsoft's recording engine and prompt validation utilities make the process easy.

Although setting up the Prompt databases are rather straight-forward, there are a couple of tips that we would like to point out, which we came across while setting up the FMStocks Voice prompt databases:

  • Keep your databases small: Within reason, keep your databases as small as possible so they are more manageable.
  • Use the import/export features: You can export the transcriptions to a comma-delimited file, and you can also import transcriptions and individual wave files. This comes in handy, especially if you record your prompts at a studio, outside of the prompt database recording tool.
  • Use the "Comment" column: If you have one database that holds prompts for multiple tasks, use the "Comment" column to keep track of which category your prompts belong to. You can then sort on the Comment column if you are trying to locate a particular prompt, or are trying to consolidate several similar prompts.
  • Try the wave normalization tool: If you have a large number of prompts, it is not uncommon to record them at different times. You need to remember that the voice will probably sound different based on the time of day, mood, etc., that the speaker is in at the time.
The volume of the recordings will probably also differ, but this can be normalized by setting a property in the property pages of the Prompt Database project. Note: we had the most success by picking one wave file and normalizing to that.

Achieving Realistic Inflection
The following techniques allow us to make our prompts play as smoothly as possible when reading strings that involve combining many different recordings (i.e., "[Microsoft Corporation] [is at] [one] [hundred] [one] [point] [twenty] [five] [dollars]"). (NOTE: throughout this section, individual prompt extractions are identified with brackets, just as they are in the prompt editor.)

  • Record Extractions in Context: Prompt extractions almost always sound more realistic when spoken in context. While it may be tempting to record common single words like "companies," "shares," and "dollars" as individual recordings, they will sound much better when recorded along with the text that will accompany them when they are used in a prompt: "one [share]," "two [shares]," etc. In one highly effective example, we recorded all of our large number terms in one recording: "one [million] three [thousand] five [hundred] twenty five dollars."
  • Recognize and Group Common Word Pairings: When recording singular words like "dollar" and "share," we almost always group them with "one" as they will always be used this way. Our extractions become "[one dollar]" and "[one share]."
  • Use Prompt Tags: Although we did not use any prompt tags in the FMStocks application, you can use tags if you have the same word with a different inflection (e.g. "Two dollars" vs. "Two dollars and ten cents"). You implement these tags by adding a element to your prompt function, as in the example below:

{ var num = "two"; if (isCents == true) return num + " <withtag tag='middleSentence'>dollars</withtag>" + " and zero cents"; else return num + " dollars."; }

  • Use 'Display Text' To Your Advantage: To achieve high-quality extractions when recording sentences, you can modify the display text column of your transcriptions to indicate where the extractions are, to put a very small pause in, for clearer extractions. As an example, the transcription, "[I understood] 25 [is that correct]?" would have the display text, "I understood, 25, is that correct?" During recording, the voice talent can pause at the appropriate places so that the extractions are recorded clearly. You can also manually align the wave files (see Figure 9).
  • Figure 9: Manual Wave Alignment
    The wave alignment tool is a very handy tool if you want to cut and paste wave sections, refine pre-set alignments, insert new alignments, and insert or delete "pauses."

    In the FMStocks Voice application, we used this tool mostly when recording the letters and numbers, so that when ticker symbols and prices were read back to the user, they were all uniformly spaced.

    Once you have completed your first round of recordings, thorough validation is important to make sure that no prompts have been missed. A few general strategies enabled us to make sure that our prompt generation functions were being validated completely and accurately:

    Validation Values: In each prompt function, a "Validation Value" must be filled in for each Parameter, with a value or values that you wish to validate. When it comes to validation values with a large number of potential values (i.e. numbers, dates, company names, etc.) we want to provide a stand-in validation value that can represent as large a set for the validator as possible, without unduly slowing down the validator tool.

    For instance, if we wish to validate a prompt which tells us how many companies matched their selection, we might enter both "1" and "2" for an itemCount parameter. This way, we can test both the sentences "One Company matched" and "Two Companies matched," showing two unique prompt results from those validation paths.

    No object-references within prompt functions: Except for calls to PromptGenerator.js, we never make calls to script objects within the body of our prompt functions. Instead, our prompt function arguments are defined so that all function calls are made before the inner prompt function is executed. This avoids errors during validation. Example: In Figure 10, note the call to insertSpaces(true) in the ticker variable declaration. A ticker symbol (e.g. "MSFT") must be separated into its component letters to be read correctly by recorded prompts. We make the call to the helper function that does this in the variable declaration and provide an already-formatted version of the ticker (e.g. "M, S, F, T") as the validation value.

    Figure 10: The ticker variable declaration screen.

Comment and Contribute






(Maximum characters: 1200). You have 1200 characters left.



Thanks for your registration, follow us on our social networks to keep up-to-date