Browse DevX
Sign up for e-mail newsletters from DevX


How To Control Robots (and Other Devices) with Your Voice : Page 3

Learn how to build speech recognition applications that let you control devices using voice commands. This article shows how you can control a robot's movement with spoken commands.




Building the Right Environment to Support AI, Machine Learning and Deep Learning

Examining the Multimodal Speech Application
ARobot accepts commands that use the following syntax structure:

  • The first letter is an exclamation point (!) used to indicate a new command.
  • The second letter is the controller ID, which is always a "1."
  • The third letter indicates the command to perform; for example, "b" indicates a beep.
  • All remaining command characters are specific to the type of command you want the robot to perform.
For example, if you want ARobot to beep twice, you would need to issue the following command:


In this case the command parameter is "b" and the "2" character indicates that ARobot should beep twice. While this command structure is relatively simple, communicating these commands to your robot as typed commands is not exactly natural. It would be much easier to say the words, "Say Hello" and have your ARobot turn on its green LED light, beep twice and turn off the light.

Figure 5. Grammar for the Say Hello Command: The diagram shows the structure of the grammar used to identify all the valid phrases associated with the Say Hello Command.
The voice-activated remote control mentioned in the beginning of this article used a training process in order to recognize a user's command. While that's one method of accomplishing speech recognition, Microsoft Speech Server uses a grammar to identify what the user is saying. The grammar includes all the likely word choices that a user might say. For example, a user that wants ARobot to say hello could say, "Say Hello" or "Say Hi" or "Say Hey". The SASDK provides tools for building a grammar that identifies all the valid alternative phrases (see Figure 5).

So, after identifying all the commands that ARobot can accept (see Table 1), it is just a matter of creating a grammar file for each command and then mapping the spoken command to the one that ARobot needs.

Table 1: The table shows a list of commands that the downloadable program can accept. Each command represents a specific action or sequence of actions that ARobot needs to perform.
Command Potential Spoken Command Command(s) sent to ARobot
Backward "Go Backward" !1m10<speed>
Faster "Go Faster" !1m1<direction><speed +1>
Forward "Move Forward" !1m11<speed>
Hello "Say Hello" !1l21 --> !1b2 --> !1l20
Left "Move Left" !1r1ff --> pause for 400ms --> !1r100
Right "Turn Right" !1r101 --> pause for 400ms --> !1r100
Slow "Slow Down" !1m1<direction><speed -1>
Stop "Halt" !1x

Listing 1 shows the complete ProcessCommand code that processes each command recognized by the speech engine:

You store connection string parameters for the serial port in the Web.config file for the speech application in an appSettings section, as shown below.

<appSettings> <add key="BaudRate" value="300" /> <add key="PortNum" value="1" /> <add key="ByteSize" value="8" /> <add key="Parity" value="0" /> <add key="StopBits" value="1" /> <add key="DefaultSpeed" value="3" /> </appSettings>

You can then retrieve those settings to open a serial connection for sending a command.

//Get our serial connection string parms //from the Web.Config _Rs232 = new Rs232(); _Rs232.BaudRate = nBaudRate; _Rs232.PortNum = nPortNum; _Rs232.Parity = nParity; _Rs232.StopBits = nStopBits; _Rs232.ByteSize = nByteSize; //Open the serial connection _Rs232.Open();

Next, you construct the appropriate command, depending on what command was spoken. This section consists of a set of if-else if commands, any recognized one of which triggers the writing of a matching set of command characters, as shown below:

//Determine what command sequence should be sent //to ARobot if (inval.ToUpper() == "STOP") { _Rs232.Write(Encoding.ASCII.GetBytes("!1x")); } else if (inval.ToUpper() == "BACKWARD") { _Rs232.Write(Encoding.ASCII.GetBytes("!1m10" + Convert.ToString(nSpeed))); } else if (inval.ToUpper() == "FORWARD") { _Rs232.Write(Encoding.ASCII.GetBytes("!1m11" + Convert.ToString(nSpeed))); } // etc...

The entire process is wrapped in a try/catch/finally block. If an exception occurs, the method writes the exception's description to the Debug window. If the spoken command was not recognized, the method closes the port and in the finally block, sets the _Rs232 variable to null.

catch (Exception ex) { Debug.WriteLine(ex.Message); } finally { //Close our serial connection _Rs232.Close(); _Rs232 = null; } }

Of course, the most enjoyable part of this project will be when you see your robot in action. Keep in mind that this can't happen until you use a serial cable or wireless serial receiver as described earlier to send the commandmode.bs2 program to the ARobot. After you have done that, you can copy the Web application to a Web folder on your desktop or laptop. At that point, bring up a browser and browse to the start page (default.aspx) in that folder. When the page loads, you can begin speaking or typing your command. What you've probably recognized by now is that even though this particular article deals with creating voice control for a robot, using a multimodal application in this way gives you an alternative method of communicating with any serial device. The device could be a robot, as shown here; but it might equally as easily be a Personal Digital Assistant (PDA), or any "Smart" device that communicates through a serial port.

Sara Morgan Rea is a 2007 Microsoft MVP for Office Communications Server. Her first book, Building Intelligent .NET Applications, was published in 2005. In addition to co-authoring several Microsoft Training Kits, she recently published Programming Microsoft Robotics Studio. She currently works as a robotic software engineer at CoroWare.com.
Thanks for your registration, follow us on our social networks to keep up-to-date