devxlogo

How To Control Robots (and Other Devices) with Your Voice

How To Control Robots (and Other Devices) with Your Voice

oice command interfaces may be the next big thing. In this article, I’ll walk you through the process of using the Microsoft Speech Server SDK to create a voice-command interface that you can use to control an affordable and easy-to-assemble robot named ARobot. Even if you do not want to purchase the ARobot, you can follow through the article to understand how the command interface works.

Assembling a Robot
I’ll start with the robot?as that’s probably bothering you right now. Don’t worry, this particular robot is quite easy to set up. The robot is named ARobot (see Figure 1) and is produced by Arrick Robotics. It is available for purchase for $339.00 US. With the purchase, you get an ARobot, a Basic Stamp 2 controller, a Basic Stamp Programming book, and a book written by ARobot’s creator, “Robot Building for Dummies.”

You should allow about 3 hours for assembling your new ARobot using basic hand tools.

After assembling your ARobot, you can immediately program it using a “Basic Stamp 2,” which is a small, self-contained computer controller manufactured by Parallax, Inc.. The controller can execute a BASIC program created with the BASIC Stamp Editor, version 2.2.5. The editor was created by Parallax and is available as a free download from their web site.

?
Figure 1. The Arobot: The “ARobot” (pronounced “A Robot”) is an inexpensive and easy-to-assemble computer-controlled mobile robot.
What You Need
To build the sample project discussed in this article, you need: Visual Studio .NET 2003, Microsoft Speech Application SDK, 1.1, and an ARobot Mobile Robot. Note that although this article is specific to the ARobot, the techniques apply equally well to any remotely-controllable device with a programmable interface.

For most .NET developers, creating a BASIC program is not an attractive option. .NET developers want to take advantage of all the convenient features available with Visual Studio .NET. Lucky for us, ARobot’s designer, Roger Arrick, has written a special command-level program in BASIC that allows ARobot to receive high-level commands from a serial port. The program lets you control the ARobot’s drive motor, steering motor, beeper, and LEDs remotely via a serial cable or wireless receiver. To simplify command entry, you can use a speech-recognition to accept and recognize voice commands, translating the spoken commands to the appropriate high-level commands sent via a serial port or wireless interface to the robot.

Using this scheme, command flow moves from the UI, which accepts spoken commands, to a Speech Server, which recognizes the commands, to application code, which translates the recognized commands into the appropriate command form for the attached device, in this case, the ARobot.

Controlling the Robot with Voice Commands
The downloadable project that accompanies this article, VoiceCommandInterface.csproj is a multimodal speech application built with the Microsoft Speech Application Software Development Toolkit (SASDK), version 1.1. The SASDK is part of Microsoft Speech Server. A multimodal speech application is a Web-based application that lets users enter input via speech or traditional web controls.

?
Figure 2. Start Page: The figure shows a screenshot of the start page for the sample application that lets you control an ARobot using a voice command interface.

The start page for the application, default.aspx (see Figure 2), lets you speak voice commands into a microphone.

There are two advantages to using voice commands with your ARobot. The obvious advantage is that you can utilize a hands-free interface, freeing you from having to type commands on a keyboard. The second advantage is that it is much easier to speak a command such as “Go Forward”, then it is to type the command, “!1m113” (the actual command that ARobot requires).

Note: If you do not want to use a microphone and prefer to enter commands as text, you can use the Speech Debugging Console (see Figure 3) that loads automatically when a speech application is started. Text commands can be entered using the Input textbox and then clicking, “Use Text”.

?
Figure 3. Speech Debugging Console: This debugging tool is part of the Speech Application Software Development Toolkit (SASDK), version 1.1, and lets developers enter input as text to determine how the Speech Engine would interpret the input if it were speech.
?
Figure 4. The Basic Stamp Editor: The Editor allows you to send a special BASIC program named commandmode.bs2 to the ARobot.

Examining the Command Level BASIC Program
Roger Arrick’s command-level program lets you send sequential commands to the ARobot. The advantage of using this method is that you do not have to write a program that anticipates every move the robot needs to make. Instead, you can just send specific commands (such as “Go Forward”) to the robot when you want it to move forward.

The command-level BASIC program (commandmode.bs2) is available for download with the application that accompanies this article. I won’t examine the code in that program because all you need to do is download it to the ARobot. You do that using the free Basic Stamp Editor.

After downloading the Basic Stamp Editor program and installing it, you should be able to open the commandmode.bs2 program using the Editor. You can then send the program to your ARobot by clicking the blue arrow icon from the Editor’s toolbar (see Figure 4).

After sending the commandmode.bs2 program to the ARobot, you can use a straight-through serial cable or a wireless serial receiver to control the ARobot through the multimodal speech application discussed earlier.Examining the Multimodal Speech Application
ARobot accepts commands that use the following syntax structure:

  • The first letter is an exclamation point (!) used to indicate a new command.
  • The second letter is the controller ID, which is always a “1.”
  • The third letter indicates the command to perform; for example, “b” indicates a beep.
  • All remaining command characters are specific to the type of command you want the robot to perform.

For example, if you want ARobot to beep twice, you would need to issue the following command:

      !1b2

In this case the command parameter is “b” and the “2” character indicates that ARobot should beep twice. While this command structure is relatively simple, communicating these commands to your robot as typed commands is not exactly natural. It would be much easier to say the words, “Say Hello” and have your ARobot turn on its green LED light, beep twice and turn off the light.

?
Figure 5. Grammar for the Say Hello Command: The diagram shows the structure of the grammar used to identify all the valid phrases associated with the Say Hello Command.

The voice-activated remote control mentioned in the beginning of this article used a training process in order to recognize a user’s command. While that’s one method of accomplishing speech recognition, Microsoft Speech Server uses a grammar to identify what the user is saying. The grammar includes all the likely word choices that a user might say. For example, a user that wants ARobot to say hello could say, “Say Hello” or “Say Hi” or “Say Hey”. The SASDK provides tools for building a grammar that identifies all the valid alternative phrases (see Figure 5).

So, after identifying all the commands that ARobot can accept (see Table 1), it is just a matter of creating a grammar file for each command and then mapping the spoken command to the one that ARobot needs.

Table 1: The table shows a list of commands that the downloadable program can accept. Each command represents a specific action or sequence of actions that ARobot needs to perform.

CommandPotential Spoken CommandCommand(s) sent to ARobot
Backward“Go Backward”!1m10
Faster“Go Faster”!1m1
Forward“Move Forward”!1m11
Hello“Say Hello”!1l21 –> !1b2 –> !1l20
Left“Move Left”!1r1ff –> pause for 400ms –> !1r100
Right“Turn Right”!1r101 –> pause for 400ms –> !1r100
Slow“Slow Down”!1m1
Stop“Halt”!1x

Listing 1 shows the complete ProcessCommand code that processes each command recognized by the speech engine:

You store connection string parameters for the serial port in the Web.config file for the speech application in an appSettings section, as shown below.

                                           

You can then retrieve those settings to open a serial connection for sending a command.

         //Get our serial connection string parms          //from the Web.Config         _Rs232 = new Rs232();         _Rs232.BaudRate = nBaudRate;         _Rs232.PortNum = nPortNum;         _Rs232.Parity = nParity;         _Rs232.StopBits = nStopBits;         _Rs232.ByteSize = nByteSize;                              //Open the serial connection         _Rs232.Open();

Next, you construct the appropriate command, depending on what command was spoken. This section consists of a set of if-else if commands, any recognized one of which triggers the writing of a matching set of command characters, as shown below:

         //Determine what command sequence should be sent          //to ARobot         if (inval.ToUpper() == "STOP")          {             _Rs232.Write(Encoding.ASCII.GetBytes("!1x"));         }          else if (inval.ToUpper() == "BACKWARD")          {             _Rs232.Write(Encoding.ASCII.GetBytes("!1m10"                + Convert.ToString(nSpeed)));         }          else if (inval.ToUpper() == "FORWARD")          {             _Rs232.Write(Encoding.ASCII.GetBytes("!1m11"                + Convert.ToString(nSpeed)));         }         // etc...

The entire process is wrapped in a try/catch/finally block. If an exception occurs, the method writes the exception’s description to the Debug window. If the spoken command was not recognized, the method closes the port and in the finally block, sets the _Rs232 variable to null.

               catch (Exception ex)       {          Debug.WriteLine(ex.Message);       }       finally      {         //Close our serial connection         _Rs232.Close();         _Rs232 = null;      }   }

Of course, the most enjoyable part of this project will be when you see your robot in action. Keep in mind that this can’t happen until you use a serial cable or wireless serial receiver as described earlier to send the commandmode.bs2 program to the ARobot. After you have done that, you can copy the Web application to a Web folder on your desktop or laptop. At that point, bring up a browser and browse to the start page (default.aspx) in that folder. When the page loads, you can begin speaking or typing your command. What you’ve probably recognized by now is that even though this particular article deals with creating voice control for a robot, using a multimodal application in this way gives you an alternative method of communicating with any serial device. The device could be a robot, as shown here; but it might equally as easily be a Personal Digital Assistant (PDA), or any “Smart” device that communicates through a serial port.

devxblackblue

About Our Editorial Process

At DevX, we’re dedicated to tech entrepreneurship. Our team closely follows industry shifts, new products, AI breakthroughs, technology trends, and funding announcements. Articles undergo thorough editing to ensure accuracy and clarity, reflecting DevX’s style and supporting entrepreneurs in the tech sphere.

See our full editorial policy.

About Our Journalist