devxlogo

Capturing Audio on BREW Handsets

Capturing Audio on BREW Handsets

oday’s wireless handsets are veritable multimedia powerhouses; sporting more I/O options than the average desktop PC a mere decade ago. Because all users are comfortable talking to a wireless handset (it is, after all, first and foremost a phone!), having the ability to capture audio paves the way for a wide variety of applications, including voice command and speech recognition, audio note-taking, and multi-user chat and messaging in either stand-alone applications or integrated as part of a larger application. Qualcomm BREW makes it easy to capture audio, either as an audio recording you can package up, replay, and send to a remote server, or in real time by directly accessing the handset’s vocoder. This article shows you both methods and arms you with the information you need to know to select which path is right for you.

Using the IMedia Interface to Capture Audio
The IMedia interface?which was covered in in the recent article “Playing Multimedia Using BREW’s IMedia“?not only plays media, but can record media as well. Using the IMedia interface, you can capture audio to a file for storage, playback, and transmission to other hosts. The interface supports a variety of audio formats including PCM, although most handsets currently only support QCELP. (QCELP is a highly optimized format used by the CDMA network itself, and is very well suited for recording the human voice for playback to the human ear, but may not be suitable for other applications that require little or no compression artifacts.) By definition, you can’t use the IMedia interface to directly capture audio off the microphone in real-time; to do this, you use the IVocoder interface, which is the subject of the next section.

Using the IMedia interface to record audio is simple:

  1. Select an appropriate subclass of IMedia based on the desired format for the captured audio (for example, AEECLSID_MEDIAQCP) and create an instance of the interface.
  2. Set the IMedia instance’s callback so that the instance can pass information about the media back to your application using IMEDIA_RegisterNotify.
  3. Configure any recording options using IMEDIA_SetParm.
  4. Begin and control media recording, using handset or programmatic events (key presses, game actions, or so forth) to trigger IMedia methods such as IMEDIA_Record, IMEDIA_Pause, IMEDIA_Resume, and IMEDIA_Stop.
  5. Monitor the values sent to your application callback for information such as errors or the termination of recording and handle those notifications appropriately.
  6. When recording is complete, release the IMedia instance and any other resources you’ve consumed.

Setup of an IMedia interface for recording is the same as for playback:

#include "AEEMedia.h"AEEMediaData sMediaData;char szFilename[10] = "record.qcp";ISHELL_CreateInstance(pIShell, AEECLSID_MEDIAQCP, (void **)&pMe->pIMedia);sMediaData.clsData = MMD_FILE_NAME;sMediaData.pData = (void *)&szFilename;sMediaData.dwSize = 0;IMEDIA_SetMediaData(pMe->pIMedia, &sMediaData);IMEDIA_RegisterNotify(pMe->pIMedia, (*PFNMEDIANOTIFY)IMediaEventHandler, pMe);

This code creates an IMedia instance and sets the file name for the resulting audio recording using IMEDIA_SetMediaData. Finally, it registers a callback the IMedia instance invokes to notify your application when its state changes. Don’t forget to set the PL_FILE privilege in your application’s Module Information File, because IMedia will be interacting with the file system.

You can use the callback to update the user interface for events such as the beginning or ending of recording. The structure of the media callback is well-documented, or you can consult my previous article on IMedia for an example.

Starting and stopping recording can be done programmatically or as a result of a key event; you simply need to call IMEDIA_Record to begin recording and IMEDIA_Stop to end recording. (If you’d like to pause and resume recording, simply use IMEDIA_Pause and IMEDIA_Resume.) The only argument these methods take is the IMedia instance you’ve created and initialized:

// Someplace in an event handler…case AVK_SELECT:if (pMe->bRecording){   pMe->bRecording = FALSE;   IMEDIA_Stop( pMe->pIMedia );}else{   pMe->bRecording = TRUE;   IMEDIA_Recording( pMe->pIMedia );}break;case AVK_SOFT1:if (pMe->bRecording && !pMe ->bPaused ) {   pMe ->bPaused = TRUE;   IMEDIA_Pause(pMe->pIMedia );}else{   pMe ->bPaused = FALSE;   IMEDIA_Resume(pMe->pIMedia );}break;

Of course, you should accompany changes in application state (like a change from recording to paused) with user interface changes, such as showing an icon when recording is paused. While it’s tempting to do this in the same place where you trigger the changes in the IMedia object’s state, it’s actually better to do it in the callback you register with your IMedia object. Doing this ensures that the user interface will only change once the state change for the IMedia object has taken place, and that the user interface cannot get out of sync with the object’s state.

Once recording is finished, you must clean up the IMedia object. To do this, first clear the callback registration and then simply release the object:

IMEDIA_RegisterNotify(pThis->pIMedia, NULL, NULL);IMEDIA_Release(pThis->pIMedia);

Clearing the callback isn’t strictly necessary, but given the asynchronous nature of the IMedia implementation, it’s a good idea.

Once you’ve recorded the sound file, you’re free to do with it as you choose, of course, from sending it to a server, keeping it on the local file system, or analyzing it for features for speech recognition or other purposes.

Using the IVocoder Interface to Capture Audio
The IVocoder interface gives you real-time access to the underlying hardware audio encoder and decoder. Like the IMedia interface, it’s bidirectional?you can use it to both record and play audio. However, the interface is significantly more complex to use, because the audio data is passed continually between the interface and the client of the interface. While this makes simple tasks such as recording an audio clip more difficult (a task better suited to an IMedia instance), it’s the only way to create truly interactive audio applications such as a push-to-talk telephony or an audio chat application.

Although all BREW applications must work asynchronously?performing their work in response to either events or function callbacks invoked by interfaces or timers?the real-time nature of the IVocoder interface places additional demands on your application design. This stems from two related reasons: the time-dependent nature of the data from the vocoder and the implementation of the vocoder itself. First, audio data is sensitive to timing; the vocoder returns data to its client in frames, self-contained units of compressed speech. If you drop a frame, you’ve lost a segment of audio (the result sounds like a digital cell call just as it’s about to drop due to poor coverage). Dropping frames can happen either because of insufficient buffering (you’re not reading from the vocoder fast or often enough, or not writing to it fast or often enough) or because the underlying hardware is occupied by other things. Thus the second challenge: because the vocoder shares processing hardware, it’s important that when using the IVocoder interface you keep other processing to a minimum, especially on low-end hardware. As the vocoder collects audio from the microphone, it compresses the data in real time based on the options you specify, using some of the hardware’s processing resources to do so. (Similarly, replaying audio requires processing to decompress the audio data in real time.)

To use IVocoder, you must:

  1. Create an instance of the IVocoder interface.
  2. Configure the instance by selecting the vocoder type along with the compression options and callbacks for data availability and data playback.
  3. Start and stop the vocoder via the interface according to program control.
  4. Release the vocoder interface instance when you’re through using it.

Vocoder configuration is a little trickier than configuring an IMedia interface, due to the nature of the vocoder itself. First, you must choose a vocoder algorithm (the default used by CDMA cellular networks is IS127). In addition to selecting an algorith, you must select a data rate?that is, how much digital data is generated from the audio channel for a given unit of time. You specify the data rate in terms of the bounds your application requires?the minimum and maximum data rates?and the vocoder dynamically chooses the data rate based on the complexity of the audio stream. This process is adaptive; the vocoder will move to a higher data rate for more complex audio, and throttle back to the minimum data rate when encoding near-silence or audio with fewer compression features. In addition, some audio encoders permit you to specify an additional factor which permits the vocoder to further trim the size of each frame, resulting in lower bandwidth (often with only slightly diminished quality) for a given vocoder rate.

Here is a snippet that creates a vocoder interface and configures it for a typical audio session:

if (ISHELL_CreateInstance(pMe->a.m_pIShell, AEECLSID_VOCODER,                           (void **)&pMe->pIVocoder) != SUCCESS)   {      DBGPRINTF("Error Creating IVocoder interface");      return FALSE;   }// Set necessary config data, Description in API   pMe->vocConfig.needDataCB = NeedDataCB;   pMe->vocConfig.haveDataCB = HaveDataCB;   pMe->vocConfig.playedDataCB = NULL;   pMe->vocConfig.readyCB = ReadyCB;   pMe->vocConfig.usrPtr = pMe;   pMe->vocConfig.max = HALF_RATE;   pMe->vocConfig.min = EIGHTH_RATE;   pMe->vocConfig.overwrite = TRUE;   pMe->vocConfig.txReduction = 0;   pMe->vocConfig.vocoder = VOC_IS127;   pMe->vocConfig.watermark = 24;   // Configure IVocoder   status = IVOCODER_VocConfigure(pMe->pIVocoder, pMe->vocConfig,                                   &pMe->vocInfo);

The IVocoder interface requires four callbacks set in in the vocoder configuration structure:

  • The needDataCB, which the vocoder invokes when it needs additional data while playing audio.
  • The haveDataCB, which the vocoder invokes when it has additional data while recording audio.
  • The playedDataCB, which the vocoder invokes each time it has played one or more frames.
  • The readyCB, which the vocoder invokes when it is ready to start or stop audio capture or playback.

To specify both the vocoder algorithm and data rates, Qualcomm provides constants in AEEVocoder.h; this listing configures the vocoder to run at half its maximum data rate at maximum, and an eighth its maximum data rate at minimum. The actual data rates depend on the type of codec; in this case the IS127 codec has been selected, with a maximum data rate of approximately 9kB/sec, so the data rates range between about 4.8kB/s and 1 kB/S.

Finally, because the vocoder interface provides some buffering of data in the event that you cannot service a callback as soon the vocoder requires or has data, you must specify the number of frames to accumulate before engaging the audio path, as well as whether when the vocoder’s buffer overflows new or old frames should be discarded. The example just shown specifies that twenty-four frames should be required (the watermark field) and that new frames should be discarded in the event the buffer is full (overwrite is TRUE).

Once the vocoder is configured, it invokes the callback you specify in the configuration’s readyCB field. At that point, you can begin using the vocoder for input or output. For input, simply invoke IVOCODER_VocInStart; for output, invoke IVOCODER_VocOutStart. In a similar vein, to stop the vocoder, use IVOCODER_VocInStop (to stop capture) or IVOCODER_VocOutStop:

static void ReadyCB(void * usrPtr) {   CApp* pMe = (CVocApp*)usrPtr;   // Start Reading in   IVOCODER_VocInStart(pMe->pIVocoder);// Set Vocoder variable to on   pMe->bVocOn = TRUE;}

This callback starts the vocoder immediately once it’s ready, and sets an internal flag used to track the status of audio capture. You can also update your application’s user interface in response to the callback; I like to do this asynchronously, by sending an event to my application using ISHELL_PostEvent indicating that the UI should be updated in accordance with the new state.

If you are going to repeatedly start and stop the vocoder, it’s a good idea to reset it between each invocation to ensure that it has reset its internal buffers and state:

if (pMe->pIVocoder) {    IVOCODER_VocInStop(pMe->pIVocoder);    IVOCODER_VocOutStop(pMe->pIVocoder);   IVOCODER_VocOutReset(pMe->pIVocoder);    IVOCODER_VocInReset(pMe->pIVocoder); }

Once the vocoder starts, it’s your responsibility to keep collecting data from it and processing the data when it invokes the callback you specified in the configuration’s haveDataCB slot. (In a similar vein if you’re playing data you need to be able to keep up with serving data to the vocoder when it invokes the callback you passed in the needDataCB slot.) What you do with this data (or where you get it, depending on whether you’re recording or playing audio) really depends on your application. For example, if you’re just storing the data for playback later, you might write something like this:

static void HaveDataCB(uint16 numFrames, void *p){   CApp* pMe = (CVocApp*)p;   uint16 length;   int i;   int status;   // Data integrity checks   if (!pMe || !pMe->pIVocoder) return;   // Read in each frame and send to network   for (i = 0; i < numFrames; i++)    {      status = IVOCODER_VocInRead(pMe->pIVocoder,                                  &rate, &length, pMe->arbyFrameData);      // If we succesfully read in data, then write to IVocoder      if (status == SUCCESS)       {         IFILE_Write(pMe->pIFile, pMe-> arbyFrameData, length);      }   }}

The HaveDataCB and NeedDataCB callbacks each take the number of frames available (or that can be accepted in the case of audio playback) along with the user data pointer you specified in the vocoder’s configuration. You should do as little processing as necessary in these callbacks, because they will occur quite often. In this example, I’m simply copying each frame as it arrives to a file on the flash file system; I could just as easily be buffering it in a memory buffer for analysis (perhaps run in the background using BREW’s ISHELL_Resume mechanism) or sending it along a network socket to a server. Regardless, note that I’m copying the frame data into a static memory region arbyFrameData in my application structure, rather than creating this buffer with MALLOC each time the vocoder invokes the callback as a simple optimization. (The buffer is sized according to the constant MAXFRAMELEN defined in AEEVocoder.h.)

Once you’re done with the vocoder, you must stop and release the associated interface. (You can, of course stop and restart the vocoder repeatedly without needing to release the interface.) To do this, simply use the appropriate Stop methods and invoke its Release method:

// We stop incoming and outgoing dataif(pMe->pIVocoder) {   IVOCODER_VocInStop(pMe->pIVocoder);   IVOCODER_VocOutStop(pMe->pIVocoder);   IVOCODER_Release(pMe->pIVocoder);   pMe->pIVocoder = NULL;}

You’ve already seen how to use the Stop methods in a previous snippet; the IVOCODER_Release method is just like that for any other interface.

Both Good Solutions
Qualcomm provides both simple and low-level access to the audio hardware on BREW-enabled handsets via the IMedia interface. If you’re looking to record audio, your best bet is IMedia, which provides a simple wrapper to package audio in a standard format for later playback or to exchange with servers or other uses. On the other hand, if you need real-time, low-level access to the real-time audio stream, it’s easy to build an application with the correct callbacks and use an instance of the IVocoder interface.

devxblackblue

About Our Editorial Process

At DevX, we’re dedicated to tech entrepreneurship. Our team closely follows industry shifts, new products, AI breakthroughs, technology trends, and funding announcements. Articles undergo thorough editing to ensure accuracy and clarity, reflecting DevX’s style and supporting entrepreneurs in the tech sphere.

See our full editorial policy.

About Our Journalist