devxlogo

Optimizing OpenGL ES Applications for Brew

Optimizing OpenGL ES Applications for Brew

ualcomm’s support for OpenGL ES makes 3D graphics development for today’s wireless terminals much easier than ever before. With an interface that’s standard across a wide variety of platforms including Brew, Symbian, and Windows CE, OpenGL-ES promises to power a wide variety of games and user interfaces in the future. The previous article shows you how to get set up with Qualcomm’s implementation of OpenGL ES for Brew. This article expands upon those topics and shares some performance tips provided by Qualcomm and other developers at this year’s Brew Developer Conference in San Diego.

The State of OpenGL-ES on Brew-enabled Handsets Today
As I write this, OpenGL ES is available only on the latest mid-to-high-end handsets, those powered by the MSM6550 platform. The MSM6550 has an ARM9 core and considerable digital signal processing capabilities. In addition, it has a 3D graphics core provided by QUALCOMM onboard. This core provides hardware-accelerated pixel and vertex processing in hardware within the MSM6550.

Because the solution is an internal on-chip one, the bottlenecks aren’t where you’d expect in older mobile 3D graphics solutions. Instead of being limited by a bus to an external graphics controller, the core limitation is the vertex and pixel processors themselves. This results in significantly faster performance, but you still have to take care to keep the pipeline full (more on that in a moment). Peak benchmarks as I write this are quite good?Qualcomm reports smooth shading of 241kTri/sec (falling to 228 kTri/sec for texturing and 135 kTri/sec for texturing and lighting combined).

Although OpenGL ES is likely to be on a specific MSM6550-powered handset, there’s no guarantee, so be sure to check device data sheets at the Brew extranet carefully when targeting your application. Because of these two points, OpenGL ES isn’t ubiquitous on Brew handsets… yet. Over time this should improve as more handsets running on the MSM6550 are released, and as Qualcomm seeds later-generation processors with similar or better graphics hardware to handset manufacturers.

Because the solution is in hardware, you must make sure you’re adequately loading the graphics hardware. Sending too few vertices at a time to be rendered is a waste of the hardware’s capabilities. At worst, you can actually stall the pipeline. It’s recommended that you send over sixty vertices per render command (assuming your model as that many to render). This is too few?fewer than thirty on current hardware?to stall the pipeline entirely. Thus, it’s important to prepare an adequate scene in advance and keep the pipeline running in order to prevent pipeline stalls and the resulting rendering hiccups that can occur.

The implementation of OpenGL ES includes an early depth test, so you don’t need to worry about trying to optimize your scene to get the best hidden-vertex removal performance. Instead, you should render the scene from front to back, and let the implementation’s depth test sort things out.

While all of this runs in hardware, remember that the hardware is a shared resource?the graphics hardware shares the digital signal processor (DSP) with the audio hardware. If you’re planning on playing audio while rendering graphics, be sure to use an appropriate audio format so you don’t overload the DSP. This is something that requires some testing to get right.

Managing Models
Your models are the bread and butter of your OpenGL application; without models, what is there to see in the scene? Your models consist of vertices, potentially a lot of them. As a result, you should always use fixed vertex array types and formats, such as GL_FIXED, GL_SHORT, and GL_UNSIGNED_BYTE using tightly packed vertex arrays. These perform significantly faster than their floating counterparts; degradation in rendering performance of up to 50 percent is possible by being naïve in your choice of types. This degradation is due both to the larger size of the slower representation as well as the performance impact of the GPU having to convert those types to native types before processing. The following table shows what types are appropriate for each kind of data.

Position Array

GL_FIXED, GL_SHORT

Color Array

GL_UNSIGNED_BYTE

Tex Coord array

[st] or [stq], GL_FIXED, GL_SHORT

Normal Array

GL_FIXED, GL_SHORT

In a similar vein, try to avoid large numbers of very small triangles. Given the hardware on which you’re running, you’re not likely to see an improvement in image quality and will spend needless time in rendering. A good trick is to try to optimize your model into strips of triangles?these strips share vertices with adjacent triangles, meaning you’re getting the greatest possible detail relative to the number of vertices you’re using. Triangle strips benefit the whole pipeline by sharing the number of vertices per triangle to send across the bus. (This optimization works well for platforms with external processors, too!)

Another area to pay attention to is static clipping: you want small models to avoid static clipping, but not too small, or else you’ll end up with too many vertices to render per unit time to get right. This can be tricky to optimize, so you’ll want to make time for experimentation.

Finally, an obvious strategy is to reduce level of detail for distant objects from the camera. It makes little sense to render lots of vertices on an object that the rendering engine’s going to end up drawing as a few colored pixels anyway! You can do this in a number of ways, including keeping various models around and selecting a model with the optimum number of vertices based on its position in the scene, especially if the model isn’t likely to move a great deal towards or away from the camera.

Managing Textures
The hardware includes a texture cache. As with all caches, the secret to getting the best performance is to try to operate with cached data as much as possible. Consequently, your first order of business when working with textures is pick appropriate textures?and textures of appropriate sizes?to ensure you don’t end up running out of cache. One 128×64 texture fits fine in the cache.

Another trick that helps is to sort your render commands first by texture object and then by state vector. Grouping render commands by texture object helps keep relevant textures in the cache for as long as possible during the rendering process.

While on the topic of textures, it’s a good idea to use an appropriate filtering mode for your textures. Linear filtering is fine as long as the texture’s cached, but if your texture doesn’t fit in cache or for large textures, you’re better off with nearest filtering, because the cost of linear filtering can be as much as 15 percent.

A final trick that falls into the category of textures is to simulate lighting with textures. While the OpenGL ES interface has support for lighting, complex lighting can cause enormous slowdowns (up to forty percent in some cases). If you can get away with pre-computing lighting into the texture bitmaps and rendering your models with the appropriately-lit textures, you’re better off.

It’s Still in Your Hands
While OpenGL ES promises 3D portability to gaming and graphics developers, it’s not a panacea. As the developer, you’re still responsible for determining how your application performs. Making an effort to tune your implementation will pay off, not just on Brew, but as you port your application to different platforms as well.

devxblackblue

About Our Editorial Process

At DevX, we’re dedicated to tech entrepreneurship. Our team closely follows industry shifts, new products, AI breakthroughs, technology trends, and funding announcements. Articles undergo thorough editing to ensure accuracy and clarity, reflecting DevX’s style and supporting entrepreneurs in the tech sphere.

See our full editorial policy.

About Our Journalist