any people believe that according to Moore’s law, you would expect a doubling of processing power about every 18 to 24 months. In truth, that rate is not what Moore’s law promises, but it is how many developers perceive it. Developers have received this doubling of power for a long time. Overall, if that power increase were the promise of Moore’s law, then it seems to be a promise that is being kept. Unfortunately, like when driving an automobile, speed is relative to what is happening around you and what you’re able to do with it.
The rules have changed.
In the past, you could always gain a performance boost by simply buying a machine with a faster processor. If you were running an application on a 500 MHz machine, you could generally gain a performance boost by upgrading to a newer computer that would almost by default have a faster processor. If it had been 18 to 24 months, then you’d expect to see a 1 GHz machine. If you put the application on a machine with a faster processor, you’d immediately see better performance.
You could write applications that were not the most efficient, and not worry too much about it because the computer speeds were getting faster and they could take care of the issues. Just look at Microsoft Windows, which has continued to bulk up over the years. Each new edition required a faster processor to run effectively. However, what happens when the individual speed of a processor hits the ceiling? What happens when the speed of a processor can no longer go any higher? How many developers rely on the additional boosts from new computers? Do you rely on the additional boost?
Processor speeds have hit the wall with current technologies. The speed is nearly at the maximum that can be obtained, and we are virtually at the ceiling with current technology. Some additional speed will be tweaked and added, but overall it seems that using current processor technology, the maximum speed of a processor is just about upon us.
Think about it. How long have processor speeds been hovering around 3 to 4 GHz? You were able to get a 3 GHz machine four or five years ago, which means processors should be well above 10 GHz if the trends were to be holding true. It seems Moore’s Law is failing. Or is it?
What do you do when you need to get more cars on a road that has only a single lane? You increase the speed at which the cars can drive. When the speed of the cars gets to be too dangerous, you can then add an additional lane. The second lane will allow you to get a lot more cars moving down the road. This same analogy applies to processors. If you are at the maximum speed, then there is the option to add a second lane. The speed limits may remain the same whether you are talking about roads or within processors; however, the overall performance has the potential to increase. With a road, you’ll be able to get more cars traveling by using the extra lane. On a processor you should get more throughput using an extra lane.
There is another issue. Consider the car analogy. What if all roads were one lane? There would be no reason for the car to change lanes?the ability to change lanes might not even have been built into the cars. If everyone had cars that couldn’t change lanes, then having multiple lanes wouldn’t help because everyone would still be in the same original lane. The same is true for processors and applications. Just as a car needs the ability to change lanes to use a multiple-lane road, an application traveling to a processor with multiple cores would also need to know what to do. Widening the Road
All applications are threaded, and by default an application is single threaded. Since a processor can do only one thing at a time?assuming a single-core processor?single threading hasn’t been a big issue for the average developer. If it does become a big issue, then a program can be broken into threads that can run in a somewhat concurrent fashion. With hyperthreading added to processors a few years ago, it became possible to get more work done by breaking an application into threads. Hyperthreading allowed the processor to help swap the threads around to get more work done.
Until recently, most processors for personal computers were like the single-lane roads described previously?they had a single core. Dual-core processors have been released, and more recently quad-core processors have become readily available. These additional cores are like additional lanes on a roadway. If you have a single-threaded application, you are still going to be using a single core. You gain no real speed from having the extra lanes because your application knows how to use only one. (Note, you might gain some speed as the operating system and other applications that are running use other cores, but this gain is an indirect benefit to your application.)
In fact, the individual core speed within most multicore processors is slower than some of the single-core processors, which means you may be stuck behind traffic as well as operating at a reduced speed! Your application, if single threaded, may go slower on a newer processor because of its lower core speeds.
If you look at dual-core processors from the last few years, you might have noticed that the published speeds were closer to 1.5 or 2.0 GHz rather than the over 3 GHz of the single-core machines being released. The recently announced quad-core processors being used in the Macintosh can be as fast as 3.0 GHz. As more cores are added to a processor, you shouldn’t be surprised to see the speed of the initially released chips slow down. As such, if you’re building standard applications that are processor intensive, then this reduction in speed should cause you concern if you are not doing something specific to architect your application to use the added cores.
Speed Versus Power, Heat Versus Performance
Why don’t processor companies like Intel and AMD simply keep increasing the speed? What’s changed? As Charles Congdon at Intel said, “It is all about power and heat.” When you push up the speed, you generally generate more heat.
|Figure 1. Transistor Switch: A transistor’s speed is based on how fast electrons travel from the switch’s in side to its out side.|
Processors are based on transistors, or more specifically, transistor switches. In simple terms, the speed of a transistor is measured based on how fast electrons can travel from the “in” side of a switch to the “out” side of the switch (see Figure 1). In the past, the size of the transistor switch was reduced to gain speeds. A transistor works by having an insulator between two conductors: the in-flow side and the out-flow side of the switch. A gate helps to regulate when the electrons will flow from the in to the out. The speed gains have come from reducing the size of the insulator and thus reducing the size of the gap from the in side to the out side.
The width of the insulation has been shrunk in the past to gain speed. Shrinking the insulated area allows there to be a shorter distance to travel. Shorter distances can be crossed more quickly.
Unfortunately, this insulator is now at the point where it can’t get much smaller; the insulator is already as small as it can go with current technology. The modern insulator in a transistor switch is only about 10 to 12 atoms thick. If it gets any smaller, it no longer acts as an insulator. Every Last Drop
Intel recently announced that it was switching to a different substance called halfium that would allow for another decrease in size. This decrease, however, will be limited. At this point just like squeezing blood from a turnip, the processor engineers are squeezing speed from silicon-based processors. It is simply not physically possible to go much smaller?thus much faster?with the current silicon structure.
In addition to shrinking the transistors?which can’t get much smaller, if any?you can also shrink the infrastructure around the transistors. However, shrinking the infrastructure will mean only a minor gain in speed, leaving only the cache as an area where some speed gains can still be found, but even there only minor gains are expected.
Building processors is a balance of power and heat. According to Congdon, there is a general rule of thumb that for every 1 percent increase in frequency, you get a 3 percent increase in power usage and about a .66 percent increase in performance. As an example, if you have a processor running at 3 GHz and you increase its frequency by 15 percent to 3.45 GHz, then you are also going to increase the power usage by 45 percent and get a performance boost of only about 9 percent. At a nearly 50 percent increase in power usage, you get less than 10 percent better performance.
Going the other way, you can decrease the power usage by slowing down the processor. For example, if you decrease the frequency of a 3 GHz machine by 15 percent to 2.55 GHz, then you decrease the power usage by 45 percent at a cost of only 10 percent of your performance. In other words, you can cut the power usage by almost half and still run at 90 percent performance.
If you add additional cores to the mix, then you have the ability to reduce the power usage and still gain overall performance. Compare a dual-core system running at 2.55 GHz with the 3 GHz single-core machine. Again, the dual core is at a frequency of 15 percent less than the single-core system. With a dual-core machine, running at a 2.55 GHz frequency, your power level would be about half (55 percent) for each core when compared to a single 3.0 GHz processor. When the two 2.55 GHz cores are combined, the power consumption is nearly equal to that of the 3 GHz machine. Your overall performance, however, could be at 90 percent for each core, or up to 180 percent of the performance of the 3 GHz machine, if you can utilize both cores. The end result is a slightly slower frequency, but much greater performance at a power level that is nearly equal.
Now take this comparison even a step further by using a quad-core processor. If you reduce the frequency by 30 percent (going from 3.0 to 2.1 GHz), then you end up with power usage that is still less than half of that used by the 3.0 GHz processor (90 percent reduction for each core would be 10 percent usage for each; thus, you’d get 40 percent less for all four cores). The overall performance would end up being 80 percent for each core, which is up to 320 percent when you compare it to the original 3.0 GHz machine, if all four cores are working to capacity.
Power usage is related to heat generation. As you can see by these numbers you can drop the speed a little bit and gain big reductions in power usage and reduced heat. By using this added power to drive additional cores, you can increase overall performance to make up for the lower frequency. Getting Closer to Reality
If you are upgrading to newer processors that have more cores, but slower speeds, then your applications may run slower unless you prepare them to run across multiple processors. While compiler builders such as Codegear (Borland) and Microsoft are sure to build features into their compilers to help with this speed issue, in many ways, the onus is on the developer. It is up to developers to change the design and architecture of their applications to take advantage of the added core. Sequential applications will take advantage of a single core only; if a design change isn’t made, you won’t gain any speed.
It is going to become critical that developers, architects, or application designers understand concepts such as concurrency and parallelism, if application speed is important. Even if speed isn’t important, chances are that you will have to know these concepts in the future. While the tools will eventually make it easier to work with these concepts, it is only the developer that can determine what business logic can be broken across processor cores. As such, while the compiler manufacturers might make it easier to add threading or another manner of using cores, the odds are it will be you that has to decide how to use it.
If you don’t use threading in your processor-intensive applications, it doesn’t mean you will automatically lose and that your applications will slow down. Your operating system will very likely be able to also take advantage of multiple cores?if not today, they will in future releases. This ability means that rather than sharing a single core between your operating system, your application, and everything else on your system, the operating system will be able to use one core and possibly let your application use a separate core. This feature alone could also give you some speed increases. As one person recently joked, “With multiple cores, you get one for your application and the rest for all the spyware on your system!” Regardless, some gains can be had.
|“…We are in the middle of a revolution right now! It is a parallel revolution, and this time it is for real.” ? Michael Suess, ThinkingParallel.com|
|“Multicore is bringing multiprocessing to the masses, which means that the average developer will now need to be more aware of how their application will behave in that environment.” ? Allen Bauer, Codegear chief scientist|
Is It Really Important?
In recent, separate presentations by Bill Gates and Microsoft’s S. “Soma” Somasegar (vice president of developer tools), both took time to lead off by stating that multicore, and thus parallelism, were important topics for developers. Both indicated that this technology is going to change the way many programs are created.
Similarly, Allen Bauer, Codegear’s chief scientist (Codegear is the tools divestiture by Borland that took place in the fall of 2006), said, “Multicore is bringing multiprocessing to the masses, which means that the average developer will now need to be more aware of how their application will behave in that environment.”
He added: “Generally, I’d say that most developers depend on their tools to isolate them from any changes or advances in processor architecture.”
While Borland, Microsoft, and others have acknowledged this change in the programming paradigm caused by multicore processors, they’ve also admitted that the tools aren’t there today to make this shift as easy as it needs to be. In fact, most key changes to tools to support this paradigm change are still a couple of years away.
Nevertheless, there is some support today. Additionally, there are approaches you can take to help your applications be more flexible in the future. For example, Intel has released a number of tools to help with programming to multiple cores. These tools include the Intel Thread Profiler, Intel Thread Checker, Intel VTune, and others.
James Reinders of Intel, as well as several others familiar with this area, have said there are two key themes regarding programming for multicore that are important to consider. The first is that you have to stop and think before jumping into application development. You have to plan the design; you have to architect your applications up front before you start coding. Just as object-oriented programming (OOP) is different from the prior procedural method, so too is the approach to developing for multicore.
The second theme is to avoid direct threaded programming. According to a survey by Evans Data, 37 percent of people in North America are doing multithreading currently (Evans Data Corportaion: North American Development Survey 2006, Winter 2006). A Jupitermedia survey of developers indicated that the number may be closer to 70 percent. Regardless, it is important to be aware that direct thread programming is too low level for what you really ought to be doing. Instead, abstract the threading from the core programming that you are doing. As tools evolve this abstraction will be done for you. Therefore, it is a best practice for you to be migrating that way from the beginning. Intel has already released OpenMP, which does some of this abstraction. Expect other tools to do the same in the future. In the meantime, it’s best for you to use abstraction if you are using threading in your applications.
Putting It Off
The warning has now been raised. Can you put off learning to tap into multicore? Sure you can! But be aware that after this year it is likely that no more single-core processors will be released for standard personal computers and notebooks. The default will be multicore. Also, be aware that if Moore’s Law holds true, then you can expect that while four cores are being released now, it won’t be too long before there are eight-core and sixteen-core machines. The number of cores is expected to continue to grow and evolve.
You can ignore this evolution, but those that don’t will be the first to tap into the possible speed gains. Often these speed gains equate to competitive advantage. You can ignore the warnings if you want to ignore them, but take heed: chances are you will eventually have to understand how to tap into multiple cores, just as many people have to understand OOP today.
The bottom line is that processors are changing, and developers need to be aware of these changes. Until a serious change occurs in the underlying technology of processors, you can no longer expect a doubling of speed, but rather you should expect increases in the number of cores in a processor. You’ll need to gain the speed by redesigning your application to use these cores.