IEDM 2017: AMD’s grand vision for the future of HPC
Industry-Wide High-Performance Computing Trends
Up until the 2000s, the semiconductor industry saw a nice increase in performance for both CPUs and GPUs of roughly twice the performance every two years. “But in the last 10 years, we all talk about how Moore’s Law is slowing and that it is getting more difficult. We are now no longer at a doubling of performance every eighteen months, but we are somewhere between doubling every 2 to 2.4 years.†Su noted.
In terms of system-level performance per Watt, there is a similar trend. The industry sees doubling the performance per Watt efficiency roughly every 2.4 years.
Technology Enablers
A more interesting picture emerges when breaking down the trend to the individual components that enable those gains.
Process Technology
Driven by Moore’s Law, the process technology has always been the single largest key performance efficiency contributor. Even today, this continues to hold true. However, energy efficiency and density across nodes has slowed down and Su noted that they currently see improvement increase at a rate of roughly double every 3.6 years.
For AMD that means that around 40 percent of the performance gain comes from the process technology. Note that this figure is at the system level.
Su noted that “This is a very important characteristic because that says that it’s incredibly important for us to keep moving devices forward as we go to the 7-nanometer and 5-nanometer and beyond. But it’s also incredibly important for us to do a lot of work on the architecture and system side and frankly there are a lot of additional opportunities for us to move forward in performance in that domain.â€
Other innovations
The remaining 60% of the gain can be attributed to many other smaller contributors. AMD believes that for many of those contributors, there is additional innovations possible to keep the gain trend alive for the foreseeable future.
Among a large list of smaller contributors, AMD listed four major ones: higher integration, microarchitectural efficiencies, power management improvements, and software code improvements and code modernization.
One specific example is the microarchitectural improvements. For their latest microarchitecture, Zen, which was introduced earlier this year, AMD reported a 52 percent increase in instructions per cycle by extracting more parallelism and improving code flow predictions. All in all, AMD improved their IPC at a rate of roughly 7% per year over the last 10 years. “In some sense that’s really good because people would say `hey, have the tricks run out?` and I think the answer is the tricks haven’t run out yet.†Su said.
Sophisticated Power Management
A big problem in process design is reducing variations. Su pointed out that on the design side, they have gotten a lot more sophisticated in how they manage the allowed power budget on a modern microprocessor. The example Su provided was their latest Zen microarchitecture.
There are 1000s of sensors on a single 8-core Zen die in an attempt to accurately detect the absolute local temperature, local voltage, and local load line. All this data is fed in real time to the system power management unit which in turn controls the voltages on the chip down to 6 millivolts in order to globally optimize the voltage and power conditions.
The benefits of the complex power management at the silicon level on AMD’s Zen die must not be underestimated. Su explained “If you turn off all power management [on the latest EPYC chip], you will have power consumption at let’s say 1x. With all the power management present, we can reduce the power by almost half. This now becomes table stakes for a modern microprocessor design.â€
Improvements Come From Everywhere
The key takeaway is that while process technology continues to be a dominant driving force in the performance and efficiency gains of modern microprocessors, the full performance gain is a result of many additional contributors.