Arm Launches Next-Gen Efficiency Core; Cortex-A520
Arm is unveiling its newest addition to the line of efficient little cores – the Cortex-A520. Serving as the successor to the previously introduced Cortex-A510, which debuted as a brand new architecture two years ago and received minor enhancements last year.
Known by its codename Hayes during development, the Cortex-A520 is expected to deliver notable power efficiency gains through a series of microarchitectural improvements and optimizations. The specifics of these enhancements are outlined below.
This article is part of a series of articles from Arm’s Client Tech Day:
- Arm Launches Next-Gen Efficiency Core; Cortex-A520
- Arm Introduces A New Big Core, The Cortex-A720
- Arm Introduces The Cortex-X4, Its Newest Flagship Performance Core
As with prior little cores, the Cortex-A520 is intended to be coupled with the new Cortex-A720 using the new DSU-120 in various configurations. The primary design focus of the new A520 is best energy efficiency levels at the lowest area for cost-constrained devices.
The Cortex-A520 builds on the A510 merged core design. Whereas the A510 brought full Armv9 support, the new A520’s underlying ISA support level has been updated to version 9.2. Along with that, the new core supports the enhancements to PAC (found in v8.8/v9.3) feature. Arm says that the new QARMA3 algorithm implemented in the PE for address authentication reduces overhead of PAC down to less than 1% while also modestly reducing latency.
Arm made various changes to the Cortex-A520, all of which focus on boosting energy and area efficiency. Performance improvements mostly took place around improving the branch predictions and data prefetching because they offer much better performance-efficiency at lower power costs versus changes to core pipeline.
Some performance features found in the Cortex-A510 were actually removed or scaled down in this iteration in order to better utilize the power in other areas. Perhaps the biggest example of this is the removal of an ALU. Previously, the Cortex-A510 had three ALUs. In the new Cortex-A520, Arm removed a whole ALU entirely. Arm says this had the effect of saving power throughout the pipeline from the issue logic to reword and forwarding. The A510 also restructured memory system for additional efficiency gain.
A graph of the power-performance at ISO-process is shown below for the A520 vs the A510. It can be seen that the changes targeted peak performance range of the core. All in all, Arm says that the new A520 allows for 22% lower power at comparable performance levels – or alternatively – allows for around 8% higher performance at similar power levels.