Arm Updates Its Neoverse Roadmap: New BFloat16, SVE Support
Two years ago Arm introduced the Arm Neoverse, their official move to target the infrastructure. At the time of the announcement, Arm didn’t offer an attractive product for the data center. The Cosmos platform, which was simply the Cortex-A72 tuned for infrastructure, offered lower power but not much else. For the data center that’s a showstopper as performance is key. Arm did announce a 5-year roadmap where they promised to deliver 30% performance or higher generation-over-generation with their follow-up platforms: Ares, Zeus, and Poseidon.
Last year Arm introduced the Neoverse N1, codenamed Ares, their first design to diverse from Cortex for infrastructure. The N1 brought around a 60% performance uplift over the Cosmos platform in single-thread performance over the Cosmos Platform. Along with the N1, Arm also introduced the E1 platform which is a lighter core designed for throughput through multi-threading. The Neoverse N series is essentially Arm’s mainstream infrastructure series with a fairly balanced PPA for the TDP ranges you’d expect from products such as data center servers processors. The Neoverse E series is designed around throughput. Here power and area take front seat roles driving design considerating. Today, Arm is announcing a new Neoverse series called the V series where performance more than anything drives decisionmaking. We have seen Arm make a similar move with their Cortex-X series earlier this year. Those are cores that push single-thread performance further, relaxing the traditional PPA constraints that confined the main Neoverse series – even at the cost of some area and power. Given the similarities between the Cortex-A76 and the Neoverse N1, we can probably expect a great deal of overlap between the Cortex-X1 and the new Neoverse V1. The push is clearly to allow Arm customers to integrate some much beefier cores and better compete with the best upcoming x86 cores on single-thread performance.
Today Arm is also announcing the first platform in the V series, the Neoverse V1, codenamed Zeus. Based on their initial performance measurements, Arm is claiming a single-thread performance improvement of over 50% versus the Neoverse N1. Since the Neoverse N1 was largely based on the Cortex-A76; if we assume that the Neoverse V1 is going to be based on the Cortex-X1, an increase of 50% in IPC alone sounds very reasonable.
Although this purely a roadmap announcement and not an architectural disclosure, Arm did opt to share that the V1 will be the first Arm core to support the Scalable Vector Extension (SVE). With the exception of the Fujitsu A64FX which is primarily designed for the HPC market, we have not seen any other commitment for this extension which was first announced half a decade ago. A unique aspect of SVE is its ability to be vector length agnostic (albeit in multiples of 128-bit), allowing an implementation to decide a vector length best suitable for its market. For the Neoverse V1, Arm chose to implement two 256-bit vector units. 256 is an interesting choice for a couple of reasons. Firstly, this doubles the SIMD throughput of all prior Arm designs. Both the Cortex-A78 and its Neoverse N1 cousin have two Neon pipes, each being 128-bit. Secondly, the V1 SIMD units are half as wide as the A64FX (as well as all recent Intel big cores implementing the AVX-512 extension), for the first time allowing comparison and stress-testing of SVE vector partitioning in real hardware. The V1 will also introduce bfloat16 support.
Along with the V1 announcement, Arm is also announcing the direct successor to the Neoverse N1, the N2 platform, codenamed Perseus (a new codename). Based on their initial performance testing, Arm is expecting a single-thread performance improvement of over 40% over the Neoverse N1 at the same power and area efficacy. Since the original Neoverse N1 was based on the Cortex-A76, we can expect the N2 to be based on either the recently introduced A78 or possibly next-gen Cortex (Matterhorn). Given this is a high-level roadmap announcement and not an architectural disclosure, only a few details are being disclosed today. For the N2, Arm is expecting a significant improvement in its scale-out capabilities, allowing for more cores at a better performance at a fixed power budget over the N1. As far as features go, the Neoverse N2 will also introduce SVE support but will retain the same SIMD unit sizes as prior generations (128b) meaning we will now have three different Arm designs with three different SVE SIMD sizes. BFloat16 support will also be introduced with the N2.
|Arm SIMD Implementations|
|Implementation||Neoverse N1||Neoverse N2||Neoverse V1||A64FX|
|Units||2 × 128b||2 × 128b||2 × 256b||2 × 512b|
As usual, it’s hard to tell how much we can infer from a single marketing slide, but there are a few interesting points we want to highlight. For one, the slide below lists TDP ranges from 80 to 350 W for the data center. These numbers far exceed anything Arm previously talked about (for example, for the Neover N1, even for the data center, the TDP range Arm has talked about was around the 150-225W range). Additionally, although things such as core counts largely depend on the TDP and other design considerations, the N2 slide clearly lists as much as 192 cores, a 50% increase in the core count over what they had originally listed for the N1.
Looking a little further, Arm has the Poseidon Platform planned for the 2022 timeframe. This is a little different from the original roadmap Arm presented which originally listed Poseidon for 2021 but had no mention of the Perseus Platform so clearly their roadmap is quite turbulent. Arm says that for the Poseidon Platform, they are committed for a 30% performance improvement, however careful scrutiny of the roadmap outlined today says infrastructure workload performance as well as ML/Vector uplift rather than indicating the kind of single-thread performance we should expect.
Although light on technical details, Arm’s new roadmap update remains important. When the company announced its intentions to go all-in in the data center, they made big promises regarding performance and capabilities. Two years later and Arm is very much on track in delivering on its overall performance goals. The extension of the Neoverse with the V-series allows Arm to better segment its IPs and address the need for the very high-end of the CPU market such as what’s required by some data center operators and in the HPC market.