Samsung Discloses Exynos M4 Changes, Upgrades Support for ARMv8.2, Rearranges The Back-End
Last November Samsung announced the Exynos 9820, their latest flagship processor which should find its way to Samsung’s next-generation flagship Galaxy smartphone. The Exynos 9820 features 2 Exynos M4 cores, 2 Cortex-A75 cores, and a quad-core cluster of power-efficient Cortex-A55.
So far, the way the Samsung Austin R&D Center (SARC) based processors have been developed is one generation of large improvements followed by one generation with smaller improvements that borrow from the next-generation. Samsung’s first generation of custom cores was based on the Exynos Mongoose 1 (M1). This was the company’s first high-performance big mobile core. The M1 followed by the Exynos M2 which was not only a die shrink to the company’s 10-nanometer process but it also brought a few features from the M3.
Samsung Mongoose Microarchitectures | ||
---|---|---|
uArch | Process | Improvements |
Exynos M1 | 14 nm (14LPP) | Large, Initial Design |
Exynos M2 | 10 nm (10LPE) | Minor, Few larger buffers |
Exynos M3 | 10 nm (10LPP) | Large, Wider Pipeline |
Exynos M4 | 8 nm (8LPP) | Minor, Reorganized BE |
Exynos M5 | 7 nm (7LPP)? | Large? |
In August, Samsung formally disclosed the details of the Exynos M3 core at Hot Chips 30. Following Samsung’s development model, the M3 introduced a large set of changes, including a significantly wider pipeline. If the pattern persists, the Exynos M4 should bring more iterative enhancements that largely borrow from the Exynos M5 design.
Thanks to a series of patches by Samsung, we now know at least some of the changes in the M4. The new core can be targeted using the -mcpu=exynos-m4
compiler switch.
ARMv8.2
The M1, M2, and M3 are all based on the ARMv8 ISA. The Exynos M4 finally introduces ARMv8.2 support – including full FP16 scalar instructions and the integer dot product extension. This is an important change as both the Cortex-A75 and the Cortex-A55 support ARMv8.2 (as well as dotprod) which means all the cores on the Exynos 9820 should be largely ISA-compatible.
Front End
Based on the available information, most of the front-end remains unchanged. It is still a 6-wide decode and the fetch unit likely remains unchanged. Samsung did increase the capacity of the instruction queue from 40 entries to 48 entries.
Back End
The majority of the changes concentrate in the execution units. The rest of the back-end remains unchained (at least to the extent that the patches show) – including the reorder buffer (ROB) which remains at 228 entries deep. All the changes concentrate in the execution units.
On the integer cluster side, there is one branch unit, two simple ALUs, and two complex ALU pipes. This is identical to the M3. The memory subsystem is where we see some relatively larger changes. Whereas the M3 had a single dedicated Store AGU and two dedicated Load AGUs, the M4 keeps a single Store AGU, a single Load AGU, and the last pipe is changed to generic AGU – capable of handling both stores and loads. In other words, both load and store µOPs have two pipes they can be scheduled to.
On the floating-point cluster side, Samsung rebalanced the execution pipes. The two notable changes include the addition of a second FP square root unit and a second vector multiplication unit.
Shown above, Nxxx refers to NEON (advanced SIMD) units, where HAD = horizontal vector arithmetic, MSC = miscellanea, SHT = shift, SHF = shuffle, and CRY = cryptography.
Derived WikiChip Articles: Samsung Exynos M4.