Samsung Discloses Exynos M4 Changes, Upgrades Support for ARMv8.2, Rearranges The Back-End

Last November Samsung announced the Exynos 9820, their latest flagship processor which should find its way to Samsung’s next-generation flagship Galaxy smartphone. The Exynos 9820 features 2 Exynos M4 cores, 2 Cortex-A75 cores, and a quad-core cluster of power-efficient Cortex-A55.

So far, the way the Samsung Austin R&D Center (SARC) based processors have been developed is one generation of large improvements followed by one generation with smaller improvements that borrow from the next-generation. Samsung’s first generation of custom cores was based on the Exynos Mongoose 1 (M1). This was the company’s first high-performance big mobile core. The M1 followed by the Exynos M2 which was not only a die shrink to the company’s 10-nanometer process but it also brought a few features from the M3.


Samsung Mongoose Microarchitectures
uArch Process Improvements
Exynos M1 14 nm (14LPP) Large, Initial Design
Exynos M2 10 nm (10LPE) Minor, Few larger buffers
Exynos M3 10 nm (10LPP) Large, Wider Pipeline
Exynos M4 8 nm (8LPP) Minor, Reorganized BE
Exynos M5 7 nm (7LPP)? Large?

In August, Samsung formally disclosed the details of the Exynos M3 core at Hot Chips 30. Following Samsung’s development model, the M3 introduced a large set of changes, including a significantly wider pipeline. If the pattern persists, the Exynos M4 should bring more iterative enhancements that largely borrow from the Exynos M5 design.

Exynos M3 Block Diagram (WikiChip)

Thanks to a series of patches by Samsung, we now know at least some of the changes in the M4. The new core can be targeted using the -mcpu=exynos-m4 compiler switch.


The M1, M2, and M3 are all based on the ARMv8 ISA. The Exynos M4 finally introduces ARMv8.2 support – including full FP16 scalar instructions and the integer dot product extension. This is an important change as both the Cortex-A75 and the Cortex-A55 support ARMv8.2 (as well as dotprod) which means all the cores on the Exynos 9820 should be largely ISA-compatible.

Front End

Based on the available information, most of the front-end remains unchanged. It is still a 6-wide decode and the fetch unit likely remains unchanged. Samsung did increase the capacity of the instruction queue from 40 entries to 48 entries.


Back End

The majority of the changes concentrate in the execution units. The rest of the back-end remains unchained (at least to the extent that the patches show) – including the reorder buffer (ROB) which remains at 228 entries deep. All the changes concentrate in the execution units.

On the integer cluster side, there is one branch unit, two simple ALUs, and two complex ALU pipes. This is identical to the M3. The memory subsystem is where we see some relatively larger changes. Whereas the M3 had a single dedicated Store AGU and two dedicated Load AGUs, the M4 keeps a single Store AGU, a single Load AGU, and the last pipe is changed to generic AGU – capable of handling both stores and loads. In other words, both load and store µOPs have two pipes they can be scheduled to.

On the floating-point cluster side, Samsung rebalanced the execution pipes. The two notable changes include the addition of a second FP square root unit and a second vector multiplication unit.

Shown above, Nxxx refers to NEON (advanced SIMD) units, where HAD = horizontal vector arithmetic, MSC = miscellanea, SHT = shift, SHF = shuffle, and CRY = cryptography.

Derived WikiChip Articles: Samsung Exynos M4.


Spotted an error? Help us fix it! Simply select the problematic text and press Ctrl+Enter to notify us.

Spelling error report

The following text will be sent to our editors: