AMD Discloses Initial Zen 2 Details

Earlier this month AMD made their first disclosure of Zen 2, their next-generation core microarchitecture for desktop and server chips. Along with Zen 2, AMD also unveiled initial details of their next-generation server chips, codename Rome.

Zen 2


Zen 2 succeeds Zen/Zen+. The design targets TSMC 7 nm process node. AMD evaluated both 10 nm and 7 nm. The choice to go with 7 nm boiled down to the much lower power and higher density they were able get. AMD claims 7-nanometers delivers 2x the density and offers 0.5x the power at the same performance or >1.25x the performance at the same power (note that at Computex AMD’s slide said “1.35x”). Zen 2-based chips are currently sampling and are on track to be delivered to market in 2019.

AMD has made a large set of enhancements to Zen 2. To feed the widened execution units which were improved in throughput, the front-end had to be adjusted. For that reason, the branch prediction unit has been reworked. This includes improvements to the prefetcher and various undisclosed optimizations to the instruction cache. The µOP cache was also tweaked including changes to the µOP cache tags and the µOP cache itself which has been enlarged to improve the instruction stream throughput. The size of the cache on Zen was 2,048 entries. The exact details of Zen 2 changes were not disclosed at this time.

The majority of the changes to the back-end involve the floating-point units. The most major change is the widening of the data path which has been doubled in width for the floating-point execution units. This includes the load/store operations as well as the FPUs. In Zen, AVX2 is fully supported through the use of two 128-bit micro-ops per instruction. Likewise, the load and store data paths were 128-bit wide. Every cycle, the FPU is capable of receiving 2 loads from the load/store unit, each up to 128 bits. In Zen 2, the data path is now 256 bits. Additionally, the execution units are now 256-bit wide as well, meaning 256-bit AVX operations no longer need to be cracked into two 128-bit micro-ops per instruction. With 2 256-bit FMAs, Zen 2 is capable of 16 FLOPs/cycle, matching that of Intel’s Skylake client core.

Initial changes disclosed by AMD. (WikiChip)

AMD stated that Zen 2 IPC has been improved along with an increase in both the dispatch and retire bandwidth, however, the fine details were not disclosed. On the security side, Zen 2 introduces in-silicon enhanced Spectre mitigations that were originally offered in firmware and software in Zen.

Rome

Some people called it [chiplets] gluing things together; we called it the next generation of system design. – Dr. Lisa Su, AMD President and CEO

AMD’s second-generation EPYC is codename Rome, the successor to Naples. The two are socket and platform compatible. Note that Milan, Rome’s sucessor, is also socket compatible. Rome still uses a multi-chip approach to scale up the core count but the system design itself has changed quite radically from the prior generation. In Naples, AMD scales up the 8-core design, called a Zeppelin, to 32 cores by stiching together four of those SoCs through their proprietary interconnect called the Infinity Fabric. This method provided eight memory channels and 128 PCIe lanes distributed accross all the dies.

An AMD Zen-based Epyc chip uses four dies. (WikiChip)

With Rome, AMD is taking the idea of chiplets further. Similar to what they initially started with Threadripper 2, Rome has compute dies and an I/O die. However, this time, AMD took out the core execution blocks and moved them to new compute dies, leveraging TSMC’s 7 nm process and taking advantage of the lower power and higher density. The compute dies are then connected to a centralized I/O die that manages the I/O and the memory. The much bigger I/O die is manufactured on GlobalFoundries mature 14 nm process where most the power and density cannot be realized.

In total, there are nine dies. One I/O die and eight compute dies – each with 8 Zen 2 cores. Neither the details of the individual compute dies nor the I/O die were disclosed. There are a fair bit of challenges involved in this kind of design and it would be interesting to see how they were addressed. The I/O die creates deterministic and unified latencies across the entire chip, but it would potentially affect best-case/sensitive scenarios. The package is organized in four pairs of compute dies similar to our diagram below. It’s worth noting that each pair of compute dies are packaged tightly together on the organic substrate, indicating there might be very short traces going between them.

Rome chiplet design based on initial details (WikiChip)

With eight octa-core compute dies, Rome can offer up to 64 cores and 128 threads, effectively doubling/quadrupling (AVX2) the throughput of first-generation EPYC. Although Rome stays with 128 PCIe lanes, it brings new supports for PCIe Gen 4, doubling the transfer rate from 8 GT/s to 16 GT/s. There are eight DDR4 memory channels supporting up to four terabytes of DRAM per socket. One interesting detail AMD disclosed with their GPU announcement is that the infinity fabric now supports 100 GB/s (BiDir) per link. If we assume the Infinity Fabric 2 still uses 16 differential pairs as with first-generation IF, it would mean the IF 2 now operates at 25 GT/s, identical to NVLink 2.0 data rate. However, since AMD’s IF is twice as wide, it provides twice the bandwidth per link over Nvidia’s NVLink.

One of the things EPYC brought is SME and on top of that SVE which extended SME to AMD-V, allowing individual VMs to run SME using their own secure keys. With Rome, AMD says the number of keys (and thus VMs) support has also been increased.

Rome server on display at SuperComputing 18 (WikiChip)

I/O Die

There is a lot of mystery surrounding the capabilities of the I/O die and AMD’s plan for the future. By moving all the “redundant components”, such as the I/O and southbridge, from the compute die to the I/O die, AMD has opened up their design to some intriguing possibilities. Since all the controls can be found in the centralized I/O die, it becomes possible to swap out the compute dies with other types of logic such an FPGA (e.g., from Xilinx) or a GPU. In Naples, this would have meant sacrificing some of the I/O or memory but with Rome, this is no longer the case. AMD has not announced any such plans, but the option is there.

Potential designs (WikiChip)

Roadmap


The key takeaway from AMD’s event is their roadmap. A predictable roadmap helps improve customers confidence in the platform. AMD wanted to show that they are capable of laying out a roadmap and execute on it. To that end, AMD expects Zen 2 to launch in 2019. Zen 3 is on track and Zen 4 is at the design completion phase.

Cray Shasta blade with AMD EPYC on display at SuperComputing 18. (WikiChip)

Spotted an error? Help us fix it! Simply select the problematic text and press Ctrl+Enter to notify us.

17
Leave a Reply

avatar
6 Comment threads
11 Thread replies
0 Followers
 
Most reacted comment
Hottest comment thread
14 Comment authors
goostRazorJoshua O SmithTrent NordykeNicholas Recent comment authors
  Subscribe  
Notify of
IronMetal
Guest
IronMetal

This is the future

Shintel
Guest
Shintel

There is a rumor about something codenamed Zen X floating around. Since AMD had a server APU named Opteron X in the past and actively researches CPU + GPU chiplets in combination with HBM, I think Zen X will be a server APU like pictured above for high density server (though I would guess the GPU part would consume the entire right side next to the IO die – the 32 cores on the left should be enough).

Sterling
Guest
Sterling

Sounds like a space saving cost effective gaming chip… Sony has been working on such a chip with them for over a year. It even delayed VEGA to get it ready.

Hitler
Guest
Hitler

If you dont mind me asking, where has your evidence been presented? I would say Im rather informed in AMD news and I havent heard anything sony or ms related.

Jay
Guest
Jay

Have you heard of navi?

Joshua O Smith
Guest
Joshua O Smith

youre talking about navi and its not a server apu. its just a newer version of gcn when you talk about pc cards and its a new version of the console apus when you talk about consoles. not exactly any different than something that came out in 2013

Trent Nordyke
Guest
Trent Nordyke

The I/O die should contain shadow copies of the cache TAGS. The cache coherency of 64 processors is a difficult problem. 64 processor’s cache snooping all memory traffic is simplified if the I/O die could snoop the memory access.

looncraz
Guest
looncraz

I’ve argued this relentlessly – if there’s an L4 it will NOT be inclusive of the L3 on the chiplets, but will be mostly exclusive just like the current L3 policy AMD uses. Beyond that, it makes sense for AMD to keep track of the memory addresses (tags) each chiplet may have. With all I/O going through the chiplet, this should enable mostly passive bookkeeping. The main benefit for the L4 would be to act as a write coalescing buffer for the IMC as well as a read buffer from the IMC. This is the easiest to organize, manage, and… Read more »

Trent Nordyke
Guest
Trent Nordyke

I agree. AMD could use some memory in the I/O die for a prefetch buffer. I have always thought that DRAM reads should fetch a large cache line (512Byte?) from an open page. The DRAM row activate uses power and time and this should be minimized.

Junior
Guest
Junior

Is this gonna be the first CPU that’s not susceptible to Spectre/Meltdown? Legit asking, I dont really follow CPUs.

Cyrus Yan
Guest
Cyrus Yan

All of the current cpus have protection against spectre and meltdown but sacrifice performance to do it.

ARM64 Porn Star
Guest
ARM64 Porn Star
looncraz
Guest
looncraz

It has hardware mitigations for side channel attacks, yes. They will not be perfect – there’s no such thing.

Joshua O Smith
Guest
Joshua O Smith

they should be meltdwon safe but spectre and portsmash are a different thing. any cpu that uses spec execution is probalby vulnerable to be honest (as well as SMT\HT) but ususally the odds of these things are so small as to not really be worried. for the home consumer no worries at all really. they will patch what it needed as its needed. no big deal. to date nobody has documented a case of anyone using any of these techniques to break in to any computer system

Nicholas
Guest
Nicholas

Please use black font in your articles instead of gray on white (or alternatively bold the gray font).

Razor
Guest
Razor

Cyrix forever 🙂

goost
Guest
goost

AMD is bringing back Media GX in 2019, you heard it here first guys!!