ISSCC 2018: AMD’s Zeppelin; Multi-chip routing and packaging

March 24, 2018May 25, 2021 David Schor 14 nm, 2D packaging, AMD, AMD's infinity fabric, EPYC, multi-chip package, Ryzen, x86, Zen

Scalable Data Fabric (SDF)

The I/O Hub interfaces with the SDF through the I/O Master/Slave (IOMS) interface. Likewise, the two CCXs interface with the Cache-Coherent Masters (CCMs). The IOMS and the CCMs are the only interfaces that are capable of making DRAM requests. The DRAM is attached to the DDR4 interface which is attached to the Unified Memory Controller (UMC) which can communicate with the scalable data fabric.

WikiChip’s diagram of the SDF transport layer.

The Coherent AMD socKet Extender (CAKE) module translates the request and response formats used by the SDF transport layer to and from the serialized format used by the IF Inter-Socket SerDes and the IF On-Package SerDes.

Local Access

Under a local access, a core request would go through the CCX, CCM, through the fabric to the local UMC and to the local DRAM channel. The read data would then follow the same path in the reversed order back to the core. A round-trip on a system with a CPU frequency of 2.4 GHz and DDR4-2666 19-19-19 memory (i.e., MEMCLK of 1333 MHz) would be roughly 90 nanoseconds.

Non-Local Access

In the case of the EPYC and Ryzen Threadripper where more than a single Zeppelin is used, a memory access may have to be routed to neighboring Zeppelin. Regardless of the route, the path is always the same. A local core request is routed through the CCX, CCM, and to the CAKE module which encodes the request and sends it through the SerDes to a CAKE module on a remote die. The remote CAKE decodes the request and sends it to the appropriate UMC to the DRAM channel. The response is then routed back in the reverse order back to the request-originating core. A round-trip on a system with a CPU frequency of 2.4 GHz and DDR4-2666 19-19-19 memory (i.e., MEMCLK of 1333 MHz) across a different socket in a two-way multiprocessing configuration is roughly 200 nanoseconds while a round-trip across dies on the same package takes roughly 145 nanoseconds.

WikiChip’s diagram of data flow across multiple Zeppelin SoCs.

By the way, the longest distance consists of two hops – one to the adjacent socket and another one to get to next die. AMD did not report the round-trip latency for this scenario.

WikiChip’s diagram of the longest path possible across multiple Zeppelin SoCs.

The difference in latency boils down to the type of SerDes used for the access. It’s worth pointing out that AMD’s “Smart Prefetch” (marketed under AMD’s “SenseMI”) in the core complexes help to greatly mitigate the latency of requests to memory that is attached to remote dies.

I/O Subsystem

There are two x16 high-speed SerDes lanes located at the upper-left and lower-right corner of the dies. Both links are MUX’ed with the Infinity Fabric InterSocket controller (IFIS) and the PCIe controller. Additionally, the lower-right link is also MUX’ed with the SATA controller. When the Infinity Fabric is the selected protocol, the entire link (i.e., all 16 lanes) is used up for this purpose. When the PCIe protocol is selected, up to 8 PCIe ports of varying widths are possible. For the link where the SATA controller is also an option, up to 8 of the PCIe lanes can be used up for this purpose. Note that a mix configuration is possible – i.e., if a subset of the SATA ports are used, the remaining lanes can still be used for standard PCIe lanes.

WikiChip’s diagram of the possible SerDes MUXing and bifurcation options for the Zeppelin SoCs.

Though this wasn’t in the presentation, which predated AMD’s announcement of EPYC Embedded 3000 and Ryzen Embedded V1000 SoCs, the bottom links can also be MUX’ed with the Ethernet port. They can be configured as either up to 8 SATA lanes and up to 4 x 10GbE ports, or a mixed configuration with the PCIe.

AMD says that the muxing logic that was added to support these features added less than one channel clock of latency to the latency-sensitive infinity fabric path.

–
Spotted an error? Help us fix it! Simply select the problematic text and press Ctrl+Enter to notify us.
–

Pages: 1 2 3 4

ISSCC 2018: AMD’s Zeppelin; Multi-chip routing and packaging

Scalable Data Fabric (SDF)

Local Access

Non-Local Access

I/O Subsystem

Spelling error report

The following text will be sent to our editors:

Your comment (optional):

Scalable Data Fabric (SDF)

Local Access

Non-Local Access

I/O Subsystem

Related Articles

Spelling error report

The following text will be sent to our editors:

Your comment (optional):