Scalable Data Fabric (SDF)
The I/O Hub interfaces with the SDF through the I/O Master/Slave (IOMS) interface. Likewise, the two CCXs interface with the Cache-Coherent Masters (CCMs). The IOMS and the CCMs are the only interfaces that are capable of making DRAM requests. The DRAM is attached to the DDR4 interface which is attached to the Unified Memory Controller (UMC) which can communicate with the scalable data fabric.
The Coherent AMD socKet Extender (CAKE) module translates the request and response formats used by the SDF transport layer to and from the serialized format used by the IF Inter-Socket SerDes and the IF On-Package SerDes.
Under a local access, a core request would go through the CCX, CCM, through the fabric to the local UMC and to the local DRAM channel. The read data would then follow the same path in the reversed order back to the core. A round-trip on a system with a CPU frequency of 2.4 GHz and DDR4-2666 19-19-19 memory (i.e., MEMCLK of 1333 MHz) would be roughly 90 nanoseconds.
In the case of the EPYC and Ryzen Threadripper where more than a single Zeppelin is used, a memory access may have to be routed to neighboring Zeppelin. Regardless of the route, the path is always the same. A local core request is routed through the CCX, CCM, and to the CAKE module which encodes the request and sends it through the SerDes to a CAKE module on a remote die. The remote CAKE decodes the request and sends it to the appropriate UMC to the DRAM channel. The response is then routed back in the reverse order back to the request-originating core. A round-trip on a system with a CPU frequency of 2.4 GHz and DDR4-2666 19-19-19 memory (i.e., MEMCLK of 1333 MHz) across a different socket in a two-way multiprocessing configuration is roughly 200 nanoseconds while a round-trip across dies on the same package takes roughly 145 nanoseconds.
By the way, the longest distance consists of two hops – one to the adjacent socket and another one to get to next die. AMD did not report the round-trip latency for this scenario.
The difference in latency boils down to the type of SerDes used for the access. It’s worth pointing out that AMD’s “Smart Prefetch” (marketed under AMD’s “SenseMI”) in the core complexes help to greatly mitigate the latency of requests to memory that is attached to remote dies.
There are two x16 high-speed SerDes lanes located at the upper-left and lower-right corner of the dies. Both links are MUX’ed with the Infinity Fabric InterSocket controller (IFIS) and the PCIe controller. Additionally, the lower-right link is also MUX’ed with the SATA controller. When the Infinity Fabric is the selected protocol, the entire link (i.e., all 16 lanes) is used up for this purpose. When the PCIe protocol is selected, up to 8 PCIe ports of varying widths are possible. For the link where the SATA controller is also an option, up to 8 of the PCIe lanes can be used up for this purpose. Note that a mix configuration is possible – i.e., if a subset of the SATA ports are used, the remaining lanes can still be used for standard PCIe lanes.
Though this wasn’t in the presentation, which predated AMD’s announcement of EPYC Embedded 3000 and Ryzen Embedded V1000 SoCs, the bottom links can also be MUX’ed with the Ethernet port. They can be configured as either up to 8 SATA lanes and up to up to 4 x 10GbE ports, or a mixed configuration with the PCIe.
AMD says that the muxing logic that was added to support these features added less than one channel clock of latency to the latency-sensitive infinity fabric path.