Extending cache coherency beyond CPUs has been a hot topic in the last few years. Most recently, the Compute Express Link (CXL) has been gaining the most steam. But CXL is designed to take advantage of PCIe Gen 5.0 and take advantage of its higher data rates so we are at least a year away from seeing products with CXL. Meanwhile, Intel hasn’t been standing still. The company has been working on a number of stopgap products in anticipation of CXL. The first such product is the Stratix 10 DX which was launched yesterday.
Accelerators are a key product vector for the Stratix 10 series where it has been used in anywhere from the edge to high-end DC workload acceleration. Depending on the desired workload, there may need to be a large exchange of data between the FPGA and the CPU, therefore, adding memory coherency and allowing the FPGA to more efficiently access memory directly is a highly desirable feature.
This where the Stratix 10 DX comes in. The new FPGA is introducing cache coherency support. However, since PCIe Gen 5.0 and CXL are quite ready, for this generation, Intel is bringing this support through their existing infrastructure – UPI. The ultra path interconnect (UPI) is Intel’s proprietary point-to-point cache-coherent interconnect which facilitates Intel Xeon symmetric multiprocessing support. With the Stratix 10 DX, this is getting extended to their FPGAs as well. The Stratix 10 DX UPI links are intended to connect to future select Xeon processors via UPI. Operating at up to 11.2 GT/s, each UPI link has a bandwidth of 22.4 GB/s. Intel says that compared to standard PCIe, UPI allows for up to 37% lower latency on memory coherent applications.
This isn’t actually the first time Intel introduced UPI support in their FPGAs. Last year, the company launched the Xeon Gold 6138P. This is a unique Xeon microprocessor with an Arria 10 GX 1150 FPGA on-package. This FPGA had a UPI link and two x8 PCIe Gen 3 links which were hardwired to the Xeon processor.
The Arria 10 is a much simpler FPGA. The Stratix 10 DX is a much beefier, chiplet-based FPGA, with an upgradable I/O subsystem. And here we are really seeing this chiplet strategy paying off with rapid development of new features through new chiplets.
In addition to the UPI link, there a few more upgrades in the new DX family. For non-coherent traffic, instead of the PCIe Gen 3, Intel added hard IP support for PCIe Gen 4.0. There is also a new memory controller. The new memory controller is bringing support for direct connections to select Optane DC persistent memory.
In total, three devices are being announced, the DX 1100, the DX 2100, and the DX 2800 ranging from 1.325 million LEs to 2.753 million. All three devices come with a single E-Tile, though the number of supported channels varies. The simplest device is the DX 1100 which comes with a 16-channel E-Tile and a single 16-channel P-Tile giving a total of 32 transceiver channels. Note that since UPI requires 20 channels, the P-Tile on the DX 1100 will only support PCIe Gen 4.0 x16 exclusively. In addition to the different tiles being used, the DX 1100 comes with a quad-core Cortex-A53 hard processor subsystem (HPS). This is the only device in the DX family to feature an HPS.
Looking at the bigger devices, the DX 2100 comes with three 20-channel P-Tiles along with a 24-channel E-Tile, for a total of 84 transceiver channels. With 20-channel each on the P-Tiles, it supports up to three UPI links. The DX 2100 also comes with 2 HBM2 stacks for a total of 8 GiB of fast local memory.
The largest FPGA, the DX 2800, comes with 84 transceiver channels as well but it sacrifices some of the E-Tile transceivers in favor of more PCIe lanes. In total, it offers four P-Tiles – three 20-channels tiles and a single 16-channel tile as well as a smaller 8-channel E-Tile. Note that both the DX 2100 and the DX 2800 support up to three UPI links through their 20-channel P-Tiles only.
|Stratix 10 DX Family|
|DX 2800||2,753,000||–||84||3||1||2×4 GiB||–|
The Statix 10 DX FPGAs are already shipping for early access customers with volume production in 2020.
The Big Picture
There is a need for supporting cache-coherent traffic on modern FPGAs. With their chiplet-based FPGA architecture in the Stratix 10, Intel was able to iterate over their prior I/O chiplets and introduce both high-bandwidth PCIe Gen 4.0 as well as bringing cache coherency support via UPI. But UPI is just a stopgap measure. Looking a bit further out, the real goal is to facilitate cache coherency via the open CXL standard instead of the company’s proprietary UPI interconnect. CXL will come along with PCIe Gen 5.0 as part of Intel’s next-generation FPGA family, Agilex.