Previously we covered the Open Compute Project’s latest endeavor – ODSA – an industry-wide collaboration for the open standardization of chiplets. The group is pursuing the standardization of the entire architecture interface stack so that chiplets from different sources could seamlessly communicate with one another. Currently, there is a large focus in the industry on advanced packaging. Silicon interposers and silicon bridges are making their way to mainstream products. As advanced packaging makes its way to more products, new interfaces are being developed to more easily linked dies together. TSMC is introducing its LIPINCON interconnects and Intel has a number of interconnects including AIB and MDIO.
The common theme among all those interconnects is that they are designed to go over silicon – and silicon is expensive. Even silicon bridges are significantly more expensive than a standard organic substrate. The OCP ODSA group is making the case, at least for some designs, a good old organic substrate works just fine. There is just one problem: there doesn’t actually exist a modern inter-die interconnect specifically designed for a standard organic-substrate-based multi-chip package with decent throughput and power consumption.
The ODSA group wants to step in here and help. In addition to supporting existing open standards such as AIB, the group wanted to enable support for cheaper packages that do not rely on silicon. There are some good arguments for using multi-chip packages: they are cheap, mature, and highly reliable. Additionally, there is generally better screening for KGDs and because they can be spaced further, they exhibit slightly better heat dissipation characteristics. The major downside to all of this is that traces between chips are generally much wider, yielding lower wire density. This, however, at least in theory, could be compensated for with 6x-10x higher throughput.
This is where the Bunch of Wires (BoW) comes in – and yes, that is its actual technical name! This is a brand new interface by the OCP ODSA group designed to address the interface void for organic substrates. Therefore the specs, testing, validation, and characterization for BoW have all been done on organic substrates. BoW has some fairly aggressive performance targets based on industry customer surveys for what they were looking for in an interface. In terms of throughput efficiency, they are going for 100 Gbps/mm to 1 Tbps/mm (die edge) with an energy efficiency of 1pJ/bit to 0.5pJ/bit – numbers that rival that of current-generation silicon interposes. Since dies are spaced apart, a trace length of 25mm to 50mm is required with a latency of sub-5ns. The group had a number of additional requirements such as it has to be relatively simple to design, especially on advanced nods such as 7 nm, 5 nm and 3 nm. The final requirement is that it uses a single supply voltage. In other words, it needs to use the same supply that the logic uses (i.e., standard Vdd rage of around 0.7V-0.9V) for maximum process compatibility.
A simple unterminated lane (driver and inverter and a latch) implementation can already get up to around 5 Gbps/wire with wires up to 10mm in length. With a simple modulation such as NRZ it’s possible to get it up to 50 Gbps or even double that rate with PAM4. The problem with PAM4 is the undesirably high error rates, necessitating forward error correction (FER). This, in turn, increases both the power consumption of the links but also the latency. For the Bunch of Wires, the bandwidth of NRZ is doubled by using simultaneous bidirectional terminated lines. In other words, instead of using the bandwidth in just one direction over the transmission line, BoW signals are transmitted bidirectionally on the interconnection to double the effective data rate to around 100 Gbps without FEC. A proof of silicon has been fabricated on GlobalFoundries 14-nanometer process which reaches 28 Gbps in each direction for an effective bidirectional bandwidth of 56 Gbps/port. At the current target supply voltage of 0.75 V they are reporting a power efficiency of 0.7 pJ/bit. (Note that AQLink is the ultra-short reach SerDes by Aquantia).
For the Bunch of Wires, a couple of flavors are being proposed – BoW-Base, BoW-Fast, and BoW-Turbo.
BoW-Base is the base implementation that has a range of under 10 mm. This is a very simple implementation with rates up to 4 GT/s using unterminated lanes. BoW-Fast (also called Plus) is a terminated version of BoW-Base but is still unidirectional. This implementation targets rates of up to 16 GT/s. Finally, the BoW-Turbo version uses the same data rate as BoW-Fast but utilizes simultaneous directional links to double the effective rate to 32 GT/s/wire. Both BoW-Fast and BoW-Turbo have a maximum trace length of up to 50 mm. Note that regardless of the BoW option chosen, the rate is capped at 16 GT/s in order to reduce the complexity of design and ease of port.
It’s worth pointing out that all three implementations are actually backward-compatible. BoW-Turbo can always communicate with BoW-Turbo by default. In order to communicate with a chiplet that uses BoW-Fast, it’s only necessary to disable a single transmit/receive per lane which makes it fold back to unidirectionality. Likewise, to go from BoW-Fast to BoW-Base, it’s only necessary to disconnect the line termination.
The BoW bump building block slice comprises 16 single-ended data bumps, differential clocks, a mode bump, and an optional error correction bump. A slice is 1170 µm x 320 µm (~0.4 mm²), assuming 130 µm bump pitch. If we do some back-of-the-envelope calculations, under BoW-Base, a single BoW Slice has an aggregated bandwidth of 64 Gbps, BoW-Fast quadruples this to 256 Gbps, and BoW-Turbo doubles that rate to 512 Gbps. That works out to 1280 Gbps/mm², not bad for an organic substrate. Of course, multiple BoW slices can be combined to increase throughput per die edge. It’s possible to stack up to around four slices on top of each other.
So what about the control communication of BoW? We pointed out earlier that there is only a single mode bump. Instead of adding additional bumps for the sideband control/calibration state, a simple shared open-drain bump technique is used. Simply toggle the mode bit to switch between data and control. For one of the sides to go into calibration mode, the mode bump is pulled down. Otherwise, the data bumps are assumed to be in standard data mode.
Chiplet Interconnect Comparison
On GlobalFoundries 14-nanometer process, current proof of concepts shows an energy-efficiency of around 0.7 pJ/bit. They estimate this can be reduced to 0.5 pJ/bit on a 7-nanometer node.
|Current Chiplet-based Demos|
|Chip||Stratix 10||Zen||VLSI Demo||This|
|Channel||1 mm||N/A||500 µm||N/A|
|Chiplet I/O Bumps||55 µm||150 µm||40 µm||130 µm|
|Data Rate||2 GT/s||10.6 GT/s||8 GT/s||32 GT/s|
|Power||1.2 pJ/bit||2 pJ/bit||0.56 pJ/bit||0.7 pJ/bit|
It’s worth highlighting that BoW is designed for standard multi-chip packages with a bump pitch of around 130-micron yielding a bump density of just 68 bumps/mm². More recently, Intel unveiled the MDIO interconnect which has much more aggressive shoreline bandwidth density. Nonetheless, BoW makes up for it with higher data rates. The final result is that, against the current generation of interconnects, with the ability to stack up to four slices, BoW provides slightly lower areal bandwidth density but higher shoreline bandwidth density.
|Current Chiplet-based Interconnects|
|Interconnect||AIB Gen1||MDIO Gen1||LIPINCON||Bow-Turbo
|Data Rate||2 GT/s||5.4 GT/s||8 GT/s||16 GT/s|
|Shoreline BW Density||504 Gbps/mm||1600 Gbps/mm||536 Gbps/mm||1280 Gbps/mm|
|PHY Power||0.85 pJ/bit||0.5 pJ/bit||0.5 pJ/bit||0.7 pJ/bit
|Areal BW Density||150 GBps/mm²||198 GBps/mm²||198 GBps/mm²||148 GBps/mm²|
- TSMC Details 5 nm
- IBM Doubles Its 14nm eDRAM Density, Adds Hundreds of Megabytes of Cache
- TSMC Announces 2x Reticle CoWoS For Next-Gen 5nm HPC Applications
- CEA-Leti Demos a 6-Chiplet 96-Core 3D-Stacked MIPS Processor
- Intel Refreshes 2nd Gen Xeon Scalable, Slashes Prices
- Radeon RX 5700: Navi and the RDNA Architecture
- 7nm Boosted Zen 2 Capabilities but Doubled the Challenges
- Arm Launches the Cortex-M55 and Its MicroNPU Companion, the Ethos-U55
- Inside Rosetta: The Engine Behind Cray’s Slingshot Exascale-Era Interconnect
- Arm Ethos is for Ubiquitous AI At the Edge