The silence is finally over. Last week at Intel’s Architecture Day the company finally disclosed some of the intimate details of their next-generation core microarchitecture.
This article is part of a series that details Intel’s Architecture Day.
1. Advanced 3D Packaging For More-than-Moore to Supplement 10- and 7-Nanometer Nodes
2. Intel Discloses 10nm Sunny Cove and Core Roadmap, Teases Ice Lake Chips
The day started off with Raja Koduri talking about the historical trends in single-thread performance and the importance of single-thread performance to the company. The majority of workloads and algorithms are still predominantly single-threaded. “Single-thread performance is still foundational for Intel moving forward,” Koduri said.
Ever since getting stuck on a never-ending Skylake-refresh loop, Intel published very few public roadmaps. Things have finally changed. At Architecture Day the company published not one, but two roadmaps – one for their high-performance big cores and one for their high-efficiency small cores.
Even for those who follow Intel closely, none of the Cove names should be familiar. Up until Skylake, Intel’s codename referred to both the core and the full chip. Starting with Cannon Lake, Intel decoupled the two – you have Cannon Lake, the full chip, and Palm Cove, the core itself. The reason for this is simple, the core is going into products other than just the one main chip design. There is now a need to be able to refer to two independently. For example, one of the products planned for next year is an ultra-low power 3D stacked processor. That chip features a heterogeneous core x86 architecture with one big Sunny Cove core and four small cores which we believe to be Tremont.
Three big core microarchitectures were announced – Sunny Cove (2019), Willow Cove (2020), and Golden Cove (2021). A few high-level details were given about each. It’s worth noting that, except for a single Cannon Lake chip, Intel is skipping over Cannon Lake. In fact, at Architecture Day there was no mention of it although they did acknowledge its existence when we asked. What will be shipping next year is Sunny Cove, their next-generation big core microarchitecture. Those will be found in the Ice Lake-based system. Sunny Cove brings improvements to single-thread performance, new ISA extensions, and “improved scalability”. Sunny Cove will be discussed in more detail later in this article.
Willow Cove will succeed Sunny Cove around the 2020 timeframe. Willow Cove is planned to take advantage of new transistor optimizations, have a redesigned cache subsystem, and introduce new security features. Succeeding Willow Cove is Golden Cove around 2021. Golden Cove will bring single-thread performance, AI performance, “network/5G performance”, and more security features. Intel will release more details as we get closer to launch time but for now, your guess is as good as ours as far as what some of those changes mean.
|Big Core Microarchitectures|
|Sunny Cove||ST performance, new ISA, scalability improved|
|Willow Cove||Cache redesign, new transistor optimization, security features|
|Golden Cove||ST performance, AI performance, network/5G performance, security features|
Intel’s high-efficiency small cores follow a slower release cadence. Intel is releasing a new small core roughly every two big core releases. Next year, Intel will be releasing Tremont, their first 10-nanometer small core. Tremont will bring single-thread performance improvement along with “network server performance” and “battery life performance”.
The follow-on to Tremont is Gracemont which will be released around the 2021 timeframe. Gracemont will also bring single-thread performance improvement along with vector performance. The “vector performance” is particularly interesting because it hints that the Atom core will be receiving AVX support. The successor to Gracemont will come around 2022/23. Intel didn’t disclose its name or any of the major enhancement it might have. It’s clearly in early design stages. The consistent theme across all the small cores is that the single-thread performance will continue to improve. Raja Koduri stated that Jim Keller has been driving an aggressive roadmap for the small cores which has much more room to grow in performance.
|Small Core Microarchitectures|
|Tremont||ST performance, network server performance, battery life performance|
|Gracemont||ST performance, frequency, vector performance|
|??-mont||ST performance, frequency, other features|
One of the key changes with future Intel microarchitectures is that they will no longer be heavily dependent on the underlying process technology. “One of the things that myself, Raja, and others have discussed is what we learned from 10-nanometer is that the resilience of our roadmap to have an IP moving independently of SoC is very important”, Murthy Renduchintala said. Intel says that looking forward, node N and node N+1 will see much more IP intertwining. There is a lot of optimizations opportunities with mature nodes that can be taken advantage of while moving some products to a new node. In the future, it’s entirely possible we might see some cores being fabbed on different processes.
10nm Xeons Shown
Intel showed a couple of 10-nanometer Xeons at the event. The first one is an Ice Lake SP engineering sample. At this point, we don’t know much about it beyond the fact that it just came back from the fab about three weeks ago and can boot Linux.
At one of the demo stations, Intel had a Xeon server SoC. Silicon came back around a month and a half ago. We think this is actually a multi-core Tremont server SoC. Intel says this is a brand new product as opposed to a shrink of any existing product they currently sell. The chip comes with various networking offloading accelerators targeting high-traffic data center workloads. The company says there is a lot of customer interest in the product. The demo was pretty simple, showing 100 Gbps of traffic is being driven into the SoC which gets decoded, encoded, and sent back out. Given this was done on the accelerator, there was minimal load on the cores themselves. More information about the product will come early next year.
Intel highlighted their overall strategy with building and improving their microarchitecture. Improvements consist of two types.
- General-purpose performance – accelerating the performance of the existing compiled code base. In other words, improving the performance of existing programs out of the box.
- Special-purpose performance – accelerating the performance of code by re-compiling it to take advantage of architecture extensions that target specific use cases and algorithms.
Although there is more performance extracting opportunities from recompilation and tuning, Ronak Singhal says Intel doesn’t want that to be the norm. The overarching principle here is to offer additional performance “for free” out of the box, but if the programmer wishes to go out of their way to re-optimize and re-compile, the performance improvement opportunities are there for him.
The performance of the processor for a given program with a fixed number of instructions is a function of two tuning knobs – IPC and frequency. Given frequency is a byproduct of many things including the product itself and the market segment, it’s a little too early to discuss that at this point. Instead, Ronak went over four IPC-improving opportunities that Intel leverages in their upcoming 10nm core, Sunny Cove.
- ISA – Leveraging the ISA through new specialized instructions is the first and most straightforward way of improving the IPC. Some specialty workloads that Intel is accelerating with Sunny Cove includes compression and decompression, communications and networking, vector processing, and other functionality to help with communication between threads, cores, and accelerators.
- Deeper – Provides opportunities for extracting a greater amount of parallelism by working on a large set of instructions.
- Wider – Improves the performance by allowing more operations to run in parallel.
- Smarter – Implementing new algorithms that allow for a more power-efficient design as well as improving other qualities such as reducing average latencies
Perhaps the most interesting one in that list is the use of new algorithms to improve both power efficiency as well as to reduce latencies. Intel only touched on that very lightly but said they will give out more details next year.
- Intel Announces Keem Bay: 3rd Generation Movidius VPU Intel announces Keem Bay, its 3rd-generation Movidius VPU edge inference p...
- Marvell Lays Out ARM Server Roadmap Marvell outlines its current and future Arm server microprocessor roadmap...
- A Look at Spring Crest: Intel Next-Generation DC Training Neural Processor A look at the microarchitecture of Intel Nervana next-generation data cente...
- Arm Makes Headway In HPC, Cloud Arm makes headway in HPC and cloud with Cray's new support for the Fujitsu...
- Intel Launches Stratix 10 GX 10M; 10M LEs, Two Massive Interconnected Dies Intel launches the industry's highest-capacity FPGA; 10-million LEs compris...
- Arm Makes Headway In HPC, Cloud
- Intel Announces Keem Bay: 3rd Generation Movidius VPU
- A Look at Spring Crest: Intel Next-Generation DC Training Neural Processor
- Marvell Lays Out ARM Server Roadmap
- AMD Announces 3rd Gen Ryzen Threadripper
- Intel Launches Stratix 10 GX 10M; 10M LEs, Two Massive Interconnected Dies
- IBM Adds POWER9 AIO, Pushes for an Open Memory-Agnostic Interface
- TSMC 5-Nanometer Update
- Intel Core i9-9900KS Special Edition Full Specs and Availability Announced
- Intel Unveils the Tremont Microarchitecture: Going After ST Performance