With mobile Ice Lake processors shipping today, Intel is disclosing the final pieces of information about their new Sunny Cove cores.
The front-end of Sunny Cove is quite similar to Skylake. Depending on the source of the µOP, Sunny Cove can deliver 4, 5 or 6 micro-ops to the back-end. Up to 5 µOP may be delivered from the decoders with up to 6 µOP possible from the µOP cache which stores pre-decoded operations. On Sunny Cove, Intel increased the µOP cache by 50%. It can now store up to 2.25K µOPs. The branch predictor has also been improved from Skylake.
The major improvements can be found in the back-end of the machine which is now wider and deeper. Sunny Cove can now allocate up to five µOPs each cycle and send up to 10 µOPs for execution each cycle, an increase of 1.25x for both.
What’s new with Ice Lake U and Y specifically is the introduction of AVX-512 in this segment of the market (note that there is a single Cannon Lake chip that also features AVX-512 but has never made it to mass production). The AVX-512 implementation in mobile Ice Lake chips is optimized for the power levels available on a laptop. In other words, the implementation here differs from the one found in the Ice Lake server parts. On those parts, you can do one 512-bit or two 256-bits FMA operations.
The level 1 data cache on Sunny Cove has grown to 48 KiB. This is the first time Intel has increased the L1 cache in over a decade. Sunny Cove also doubled the second level cache from 256 KiB to 512 KiB. Along with this Intel also increased the TLB sizes. The previously shared 4K and 2M/4M L2 TLB has been split up into a 4K L2 page table and a 2M/4M table.
|Intel Microarchitecture Cache Comparison|
|L1D Cache||32 KiB||32 KiB||48 KiB|
|L2 Cache||256 KiB||256 KiB||512 KiB|
|µOP Cache||1.5K µOPs||1.5K µOPs||2.25K µOPs|
Sunny Cove overhauled the core microarchitecture in order to extract significantly more single-thread performance for existing code. Intel increased the reorder buffer by over 50% from 224 entries in Skylake to 352 entries in Sunny Cove. Likewise, the amount of inflight loads and stores has been increased by over 50% to 200 memory operations inflight.
|Intel Microarchitecture Cache Comparison|
What do all of those changes amount to? For out-of-the-box performance, compared to the Skylake core, at the same frequency, same memory bandwidth, and same binaries, Intel is claiming an average of 18% increase across a range of proxy workloads such as SPEC 2006, SPEC 2017, SYSmark 2014 SE, WebXPRT, and Cinebench R15. “The last time we have had something in this range was during the Merom era,” Ronak, Intel Chief Core Architect, said. He added,
I have had some of you ask me, “can you actually push the IPC further anymore or are we running out of steam doing this?” And the answer is: No. And Sunny Cove is one example of this. The other microarchitectures we are working on back at home will go even beyond this. Sunny Cove is the foundation for our next wave of innovation.
Update: Our original story erroneously reported that Ice Lake supported one 512-bit AND a 256-bit FMA operation at once. It is actually one 512-bit OR 2 256-bit FMA operations at once. We apologize for the mistake.
- Intel Launches 10th Gen Ice Lake Lineup: 11 New Mobile Chips
- Intel Announces 10th Gen Core Processors Based On 10nm Ice Lake, Now Shipping
- Intel Reveals 10nm Sunny Cove Core, a New Core Roadmap, and Teases Ice Lake Chips
- Intel Spring Hill: Morphing Ice Lake SoC Into A Power-Efficient Data Center Inference Accelerator
- A Look At The Ice Lake Thunderbolt 3 Integration
- Leaked Intel Server Roadmap Shows Sapphire Rapids With DDR5/PCIe 5.0 For 2021, Granite Rapids For 2022
- Intel Unveils the Tremont Microarchitecture: Going After ST Performance
- NEC Refreshes SX-Aurora Vector Engine, Outlines Roadmap
- Japanese AI Startup Preferred Networks Designed A Custom Half-petaFLOPS Training Chip
- Samsung M5 Core Details Show Up
- SC19: Aurora Supercomputer To Feature Intel First Exascale Xe GPGPU, 7nm Ponte Vecchio
- A Look at Cerebras Wafer-Scale Engine: Half Square Foot Silicon Chip
- Groq Tensor Streaming Processor Delivers 1 PetaOPS of Compute
- Arm Makes Headway In HPC, Cloud
- Intel Announces Keem Bay: 3rd Generation Movidius VPU
- A Look at Spring Crest: Intel Next-Generation DC Training Neural Processor
- Marvell Lays Out ARM Server Roadmap