China has taken a major step forward in its quest for high-performance domestic Chinese microprocessors with Zhaoxin’s launch of their newest x86 processors.
In case you’ve never heard about Zhaoxin, they are a Chinese microprocessor designer that has been working on developing a domestic x86 CPU microarchitecture. Being partially owned by VIA Technologies most likely means they are covered by VIA’s x86 cross-license agreement, although VIA refused to confirm this when we asked. The 2010 FTC settlement required Intel to modify agreements with AMD, Nvidia, and Via to allow them to undergo mergers and joint ventures with other companies without the threat of being sued for patent infringement. Zhaoxin is majority owned (80.1%) by the Shanghai Municipal Government and the push for domestic x86 chips comes as part of their national security initiative which calls for the reduction in reliance on foreign products and greater control over their own intellectual property (i.e., the hardware in this case).
5th Generation KaiXian
On December 28 at a conference dedicated for independently-developed domestic Chinese CPUs, Zhaoxin officially launched their 5th generation KaiXian processors. Fabricated domestically on HLMC’s 28nm process based on the WuDaoKou microarchitecture, those processors represent a significant step forward.
Zhaoxin announced two new series based on their latest architecture: KaiXian 5000 (KX-5000) and the KaisHeng 20000 (KH-20000). Note that “KaiXian”/”KX” is exactly the same family as the previously named “Zhaoxin KaiXian”/”ZX”. The slight renaming was done to distinguish prior VIA Technologies architecture from Zhaoxin mostly domestically developed architecture.
The KaiXian 5000 series is mostly aimed for PCs, workstations, and laptops. Those SKUs are positioned against Intel’s Core i3 and Core i5 processors.
|KX-5640||4/4||2.0 GHz||4 MiB|
|KX-5540||4/4||1.8 GHz||4 MiB|
|KX-U5680||8/8||2.0 GHz||8 MiB|
|KX-U5580||8/8||1.8 GHz||8 MiB|
|KX-U5580M||8/8||≤ 1.8 GHz||8 MiB|
The model numbering mimics that of AMD and Intel. The first digit ‘5’ refers to 5th generation. The next three digits refer to the clock, number of cores, and market segment. Additionally, the U prefix refers to high-end 8 core models while the M suffix refers to low-power models. All models have virtualization support compatible with Intel’s VT-x, Trusted Execution Technology (TXT), SSE 4.2, and AVX support. Those models support 64 GiB of DDR4 memory and integrated a GPU that supports up to three displays with DirectX 11.1 support and 4K resolution. It seems that the whole “VR Ready” thing has made it to China as well because Zhaoxin mentioned it at least a dozen times in their brochure.
Additionally, Zhaoxin also announced the KaisHeng 20000 series which is geared towards embedded networking, storage, and servers. This series should not be confused with a similarly named “ZX-2000” series which are actually quad-core ARM Cortex-A17 CPUs.
|KH-26800||8/8||2.0 GHz||8 MiB|
|KH-25800||8/8||1.8 GHz||8 MiB|
As with the KX-5000 parts, all models have virtualization support compatible with Intel’s VT-x, Trusted Execution Technology (TXT), SSE 4.2, and AVX support. The KaisHeng 20000 parts support up to 128 GiB of memory and have added support for ECC and RDIMMs. Additionally, those SKUs do not have a GPU enabled.
Chinese, but American lineage
The newly announced processors are based on the WuDaoKou microarchitecture. Zhaoxin boasts that this as the first truly domestic x86 microarchitecture and the only one fully compatible with all existing software – including Window 10. The truth is a bit more complicated. WuDaoKou is the successor to ZhangJiang. The interesting part is what ZhangJiang succeeds? ZhangJiang is an out-of-order core manufactured on TSMC’s 28 nm process. We believe ZhangJiang is in fact the successor to VIA’s Isaiah II (as opposed to Isaiah). Isaiah was VIA’s first out-of-order design which found its way to the VIA Nano.
If you are like most people, the last time you might have heard about VIA was during Centaur’s heydays, here is a rough timeline to help you find your way.
We have managed to confirm with Zhaoxin that the core is indeed that of Centaur Technology. This Chinese x86 chip has a uniquely Texan lineage! Unfortunately, Zhaoxin isn’t exactly aware of VIA’s internal codenames which made it impossible to confirm whether it was the original Isaiah or Isaiah II design. We believe that ZhangJiang is almost identical to VIA’s Isaiah II design. In fact, we believe it’s part of the reason the ZX-C is still fabricated on TSMC’s 28nm. It was largely VIA’s original Isaiah II floorplan and design. It’s worth noting that Zhaoxin has made a few minor improvements to PadLock (a security engine found on many VIA chips) such as adding support for the two Chinese cryptographic hash algorithms SM3 and SM4. But beyond that, the architecture is identical.
Over the last couple of years, Zhaoxin has invested most of its resources into WuDaoKou which is substantially different from all prior designs. They no longer use TSMC 28nm but instead have opted to use Shanghai Huali Microelectronics Corporation (HLMC) 28 nm process meaning this chip is not only designed in China, it’s also made there. The effort to move fabrication to mainland China is driven by their national security initiative. Unfortunately, the lack of a leading-edge foundry makes this rather difficult. It’s why we are going to see them switch back and forth between TSMC and mainland China solutions as they become available. It’s worth noting that the Shanghai Municipal Government is the majority share owner of both companies (HLMC and Zhaoxin).
WuDaoKou is the first design that is similar to contemporary x86 microprocessors. WuDaoKou finally got rid of the front-side bus (FSB). Previously, the chipset integrated the southbridge and northbridge. In fact, the microprocessor die itself was simply the cores. With WuDaoKou, they moved to a modern SoC design. They also introduced a new uncore which now houses the memory controller as well as all the I/O PHYs and memory and cache arbitration.
The new chip is a complete SoC, incorporating N-core clusters, an integrated graphics processor, and the uncore on a single die. Each cluster (Zhaoxin also calls a module) is made of four cores and a shared L2. The clusters are met at the Uncore and can communicate directly with each other via a new coherent fabric. While the design can scale up to a higher core count, current chips only have two clusters for a total of eight cores.
The fabric is a point-to-point high-speed interconnect crossbar that offers substantially higher bandwidth than the prior solution (front-side bus) was able to deliver. Additionally, the fabric also reduces the latency and provides facilities for control flow and cache coherency. Since this chip also incorporates a GPU, it is also connected via the fabric. The new memory controller found in the uncore has been improved. It now supports up to dual-channel DDR4 with data rates of up to 2400 MT/s (although current SKUs only seem to support up to 2133 MT/s). Zhaoxin said this is the first domestic CPU to have a dual-channel DDR4 memory controller.
Significant improvements were done to the core. Although Zhaoxin didn’t go into too many details, they did note that the execution blocks have been rebalanced, a number of pipeline stages were eliminated, and the branch prediction unit was entirely reworked. Overall, the new processors are said to be roughly 25% faster in single-thread performance and 40% faster in multi-core workloads.
The higher integration does come at a cost. The new KX-5000 parts pack 2.1 billion transistors. That’s around seven times as much as the ~300M transistors the ZX-C had. Additionally, the die area itself has ballooned to 187 mm². This will have a fairly significant impact on both yield and cost over the older parts.
We’ve asked Zhaoxin if they are affected by the recent security vulnerabilities and they confirmed that the KX-5000 series is unaffected by Meltdown. They also noted that their chips are indeed affected by Spectre, adding that it requires a much more complex sequence of operations, making an attack incredibly difficult and impractical. In fact, Zhaoxin is attempting to leverage Meltdown to push their own domestically-designed chips as a safer alternative.
We normally don’t bother mentioning performance scores reported by manufacturers because they tend to cherry pick their scores. However, given those processors will most likely never make it to review sites such as AnandTech for a proper review, we figured we’ll mention a few claims.
Zhaoxin reported the following SPEC CPU 2006 scores:
|SPEC CPU 2006 Scores|
|Test||KX-5640 (4C @ 2GHz)||KX-U5680 (8C @ 2GHz)||Atom C2750 (8C @ 2.4GHz/2.6GHz)|
We’ve added an Atom C2750 microserver chip to the table since it in the ballpark of the KX-5000 performance (though it seems they might be closer to Intel’s Goldmont). It also doesn’t have multi-threading support like WuDaoKou. That part is an Avoton core based on Intel’s 22nm Silvermont. Note that we don’t actually know if Zhaoxin’s scores use the base options (i.e., SPECint_base2006 vs SPECint2006) or if they use additional optimization flags (i.e., SPECfp_base2006 vs SPECfp2006) but we’ve only used base scores for the Atom listing.
Pass AMD In Performance
Zhaoxin is already working on their next generation, KX-6000, processors. Those processors are based on the Lujiazui microarchitecture which is planned for TSMC’s 16nm process (although we were told they might switch to SMIC’s 14nm eventually when ready). In order to increase the performance, a primary area of focus is increasing the clock frequency. Lujiazui is expected to reach at least 3 GHz. Additionally, the memory controller will support higher data rates (up to 3200 MT/s).
Zhaoxin has stated they intend on reaching AMD level of performance with KX-6000’s successor, KX-7000. That is, they want the KX-7000 to match the performance of Zen 2. While the process for KX-7000 is currently unknown, they would most likely have to move to TSMC’s 10nm or 7nm process. They are planning on supporting DDR5 and PCIe 4 as well as even higher clock frequency. Zhaoxin stated that they plan on making major enhancements to the pipeline in order to substantially improve IPC although they did not go into any details. They expect around 1.5x improvement in single-thread performance over the KX-5000.
All in all, Zhaoxin is currently still playing catch-up but they have made a major leap forward with WuDaoKou. They will have to make a series of similar strides with future architectures in order to substantially close the gap. Unfortunately, even with a 1.5x ST performance, they would be a fair bit behind in IPC given that the KX-5000 series appears to be slightly behind Intel’s Goldmont level of performance. Whether they will be able to catch up to AMD or Intel remains to be seen; nonetheless Zhaoxin is determined to displace those two companies in China.
- Intel Announces Keem Bay: 3rd Generation Movidius VPU Intel announces Keem Bay, its 3rd-generation Movidius VPU edge inference p...
- Marvell Lays Out ARM Server Roadmap Marvell outlines its current and future Arm server microprocessor roadmap...
- A Look at Spring Crest: Intel Next-Generation DC Training Neural Processor A look at the microarchitecture of Intel Nervana next-generation data cente...
- Arm Makes Headway In HPC, Cloud Arm makes headway in HPC and cloud with Cray's new support for the Fujitsu...
- Intel Launches Stratix 10 GX 10M; 10M LEs, Two Massive Interconnected Dies Intel launches the industry's highest-capacity FPGA; 10-million LEs compris...
- Arm Makes Headway In HPC, Cloud
- Intel Announces Keem Bay: 3rd Generation Movidius VPU
- A Look at Spring Crest: Intel Next-Generation DC Training Neural Processor
- Marvell Lays Out ARM Server Roadmap
- AMD Announces 3rd Gen Ryzen Threadripper
- Intel Launches Stratix 10 GX 10M; 10M LEs, Two Massive Interconnected Dies
- IBM Adds POWER9 AIO, Pushes for an Open Memory-Agnostic Interface
- TSMC 5-Nanometer Update
- Intel Core i9-9900KS Special Edition Full Specs and Availability Announced
- Intel Unveils the Tremont Microarchitecture: Going After ST Performance