Zhaoxin launches their highest-performance Chinese x86 chips

China has taken a major step forward in its quest for high-performance domestic Chinese microprocessors with Zhaoxin’s launch of their newest x86 processors.

In case you’ve never heard about Zhaoxin, they are a Chinese microprocessor designer that has been working on developing a domestic x86 CPU microarchitecture. Being partially owned by VIA Technologies most likely means they are covered by VIA’s x86 cross-license agreement, although VIA refused to confirm this when we asked. The 2010 FTC settlement required Intel to modify agreements with AMD, Nvidia, and Via to allow them to undergo mergers and joint ventures with other companies without the threat of being sued for patent infringement. Zhaoxin is majority owned (80.1%) by the Shanghai Municipal Government and the push for domestic x86 chips comes as part of their national security initiative which calls for the reduction in reliance on foreign products and greater control over their own intellectual property (i.e., the hardware in this case).

5th Generation KaiXian

On December 28 at a conference dedicated for independently-developed domestic Chinese CPUs, Zhaoxin officially launched their 5th generation KaiXian processors. Fabricated domestically on HLMC’s 28nm process based on the WuDaoKou microarchitecture, those processors represent a significant step forward.

Zhaoxin announced two new series based on their latest architecture: KaiXian 5000 (KX-5000) and the KaisHeng 20000 (KH-20000). Note that “KaiXian”/”KX” is exactly the same family as the previously named “Zhaoxin KaiXian”/”ZX”. The slight renaming was done to distinguish prior VIA Technologies architecture from Zhaoxin mostly domestically developed architecture.

The KaiXian 5000 series is mostly aimed for PCs, workstations, and laptops. Those SKUs are positioned against Intel’s Core i3 and Core i5 processors.

New SKUs
Model Cores/Threads Frequency L2 Cache
KX-5640 4/4 2.0 GHz 4 MiB
KX-5540 4/4 1.8 GHz 4 MiB
KX-U5680 8/8 2.0 GHz 8 MiB
KX-U5580 8/8 1.8 GHz 8 MiB
KX-U5580M 8/8 ≤ 1.8 GHz 8 MiB

The model numbering mimics that of AMD and Intel. The first digit ‘5’ refers to 5th generation. The next three digits refer to the clock, number of cores, and market segment. Additionally, the U prefix refers to high-end 8 core models while the M suffix refers to low-power models. All models have virtualization support compatible with Intel’s VT-x, Trusted Execution Technology (TXT), SSE 4.2, and AVX support. Those models support 64 GiB of DDR4 memory and integrated a GPU that supports up to three displays with DirectX 11.1 support and 4K resolution. It seems that the whole “VR Ready” thing has made it to China as well because Zhaoxin mentioned it at least a dozen times in their brochure.

KX-U5580M in a 37.5 mm x 37.5 mm HFCBGA package.

KaisHeng 20000

Additionally, Zhaoxin also announced the KaisHeng 20000 series which is geared towards embedded networking, storage, and servers. This series should not be confused with a similarly named “ZX-2000” series which are actually quad-core ARM Cortex-A17 CPUs.

New SKUs
Model Cores/Threads Frequency L2 Cache
KH-26800 8/8 2.0 GHz 8 MiB
KH-25800 8/8 1.8 GHz 8 MiB

As with the KX-5000 parts, all models have virtualization support compatible with Intel’s VT-x, Trusted Execution Technology (TXT), SSE 4.2, and AVX support. The KaisHeng 20000 parts support up to 128 GiB of memory and have additional support for ECC and RDIMMs. Additionally, those SKUs do not have a GPU enabled.

KX-25800 in a 37.5 mm x 37.5 mm HFCBGA package.

Chinese, but American lineage

The newly announced processors are based on the WuDaoKou microarchitecture. Zhaoxin boasts that this as the first truly domestic x86 microarchitecture and the only one fully compatible with all existing software – including Window 10. The truth is a bit more complicated. WuDaoKou is the successor to ZhangJiang. The interesting part is what ZhangJiang succeeds? ZhangJiang is an out-of-order core manufactured on TSMC’s 28 nm process. We believe ZhangJiang is in fact the successor to VIA’s Isaiah II (as opposed to Isaiah). Isaiah was VIA’s first out-of-order design which found its way to the VIA Nano.

VIA Nano

If you are like most people, the last time you might have heard about VIA was during Centaur’s heydays, here is a rough timeline to help you find your way.

We have managed to confirm with Zhaoxin that the core is indeed that of Centaur Technology. This Chinese x86 chip has a uniquely Texan lineage! Unfortunately Zhaoxin isn’t exactly aware of VIA’s internal codenames which made it impossible to confirm whether it was the original Isaiah or Isaiah II design. We believe that ZhangJiang is almost identical to VIA’s Isaiah II design. In fact, we believe it’s part of the reason the ZX-C is still fabricated on TSMC’s 28nm. It was largely VIA’s original Isaiah II floorplan and design. It’s worth noting that Zhaoxin has made a few minor improvements to PadLock (a security engine found on many VIA chips) such as adding support for the two Chinese cryptographic hash algorithms SM3 and SM4. But beyond that, the architecture is identical.

Over the last couple of years, Zhaoxin has invested most of its resources into WuDaoKou which is substantially different from all prior designs. They no longer use TSMC 28nm but instead have opted to use Shanghai Huali Microelectronics Corporation (HLMC) 28 nm process meaning this chip is not only designed in China, it’s also made there. The effort to move fabrication to mainland China is driven by their national security initiative. Unfortunately the lack of a leading-edge foundry makes this rather difficult. It’s why we are going to see them switch back and forth between TSMC and mainland China solutions as they become available. It’s worth noting that the Shanghai Municipal Government is the majority share owner of both companies (HLMC and Zhaoxin).

WuDaoKou is the first design that is similar to contemporary x86 microprocessors. WuDaoKou finally got rid of the front-side bus (FSB). Previously, the chipset integrated the southbridge and northbridge. In fact, the microprocessor die itself was simply the cores. With WuDaoKou, they moved to a modern SoC design. They also introduced a new uncore which now houses the memory controller as well as all the I/O PHYs and memory and cache arbitration.

The new chip is a complete SoC, incorporating N-core clusters, an integrated graphics processor, and the uncore on a single die. Each cluster (Zhaoxin also calls a module) is made of four cores and a shared L2. The clusters are met at the Uncore and can communicate directly with each other via a new coherent fabric. While the design can scale up to a higher core count, current chips only have two clusters for a total of eight cores.

WuDaoKou Uncore

The fabric is a point-to-point high-speed interconnect crossbar that offers substantially higher bandwidth than prior solution (front-side bus) was able to deliver. Additionally, the fabric also reduces the latency and provides facilities for control flow and cache coherency. Since this chip also incorporates a GPU, it is also connected via the fabric. The new memory controller found in the uncore has been improved. It now supports up to dual-channel DDR4 with data rates of up to 2400 MT/s (although current SKUs only seem to support up to 2133 MT/s). Zhaoxin said this is the first domestic CPU to have a dual-channel DDR4 memory controller.

Significant improvements were done to the core. Although Zhaoxin didn’t go into too much details, they did note that the execution blocks have been rebalanced, a number of pipeline stages were eliminated, and the branch prediction unit was entirely reworked. Overall, the new processors are said to be roughly 25% faster in single-thread performance and 40% faster in multi-core workloads.

The higher integration does come at a cost. The new KX-5000 parts pack 2.1 billion transistors. That’s around seven times as much as the ~300M transistors the ZX-C had. Additionally, the die area itself has balloon to 187 mm². This will have a fairly significant impact on the both yield and cost over the older parts.

Recent Vulnerability

We’ve asked Zhaoxin if they are affected by the recent security vulnerabilities and they confirmed that the KX-5000 series is unaffected by Meltdown. They also noted that their chips are indeed affected by Spectre, adding that it requires a much more complex sequence of operations, making an attack incredibly difficult and impractical. In fact, Zhaoxin is attempting to leverage Meltdown to push their own domestically-designed chips as a safer alternative.

Performance

We normally don’t bother mentioning performance scores reported by manufacturers because they tend to cherry pick their scores. However, given those processors will most likely never make it to review sites such as AnandTech for a proper review, we figured we’ll mention a few claims.

Zhaoxin reported the following SPEC CPU 2006 scores:

SPEC CPU 2006 Scores
Test KX-5640 (4C @ 2GHz) KX-U5680 (8C @ 2GHz) Atom C2750 (8C @ 2.4GHz/2.6GHz)
SPECint 19.1 19.9 17.5
SPECint_rate 64.3 115 101
SPECfp 22.9 25.7 23.0
SPECfp_rate 53 81.3 76.8

We’ve added an Atom C2750 microserver chip to the table since it in the ballpark of the KX-5000 performance (though it seems they might be closer to Intel’s Goldmont). It also doesn’t have multi-threading support like WuDaoKou. That part is an Avoton core based on Intel’s 22nm Silvermont. Note that we don’t actually know if Zhaoxin’s scores use the base options (i.e., SPECint_base2006 vs SPECint2006) or if they use additional optimization flags (i.e., SPECfp_base2006 vs SPECfp2006) but we’ve only used base scores for the Atom listing.

Beat AMD

Zhaoxin is already working on their next generation, KX-6000, processors. Those processors are based on the Lujiazui microarchitecture which is planned for TSMC’s 16nm process (although we were told they might switch to SMIC’s 14nm eventually when ready). In order to increase the performance, a primary area of focus is increasing the clock frequency. Lujiazui is expected to reach at least 3 GHz. Additionally, the memory controller will support higher data rates (up to 3200 MT/s).

Zhaoxin has stated they intend on reaching AMD level of performance with KX-6000’s successor, KX-7000. That is, they want the KX-7000 to match the performance of Zen 2. While the process for KX-7000 is currently unknown, they would most likely have to move to TSMC’s 10nm or 7nm process. They are planning on supporting DDR5 and PCIe 4 as well as even higher clock frequency. Zhaoxin stated that they plan on making major enhancements to the pipeline in order to substantially improve IPC although they did not go into any details. They expect around 1.5x improvement in single-thread performance over the KX-5000.

All in all, Zhaoxin is currently still playing catch-up but they have made a major leap forward with WuDaoKou. They will have to make a series of similar strides with future architectures in order to substantially close the gap. Unfortunately even with a 1.5x ST performance, they would be a fair bit behind in IPC given that the KX-5000 series appears to be slightly behind Intel’s Goldmont level of performance. Whether they will be able to catch up to AMD or Intel remains to be seen; nonetheless Zhaoxin is determined to displace those two companies in China.

Spotted an error? Help us fix it! Simply select the problematic text and press Ctrl+Enter to notify us.

Leave a Reply

Be the First to Comment!

avatar
  Subscribe  
Notify of