Alibaba Launches DC Inference Accelerators

The semiconductor industry is evolving rapidly. This is no more apparent than in the data center with companies such as Google and Amazon having dedicated silicon teams designing custom ASICs.

Last year Alibaba also entered the chip business with the introduction of a number of new chips through a company called Pingtouge Semi or T-Head, Alibaba’s semiconductor unit which was founded in 2017. While the move was motivated in part by Beijing’s effort to reduce China’s reliance on foreign semiconductors imports and invest heavily in its own industry, this trend is a worldwide phenomenon.

 

Xuantie

At WAIC 2019, T-Head sprung to the spotlight when the company launched the Xuantie (Black Steel) series of high-performance RISC-V processors which they intend on open sourcing.

Wujian

T-Head has since moved into the area of artificial intelligence. At the same conference, Alibaba introduced the Wujian (No Sword) Platform, intended to serve as an all-in-one turnkey SoC solution for IoT AI. Wujian consists of modules comprising an SoC architecture with processing cores, high-speed interfaces, and other specialized blocks. The platform comes with a full firmware and software stack including drivers, an operating system, development tools, and example software. The platform is intended to aid with semi-customized edge products. Alibaba says the platform can reduce design cost by over fifty percent and cut the design cycle by half.

Enter Hanguang 800

On Wednesday, at Alibaba’s annual Apsara Computing Conference in Hangzhou the company officially launched its data center inference accelerators. We have seen Amazon following in Google’s footsteps when they announced the AWS Inferentia for their own cloud infrastructure so it’s no surprise to see Alibaba doing the same. These companies have the data center volume to support this kind of development and they are leveraging their size to develop and provide differentiating services. The first chip launched is the Hanguang 800 which is also the company’s physically largest chip to date. “This is the first large chip by T-Head and the largest chip T-Head has produced,” said Alibaba CTO Jeff Zhang. Fabricated on a 12-nanometer process, the Hanguang 800 integrates a little north of 17 billion transistors.

Zhang stated that on standard ResNet-50, the chip is capable of 78.5K images per second at its peak. “When comparing out AI performance to other mainstream chips, we are able to process significantly more images,” said Zhang. Adding that they are able to do this with much higher efficiency. For the same ResNet-50 benchmark, they are quoting 500 images per second per watt.

 
 

Alibaba showed a graph of their Hanguang 800 compared to other mainstream chips but did not specify the full conditions of the test. Zhang stated that the Hanguang 800 is roughly 15 times more powerful than the Nvidia T4 GPU and 46 times more powerful than the P4. There is very little actual data disclosed to draw an informed conclusion, therefore, their claim must be taken with a large grain of salt. What is clear though, is that Alibaba is serious about their semiconductor endeavor.

Alibaba ResNet-50 v1 Comparison
Company T-Head Habana Cambricon Nvidia Nvidia
Product Hanguang 800 Goya MLU270 T4 P4
Performance 78,563 IPS 15,433 IPS 10,000 IPS 5,402 IPS 1,721 IPS
Efficiency 500 IPS/W 150 IPS/W 143 IPS/W 78 IPS/W 52 IPS/W

“I believe this is the first large chip developed by an Internet company. This is just the first step of our journey. Alibaba has the confidence and the capability to do things that chip designers cannot do. I think chip design houses can do hardware but I think we can do better with regard to software which has traditionally been our strong suit.” He added that “With the speed of an internet company, we have succeeded in completing this chip project form design to verification within just under year.”

One of the examples Zhang talked about is analysis of street cameras of all the roads in one of the cities in China. The AI application there is detecting what’s happening across all roads. They claimed that the Hanguang 800 is capable of beating an unnamed GPU at less than half the power and lower latency.

In terms of full computing power, Zhang claims that the NPU has the compute power of 10 “common” GPUs. Again, without real data it’s hard to evaluate the real performance of the NPU.

Like Google’s TPU and Amazon’s AWS Inferentia, the Hanguang 800 will be used for Alibaba’s own Aliyun cloud infrastructure exclusively. Additionally, the Hanguang 800 will be accessible to developers for rent via their cloud. Alibaba stated it has no plans on selling those chips.

 



Spotted an error? Help us fix it! Simply select the problematic text and press Ctrl+Enter to notify us.

Spelling error report

The following text will be sent to our editors: