NEC readies 2nd-generation Vector Engine, Type 20, offering higher memory bandwidth and a few more vector cores. NEC Readies 2nd Gen Vector Engine

Back at Supercomputing 2019 NEC announced the rollout of an enhanced version of their Vector Engines branded as Type 10E. Production for those VEs should have started in January.

Ahead of the COVID-19 pandemic and shutdowns, at a workshop earlier this year, NEC started talking about their second-generation Vector Engine Type 20. NEC said second generation looks very similar to their first generation SX-Aurora. The company is adding two additional cores for a total of 10 cores in the highest-end models. Additionally, NEC is bumping the memory bandwidth and frequency for a modest improvement in performance over first-generation (both VE Type 10 and Type 10E). NEC says that it made some enhancements to the core architecture as well but was not ready to disclose those changes at that time. Presumably, overall, the full SoC structure has not changed.


Two initial models were announced, the VE Type 20A and the Type 20B. Like their predecessors, both processors have 6 HBM2 stacks with a total memory capacity of 48 GiB. In the Type 10, NEC switched to higher-binned DRAM dies supporting data rates of up to 1.76 GT/s for a total peak bandwidth of 1.35 TB/s. The new Type 20 models will push this further by around 13% to a hair under 2 GT/s for the higher peak bandwidth of 1.53 TB/s.

Type 10E vs Type 20
Type 10AE 10BE 10CE 20A 20B
Cores 8 8 8 10 8
Clock (GHz) 1.584 1.408 1.40 1.60 1.60
Perf/Core (gigaFLOPS) 304.1 270.3 268.8 307 307
Perf (teraFLOPS) 2.433 2.163 2.150 3.07 2.46
Memory (Hi) 8 8 4 8 8
Memory (GiB) 48 48 24 48 48
Bandwidth (TB/s) 1.35 1.35 1.00 1.53 1.53

Type 20B will remain eight cores, albeit with some new enhancements. Type 20A will offer the full ten-core configuration. Both processors will operate at 1.6 GHz or 307 gigaFLOPS/core. The peak performance for the 20B is the same as the 10A but the real application performance should be higher due to the improved memory bandwidth. Whereas the 10AE is at 0.55 B/FLOP, the 20B improves that slightly to 0.63 B/FLOP. With two additional core, the 20A will have a new peak performance of 3.07 teraFLOPS. However, having the same peak bandwidth as the 20B, the 20A decreases the bytes per flop rate from the HBM down to 0.5 B/FLOP.

