IBM Releases Power ISA v3.1; To Present POWER10 At Hot Chips 32
It looks like IBM is finally readying its next-generation Power systems which are planned for early 2021 timeframe. Last year the company finished unavailing the last of its POWER9 microprocessor variants – Axone. In total, three flavors of POWER9 were designed over the years: Nimbus, Cumulus, and Axone. This year, IBM’s Bill Starke will officially present the POWER10 processor at Hot Chips 32 on August 17. POWER10 is expected to be a new SoC design on a new process node, featuring higher core count, PCIe Gen 5, and higher memory bandwidth.
Recently IBM published the new Power ISA Version 3.1. The new version supersedes the prior Version 3.0(C) which is currently implemented in POWER9 microprocessors. Next-generation POWER10 microprocessors will be 3.1-compliant. Some of the main changes in Version 3.1 are highlighted below.
Instruction Prefixes
A new instruction prefix format was introduced. Prefixed instructions are effectively eight bytes long now, comprising a prefix word followed by a suffix word. Suffix words are the same as normal word instructions. The prefix is added to support PC-relative addressing and extend immediate displacements. For example, an 18-bit immediate field in the prefix word may be concatenated with the 14-bit displacement field in the instruction word (or a 16-bit immediate field concatenated to a 16-bit immediate in the instruction word).
Bfloat16 & other reduced-precision support
New VSX Matrix-Multiply Assist (MMA) instructions were introduced. There are now eight new 512-bit accumulators, each containing four 128-bit rows. Those accumulators are used for the new outer product operation. Each of the four rows are associated with four VSRs respectively. These registers are treated as separate storage space and have associated instructions to transfer data from/to ACC and their respective VSRs.
Along with the new MMA instructions, the vector-scalar extension has been extended to support the brain floating-point format (bfloat16) for the acceleration of matrix multiplication. New instructions for converting to/from bfloat16 to/from single-precision VSX Vector operations were also added.
In addition to the new bfloat16 outer production operations, the new ISA extended support for outer production operations of all other data types. All in all, there’s now support for 4-bit, 8-bit and 16-bit integer and 16-bit, 32-bit, and 64-bit floating-point outer production operations.
New Instructions
A large number of new instructions were introduced.
- String Insolate – New byte/halfword null-terminated/explicit-length string isolate instructions
- Byte-reverse instructions – a number of instructions that will reverse halfword, word, and doublewords in byte-reverse order
- 128-bit integer instructions – 128-bit integer instructions for comp/multiply/divide/module/rotate/shift and DFP/QFP format conversion ops
- Set Boolean extension – new instructions for converting condition code bit into a boolean or field mask (or their negations) that is stored in a GPR.
A set of vector instructions were also added.
- Vector integer Multiply/Divide/Modulo instructions – new vector integer SIMD-equivalent forms of FXU multiply/divide/module instructions
- SIMD permute-class operations – new permute-class instructions for things such as element extraction/insertion, 32-bit immediate splat, doublewide bit shift left/right, and element mask-based blend.
- VSX load/store rightmost – new load/store instructions for copying rightmost element from VSR to storage and vice versa
- VSX mask manipulation – new vector instructions for the manipulation of vector masks
- VSX PCV generate – new permute control vector (PCV/third source) instructions to emulate load expand and store compress
- 32-byte storage access – new 32-byte VSR load/store
The new Power ISA Version 3.1 may be found here.