ISSCC 2018: The IBM z14 Microprocessor And System Control Design

Archaic to most people, IBM mainframes play a pivotal role in our everyday life. Behind the scenes, these state-of-the-art machines process billions of transactions every day. Announced in July of last year, IBM’s latest mainframe is the z14, succeeding the z13 which launched back in 2015.

Earlier this year at the 65th International Solid-State Circuits Conference (ISSCC) in San Francisco IBM presented some of the architectural changes between the z13 and z14. The paper was presented by Christopher Berry, a Senior Technical Staff Member for the IBM Systems Hardware Development Team. Mr. Berry led the z14 physical design execution.



For the z14, IBM continued to target the three major vectors that impact customer workloads: increase in per-thread performance, system capacity, and increase efficiency. For the efficiency component, it is important for IBM that the system power footprint remains a constant from one generation to the next in order to facilitate upgrades without requiring costly infrastructure changes. One of the main enablers of those goals is the process technology. As with their POWER9, IBM uses GlobalFoundries’ highly unique 14nm SOI FinFET (14HP) with embedded DRAM in order to extract higher density and reduce the power.

14HP provides them with 17 copper interconnect layers to work with. For the critical metal layer with a 64-nanometer pitch, they use litho-etch-litho-etch (LELE) double patterning technique. This process has two SRAM cells – a high-performance 0.102 μm² cell and a low-leakage 0.143μm² cell. For high-density, they use a 0.0174 μm² embedded DRAM cell which is their workhorse memory cell used for the L2, L3, and the L4 caches.

A Frame

The z14 comes in various models but the max system with processors that can reach 5.2 GHz has to be water cooled. Below is a picture of what a max water-cooled system looks like.

Max z14 system (ISSCC 2018, IBM)

The frames are called A and Z. The more interesting stuff is in the A-frame on the right. The microprocessors reside in the four identically-looking drawers in the middle of the frame.


A frame from a z14 shown by IBM at ISSCC (WikiChip)

The Drawer

Each z14 frame features up to four drawers where the CP and SC chips reside. The System Control (SC) chip is responsible for routing the traffic between the four drawers. Every drawer is divided up into two clusters of three processors. Within a cluster, every Central Processor (CP) is fully connected to every other processor as well as to the SC chip. Each CP chip connects to up to 1.5 TiB of main memory and incorporates two PCIe Gen 3.0 interfaces and a GX Infiniband interface.

WikiChip’s diagram of a drawer.

Every drawer is fully interconnected to every other drawer through the SC chips. In a max system configuration, this would produce 24 processors with 240 cores. IBM doesn’t actually sell a 240-core z14 system. The maximum model has 170 cores. 26 additional cores are available for system-assist processors (SAP). SAP cores are designed to assist with the acceleration and offloading of network and I/O processing from the main user-configurable cores. All the remaining cores are left reserved/unusable for power, yield, and sparing purposes. In the z14, each CP chip can support up to 1.5 TiB of main memory with a maximum system support of 32 TiB of memory.

WikiChip’s diagram of a four interconnected drawers.

The board wiring diagram below shows the two CP clusters (in orange) and the SC chip (in gray). The two clusters are physically partitioned to the left and to the right of the board with the blue wires fully interconnecting the CP chips within each cluster together. The white and yellow wires connect all the CP chips to the SC chips.

z14 SC-CP Board Wiring (ISSCC 2018, IBM)

IBM being IBM, a full z14 mainframe was brought to the San Francisco Marriott for the demonstration session. We managed to snap some shots of the drawer for you. One of the water cooling blocks, as well as one of the chips, were specifically sliced to show their cross section.

[foogallery id=”1254″]


One of the major changes IBM made with the z14 is reorganizing the cache hierarchy of the entire drawer. In their prior machine, the z13, a drawer consisted of two SC chips. Each cluster of three CP chips had their own dedicated SC chip. The two SC chips within the drawer were linked together over the S-bus.

z13 drawer diagram (WikiChip)

With the z14, the level one instruction cache was increased from 96 KiB to 128 KiB and the L2 data cache was doubled to 4 MiB. The z14 also doubled the shared L3 cache to 128 MiB. The increase in the cache in the lower levels was done in order to be able to remove an entire SC chip from the drawer. The z14 SC chip increased the L4 cache from 480 MiB to 672 MiB, however, since each drawer now has one less SC chip, the L4 cache per drawer was actually reduced by 288 MiB. By removing the second SC chip from the drawer, despite having less L4 cache, due to the latency improvement at the drawer level, the overall change resulted in a performance win.

z13 vs z14 cache hierarchy (ISSCC 2018, IBM)

Spotted an error? Help us fix it! Simply select the problematic text and press Ctrl+Enter to notify us.

Notify of
Newest Most Voted
Inline Feedbacks
View all comments
2 years ago

I demoed OpenSUSE on z/39 and I couldn’t find much real-world applications to run on it. Mainframes are fast and great for parallel processing, but unless you can find software or write software that can be used for that architecture, it doesn’t do me any good. When can IBM release CentOS or Ubuntu using a x86_64 or ARM architecture mainframe which has better software support?

Reply to  curious
2 years ago

Of course nobody would buy a Z System for running Linux only: never seen one of those not running some mix of CICS, z/TPF, IMS, etc. mostly on z/OS and/or Linux on some ancillary systems like JVM apps or networking on z/VM.

2 years ago

“The z14 cores are very large, measuring 28 mm² in die area. ”
I’m fairly certain that is wrong….

Reply to  David Schor
11 months ago

There is something puzzling about the memory feeds and speeds. A lot of the slides I’ve seen say 9.6Gb/s per link, but its actually 9.6GB/s per link for Centaur DIMMs isn’t it? The big B little b makes a huge difference.

I also can’t find where any of the feeds and speeds of z systems make sense when added up. Several slides I’ve seen from James Warnock say 1.6Tb/s memory bandwidth on z13, then another paper(by Chris Berry) says 384GB/s “drawer level memory bandwidth” on z13 which is also claimed to be >3x zEC12 and the only papers I could find on zEC12 said 170GB/s per node and then another which said 77GB/s read and 60GB/s write which doesn’t add up to 170 or 1/3 of 384!

The only thing I can deduce is that with 20 usable CDIMMs worth of bandwidth at 19.2GB/s you get 384GB/s per drawer, which makes sense for the paper’s claim about z13(and z14). It would be 307.2 for z15 since there are only 16 CDIMMs worth of usable bandwidth using the same math.

So if any of my math is right, I gotta ask: why do POWER8 and 9 SU chips do 2B read 1B write for 28.8GB/s per link but z only does 1B read and 1B write? Is that even the case or am I wrong?

I would LOVE a deep dive into the z systems’ memory controllers, their feeds and speeds, and the actual way the MCUs talk to the CDIMMs and the SC chips, since in all the logical diagrams it shows memory talking to L4, even though in his video presentation Eldee Stephens himself says that’s not how it really works and its just a logical diagram.

I don’t know if these are closely guarded secrets or what, but I would think IBM would be bragging about the feeds and speeds!

Would love your thoughts, please comment.x

Spelling error report

The following text will be sent to our editors: