Our initial look at Intel’s Architecture Day focused on the new Xeons and IPU processors. Now we’ll get into the fine details, as well as look at other upcoming technologies.
Intel’s upcoming next-generation Xeon is codenamed Sapphire Rapids and promises a radical new design and gains in performance. One of its key differentiators is its modular SoC design. The chip has multiple tiles that appears to the system as a monolithic CPU and all of the tiles communicate with each other, so every thread has full access to all resources on all tiles.
In a way it’s similar to the chiplet design AMD uses in its Epyc processor. By breaking the monolithic chip up into smaller pieces it’s easier to manufacture.
In addition to faster/wider cores and interconnects, Sapphire Rapids has a new feature called Last Level Cache (LLC) that features up to 100MB of cache that can be shared across all cores, with up to four memory controllers and eight memory channels of DDR5 memory, next-gen Optane Persistent Memory, and/or High Bandwidth Memory (HBM).
Sapphire Rapids also offers Intel Ultra Path Interconnect 2.0 (UPI), a CPU interconnect used for multi-socket communication. UPI 2.0 features four UPI links per processor with 16GT/s of throughput and supports up to eight sockets.
With so much new, high-performance technology in Sapphire Rapids it should blow the current generation out of the water. Sapphire Rapids-generation Xeon Scalable Processors are due to arrive early next year.
Ponte Vecchio: Molto Bene
Intel is determined to get into the GPU business and is not letting Nvidia’s dominance deter it. The Xe architecture is its third attempt after the disastrous Larrabee and Xeon Phi, and on paper, this one looks like it has a chance.
Like Nvidia and AMD, Intel is making multiple GPUs for different markets. One is for PC clients and aimed squarely at gamers. Another is Ponte Vecchio, a Xe-based GPU optimized for HPC and AI workloads.
Ponte Vecchio is an insanely complex piece of silicon, perhaps Intel’s biggest ever. Ponte Vecchio processors have more than 100 billion transistors (the new Xeon Scalable reportedly has 300 million) and uses five different process nodes and multiple foundries to make what are called specialized tiles.
All told, a Ponte Vecchio SoC has 47 active tiles including: compute, an HPC-oriented specialized cache called Rambo, HBM, Xe Link and a specialized high-speed interconnect called EMIB tiles. It uses a 3D packaging architecture called Foveros and specialized multi-tile packaging. What’s interesting is that some of the tiles are made by TSMC while others are made by Intel.
For the first time, Intel revealed initial performance data. It claims Ponte Vecchio silicon supports 45 TFLOP of FP32 throughput, which is vital for AI training, greater than 5TBps memory fabric bandwidth, and greater than 2TBps connectivity. By contrast, Nvidia’s new Ampere architecture offers peak FP32 performance of 19.5 TFLOPs.
Looks like Nvidia’s CEO Jen-Hsun poked the bear once too often.
Like Sapphire Rapids, Ponte Vecchio will be released next year.
Copyright © 2021 IDG Communications, Inc.