Companies mentioned: NVDA 0.00%↑ , AMD 0.00%↑ , INTC 0.00%↑
At this year’s SuperComputing conference in Denver, NVIDIA pulled the lid off of a new* product they’re bringing to market.
Dubbed the H200, it seeks to address the immediate needs of AI and HPC customers whose workloads hunger for one thing more than nearly anything else: lots of fast memory. H200 is not a revelation in compute in and off itself, but instead seeks to fully unleash the underlying compute resources of NVIDIA’s Hopper architecture.
H200 uses the same core silicon as the H100 before it, but is now based around 6 stacks of HBM3e compared to the 5 stacks of HBM3 present in the H100 launched 18 months ago in early 2022. This means more memory, and more memory bandwidth: 144 GB vs 80 GB, and now up to 4.8 TB/sec peak bandwidth. NVIDIA’s goals here are relatively simple: H100 was a huge increase in compute relative to their prior A100 GPGPUs, but was by no means a revelation in terms of memory bandwidth, and was stuck with the same 80 GB maximum memory per GPU. This has been a limitation of sorts, for everything from HPC to machine learning, where memory capacity and memory bandwidth rule above all else. It’s worth noting that the previous move from A100 to H100 brought SXM form factor devices from 2TB/s of memory bandwidth to 3.35 TB/s of memory bandwidth, so this is a step above.
H200 changes the memory technology from H100 while also enabling the sixth stack of previously disabled HBM while also increasing memory density. Plainly, we move from 5 16GB stacks of HBM3 to 6 24GB stacks of HBM3e. This means that aggregate full device capacity moves from a previous maximum of 80 GBs to 144 GBs, 141 GB of which are addressable by applications. Bandwidth comes in at a massive 4.8 TB/s, a 40+% increase over H100’s 3.35TB/s. By comparison, AMD’s MI300X has 192 GB of HBM3 memory for 5.2 TB/sec bandwidth, although we’ll hear more about that later this year at the December 6th AMD event.
One of the more important aspects for cloud vendors is compatibility - system integrators and other partners is that H200 remains pin, power, and mechanically compatible with H100 based systems. Expect those who receive H200 GPUs to be able to swap in GPUs as quickly as they receive them, no major platform revalidation needed. That being said, anyone who has installed an SXM module will know just how precise you have to be, and how expensive that torque screwdriver is.
HPC Systems
Beyond the headline of a new GPGPU product, NVIDIA was proudly discussing their design win powering the EU’s first Exascale system. In partnership with Eviden (formerly called Atos), NVIDIA will be building the new Jupiter Supercomputer for the Jülich Supercomputing Centre in Germany. Expected to become available in 2024, the system is made up of 23752 Grace-Hopper accelerators. Each node will be composed of 4 GH200s, making it just shy of 6000 nodes. Here NV proudly unveils their new Quad Grace Hopper blades, at least with the Eviden version of these blades destined for Jupiter called the Eviden Bull Sequana XH3000.
Going into the Numbers
An interesting remark here is the difference in performance listed between the Jupiter system and the previous generation hardware.
Jupiter is explicitly claimed as a “1.0 Exaflops delivered” system, implying that you can expect it to be the first NVIDIA powered Exascale system (as well as the first Arm powered Exascale system, for those keeping track). Doing the maths backwards from 1 EF, delivered using 23752 GPUs, Jupiter is getting 42 TF of real FP64 Matrix Multiply out of a nominal 67 TF device. This means that the efficiency, comparing theoretical peak performance to real world performance, only comes in at 62%. This is well below the norm of other top level NVIDIA accelerated super computers (typically 70-75%). For the people at Jülich, expect this to mean long term optimization will lead to significantly higher numbers of available compute.
In the meantime, Jupiter should comfortably land in the top 5 most powerful supercomputers in the world, preceded only by Aurora at ALCF, El Capitan at LLCF, and Frontier at ORNL. This excludes the presumed exascale Sunway systems from China which would round out the top 5.
Speculating that Jupiter is finished in time for a Top500 submission come the SC24 conference, you can expect the list to be El Capitan (AMD MI300A), Aurora (Intel Xeon CPU Max + Intel GPU Max), and Frontier (Custom AMD Milan + Custom MI250X), followed by Jupiter (NVIDIA Grace Hopper GH200). Fugaku, built with the arm-based A64FX CPU and based in Japan, would continue its reign as the most powerful CPU only supercomputer in the world.