Companies mentioned: AMD 0.00%↑, INTC 0.00%↑, NVDA 0.00%↑, MSFT 0.00%↑, TSM 0.00%↑
One of the biggest competitors to NVIDIA's crown of machine learning is AMD. While like most it was a slow-to-pivot from a traditional HPC focus, the growth of the companies MI series of accelerators is set to top $4 billion in revenue in 2024 - and it's a number that seemingly rises every quarter. AMD puts it down to delivering a competitive product (or world leading product, depending on what test and who you talk to), at volume and at a total cost advantage against the competition. While AMD has many product lines with some AI acceleration, it's the Data Center training and inference hardware that is set to take the biggest part of the pie for several years to come.
AMD's hardware stack in that market is the Instinct line. After MI100 and MI200 in previous years, the latest member of the family is the MI300X - a large server AI accelerator featuring lots of chiplets, lots of compute, lots of high-bandwidth memory, and approaching a philosophy truly indicative of what a chiplet architecture should be. To that extent, Microsoft at their annual Microsoft Build event recently showcased that MI300X offers the best performance per dollar on GPT4 than any other solutions they're deploying.
MI300X is built using TSMC leading edge process nodes and TSMC CoWoS packaging. At the time, the initial questions about capacity and software support has been something AMD has been addressing over the last 12-18 months, especially given the macroeconomics of this market since the pandemic.
New Chip: MI325X with 288 GB HBM3e
The new announcement at this year's Computex 2024 event in Taipei is a new member to this family, the MI325X. The MI325X builds upon the MI300X by offering HBM3E instead of HBM3 for the high-bandwidth memory. This is faster memory, but in this case, AMD is also doubling the capacity. In a world where 80 GB of memory on chips in this market in the norm, MI300X had 192 GB - MI325X will now increase this to 288 GB, while also running faster. This leads to a memory bandwidth increase as well, from 5.3 TB/sec to 6.0+ TB/sec. One of the issues with compute in these form factors is memory capacity and feeding the compute cores with enough data to keep utilization high, and the MI325X further improves those metrics - at a price premium for the customers of course.
Today's announcement is only a product preview, rather than a performance analysis. We expect that to come later in the year as the volume of MI325X ramps up and we start hearing about deployments at AMD's key partners. AMD did commit to saying that a full MI325X platform, containing eight liquid cooled chips, will offer a peak throughput of 10.4 PetaFLOPs (theoretical max FP16/BF16), up to 2.3 TB of HBM3E, and it looks like it will be pin compatible with current MI300X solutions. MI325X will have air cooled and liquid cooled options, though exact details were not given today. They did say power wasn't a meaningful difference compared to MI300X, which is a plus.
One big question, and one I put to AMD, is going to be supply availability to build this product. It comes in two forms - the first is the common question of TSMC's CoWoS packaging supply, which we know is highly competitive between NVIDIA, AMD, Intel, SambaNova, and a couple of others. Some reports you can find free online suggest that AMD and NVIDIA have bought all that capacity for the 2024 and 2025, but we will have to wait and see. What's new for the MI325X however is the supply constraints on HBM3E. As the premium high-bandwidth memory hardware, it's in very short supply, and NVIDIA needs it too. We've heard from partners that lead times to order GPUs is 52+ weeks from NVIDIA and 26+ weeks from AMD, so while demand is high, there could perhaps be a bidding war for as much as they can acquire. AMD gave a predictable but understandable answer to the question of HBM3 supply, in that they have strong partnerships with all involved, have been working with customers for months on this product with known expected deployment, and have secured enough supply to meet those demands and beyond - simply put AMD does not believe they are supply constrained. It does mean that the $4-4.5 billion figure of revenue we heard about earlier in the year for AMD's AI chip efforts already included the new MI325X in those numbers, so it's likely already priced in.
A New Yearly Cadence: First, CDNA4 on 3nm
Another aspect to today's announcement is a commitment from AMD to a yearly cadence when it comes to the MI family. A slide shown on stage by CEO Dr. Lisa Su showcases that for 2024 we have MI325X built on CDNA3, but for 2025 AMD will launch a new CDNA4 architecture design, offering better performance/power characteristics, and also built on TSMC 3nm.
A big update in CDNA4 will be supporting FP4/FP6 quantized formats, helping models scale to smaller memory footprints if they can keep the accuracy. Then in beyond, in 2026, there's a CDNA-next architecture which will also be a big step in performance. AMD says it is an architecture uplift, though in this context they're talking about scalable performance uplifts - if that's managed with minor underlying tweaks, it's being counted towards that annual cadence.
My thoughts on this are hopefully not too wild - building a new generational performance uplift in this market is *hard*. GPUs, HPC, and AI chips like this typically run on a two year cycle due to the complexity, however AMD sees a need to address the fast moving market on quicker update cycle, and today's announcement commits to that. We're getting to a point now where some of the 'known' tricks for AI scaling are going to be stopping short: quantization, for example, from 32-bit to 16-bit to 8-bit gets really difficult at 6-bit and 4-bit to maintain accuracy while also uplifting performance. Don't get me started on 2-bit, and 1-bit is just binary.
We're seeing new quantization features such as micro-scaling, or blockfloat, to help reach other layer of performance, but that might run out as well. Here's a video on microscaling that I did earlier this year, in response to NVIDIA announcing native support for it on their Blackwell product line. I've since learned that other companies support it too, though getting them to commit on the record that it's natively supported in hardware rather than simply an architecture trick has been difficult. But his video goes over some of the puts and takes of the new technology.
New features, paradigms, and other tweaks will be needed to go above and beyond simple hardware uplift, and some of those paradigms might be natively built into future hardware. The company that iterates the fastest can address those needs, even if they don't pan out fully, but it will be a cost-intensive exercise to be that aggressive for a long time.
Another question in this is software. AMD announced its ROCm 6.0 package back in December, built for AI and specifically LLMs for the market. Initial reports, perhaps as always with AMD software in the enterprise space, is that while it's rough it's certainly on the right direction. AMD (and others) are competing against a green giant that has had 15 years of embedding itself into HPC at universities, so the software support on the green side is established with strong foundations. It will be tough for anyone to match that, though AMD is getting to a point where anyone investing in CDNA hardware is getting confident they can enable their solutions at scale - or rely on AMD's partners to help do some of that heavy lifting (such as Lamini). While there's no software updates to announce today, we are closing in on six months since the latest update. Given that today was not a full end-to-end update, they did reiterate support for popular frameworks as well as 700k+ models on HuggingFace, and deep collabs with major hyperscalers on their workloads. I'm guessing there's room in the calendar for a software update and roadmap soon enough!
I have a meeting scheduled later this week with AMD's Instinct team. Unlikely going to be anything I can publish, though if they have an MI325X on hand, I'll take a bite and post a picture to my socials. I usually post @IanCutress on the twitsphere.
looking forward to the next installment on AMD!