NVIDIA Introduces Groq LP30 and LPX Nodes
The Partnership Blooms
I’ve been tracking Groq almost since it went public - it was a startup for over seven years, and in recent times had raised ~$1.5 billion in funding in 12-18 months. Their product was an SRAM based chip that relied on a very fixed VLIW pipeline needing close attention in its programming - like a DSP. The benefit of the design was determinism - at compile time, you knew how long your inference was going to take. The only downside of that chip was the lack of memory - 230 MB of SRAM per chip meant it took lots of chips to scale out and fit even medium sized models. The chip was also built for convolutional neural networks, and transformers were retrofitted into the architecture. Nonetheless, the company showcased impressive raw performance, in thousands of tokens per second, for a single user - regardless of the power.
In December 2025, as we covered at the time, NVIDIA acquired the Groq engineering team and a license to use the technology. Although unconfirmed, it was cited around $20b at the time. We covered it on this substack.
At the time, we covered several reasons why NVIDIA made this ‘acquisition’. (I’ll call it an acquisition from hereon, just to simplify things.) We knew Groq was in the process of taping out its next generation chip, but it had missed deadlines set by Groq CEO Jon Ross at the time. With the demands for agentic AI and Mixture-of-Experts models, the SRAM limit of Groq, while high on memory bandwidth, meant that tokens per watt were unfeasible for any large installation without a specific customer in mind. However, today, at GTC 2026, Jensen is showcasing the integration of NVIDIA with Groq’s latest chip.
Groq LP30
The new chip is called the LP30 - Jensen calls this a third generation chip, although as far as we know this is still Groq’s second chip. There might have been a failed tapeout at the startup during those seven years making this the third. But Jensen gave some details.
The new Groq chip looks as if it has the same, or similar, architecture design as the first generation. It is built on Samsung LP4X, and features 500 MB of SRAM. That’s just over double the first chip, but isn’t that much of a bump to be honest. We also see that it has FP8 performance, in this case 1.2 PFLOPs. No power is mentioned, however I’ve seen one analyst quote 600 W per chip. I can’t confirm that.
The chip is going to be in a system of eight chips, in a system called the Groq 3 LPX.
This means eight chips has 4 GB of memory, but the full LPX system will have 256 chips in a 128 GB configuration. It’s worth noting that’s less than a single mainstream GPU AI accelerator that feeds the largest models. Each 8-way rack will be controlled by an FPGA backed by what looks like an Intel CPU. Bluefield-4 provides the scale, with NVIDIA confirming that Ethernet is used to speak to other systems.
The goal is to be able to support higher throughput. Jensen pulled out this graph.
On its own, Rubin systems would not stretch out to 800-1000 tokens per second, and using Groq helps NVIDIA to get there. NVIDIA sees that area as an ‘ultra’ token offering for people willing to spend $100+ per million tokens. I’m seeing this year that code generation (see Cerebras and OpenAI’s partnership) is one such workload where people are prepared to spend that much, or more. One of my key issues this year is the value of a token - no point making tokens if they’re not worth anything. I’d love it if CEOs would talk more about it, rather than simply token generation.
Anyway, one other aspect that came up is how the Groq chips will be used - in decode.
I believe we’ve covered workload disaggregation on the substack before, but the simple way to approach it is to consider most transformer based language models can be split into two parts: prefill and decode. Prefill is a compute bound workload, relying more on raw compute compared to memory bandwidth (but it still needs it). Decode by contrast is memory bandwidth bound. The idea is that Rubin is a happy mix of both - it provides incredible compute and incredible memory bandwidth. Last year, NVIDIA announced Rubin CPX, a GDDR-based chip specifically to address prefill as it was compute focused rather than memory (GDDR vs HBM). Groq is here to address specifically the decode, given that the SRAM bandwidth is a big multiple over what a standard Rubin can do (150 TB/sec vs 22 TB/sec on Rubin Ultra).
In order to enable this, NVIDIA are leaning on its Dynamo platform. Dynamo is designed to be flexible in its parallelism but also when addressing heterogeneous compute. If a customer has a set of Rubin, Rubin CPX, and Groq LPX, Dynamo is aimed to split the workload across that hardware to optimize the throughput given the batch size and parallelism. That’s how we get that graph above. Jensen on stage said that installations could be built with up to 25% Groq chips. They’re set to ship in Q3 or 2H. Both were mentioned.
The question is though, how many chips are required for the big trillion parameter models? Is this for that? 128 GB of SRAM for 256 chips isn’t a lot, even at FP8, but perhaps for code-gen, only 70B parameter models is enough. Even if it’s agentic.
Former Groq CEO Jon Ross has a fireside chat later today. We’re not expecting more architecture info, although there’s an outside possibility. Because I wasn’t given a press/analyst pass this year (see here for info), I won’t be able to attend. But I have people I know in that talk, and I’ll see what they have to say.
Here’s the roadmap. While Jensen did accidentally say Opteron instead of Oberon (hey, it’s a brand name that sticks in the brain), we do see some new information on here:
There will be a Groq LP35 with NVFP4 support, and a later generation of LP40 in line with the Feynman generation.
The post-Vera CPU will be called Rosa.
Feynman will use die stacking and custom HBM.
NVIDIA will be co-designing copper and optical scale-up solutions simultaneously. This includes CPO versions of NVLink in NVLink 8.
Oberon will have an ETL256 version, and Kyber will have an NVL1152 version.
There’s more to come no doubt, and I’m on the ground today to find out what. Later this week I’ll be at OFC, the optical conference, so stay tuned for coverage from that as well.







Thanks Dr. Ian. I like to hear how you react and process the announcements 👍
"Because I wasn’t given a press/analyst pass this year (see here for info), I won’t be able to attend."
Can't you just join online? The event seems to be on the online catalog and streamed.
"GPU ♥ LPU: Everything You Wanted to Know [S82419]"