AI can stand for many things. Actualized Investments seems appropriate when it comes to NVIDIA, which during the last week just hit new market capitalization highs of $3 trillion. I remember giving a talk back in December at a university when the company went over the $1 trillion mark and I was changing my slide each day up to the event. But that valuation comes on the back of AI – the artificial intelligence wave we’re riding right now, and there’s one big company selling the tools and eating up a lot of the market. The man at the helm, when he’s not on the show floor at Computex signing a variety of things silicon or silicone, is Jensen Huang.
While NVIDIA didn’t have an official keynote this year at the Computex 2024 event (probably because they had the 2023 lead keynote), there was still an NVIDIA presentation ahead of everyone else’s keynotes in the city. This year’s Computex announcements were mainly focused on advances in the datacenter, software, and a slight nod to gaming acceleration. It was a well-attended event – you had to a pre-approved to attend, but with the growth of AI and the demand on the supply chain, there is plenty of interest in NVIDIA’s business operations as well. To that end, NVIDIA also held a CEO question and answer session for the media and analysts in attendance.
As with these Q&A transcriptions, media at the event often focus on their segment of interest. We’ve optimized some of the wording so that it flows, and where known, the name of the media / analyst is provided.
Q: Currently NVIDIA is only certifying SK Hynix HBM for its products. When will Samsung HBM become validated? Are there problems in getting Samsung or Micron HBM working?
A: HBM is important to us because we need high-speed and low-power memory. We’re growing incredibly fast: Hopper, Blackwell, and Grace superchips, and so the amount of HBM needed is significant. We’re working with all three partners, and all three are excellent. SK Hynix, Samsung and Micron - they are supplying us with HBM for our products and we’re working on getting them qualified and adopted into our manufacturing as quickly as possible.
Q: Paul Alcorn, Tom’s Hardware - Since Intel Foundry was announced, I’ve periodically asked you for your thoughts on NVIDIA possibly fabbing with Intel. Has there been any updates on that front?
A: My interest remains.
Q: Elaine Huang, Commonwealth Magazine - You’ve already built an AI supercomputer here in Taiwan. Would you consider building the second AI supercomputer in Taiwan? And where?
A: Yes, and I don’t know.
Q: AI has been used in games for a while now, I’m thinking DLSS and now ACE. Do you think it’s possible to apply multimodality AIs to generate frames?
A: AI for gaming - we already use it for neural graphics, and we can generate pixels based off of few input pixels. We also generate frames between frames - not interpolation, but generation. In the future we’ll even generate textures and objects, and the objects can be of lower quality and we can make them look better. We’ll also generate characters in the games - think of a group of six people, two may be real, and the others may be long-term use AIs. The games will be made with AI, they’ll have AI inside, and you’ll even have the PC become AI using G-Assist. You can use the PC as an AI assistant to help you game. GeForce is the biggest gaming brand in the world, we only see it growing, and a lot of them have AI in some capacity. We can’t wait to let more people have it.
Q: Earlier this week, a group of industry monoliths such as AMD and Intel announced they were working on an Accelerator connection protocol called UALink (Ultra Accelerator Link). They’ve claimed an open standard is better than a closed one, and NVLink is obviously proprietary to you. What do you think of the open vs proprietary debate, on a cost and performance basis?
A: The world has awakened to the importance of NVLink. NVLink is now on its 5th generation, and NVLink connects at incredible speeds. Inside NVLink is a lot of software and complicated things, not just SERDES. It’s connected to our GPUs - they start and end with NVLink, so the software only sees one GPU. After 7 years, people are only just now seeing how important NVLink is. Today they have a proposal, but it’ll be some years before they will have a fully capable NVLink competitor. Some prefer to buy off the shelf - if it exists and is good enough, then why not? But my feeling is that ‘we’ll have to see’.
Also, NVLink is two things. It’s a switch as well - it looks like any other switch, and multiple switches connect every GPU to each other, with incredible bandwidth. The NVlink SERDES is so incredible - they can drive copper from one GPU to an NVLink switch, to NVLink switch, to the spine, to another GPU across the whole rack. 72 GPUs can become one GPU, and multiple racks can become one GPU. The world has realised that this is invaluable seven years later. So we’ll see - by the time their first generation comes, we’ll likely be on 7th or 8th generation of NVLink. We have some good ideas, it’s just very complicated.
Q: Adam Patrick Murray, PC World - Are you able to give us a taste of what 50-series RTX consumer GPUs will bring?
A: 50-series is something I will tell you about later. I can’t wait, but I can’t say anything. Not yet.
Q: You have emphasised the AI datacenter many times, but in many countries, like here in Taiwan, are worried about power draw. Can you give a suggestion on how we can we tackle the energy efficiency issue?
A: I’ve got three answers for you.
Number one: AI uses accelerated computing. Accelerated compute should be the only thing you use if you can accelerate it. Accelerate everything you can, because it saves power. You’ve heard me say it before but, the more you buy the more you save.
Number two: Generative AI. It’s not about training, it’s about inference. The goal is not to train, the goal is to inference, because the amount of power used in inference is significantly lower. The climate and weather simulations done here in Taiwan were 3000x less power than when using general purpose compute. This happens one after another in these programs. The goal is not to train only, but to inference, and inference power saving is enormous.
Finally third, and this is my favourite part: AI doesn’t care where it goes to school. If the world doesn’t have enough power near a population, but the world has a lot of excess energy, then it’s in the wrong place. We install in the places where people don’t want to live - we should set up power plants and datacenters where there isn’t a population. You don’t need to train a model on the power grid here, you can train the model somewhere else, and move the inference closer to the people.
So, one: accelerate everything. Two: inference is important. Three: AI doesn’t care where it goes to school, train somewhere else.
Q: Monica Chen, Digitimes – Cloud service providers like Meta, Google, and Microsoft are making their own AI chips. How will that effect NVIDIA? Would NVIDIA enter the custom ASIC business?
A: Yes, we would. But NVIDIA is very different – NVIDIA is not an accelerator company, it’s an accelerated computing company. NVIDIA’s accelerated computing is very versatile, so the utilisation is higher, the usefulness is higher, and the effective cost is lower. People think that your smartphone is expensive, but think of all the things it replaced - the effective cost of one device taking out that many, and it’s the same with NVIDIA accelerated compute.
NVIDIA architecture is so versatile it’s everywhere - public, private, sovereign etc. This is because our reach is so high, so we are the first target for any developer. It makes sense that if you program for CUDA, its runs anywhere, if you program for an accelerator, it only runs there. When cloud customers use NVIDIA, we bring CUDA customers to the cloud, and we’re pleased about that.
Q: Max Cherney, Reuters – NVIDIA used to release on a roughly biyearly cadence, and recently you went to yearly. First, how? Second, are you planning on implementing a tick/tock like scenario?
A: We just hired a bunch more engineers, and we created AI engineers. Our chips are designed with a bunch of NVIDIA created AIs - we’re very good at creating AI. We taught them how to create chips, and they’re helping us create and iterate faster. Of course, the type of things we build are very different than others, and whilst everyone else is laying people off, we’re hiring more than ever. We build CPUs, GPUs, NVLink switches, Infiniband switches, NICs, SuperNICs – it’s a lot of chips, but you need a lot of chips to serve the whole market. If you don’t have the chip, you don’t have the software - because we make it all, we have all the software that works well together and optimises it. We can demonstrate our stuff works, instead of just a Powerpoint slide. These are billions of dollars, these datacenters - you need to prove you can build it from end-to-end. We move at the pace we move, not because someone put it on a Powerpoint slide, because those are easy to make – it’s because AI supercomputers are hard, and we’re the only one you can buy right now.
Q: With regards to Hopper and Blackwell, you’re talking a lot more about value now than you were at the launch recently. Is that because customers are more worried about the cost of Blackwell?
A: Pricing is always based on value. If a product is priced properly, the demand is incredible, and there’s no such thing as high demand on the wrong price. Our demand is incredible. Pricing depends on market demand, and our pricing is set correct. It’s not easy. What we build is AI factories, and the way we deliver it is as a chip.
In the old days, Microsoft delivered products - Office, Windows etc. They delivered it on floppy disks. Did MS sell floppy disks? No, they sold the software - the floppy was the delivery mechanism. It’s the same for us - we sell AI factories, and the chips are just the delivery mechanism. We’ve driven down the energy consumption by 45,000x in 8 years, Moore’s law couldn’t have come close to that, and we have driven down the cost of training by 350x. In 10 years that’s likely 1200x or so, and there’s no way Moore’s law could do that, not even in his best days. We’re driving down energy consumption. Why? So we can train more. The cost to train has dropped so rapidly that we’re training ever larger models. By driving down the marginal cost and the energy cost, we are enabling generative AI. If we didn’t do that, Generative AI would still be a decade away or more - so that’s why we do it. We believe in it.
Q: How do you think the RTX AI PC will compete with others?
A: RTX is the only architecture currently that has every part of the pipeline to train or create AI. If you want to train AI, or if you want to inference AI, or you want to develop AI, if it says RTX it will absolutely work, because CUDA and our tensor cores are there. It just works, just like the cloud - no other computer in the world can say that today. Not one.
We have 100 million RTX GPUs in the world now, and you can’t find this anywhere else. When we see the future, we start investing straight away. The install base is what matters in computing, and CUDA has 100s of millions of compatible servers, so when you develop on CUDA you know it’ll work anywhere - on your laptop, or a server across the world. Software only comes with an install base. We’ve been visionary in seeing this coming, so all the RTX gaming PCs - they now become an AI PC.
Q: Vlad Savov, Bloomberg – What’s the hold up on Samsung and Micron HBM certification? Heat? Energy consumption?
A: Nothing was failing for those reasons. Nothing you’ve read was associated with us. But our work with Samsung is doing just fine, same with Micron. The engineering just isn’t done yet. There’s no story there – it’s just taking time.
Q: Next week, Apple will introduce their on-device AI. What do you think about target point of the AI area? (It was hard to understand this question)
A: We don’t compete with Apple. NVIDIA is building AI factories. Apple is building amazing devices. AMD, Qualcomm and Intel compete with Apple, but we’re not. We’re building cloud and AI factories.
Q: Tweaktown - With DLSS and frame generation bringing AI to gaming, and it being developed by developers all over the world, how will NVIDIA embrace and engage developers to future gaming services and creating new gaming worlds?
A: We’ve created a world class generative image and texture model that was trained on licensed content. You can now use that model, because it was trained on licensed content, to create other content that you can resell. We call that Edify, and Edify is world class. Also, you can use Edify to create 360-degree generated content. NVIDIA is doing a lot of work on generating content for environments, and they’ll eventually just be a main part of the game.
Q: (Unknown, speaking mandarin, no idea – but right at the end) WHO IS YOUR ENEMY?
A: NVIDIA is so likable, were so adorable, why would anyone want to attack us? do you agree? We’re so nice and adorable. NVIDIA is a market maker, not a share taker, does that make sense? We’re always inventing the future. GeForce was the first graphics card designed for gaming, and from the very beginning NVIDIA was there. All the work with accelerated computing has been decades in the making, same with autonomous driving, and generative AI. The company culture and personality is about inventing the future. We’re dreamers, we’re inventors, we’re pioneers - we don’t mind failing, or wasting time, we just want to create something new. That’s the personality of the company. As you know were not just building GPUs, we build systems. But someone had to build the first one, and we did. Someone had to write the software to make this work, we did it. We’re not a GPU company, we’re an AI supercomputer company. How do you know what AI computer to build if you don’t understand the AI? We had to understand where AI was going before we could create the right computer for it. Whilst CSPs building chips is fine, NVIDIA is still going to be everywhere.
Q: What’s the single biggest risk to NVIDIA right now?
A: There are plenty. Look at all of the electronics we build - a single rack of GB200 has 600,000 parts. No computer has ever had that many. No computer in history has consumed this much power (120 kW per rack). No-one has connected this many computers together before. The computers are complicated, the AI is complicated, and we’re creating the AI as we’re building the computers. The tech is not for the feint of heart - you can’t take it lightly. We take it seriously, and that’s why we succeed. It’s not easy, but it’s worth it.
Q: How does NVIDIA plan to address the ecological issues of large scale data centers?
A: This is imperative: you must accelerate every application you can, and reduce energy by 90%. That’s number one. Accelerate everything. It makes no sense to use general purpose computing today - why would you? You don’t have a general tool for air travel, ocean travel and street travel - you use specialised tools. CPU scaling ended so long ago. I’ve been saying it for 10 years. If something stopped for 10 years but data processing kept going up, its data and power inflation, the most important thing we do today is to accelerate everything. This is the imperative, this is provable. It takes a bit of work, but we’ve done a lot of the hard work for you. Stop using general purpose for everything - only use it for what you need to.
Q: What is NVIDIA’s strategy for AI PC this year? Will NVIDIA consider launching an NPU in the future?
A: We started working on AIPC nine years ago when we launched RTX. This year we launched digital humans for gaming, and the API is called ACE. You can put digital humans in your game, and you can have a digital human in the computer helping you play the game. That’s our G-Assist tool. Everything we worked on for RTX will work for AI now, almost a decade in the making.
Q: Will we see new chips from NVIDIA that support the new Copilot features?
A: Microsoft has announced that RTX architecture products will support the Copilot runtime.
Q: Given the move to a yearly cadence of releases, how do you plan to build an AI factory when they take so long to build?
A: As you’re going to use it (the server) for 4-5 years anyways, so you just keep on building. Buy a year at a time and just continually build - we’re building them as fast as possible. Just look the world’s data centers today - there are $1 trillion worth of data centers built in the world, and by the end of the decade it will grow to $3T. Every year $750bn will need to be built. So you use Hopper now, then build a Blackwell, then Blackwell Ultra, then Ruben, then Ruben Ultra. Don’t buy 4-5 years of hopper. We also need to stop building only general purpose computing - a few is enough, but not more than we need. General purpose computing is old. 60 years is a long time, and has had its day.
Q: You’ve spoken about running these AI models on RTX on a PC. How do you feel about running that in a laptop battery constrained environment? Do you leave that to Apple and Qualcomm? Running these models 24/7 in a power efficient way is hard when you want to get all day battery life.
A: RTX does ‘race to sleep’ - it does its job fast then goes to sleep. These chips are designed for AI - the tensor cores are 10-20x faster than NPUs. So you race to sleep, and once the job is done, put it to sleep. When the job comes back every 30, 40, or 60 milliseconds, instead of constantly keeping it active, wake then sleep. Off is better than low.
Q: Wayne, CNN: How seriously do you take the growing competition from AI chip makers in China such as Huawei?
A: Very seriously.
Q: What do you think about the future of Generative AI at the edge?
A: Everything at the edge in the future will be software defined, and everything software defined will have generative AI in it. Every building will be an edge AI station. Inside factories and warehouses there are lots of robots - we have Isaac (robot AI training tool). Every car that we make, with things that move, and all the transportation, will be generative AI based. We call it autonomous - we have Drive. All the instruments in the world, especially medical ultrasound in the future, will be very high resolution. Same with CT, MRI, PET scans – if we can lower the radiation dose and use generative AI to increase the resolution, it makes it more usable for more people. In order to have AI, you’ll need a digital twin to keep everything up to date and get better over time. Omniverse is where edge AI or robotics AI all learn how to be good AIs. We have a lot of big market opportunities here. It’s hard but worth it.
Q: PC Gamer - All the cool gaming stuff you announced - none was announced in the keynote. Do you still love PC Gamers?
A: I love PC gaming, but my keynote was already 2 hours. I didn’t want to torture you guys. It was too long anyway. But I love PC gamers - without the PC gamers we couldn’t build NVIDIA to what it is now.
Q: Can you talk about the Bluefield roadmap? Bluefield 3 has been in the market for a while - what about the next generation? Should we expect something new this year? Networking is already a bottleneck for accelerated compute, and with you moving to a yearly cadence on the GPUs, but networking being at a roughly 3 year cadence, how will you keep up?
A: Our Ethernet switch is Spectrum-X, and we use it with Bluefield 3. Bluefield 3 went into production late last year. Bluefield 3 and Spectrum-X will be a multi-billion dollar run rate business in just one year - that doesn’t happen very often. Spectrum-X is 800 Gb/sec, and the next is Spectrum-X Ultra. It’s also 800 Gb/s, but supports 512-radix, so you can connect more GPUs. After those is 1600 Gb/sec, after that is 1600 Ultra - on a one year rhythm. So something is mechanically or electrically different every year.
So we speed up, then scale out – that’s our cadence for networking. Bluefield 3 makes it possible to bring AI to Ethernet - Ethernet was not designed for AI, but Ethernet is for each of us connecting to Hyperscale. Bluefield 3 and Spectrum-X is 1.6x better than Ethernet. If we’re 1.6x faster than Ethernet, and you don’t chose us, it would be like buying a 2-billion dollar datacenter, but only getting 1 billion worth of value. They’re big numbers that we’re working with here, and people have heard me say that Spectrum-X is practically free, but the reason is simple - utilisation. It’s why we’re selling so fast. We’re bringing AI to Ethernet, in a multi-billion dollar industry, in one year.
When questioning NVIDIA’s employees at the demos, in order to achieve that 1.6x speedup, BF3/S-X uses an ‘optimized’ version of the Ethernet spec, aka stuff gets thrown out that’s not needed. However if a customer does need it, they can re-enable the full stack. I’ve not heard performance numbers when the full stack is in play.
Q: Are you worried about the geopolitical risk of having your R&D and AI in Taiwan, and what are your criteria for wanting to build a new facility?
A: The main thing is a good piece of land, seriously. We build in Taiwan because TSMC is real good – it’s not just good, it’s incredible. They have an incredible work ethic. Our two companies have been working together for a quarter century, and we just work well together. So we can build very complicated things at high volume and high speed with TSMC - you can’t just do that anywhere. If we could build this anywhere, we’d think about it. But quite frankly it’s not really possible. I need to be able to make it at all, and for TSMC and the ecosystem around them, the entire ecosystem is full of unsung heroes. If you’re from Taiwan, be very proud of the ecosystem here. If you’re a Taiwanese company, be proud of the ecosystem here. I’m excited for this new start, on the expertise they’ve built over the last 2-3 decades. AI is going to be bigger than all the previous computer revolutions before, so I’m very happy for prosperity and the exciting future ahead.
It's almost enough to make someone wish that all AI chips fail some vital ethics issue and flop. At least then some small competitor would stand a chance. 😜
Powerful, important, and well-written. Thank you.
Dave Burstein