If the last few months of big AI silicon has been anything to talk about, then how all of these large powerful chips are connected together is often as important a topic as the chips themselves. Large machine learning models need lots of bandwidth across a whole machine for training, and when it comes to inference, a sufficiently large distributed hardware model to deal with the size and lumpiness of an inference workload under demand. At the launch of the NVL72, featuring 120 kW of NVIDIA’s latest Blackwell AI hardware, it was remarked that a single system had almost a mile of cabling, most of it for high-speed bandwidth between the hardware in the system. However, there is another way.
Optical connectivity isn’t new. We’ve been using it as an alternative to copper for high-speed, long distance communications in networking for decades. Within the datacenter, this comes in the form of transceivers and switches. Each port uses an optical transceiver to offer bandwidth at distances that electrical connections over copper cannot provide. Ideally, the optical connection would be in the AI chip, and not provided through a SERDES/network controller that adds power and latency – however the technology used in transceivers is too power hungry and doesn’t scale to what’s needed for the concept of integrated photonics.
Integrated photonics literally means building the optical connectivity into silicon. One company that’s leading the charge in this technology is Ayar Labs. Founded in 2016, the company has developed a solution, using chiplets, to enable direct silicon-to-photon without the need for electrical or copper cabling. The first generation consists of a SuperNova light source, essentially the power system for the optical connectivity, and a TeraPHY chiplet capable of 4 Tbps TX and RX. Ayar Labs has already shown demos of the connectivity, integrating into an FPGA as well as development kits for customers already testing the technology for their own roadmaps.
Today I’m sitting down with Mark Wade, CEO and Co-Founder of Ayar Labs. Mark is, funnily enough, younger than I am, but I won’t hold that against him! Mark and the other co-founders were part of an academic team from MIT and UC Berkeley that invented breakthrough optics technology to enable this to happen, and published the first papers showcasing optical communications within a processor. Mark has held the Chief Scientist and CTO positions at Ayar Labs, and the company recently moved into new facilities in Santa Jose, so I went to pepper him with questions and look at the new facilities after seeing the demos late last year.
The video of the interview can be found here, with the transcription below. The transcription has been cleaned up for readability.
Ian: Who, or what, is Ayar labs?
Mark: Ayar Labs is building optical I/O for computing. We think of it as imagine a compute socket, or some ASIC that's doing compute. It has a problem getting data off of that socket and moving somewhere else in the system – it could be compute-to-compute, compute-to-memory, compute-to-storage, or compute-to-network. But any high bandwidth data transfer that's happening in the computing fabric, that's moving that data optically, is what we call optical I/O.
Ian: What does optical bring to the table?
Mark: The co-founders and myself, years ago, were studying the fundamental physics of how computing systems are built. Of course, a lot of people focus on cramming more and more transistors on a single pieces of silicon, and increasing the amount of computation you can get on a single piece of silicon. That's gone pretty well over the years. What we recognised is that as compute was scaling with more transistors on each piece of silicon, the ability to move data in and out of that compute chip was becoming supremely challenged. At the root of it are fundamental physics issues happening, and how inefficiencies creep in when you're moving high bandwidth with electricity. Optical I/O goes from data transfer with electrons and move it to photons in the optical domain. We then have much higher bandwidth, and especially over long distances. So we view it as solving really fundamental problems happening in the data movement with electrical I/O today.
Ian: What workloads have those issues?
Mark: If you go back, when we first started working on this, a lot of these early insights were coming out of the high performance computing community - you know, big machines that the national labs are building. Those large systems were the first to see that they have massive data movement problems that are starting to bottleneck entire system performance. Let's call that the 2010 to 2015 time period. That was the canary in the coal mine that said there's a problem in the underlying computing technologies. What happened after that, as the AI workflow that started emerging and in early workloads of image recognition, recommendation engines, these kinds of things - but then especially when the transformer model came online and started to enable new AI applications, now what we would call generative AI. But the key thing to realise is that the computing systems that are that are forming the backbone of these of these AI computing systems, they look like high performance computing architectures. So the same data movement challenges that were happening in high-performance computing a decade ago are now starting to show up in AI systems and bottlenecking the overall system performance.
Ian: Is that tight in terms of bandwidth or latency? How does power go into that equation?
Mark: We view it as multifaceted. Everything you just mentioned hits you at some point - you have to get people to move more bandwidth over longer distances with a power constraint. So the power constraints on these systems are not infinite. You have thermal and power density issues at every single level - the chip level, the package level, the system board level, the rack level. So power is happening at every level. Latency comes into play is where you have to examine it a bit closer. The way that people move high bandwidth electrically today, with copper and electrical I/O, you tend to do things like add error correction, because you're trying to recover all the inefficiencies and corruption of data that's happening as you move it electrically. In optics you can solve that problem in an elegant way that gets rid of error correction. So you can, say, gets to a much lighter weight architecture for error correction. This affects latency.
Distance becomes a big part here as well. If you're familiar with optics, we have distance regimes that are in kilometres. If you want your internet to be faster at your house, you know that you need to get the fibre line to come to your house. Conceptually that is the same thinking - think about the same reason you want an optical line coming to your house. Now just propagate that story down even deeper, and it's why you would want optical connections coming straight to the CPU package. So all the intuition applies - shrink it down by another 10x to 100,000x, with much higher performance. So yes, bandwidth, power, latency. All these issues are happening at once and hitting all at the same time. Optical I/O solves those by saying I can move more bandwidth over longer distances with better latency and better power, especially at rack scale, multi-rack scale, and big AI infrastructure systems.
Ian: Why hasn't it been done yet?
Mark: It’s a great question. Optics is not a new technology – it was in the 70s when fibre optics really came into technology. We started building undersea cables and these kinds of things, eventually connecting the internet. Optics technologies are known.
The need to move data optically straight out of the compute package is really a fairly recent phenomenon that has to do with the rate of how bad the electrical I/O problem has been has been getting. We have the applications that are demanding even more bandwidth with better power efficiency - that starts to break the back of the existing incumbent electrical I/O based systems. But the challenge has been that you can't just take the technologies and products that people use for more standardised solutions that people were probably familiar with, such as pluggable transceivers using Ethernet. If I'm moving 100 Gbps, 400 Gbps, or 800 Gbps in my in my datacenter, across the datacenter, those are already optical pluggable transceivers. The problem is if you open up those transceivers and look at what's inside, they don't have the characteristics that scale directly to the compute fabric
Ian: So it's a size issue?
Mark: It's the size, the number of components, the cost structure as to how all that stuff is put together. It’s also the power efficiency, the thermal sensitivities, and there's a whole layer of ‘I can't just take my transceiver and put it into a computer package’. So we had to work on inventing a technology from the ground up that had the right low-level characteristics: density, the size of the device, the energy efficiency, and importantly, the ability to be integrated in manufacturing processes that can that can operate at CMOS scale. We had to intercept how you bring that technology into the package, because this was a truly massive volume kind of application. All these characteristics have been challenges every single step, and part of our company, and part of what we worked on for several years now, is really solving those steps one at a time.
Transceiver vs Optical Chiplet
Ian: Technologies like GlobalFoundries’ 45nm optical process, or UCIe chiplet standards. Those didn't exist a few years ago, but they do now. That has helped?
Mark: If you look at the history of Ayar Labs - we've approaching a late-stage startup phase right now. In the early phases, we focused a lot on establishing key partnerships and relationships with our go-to-market partners. So what does that mean? From the supplier side, from the core manufacturers you'll find our equity partners and strategic investors, such as GlobalFoundries, Applied Materials, and Intel Capital. We did that very early on because as we're trying to ultimately transform the computing industry to go from electrical I/O to optical I/O, that's a big lift. That's a big heavy lift, and a huge value proposition. A big promise, but a big lift.
So we needed a series of strong industry partnerships and strategic relationships that that we could work closely with to make that reality happen. So yes, on the foundry side we have GlobalFoundries, then Intel Capital and Intel for advanced packaging, as well as OSATs, and we work very closely with the with the team. With photonics and optical technologies, the crux of getting them to market has always been manufacturing. It is pretty easy to show compelling things on slides - it's much harder to prove that you can take that compelling vision on the slide and prove it at scale with a manufacturing base that can deliver millions or 10s of millions of units per month. That’s the scale that these the high volume customers in the CMOS world and these high performing systems need. So we focused on saying that we're not a company that just cares about the cool story on the slide - we're a company that cares about establishing deep and thorough relationships to make it to the high volume opportunity.
Ian: You’ve shown off some demos at recent events. What’s the product portfolio today?
Mark: We like to say that we're building and selling an optical I/O solution, and the solution is comprised of two pieces. It's a laser module, which we call Supernova, and an optical I/O chiplet, which we call TeraPHY.
To have an optical link, you need to produce light somewhere. We do that with our Supernova product. It's a remote laser light source, so it's separate from the optical chiplets that get co-packaged with the SoC, but it provides light into that SoC package and powers up the optical link. You can think of it as an optical power supply.
Then the TeraPHY optical I/O chiplet - it crams a whole bunch of functionality onto a single chiplet, and has massive bandwidth on one chiplet. Then different customers can choose to populate one, two, or up to eight of those chiplets per package. It's a modular unit of really high optical I/O bandwidth. So our solution is Supernova, plus TeraPHY optical fabric, and the firmware/software layer that coordinates and orchestrates this optical I/O solution as well.
Ian: To draw some comparisons with Ethernet today. You mentioned the up to 800 Gbps, and the Ultra Ethernet Consortium is talking about 1.6 Tbps. Electrical says the cables can go up to 800 Gbps. We've also had in system technology like NV Switch, going up to hundreds of gigabytes a second. Where are you on that journey?
Mark: Per chiplet, we're at four terabits per second per chiplet. The way you scale up the bandwidth coming out of a single package is you instantiate multiple chiplets into the design. If a customer has eight chiplets, that’d be 32 terabits per second coming out of a compute package, aggregate bandwidth (TX plus RX).
If you look at the roadmap, what we do is we're driving a doubling of that bandwidth per chiplet, every couple of years. We plan to go from 4 Tbps to 8 Tbps, then 16 and 32. That's per chiplet. There are also a few vectors that we push down - per chiplet bandwidth, the capabilities to instantiate multiple of those per package, right to scaling up the overall package level bandwidth, and radix of what can escape that package. Our customers typically focus on how much bandwidth can we escape from their package – and under what power density kind of constraints. Especially as AI systems are growing, the higher bandwidth escape per package is becoming important, and also driving up the radix of connectivity.
Ian: Radix in this context just means how many hops you make before you can get to the other end of your system?
Mark: That's right. I would use it as an equivalent to how many ports are on our chip, then how many ports per package given how many chips you instantiate. Right now we have eight ports per chip, and eight ports per chiplet. With, let's say, four chiplets per package, you're at a connectivity of 32 ports. You can take all those ports to different places.
Ian: Right now in NVIDIA’s DGX boxes, there are eight GPUs that are connected all-to-all. With enough optical chiplets, you can do a 32 all-for-all or beyond that?
Mark: That's right. These systems are trying to grow in how much bandwidth they're moving between the GPUs or accelerators or different compute nodes, and we can expand to a higher amount of bandwidth and number of ports. With more ports coming straight out of the socket, you can flatten the network that connects all these things up together, and really drive efficiencies and simplicity at the network level without having to get into tiers of switching to expand out the total system size. You're able to increase the domain size of a big AI accelerator system, while only moderately growing the fabric size and the number of levels you have to traverse in that fabric to get to all those endpoints.
Ian: Where do you sit on power? Usually in this ecosystem, we talk about picojoules per bit, but you also have this additional laser on package that obviously has some power attached to it.
Mark: To give you a few numbers. If you look at how people build in connecting up systems using the products on the market today and the evolution of those products, you're really in the in the 20 picojoule-per-bit (20 pJ/b) regime. This is while also not meeting the latency requirements or density requirements. Our target landing zone for what we have - we think we have a really compelling energy efficiency is in the five to seven picojoule-per-bit (5 pJ/b to 7 pJ/b) total.
Now, you'll find that in this space, there's a lot of “specsmanship” that people do. [Some people will report they] have a one picojoule-per-bit thing, but forget to account for a whole list of other things that you need. This might include the DSP, the clocking and how you actually construct a high speed interconnect system, right. So we focus on the total cost of ownership power. On a per chiplet basis, it's already into a regime that is better than anything on the market. But the real benefits show up at the system level. With optical I/O straight from the package, I don't need an expensive motherboard with incredible signal integrity and electrical traces. I don't need retimers, I don't need pluggable transceivers. I might not even need certain entire layers of switching. So if you add up the power that you start saving at the system level, once you really connect everything up top and connect everything up optically, it's a huge amount.
Ian: It's the bill of materials of the system, not necessarily the one blade cost.
Mark: That's right. I think there are benefits at the one blade level, but the true benefits start to show up at the multi blade or the rack scale / multi-rack scale kind of level. So that's where there are dramatic differences in power consumption and TCO of those kinds of systems.
Optical FPGA Demo from Hot Chips 2023
Ian: We had a discussion before this interview, and one of the things that came up was this idea of reconfigurable composability. I’ve spoken to the audience about things like CXL, and companies who are trying to do it over PCIe. But you had a slightly different take on what optical enables in all of this?
Mark: We're big believers in the promise of disaggregation and composability. If you look at the trends that have happened in the computing world, the electrical I/O problem, people are trying to connect up lots of different pieces in the system with high enough bandwidths, because there's a big bandwidth distance trade-off in electrical I/O, it drives you to cram lots of things very close together. Your bandwidth as you get further away is just less. So if you want to go half a meter, you have a certain bandwidth you can support. If you want to go two meters, it's different - if you want to go five meters is different. Now you can put band aids all along the path, such as retimers in fiber cables, and there's a band aid you can put on it, but it's painful. You take a performance density cost penalty at every single step.
But to go back to composability - there's been this trend with electrical I/O because of the bandwidth distance trade off. It drives you to cram things as close together as possible, because I need for high bandwidth between my compute core and my DRAM. But once you have electrical I/O with the right characteristics, the amount of bandwidth and the power efficiency, optical I/O breaks the bandwidth distance trade off. Compared to electrical I/O, it essentially breaks it completely. You can now go, one meter, 10 meters, 100 meters. You can go through the full datacenter at full bandwidth.
Now you'll be bound by latency, right? It's not a claim that you want to put your compute a kilometre away from main memory at five nanoseconds per metre. But what you want to unlock is system composability that's purely speed-of-light latency bound. The size of my system is now no longer limited by this extremely punitive bandwidth distance trade off of electrical communications. It means I've kind of unlocked my ability to ship lots of bandwidth all over the place. Yes - I'm going to have a round trip latency that is limited by the speed of light, but that doesn't seem to be going anywhere anytime soon!
What this means is we’ve at a place where we can architect systems that are pushing up against the fundamental physics of the universe. That gives a lot of flexibility in talking about having a blade of compute, and a blade of memory, and maybe a blade of storage, especially if the optical I/O is high-performance enough to move all that bandwidth between these different places.
Ian: Where re you with clients today?
Mark: The last few years, we've been pretty heads down. We view that one of the big challenges in photonics is manufacturing at scale. So we've been doing a lot of work with our manufacturing partners to make sure that we've got the core technology on a really strong footing in its manufacturability, as well as its ability to scale.
On the CMOS side, we’re a fabless semiconductor company. So that means that we design things, and then the CMOS manufacturer makes them. A wafer comes out, and then the wafer needs to go somewhere for packaging. So we focused both on solving all the problems at the wafer processing level and making sure we're driving an ecosystem on the back-end beyond the wafer that can take it, dice it up, package it, and putting it put it into a package that's really looking like an SoC. Then test, validation, and working on the whole supply chain alignment along that path.
Last year was a big year for us, because we started shipping what we would think of as our low volume sampling. We have 10,000+ units coming out of the fab. Now that's low volume compared to the high volume that we want to be at. But the key thing is that it exercises the entire go-to-market high-volume path. That's where we're at right now, and we're shipping these units into several different customers that are getting access to what optical I/O looks like. They're starting to integrate it into a number of proof-of-concepts that they're building, and looking at where and how it's going to intercept future product roadmaps.
We think that we've shown the core technology has been proven, and its ability to be manufactured in a scalable manufacturing base has been proven as of last year. We’ll take that base and work on how to ramp it over time and figure out where it's intercepting a number of customer timelines.
Ian: Are those customers in AI?
Mark: We think of this technology as a foundational technology that will ultimately permeate all kinds of application segments. That being said, the one that needs it the fastest with the most urgency is AI.
Ian: Do you speak to the cloud providers? Conversations with hyperscalers too?
Mark: I'd say a number of the hyperscalers have spoken very publicly about the what a new generation of optical I/O can do for how they build their systems. The demand is extremely strong, and it solves key problems that the hyperscalers are experiencing, especially with this onslaught of demand coming for AI. So they're facing the problem of scaling performance of my rack-scale AI systems. The problem of delivering enough power. The problem of optimizing throughput. Hyperscalers look to drive efficiencies in that infrastructure that can start to enable productive economic output on the applications on top of it. Connectivity in this world is viewed as a foundational challenge.
Ian: What does the next few years look like for Ayar Labs?
Mark: Last year was pivotal for us because we started shipping substantial units that exercise the production supply chain. We have a number of customers consuming those, and the intercepts are being planned out for the future generation products. So in the time period that we're really focused on right now is where people are consuming our samples in meaningful quantities. We're validating and qualifying the technology for high volume, for commercial readiness, and the increased adoption rates. We're expecting to start really kicking up in the 2026-2028 timeframe. By 2028, we think that the huge step towards optical I/O connected AI systems will have already happened. Beyond 2028, you're talking about a generation-by-generation roadmap, driving the performance and driving the roadmap for these AI systems. We think this 2026-2028 window is a key window that a number of customers are looking to intercept. It's an important moment in time for the uplift of optical I/O.
Ian: A biased question perhaps, but where do you sit compared to the competition?
Mark: The way that we think about that is it depends on what you mean by competition. If you look at nearest neighbor technology, direct competitors, we're clearly far ahead. We are showing that we have parts working, we are producing them, and we are selling them. Our parts are going through commercial grade qualification. We view it more as our job is to prove that optical I/O can reach the commercial readiness in the commercial grade to displace incumbent technologies in the market.