Companies mentioned in this article: NVIDIA NVDA 0.00, ASML ASM 0.00, TSMC TSM 0.00, Synopsys SNPS 0.00 , imec
One of the aspects of modern chip design is how well we can print super small structures with light. In order to do it effectively, not only do we need a good light source, but we also have to have a sufficiently accurate stencil in order to make sure we print the right image onto the silicon. In the world of chip design, we call that stencil a mask, and when using light, we call the process lithography. A modern leading edge processor requires dozens of picometer accurate masks in the tool chain to stencil the nanometer wide elements of our chips.
The problem with using light in lithography however is that when you want to start printing devices in your mask that are near the wavelength of light you are using, then the light starts to diffract, making your device ‘fuzzy’. More often than not, lithography requires significant accuracy to ensure the valleys/troughs that you print into the chemicals to create nanostructures to be well defined and rigid, not ‘fuzzy’.
There are several ways around this – the easiest being to ‘simply’ use smaller wavelengths of light. ArF is 193 nm, and EUV is 13.5 nm. You could also play around with the mirrors inside the machine, enabling better optics or anamorphic projection. Both of these are very costly, and in the case of smaller wavelength of light, take decades to actually show that they work – EUV for example was first called Soft X-Rays and was envisaged in the 1980s, and expected to come into production at the 180nm node, rather than the 7nm node which is when it actually came in.
One other way to get better defined features is to actually build a stencil, a mask, that uses the fuzziness to its advantage. Light, as we should have all learned in high-school physics, can split and interfere with itself, because it is an electromagnetic wave with a given phase angle. This interference can either be constructive, where the amplitude is combined, or destructive, where the two waves cancel each other out. If the mask is constructed in such a way, then you can print devices onto silicon taking into account the fuzziness and still get a desired accuracy on what you are building.
Jim Keller, ScaledML Conference 2020
Here is an example of how device printing using photolithography has changed over the years. When the wavelength of light was small compared to the devices we wanted to print, we could simply put the shape of those devices into the mask.
As the devices we wanted to print became smaller, adding ‘dog ears’ or ‘bone ears’ to the corners of simple lines helped keep those corners well defined. Then as time went on, exactly what we had to put into those masks has slowly showcased less and less of what we think we wanted, but the idea is that the end product is better constructed, better printed, than that simple shape we did at the beginning.
During the lifetime of DUV, or Deep Ultra Violet, such as with ArF at 193nm, the industry developed tools to continue to print devices down to the 10nm node (i.e. 26-40nm in reality) by using these techniques. The techniques came into two major buckets:
OPC, Optical Proximity Correction (either rule-based or model-based), or
ILT, Inverse Lithography Technology
Both are complicated mathematical ways, using Maxwell’s equations, to get from printing a simple shape like a plus, into what looks like a blurry image that still prints a plus but better. ILT is the difficult one, because as the name suggests, it works inverse, or rather in reverse. It starts with what you want to print, and then works backwards as to what the mask should look like given the nature of the optics of the system.
The act of calculating what the mask should look like is a whole field known as computational lithography, which is the point of this post. Because the equations governing the optics are very nasty integrals, the computational side is excessively complex and takes weeks or months to compute. The more complex a chip, the more masks it needs, and the bigger the chip, the more this takes to create. Most mask fabrication companies (usually the foundries themselves) have dedicated datacenters and racks and racks of machines to do this for their customers.
imec ITF 2022 Roadmap
For example, in today’s announcement, NVIDIA CEO Jensen Huang stated that the H100 big Hopper GPU has 89 masks, and it takes two weeks on CPUs to compute each one. Overall, this is billions of CPU hours worldwide every year, contributing to the capital expenditure of the foundries.
Based on years of work, today NVIDIA is announcing cuLitho, combining both CUDA and Computational Lithography into a GPU-accelerated software library available for foundry use. One of the underlying computational elements to hard maths like inverse lithography technology is convolutions, and NVIDIA has recently been doing a lot of work in machine learning to accelerate convolutions, so it was simply a matter of time before they were paired together.
NVIDIA claims that compared to standard CPUs, a H100 based system can increase the throughput of computational lithography efforts by 42x, however the software can run on several generations of NVIDIA GPU if needed. In the examples NVIDIA gave us, 500 DGX H100 systems (so, 4000* H100 GPUs) can replace the work of 40,000 CPU systems (no words on sockets per system, cores per chip). In their eyes, NVIDIA believes they can cut two weeks of compute into an overnight process. This also comes at the claimed benefit of 1/8th the data center space, 1/9th of the energy consumed, and improving the sustainability and scale out of foundry data center resources.
In the image above on ‘how to print a +’, as we moved on, we can see the complexity increase. When we hit EUV processing, the industry was able to start near the beginning again without the computational lithography due to the wavelength shift from 193nm to 13.5nm. However, all the techniques learned in the previous generation can be applied to the new one, and in our discussions with NVIDIA executives on this new cuLitho tool, a lot of those techniques can be applied today to improve yields still further. NVIDIA’s spokesperson even went as far as confirming that these tools could reduce the number of masks, reducing cost and improving yield.
NVIDIA’s partners in today’s launch are TSMC, ASML, and Synopsys. The people NVIDIA gave to us to talk to were the engineers, not on the business side, so nothing could be said about other companies, EDA vendors, or foundries using cuLitho in the future, however NVIDIA did explain that cuLitho should be up and running at TSMC for use from June of this year.
Of course when speaking about NV, it’s hard not to mention machine learning. Currently cuLitho is purely a mathematical solution tool, and not a machine learning accelerated tool. The engineering team we spoke to, when asked about adding ML tools to something like this, had a gleam in their eyes and essentially said ‘that would be the dream’. So not yet, but perhaps in the future.
Today’s announcement from NVIDIA came as part of their annual GTC 2023 conference as part of the CEO Keynote.
*article has been corrected as there are 8 H100s per DGX H100, not 4 as originally (badly) calculated.
Thank you for breaking this down Dr. C!
Very interesting and hope it plays a bigger role for High-NA EUV as the industry transitions from 0.33 NA to 0.55 NA.