New 3D Chip Poised to Break AI Bottleneck

Stanford University

Engineers from Stanford University, Carnegie Mellon University, University of Pennsylvania, and the Massachusetts Institute of Technology worked with SkyWater Technology, the largest exclusively U.S. based pure play semiconductor foundry, to create a new multilayer computer chip. The team says its architecture could mark a major shift in AI hardware and strengthen domestic semiconductor innovation.

Unlike most of today's chips, which are mostly flat and 2D, this prototype is built to rise upward. Ultra thin parts are stacked like floors in a tall building, and vertical wiring works like many fast elevators that move huge amounts of data quickly. With a record setting number of vertical connections and a tightly woven layout that places memory and computing units close together, the design avoids slowdowns that have limited progress in flat chips. In hardware tests and simulations, the 3D chip beats 2D chips by roughly an order of magnitude.

Researchers have made experimental 3D chips in academic labs before, but the team says this is the first time one has delivered clear performance improvements and been produced in a commercial foundry. "This opens the door to a new era of chip production and innovation," said Subhasish Mitra, the William E. Ayer Professor in Electrical Engineering and professor of computer science at Stanford University, and principal investigator of a new paper describing the chip presented at the 71st Annual IEEE International Electron Devices Meeting (IEDM). "Breakthroughs like this are how we get to the 1,000-fold hardware performance improvements future AI systems will demand."

Why Flat Chips Struggle With Modern AI

Large AI models such as ChatGPT and Claude constantly shuttle enormous volumes of data between memory, which holds information, and the computing units that process it.

On conventional 2D chips, everything sits on one surface and memory is limited and spread out, so data is forced through a small number of long, crowded paths. The computing parts can run far faster than data can be delivered, and the chip cannot keep enough memory nearby. The result is frequent waiting. Engineers call this problem the "memory wall," where processing speed outruns the chip's ability to feed it data.

For years, chipmakers pushed back against the memory wall by shrinking transistors, the tiny switches that handle computations and store data, and packing more of them onto each chip. But researchers say that approach is nearing hard physical limits, known as the "miniaturization wall."

The new design aims to get past both limits by building upward. "By integrating memory and computation vertically, we can move a lot more information much quicker, just as the elevator banks in a high-rise let many residents travel between floors at once," said Tathagata Srimani, assistant professor of electrical and computer engineering at Carnegie Mellon University, the paper's senior author, who began the work as a postdoctoral fellow advised by Mitra.

"The memory wall and the miniaturization wall form a deadly combination," said Robert M. Radway, assistant professor of electrical and systems engineering at the University of Pennsylvania and a co-author of the study. "We attacked it head-on by tightly integrating memory and logic and then building upward at extremely high density. It's like the Manhattan of computing -- we can fit more people in less space."

How the Monolithic 3D Chip Is Manufactured

Many earlier 3D chip efforts have taken a simpler route by stacking separate chips. That can help, but the links between layers are often relatively rough, limited in number, and can become bottlenecks.

This team used a different approach. Instead of making separate chips and bonding them together, they build each new layer directly on top of the previous one in a single continuous flow. This method, known as "monolithic" 3D integration, uses temperatures low enough to avoid harming the circuitry already built below. That makes it possible to pack layers more tightly and create far more dense connections between them.

A key point, the researchers say, is that the entire process was carried out in a domestic commercial silicon foundry. "Turning a cutting-edge academic concept into something a commercial fab can build is an enormous challenge," said co-author Mark Nelson, vice president of technology development operations at SkyWater Technology. "This shows that these advanced architectures aren't just possible in the lab -- they can be produced domestically, at scale, which is what America needs to stay at the forefront of semiconductor innovation."

Performance Gains and What Comes Next for AI Hardware

In early hardware tests, the prototype outperformed comparable 2D chips by about four times. The team's simulations suggest even bigger gains as the design grows taller with more stacked layers of memory and compute. With additional tiers, the models show up to a twelve fold improvement on real AI workloads, including workloads derived from Meta's open source LLaMA model.

The researchers also highlight a longer range payoff. They say the architecture offers a practical route to 100 to 1,000 fold improvements in energy delay product (EDP), a metric that combines speed and energy efficiency. By shortening how far data needs to travel and adding many more vertical routes for movement, the chip can increase throughput while reducing energy per operation, a combination that has been difficult to achieve with conventional flat designs.

The team says the importance of the work is not only about speed. By demonstrating that monolithic 3D chips can be made in the United States, they argue it provides a blueprint for a new period of domestic hardware innovation where the most advanced chips can be designed and manufactured on U.S. soil.

They also say the shift to vertical, monolithic 3D integration will require a new generation of engineers trained in these methods, similar to how the integrated circuit boom of the 1980s was fueled by students learning chip design and fabrication in U.S. labs. Through collaborations and funding efforts including the Microelectronics Commons California-Pacific-Northwest AI Hardware Hub (Northwest-AI-Hub), students and researchers are already being prepared to push American semiconductor innovation forward.

"Breakthroughs like this are of course about performance," said H.-S. Philip Wong, the Willard R. and Inez Kerr Bell Professor in the Stanford School of Engineering and principal investigator of the Northwest-AI-Hub. "But they're also about capability. If we can build advanced 3D chips, we can innovate faster, respond faster, and shape the future of AI hardware."

This study took place at Stanford University School of Engineering, Carnegie Mellon University College of Engineering, the University of Pennsylvania School of Engineering and Applied Science, and the Massachusetts Institute of Technology, and all fabrications were completed at SkyWater Technology's Bloomington, Minnesota, Foundry. Support came from the Defense Advanced Research Projects Agency, the U.S. National Science Foundation Graduate Research Fellowship Program, Samsung, the Stanford Precourt Institute for Energy, the Stanford SystemX Alliance, the Department of War's Microelectronics Commons AI Hardware Hub, the U.S. Department of Energy, and the National Science Foundation's Future of Semiconductors Program (2425218).

Additional Stanford co-authors include Suhyeong Choi, Samuel Dayo, Andrew Bechdolt, Shengman Li, Dennis T. Rich, and R.H. Yang. Additional authors are from CMU and MIT.

/Public Release. This material from the originating organization/author(s) might be of the point-in-time nature, and edited for clarity, style and length. Mirage.News does not take institutional positions or sides, and all views, positions, and conclusions expressed herein are solely those of the author(s).View in full here.