KAIST Develops AI Semiconductor Brain Combining Transformer's Intelligence And Mamba's Efficiency

Korea Advanced Institute of Science and Technology

<(From Left) Ph.D candidate Seongryong Oh, Ph.D candidate Yoonsung Kim, Ph.D candidate Wonung Kim, Ph.D candidate Yubin Lee, M.S candidate Jiyong Jung, Professor Jongse Park, Professor Divya Mahajan, Professor Chang Hyun Park>

As recent Artificial Intelligence (AI) models' capacity to understand and process long, complex sentences grows, the necessity for new semiconductor technologies that can simultaneously boost computation speed and memory efficiency is increasing. Amidst this, a joint research team featuring KAIST researchers and international collaborators has successfully developed a core AI semiconductor 'brain' technology based on a hybrid Transformer and Mamba structure, which was implemented for the first time in the world in a form capable of direct computation inside the memory, resulting in a four-fold increase in the inference speed of Large Language Models (LLMs) and a 2.2-fold reduction in power consumption.

KAIST (President Kwang Hyung Lee) announced on the 17th of October that the research team led by Professor Jongse Park from KAIST School of Computing, in collaboration with Georgia Institute of Technology in the United States and Uppsala University in Sweden, developed 'PIMBA,' a core technology based on 'AI Memory Semiconductor (PIM, Processing-in-Memory),' which acts as the brain for next-generation AI models.

Currently, LLMs such as ChatGPT, GPT-4, Claude, Gemini, and Llama operate based on the 'Transformer' brain structure, which sees all of the words simultaneously. Consequently, as the AI model grows and the processed sentences become longer, the computational load and memory requirements surge, leading to speed reductions and high energy consumption as major issues.

To overcome these problems with Transformer, the recently proposed sequential memory-based 'Mamba' structure introduced a method for processing information over time, increasing efficiency. However, memory bottlenecks and power consumption limits still remained.

Professor Park Jongse's research team designed 'PIMBA,' a new semiconductor structure that directly performs computations inside the memory in order to maximize the performance of the 'Transformer–Mamba Hybrid Model,' which combines the advantages of both Transformer and Mamba.

While existing GPU-based systems move data out of the memory to perform computations, PIMBA performs calculations directly within the storage device without moving the data. This minimizes data movement time and significantly reduces power consumption.

As a result, PIMBA showed up to a 4.1-fold improvement in processing performance and an average 2.2-fold decrease in energy consumption compared to existing GPU systems.

The research outcome is scheduled to be presented on October 20th at the '58th International Symposium on Microarchitecture (MICRO 2025),' a globally renowned computer architecture conference that will be held in Seoul. It was previously recognized for its excellence by winning the Gold Prize at the '31st Samsung Humantech Paper Award.' ※Paper Title: Pimba: A Processing-in-Memory Acceleration for Post-Transformer Large Language Model Serving, DOI: 10.1145/3725843.3756121

This research was supported by the Institute for Information & Communications Technology Planning & Evaluation (IITP), the AI Semiconductor Graduate School Support Project, and the ICT R&D Program of the Ministry of Science and ICT and the IITP, with assistance from the Electronics and Telecommunications Research Institute (ETRI). The EDA tools were supported by IDEC (the IC Design Education Center).

/Public Release. This material from the originating organization/author(s) might be of the point-in-time nature, and edited for clarity, style and length. Mirage.News does not take institutional positions or sides, and all views, positions, and conclusions expressed herein are solely those of the author(s).View in full here.