KAIST Unveils AI Agents' Hidden Energy Cost

The Korea Advanced Institute of Science and Technology (KAIST)

As the era of AI agents—systems that can reason and act autonomously—begins, the power consumption of data centers is emerging as a critical challenge. A KAIST research team has, for the first time, analyzed the computational cost and energy consumption of AI agents, finding that they can consume up to 136.5 times more energy per query than conventional generative AI. The study shows that competitiveness in the AI era is expanding beyond model performance to include the efficiency of data centers and power infrastructure.

KAIST announced that a research team led by Professor Minsoo Rhu of the School of Electrical Engineering has systematically analyzed, for the first time, how much computational resources and power AI agents require in real-world service environments.

Large language model (LLMs) powered applications such as ChatGPT have rapidly evolved beyond simply answering questions. They are now developing into AI agents: next-generation AI systems that can plan, use external tools such as web search, calculators, and code execution environments, and solve complex tasks by coordinating multiple steps on their own.

Although AI agents are increasingly being adopted in areas such as software development, research, and workplace automation, little has been known about the amount of electricity and operational cost required to run them in practice.

The research team defined AI agents not merely as software programs, but as a new type of workload that must be continuously processed by data-center servers and graphics processing units, or GPUs—high-performance chips used for large-scale AI computation. The team then analyzed the computational load and energy consumption incurred during actual AI agent execution.

The analysis found that AI agents perform, far more LLM invocations than conventional chain-of-thought reasoning. Chain-of-thought, or CoT, refers to a method in which an AI model breaks down its reasoning process step by step to reach an answer, while an LLM invocation refers to each computational request made to a language model to generate a new judgment or response.

Because AI agents repeatedly call language models during execution, their response latency also increases significantly. The team found that response time can increase by up to 153.7 times, while GPUs remain idle for as much as 54.5 percent of the total execution time as external tools perform their tasks. In other words, as AI systems take on more complex tasks, a new form of inefficiency emerges in which expensive GPUs cannot be fully utilized.

The research team also analyzed the power consumption of AI agents at data-center scale. An AI agent using a 70-billion-parameter LLM—a scale comparable to current commercial AI services—consumed an average of 348.41 watt-hours per query. This is 136.5 times higher than the energy consumed by a conventional generative AI system performing simple question answering.

In addition, the team projected a future scenario in which 13.7 billion AI agent requests are generated per day — a volume equivalent to current Google search traffic. Under this scenario, data-center power demand would reach approximately 198.9 gigawatts, a level far exceeding the scale of AI data centers currently under development (which are in the range of a few gigawatts) and equivalent to roughly half of the average power consumption of the United States.

This study demonstrates that the focus of competition in the AI era is shifting from "smarter AI" to "more efficient AI." Going forward, it will be essential not only to advance AI models, but also to jointly optimize AI semiconductors, data centers, and power infrastructure through co-design. Such an approach is expected to become a key strategy for reducing the operating cost of AI services and building sustainable AI infrastructure.

"This study is the first to quantitatively show not only how AI is becoming more intelligent, but also how much electricity and cost are required to implement and sustain that intelligence," said Professor Rhu. "As AI agents become widespread, it will become increasingly important to take an integrated co-design approach that optimizes not only AI data-center infrastructure, but also AI agent models and power infrastructure." He added, "Research and investment in this direction will be essential to dramatically reduce the cost for end users to access AI services while building sustainable AI infrastructure."

The study was conducted with Jiin Kim, a Ph.D. student in the KAIST School of Electrical Engineering, as the first author. The paper was presented in February at the 32nd IEEE International Symposium on High-Performance Computer Architecture, or HPCA, one of the most prestigious international conferences in computer system design. The research team has also released the AI agent implementations and benchmarks used in the paper as open source to support follow-up studies by researchers worldwide.

Paper title: "The Cost of Dynamic Reasoning: Demystifying AI Agents and Test-Time Scaling from an AI Infrastructure Perspective"

Open-source repository: 10.1109/HPCA68181.2026.11408569

This research was supported by the Institute of Information & Communications Technology Planning & Evaluation (IITP) through the SW Starlab program, the K-Cloud Technology Development Program using AI semiconductors, and the Leading Technology Development Program for Advancing AI-Semiconductor-Based Data Centers, as well as by the Samsung Electronics Future Technology Incubation Center.

/Public Release. This material from the originating organization/author(s) might be of the point-in-time nature, and edited for clarity, style and length. Mirage.News does not take institutional positions or sides, and all views, positions, and conclusions expressed herein are solely those of the author(s).View in full here.

You might also like