Abstract
Humans instinctively walk and run-brisk walking feels effortless, and we naturally adjust our stride and pace without conscious thought. For physical AI robots, however, mastering basic movements doesn't automatically translate to adaptability in new or unexpected situations. Even if a robot is trained to run at high speeds, it may struggle with nuanced adjustments-such as modifying leg angles or applying the right force-when faced with different tasks, often leading to unstable or halted movements.
Recognizing this challenge, Professor Seungyul Han and his research team from the Graduate School of Artificial Intelligence at UNIST has developed a pioneering meta-reinforcement learning technique that enables AI agents to anticipate and prepare for unfamiliar tasks independently.
Introducing the Task-Aware Virtual Training (TAVT)-an innovative approach that equips AI with the ability to generate and learn from virtual tasks in advance, significantly enhancing its capacity to adapt to unforeseen challenges.
The research utilizes a dual-module system comprising a deep learning-based representation component and a generation module. The representation module assesses the similarities between different tasks, creating a latent space that captures essential features. The generation module then synthesizes new, virtual tasks that mirror core aspects of real-world scenarios. This process effectively allows AI to pre-experience situations it has yet to encounter, boosting its readiness for out-of-distribution (OOD) tasks.
Jeongmo Kim, the lead researcher, explains, "Traditional reinforcement learning trains an agent to excel within a specific task, limiting its ability to generalize. While meta-reinforcement learning exposes the agent to multiple tasks, adapting to entirely new, unseen situations remains a challenge," adding "Our TAVT approach proactively prepares AI for such scenarios."

Figure 1. Schematic image illustrating the overall study of the research.
The team tested TAVT across various robotic simulations, including cheetahs, ants, and bipedal robots. Notably, in the Cheetah-Vel-OOD experiment, robots utilizing TAVT quickly adapted to previously unexperienced intermediate speeds (1.25 and 1.75 m/s), maintaining stable and efficient movement. In contrast, conventionally trained robots often struggled to adjust, resulting in instability or loss of balance.
Professor Han emphasized "This method significantly improves an AI's ability to generalize across diverse tasks, which is vital for applications like autonomous vehicles, drones, and physical robots operating in unpredictable environments." He further noted, "It paves the way for more flexible, resilient AI systems."
The research has been accepted for presentation at the International Conference on Machine Learning (ICML 2025), which took place in Vancouver, Canada, from July 13 to 19, 2025. Supported by the Ministry of Science and ICT (MSIT), the Institute of Information & Communications Technology Planning & Evaluation (IITP), and various national initiatives, this work underscores a concerted effort to advance AI core technologies and foster innovative solutions for real-world challenges.
Journal Reference
Jeongmo Kim, Yisak Park, Minung Kim, et al., "Task-Aware Virtual Training: Enhancing Generalization in Meta-Reinforcement Learning for Out-of-Distribution Tasks," ICML '25., (2025).