KAIST Unveils Humanoid Robots With Efficient Vision

The Korea Advanced Institute of Science and Technology (KAIST)

From facial recognition on smartphones to humanoid robots, computer vision technology, which serves as the eyes of artificial intelligence (AI), is widely utilized in our daily lives. A joint research team from KAIST and international institutions has developed a technology that allows AI to see the world more clearly with minimal memory, increasing GPU (Graphics Processing Unit) memory efficiency by up to 16 times. This achievement is evaluated as a core technology that will accelerate the era of humanoid robots and on-device AI.

KAIST announced on June 17th that a research team led by Professor Changick Kim from the School of Electrical Engineering, through joint research with researchers from MIT and Microsoft in the United States, has developed 'Upsample Anything,' a universal technology that can enhance the visual performance of AI even with limited GPU memory.

Following its acceptance to 'CVPR 2026,' the world's most prestigious conference in the field of artificial intelligence and computer vision, this achievement was awarded the 'CVPR Compute Gold Star' in recognition of its efficient utilization of computational resources. It was also selected as the 'Transparency Champion,' ranking first overall in the category of research process transparency and reproducibility. This is an accomplishment that widely recognizes the core elements of responsible AI research, including research performance, computational resources used, code disclosure, and experimental reproducibility.

Recently, humanoid robots, autonomous driving systems, and AI based on world models (AI models that learn and predict the physical environment and changes of the real world) have been compressing input images into low-resolution features (core information extracted from images by AI) to increase computational speed and reduce memory usage.

However, during the compression process, a problem occurs where important visual information, such as small objects, thin structures, and minute defects, is lost. Conversely, processing all images at high resolution from the beginning requires massive GPU memory and computational resources, making real-time processing difficult. This has remained an unresolved challenge for a long time in situations where small devices like smartphones or robots, where mobility is crucial, must precisely perceive their surrounding environment.

To overcome these limitations, the research team developed a training-free (requiring no additional data training) upsampling technology that restores low-resolution feature information into high resolution by utilizing the edge and structural information of the input image.

Existing technologies required a separate retraining or complex optimization process to be applied to new environments or data. In contrast, 'Upsample Anything' developed by the research team can find the optimal restoration method using just a single input image, allowing it to be immediately applied to various environments.

In addition, by compressing and utilizing only core information instead of storing and processing all visual information at high resolution, GPU memory usage was significantly reduced. Based on a 224×224 size image (approximately 50,000 pixels) widely used in AI research, the research team restored visual information close to the original with a short calculation of about 0.4 seconds, achieving a performance that improves GPU memory efficiency by up to 16 times.

This means that artificial intelligence can perceive its surrounding environment more precisely even with limited computational resources. Therefore, this technology is expected to be widely used in various next-generation artificial intelligence fields, such as small devices like smartphones, as well as humanoid robots that need to accurately identify and manipulate small objects, autonomous driving systems, and on-device AI.

Professor Changick Kim said, "This technology is an algorithm that can significantly increase the visual precision of artificial intelligence with fewer resources, and it is expected to accelerate the commercialization of humanoid robots and on-device AI." He added, "It is even more meaningful because it was recognized at CVPR not only for its performance but also for its computational efficiency and research transparency."

This research was participated in by KAIST PhD student Minseok Seo as the first author, and this achievement was presented on June 7 at 'CVPR 2026,' the world's most prestigious conference in the field of artificial intelligence and computer vision.

※ Paper Title: Upsample Anything: A Simple and Hard to Beat Baseline for Feature Upsampling, DOI:10.48550/arXiv.2511.16301

※ Author Information: Minseok Seo (KAIST, First Author), Mark Hamilton (MIT, Microsoft, Second Author), Changick Kim (KAIST, Corresponding Author)

/Public Release. This material from the originating organization/author(s) might be of the point-in-time nature, and edited for clarity, style and length. Mirage.News does not take institutional positions or sides, and all views, positions, and conclusions expressed herein are solely those of the author(s).View in full here.

You might also like