Abstract
A groundbreaking artificial intelligence (AI) technology has been developed to enable camera-based autonomous vehicles to perceive their surroundings more accurately. This innovative approach utilizes the geometric concept of the vanishing point-an artistic device that conveys depth and perspective in images.
Professor Kyungdon Joo and his research team in the Graduate School of Artificial Intelligence at UNIST announced the development of VPOcc, a novel AI framework that leverages the vanishing point to mitigate the 2D-3D discrepancy at both pixel and feature levels. This approach addresses the perspective distortion inherent in camera inputs, enabling more precise scene understanding.
Autonomous vehicles and robots recognize their environment primarily through cameras and LIDAR sensors. While cameras are more affordable, lightweight, and capable of capturing rich color and shape information compared to LIDAR, they also introduce significant issues due to the projection of three-dimensional space onto two-dimensional images. Objects closer to the camera appear larger, while distant objects seem smaller, leading to potential errors such as missed detections of faraway objects or overemphasis on nearby regions.
To address this challenge, the research team designed an AI system that reconstructs scene information based on the vanishing point-a concept established by Renaissance painters to depict depth and perspective, where parallel lines appear to converge at a single point in the distance. Just as humans perceive depth by recognizing vanishing points on a flat canvas, the developed AI model uses this principle to more accurately restore depth and spatial relationships within camera footage.
Fiure 1. Qualitative results on SemanticKITTI validation set. Our VPOcc shows better performance along the road where usually VP is located. Additionally, boxed areas representing distant areas in the images demonstrate that our method also achieves superior performance in these distant areas.
The VPOcc model consists of three key modules. The first is VPZoomer, which corrects perspective distortion by warping images based on the vanishing point. The second is a VP-guided cross-attention (VPCA), which extracts balanced information from near and far regions through perspective-aware feature aggregation. The third is a special volume fusion (SVF), which fuses original and corrected images to complement each other's strengths and weaknesses.
Experimental results demonstrated that VPOcc outperforms existing models across multiple benchmarks in both spatial understanding (measured by mean Intersection over Union, mIoU) and scene reconstruction accuracy (IoU). Notably, it more effectively predicts distant objects and distinguishes overlapping entities-crucial capabilities for autonomous driving in complex road environments.
This research was led by first author Junsu Kim, a researcher at UNIST, with contributions from Junhee Lee at UNIST and a team from Carnegie Mellon University in the United States.
Junsu Kim explained, "Integrating human spatial perception into AI allows for a more effective understanding of 3D space. Our focus was to maximize the potential of camera sensors-more affordable and lightweight than LIDAR-by addressing their inherent perspective limitations."
Professor Joo added, "The developed technology has broad applications, not only in robotics and autonomous systems but also in augmented reality (AR) mapping and beyond."
The study received the Silver Award at the 31st Samsung Human Tech Paper Award in March and has been accepted for presentation at IROS 2025 (International Conference on Intelligent Robots and Systems), a leading conference in intelligent robotics. The event will be held in Hangzhou, China, from the 19th to the 25th of this month.
Journal Reference
Junsu Kim, Junhee Lee, Ukcheol Shin, et al., "VPOcc: Exploiting Vanishing Point for 3D Semantic Occupancy Prediction," IROS '25 (2025).