Faster and more effective scene understanding

Faster and more effective scene understanding

Red for people, blue for cars: A new method uses artificial intelligence (AI) model that enables coherent recognition of visual scenes more quickly and effectively. Source: Abhinav Valada

People, bicycles, cars or road, sky, grass: Which pixels of an image represent distinct foreground persons or objects in front of a self-driving car, and which pixels represent background classes? This task, known as panoptic segmentation, is a fundamental problem that has applications in numerous fields such as self-driving cars, robotics, augmented reality and even in biomedical image analysis. At the Department of Computer Science at the University of Freiburg Dr. Abhinav Valada, Assistant Professor for Robot Learning and member of BrainLinks-BrainTools focuses on this research question. Valada and his team have developed the state-of-the-art “EfficientPS” artificial intelligence (AI) model that enables coherent recognition of visual scenes more quickly and effectively.

This task is mostly tackled using a machine learning technique known as deep learning where artificial neural networks that are inspired from the human brain, learn from large amounts of data, explains the Freiburg researcher. Public benchmarks such as Cityscapes play an important role in measuring the advancement in these techniques. “For many years, research teams, for example from Google or Uber, compete for the top place in these benchmarks”, says Rohit Mohan, a member of Valada’s team. The method of the computer scientists from Freiburg, which has been developed to understand urban city scenes, has been ranked first in Cityscapes, the most influential leaderboard for scene understanding research in autonomous driving. EfficientPS also consistently sets the new state-of-the-art on other standard benchmark datasets such as KITTI, Mapillary Vistas, and IDD.

On the project website, Valada shows examples of how the team trained different AI models on various datasets. The results are superimposed on the respective input image, where the colors show which object class that the model assigns the pixel to. For example, cars are marked in blue, people in red, trees in green, and buildings in gray. In addition, the AI model also draws a border around each object that it considers a separate entity. The Freiburg researchers have succeeded in training the model to transfer the learned information of urban scenes from Stuttgart to New York City. Although the AI model did not know what a city in the USA could look like, it was able to accurately recognize scenes of New York City.

Most previous methods that address this problem have large model sizes and are computationally expensive for use in real-world applications such as robotics that are highly resource constrained, explains Valada: “Our EfficientPS not only achieves state-of-the-art performance, it is also the most computationally efficient and fastest method. This further extends the applications in which EfficientPS can be used.”

/Public Release. View in full here.