Imagine navigating a city street during rush hour – cars and bikes zipping by, pedestrians hustling down a crowded sidewalk, your eyes adjusting to the shop windows' glare in one moment and a dark underpass the next. Our brain, of course, does all this without us being aware of the complex processes going on in that moment. In real time, our eyes and brain create a three-dimensional, accurate representation of a dynamic scene, constantly calculating distances between objects with a myriad shapes, sizes and surfaces.
What humans do subconsciously and effortlessly is a tough nut to crack for machines with 3D-sensing systems – think self-driving cars, for example – tasked with measuring real-world scenes filled with objects that reflect light differently. In a paper published in Nature Communications, a research group in the Computational 3D Imaging and Measurement Lab at the University of Arizona now reports clearing a hurdle toward endowing machines with "superhuman 3D vision," according to the lab's director, Florian Willomitzer, an associate professor at the U of A Wyant College of Optical Sciences .
Rather than simply mimicking the capabilities of the human 3D vision, however, his team is developing ways to significantly improve 3D sensors capable of capturing images at higher resolution and faster speed, making the image-capture process impervious to challenging conditions such as highly reflective surfaces.
"Humans already have a built-in 3D camera system – the stereo vision of our two eyes," Willomitzer said. "One of our goals is to enable computers and machines to see in 3D better than any human, which is crucial for a multitude of technological challenges, such as reliable navigation of self-driving cars, accurate guidance during robotic surgery or improved sensing capabilities in industrial inspection and biomedical imaging."
On the way to accomplishing those objectives, 3D imaging has to overcome a stubborn problem: Most state-of-the-art 3D sensors are optimized for imaging either "diffuse" (matte) or "specular" (reflective) surfaces. In contradiction, real-world scenes feature a wide range of surface reflectivities that exist somewhere in between those two extremes. This is where most 3D imagers fail.
"Think of the interior of a car or a living room," Willomitzer said. "Those environments include specular materials, such as mirrors, glass or polished metal finishes, alongside diffuse surfaces, such as walls, fabric and furniture."
The same is true for robotic surgery applications, as a surgical site typically involves glistening fluids and moist tissues as well as diffuse surfaces such as skin. 3D sensing techniques that can measure all those surfaces equally well are extremely difficult to develop.
The team's idea is built on an extension of so-called deflectometry – a well-established technique that measures the shape of specular surfaces by observing how a pattern on a screen is deformed upon reflection over the reflective surface. To measure highly complex shapes with deflectometry, however, the screens have to be very large to cover a wide angular range of surface orientations, Willomitzer explained. For applications such as inspecting freshly painted car bodies, this has even led to tunnel-like screen assemblies large enough to accommodate the entire car. Such solutions are expensive, not portable, and tend to be limited to specific tasks.
The solution that Willomitzer's team developed is as simple as it is effective: Instead of requiring large screens, the entire surroundings of the specular objects that should be measured are turned into a "virtual screen."
"We can use a laser scanner to capture everything in the room, with whatever is inside, including objects with specular, glossy and matte surfaces, as well as matte walls. We then use our algorithms to separate the diffuse from the specular surfaces and can eventually use all measured diffuse scene parts as a virtual screen for the deflectometry measurement of the specular parts," said the study's first author, Aniket Dashpute, who started the work with Willomitzer at their previous institution, Northwestern University, and who is now a doctoral student at Rice University.
"This effectively allows us to repurpose everything inside that room to into a giant display – essentially everything around you becomes a virtual screen," Willomitzer added.
Rather than relying on a conventional camera, which captures the entire scene frame by frame, the researchers use a so-called neuromorphic event camera, which only captures the important parts of the measurement at very high time resolution. This allows them to capture 3D videos of mixed reflectance scenes with moving objects at high frame rates.
"The event camera can handle vastly different light levels – from very dim to extremely bright," said the paper's second author, Jiazhang Wang, a postdoctoral research associate at the Wyant College of Optical Sciences. "This allows us to measure all object surfaces in a scene with high accuracy, despite their huge variations in surface reflectivity."
Currently, the approach has been demonstrated in a tabletop laboratory setting, but Willomitzer said the technology is scalable to whatever the application demands.
"Scalability is an important requirement for the wide spectrum of 3D imaging applications," he said, "from measuring small, shiny blood vessels during surgery to digitizing entire rooms or buildings."
Co-authors on the paper include James Taylor, a doctoral student at the Wyant College of Optical Sciences; Oliver Cossairt, adjunct associate professor of electrical and computer engineering at Northwestern University; and Ashok Veeraraghavan, professor of electrical and computer engineering and computer science at Rice University.