Osaka, Japan – A single photograph contains a wealth of information, but determining 3D spatial relationships from a 2D scene is no simple task. Many attempts at developing a method to reconstruct both depth and sharp color images from a single snapshot have been made, but many struggle to deliver accurate and reliable output.
In an article recently published in IEEE Transactions on Computational Imaging, researchers from The University of Osaka developed a new approach for depth from defocus, a technique that estimates distances by analyzing blur in an image. By combining a specially designed camera with diffusion-model-based artificial intelligence (AI), the team was able to accurately estimate depth from a single image and reduce the number of errors produced by existing methods.
Conventional methods for calculating depth often require multiple cameras or images captured under different conditions. In contrast, depth-from-defocus techniques recover depth from a single photograph by exploiting the fact that objects at different distances vary in how much they appear to be blurred. However, accurately interpreting these blur patterns is difficult.
"Traditional reconstruction methods tend to struggle in low-texture regions," says lead author Hodaka Kawachi. "However, with AI techniques, we now have the potential to stabilize the reconstruction."
However, AI methods do have their own drawbacks. Modern deep-learning approaches can become unreliable when imaging conditions differ from the data used during training. In some cases, AI systems have been known to generate plausible-looking but incorrect structures, a phenomenon known as hallucination.
"Our goal was to combine the strengths of modern diffusion-model-based AI with the reliability of physics-based imaging," explains senior author Tomoya Nakamura. "By ensuring that the reconstruction stays consistent with the observed image, we can suppress many of the hallucinations that appear in other methods."
To test their approach, the researchers built a prototype camera equipped with a specially designed coded aperture and evaluated it using both simulated and real-world scenes. Across a wide range of conditions, the new method consistently produced accurate depth maps and high-quality images, while competing approaches experienced declines in performance.
"The reconstructed images preserved object shapes and fine texture details while remaining faithful to the original measurements," remarks Kawachi. "In contrast, several existing methods produced artifacts or inaccurate depth estimates under the same conditions."
The team believes that the work represents an important step toward practical computational imaging systems that can recover rich spatial information using simple hardware. By combining coded-aperture optics and a novel reconstruction algorithm, the team may unlock new ways of seeing the 3D world.