Using AI To 'see' What We See

Ian Charest

Credit: Courtesy

When we look at the world, our brain doesn't just recognize objects such as "a dog" or "a car," it also understands the broader meaning, like what's happening, where it's happening, and how everything fits together. But for years, scientists didn't have a good way to measure that rich, complex understanding.

Now, in a new study published today in Nature Machine Intelligence, Université de Montréal associate professor of psychology Ian Charest explains how he and colleagues at the University of Minnesota and Germany's University of Osnabrück and Frei Universität Berlin used large language models (LLMs) to figure it out.

"By feeding natural scene descriptions into these LLMs - the same kind of AI behind tools like ChatGPT - we created a kind of 'language-based fingerprint' of what a scene means," said Charest, holder of UdeM's Courtois Chair in Fundamental Neurosciences and member of Mila - Quebec AI Institute.

"Remarkably," he said, "these fingerprints closely matched brain activity patterns recorded while people looked at the same scenes in an MRI scanner," things such as a group of children playing or a big city skyline.

"For example," said Charest, using LLMs we can decode in a sentence the visual scene that the person just perceived. We can also predict precisely how the brain would respond to scenes of foods or places or scenes including human faces, using the representations encoded in the LLM."

The researchers went even further: they trained artificial neural networks to take in images and predict these LLM fingerprints-and found that these networks did a better job at matching brain responses than many of the most advanced AI vision models available today.

And this, despite the fact that those available models were trained on far less data.

The conception of these "artificial neural networks" was supported by machine-learning professor Tim Kietzmann and his team at the University of Osnabrück. The study's first author is professor Adrien Doerig of Freie Universität Berlin.

"What we've learned suggests that the human brain may represent complex visual scenes in a way that's surprisingly similar to how modern language models understand text," said Charest, who is continuing his research into the subject.

"Our study," he continued, "opens up new possibilities for decoding thoughts, improving brain-computer interfaces, and building smarter AI systems that 'see' more like we humans do. We could someday imagine better computational models of vision supporting better decisions for self-driving cars.

"These new technologies could also one day help develop visual prostheses for people with significant visual impairments. But ultimately, this is a step forward in understanding how the human brain understands meaning from the visual world."

About this study

"High-level visual representations in the human brain are aligned with large language models," by Adrien Doerig et al., was published Aug. 7, 2025 in Nature Machine Intelligence.

/Public Release. This material from the originating organization/author(s) might be of the point-in-time nature, and edited for clarity, style and length. Mirage.News does not take institutional positions or sides, and all views, positions, and conclusions expressed herein are solely those of the author(s).View in full here.

About this study

You might also like