Engineers at the University of California San Diego have developed a new way to train artificial intelligence systems to solve complex problems more reliably, particularly those that require interpreting both text and images. In widely used tests to evaluate mathematical reasoning, AI models trained with this method outperformed others in solving math word problems containing visual elements like charts and diagrams.
The work could give rise to more capable AI tutors that walk students through solutions step by step while checking their logic along the way. It could also provide more reliable automated analysis of business reports, complex charts or scientific papers — and do so with reduced risk of fabricated information or incorrect interpretations.
Researchers presented the new training method at the NeurIPS Conference that was held in December 2025.
The method has two main features: it evaluates how AI models reason through problems instead of just checking whether their final answers are correct, and it evaluates the quality of training data so that higher-quality examples carry more weight during learning.
Traditionally, AI models are rewarded only for getting the right answer. "They are graded much like students taking a multiple-choice test," said study senior author Pengtao Xie, professor in the Department of Electrical and Computer Engineering at the UC San Diego Jacobs School of Engineering. "If they select the right answer, they still receive full credit, even if they guessed." In contrast, his team's approach grades the model for showing its work. "It gets rewarded for thinking logically, step by step, rather than just guessing correctly," Xie said. "If it gets the right answer using the wrong logic, it doesn't get rewarded."
This shift from asking, "Did the AI get it right?" to "Did the AI think it through?" could provide the safety net needed for high-stakes applications like medical diagnosis, financial analysis and engineering, he added.
The next challenge was making this style of training work for AI models that must reason with both language and images. So far, it has primarily worked well for text-only AI models. That's because training datasets vary widely in quality. Some contain rich, high-quality information, while others contain noise as well as overly simplistic or irrelevant information. Feeding all of these data equally to an AI model can slow its learning and even make performance worse. "It's like trying to learn calculus when half of your reading list consists of kindergarten coloring books," Xie explained.
To address this challenge, Xie's team designed its system to act as a smart curator of training data. Instead of treating every dataset as equally valuable, it learns to assign different levels of importance to each one. It downplays lower-quality data and focuses on challenging, high-quality examples. The system also evaluates itself on a separate set of problems and uses that feedback to refine how it prioritizes training data.
"Our system doesn't just learn from everything," Xie said. "It learns what is worth learning from. It emphasizes quality over quantity."
When tested across multiple benchmarks in visual and mathematical reasoning, the team's system consistently outperformed other methods. An AI model trained with this system achieved a top public score of 85.2% on MathVista, which is a widely used test of visual math reasoning that incorporates word problems with charts and diagrams. The result was verified by MathVista's organizers.
Xie added that this approach democratizes AI by enabling smaller models — that can be run on a personal computer — to rival or even outperform larger proprietary models like Gemini or GPT on difficult math benchmarks. "You don't need a trillion-dollar computing cluster to get state-of-the-art reasoning," he said.
The team is now refining the system by evaluating the quality of individual questions rather than entire datasets. They are also making the training process faster and less computationally demanding.
Full study: " DreamPRM: Domain-Reweighted Process Reward Model for Multimodal Reasoning. " Study authors include Qi Cao, Ruiyi Wang, Ruiyi Zhang and Sai Ashish Somayajula, all at UC San Diego.
This work was supported by the National Science Foundation (IIS2405974 and IIS2339216) and the National Institutes of Health (R35GM157217).