AI Systems Learn to Recover and Optimize Solutions

If you use consumer AI systems, you have likely experienced something like AI "brain fog": You are well into a conversation when suddenly the AI seems to lose track of the different ideas you have been talking about and how they fit together. That same problem crops up when programmers build "agents," systems that use large language models (LLMs) and symbolic computer programs to solve certain multistep tasks. Once such a program reaches a certain size, meaning the code makes decisions involving many steps, the LLM almost inevitably makes errors.

A team of scientists led by alumni-founded company Asari AI along with researchers from Caltech and MIT, has developed a new framework that enables programmers to improve their agent to make fewer mistakes and to quickly recover from errors without rewriting the core logic of their agent. The framework, called EnCompass, allows programmers to backtrack easily when an error occurs and simply test different search strategies , methods of searching along different possible pathways of execution through their code. The researchers presented EnCompass at the Thirty-Ninth Annual Conference on Neural Information Processing Systems (NeurIPS) in San Diego last month.

"If we want to develop AI systems that can tackle the hardest problems facing society, from health care to government to engineering design, we need better tools to organize the core logic of our AI systems," says Yisong Yue , professor of computing and mathematical sciences at Caltech. "The EnCompass framework is an important step in empowering AI agent programmers to maintain clean code organization as the agent logic becomes more complex."

Importantly, AI agents use large neural networks with a huge number of potential pathways through their workflows. At each reasoning step, additional uncertainty creeps in, allowing small errors to accumulate.

"A mistake made during an early reasoning step might not be immediately disastrous, but such mistakes compound and ultimately lead to failure," says Stephan Zheng (PhD '18), the founder and CEO of Asari AI and Yue's former graduate student.

To ensure the program finds valid solutions, programmers may write code that uses complicated loops, including rules at decision points that dictate when to backtrack and make sure that it is not making errors. But as agents become complex, this "hard coding" becomes too cumbersome.

"The result is a monolithic system that resists iteration and experimentation, which is exactly what you don't want when building agents with LLMs," Zheng says.

EnCompass allows agents to recover quickly from errors while remaining nimble. Rather than hard-coding logic, programmers mark decision points that they might want to revisit as "branchpoints," and places that can help them evaluate the utility of a certain path as "scores." By introducing these simple annotations, EnCompass disentangles an agent's core logic from its search strategy, allowing programmers to easily experiment with different search strategies without any major rewriting of the code.

Imagine a programmer wants to translate a repository of code from one programming language, Java, to another, such as Python. They might use an LLM to translate each Java function, or block of code, into Python, and then to test that the translation was successful.

To test the agent, the programmer would want to look through various paths that the program might take to arrive at its answers and then evaluate the outcomes. The programmer might find that on some runs, the program takes a path where the translation is wrong at point A, while on other runs, it follows a branching path in which the function is incorrect at point C because of the erroneous translation of another function at point B.

"In some rare cases, there's one or maybe two paths through the program where the LLM made the translation correctly at every single step and was successful. Those are the paths that the programmer ideally wants the program to take," says Zhening Li, lead author of the paper about EnCompass and an MIT graduate student who interned at Asari AI under Zheng.

To find the best pathways, the programmer writes code that follows one or more search strategies. The sum of all the pathways could be represented as a branching tree with outputs at nodes along the stepwise branches. The simplest search strategy would be to use an LLM to find the "global best-of-N" solution, which involves running the agent multiple times and choosing the best outcome. Visually, each run would look like a straight path from the top of the tree to the bottom. A more nuanced search strategy would be to look for the "local best-of-N" solution where, at each step in the code, the agent would attempt the step multiple times and choose the best result before moving on and repeating the process at each subsequent step. The chosen path might trace a zigzag on its way down the branching tree.

"But implementing these search strategies typically involves baking them into the core logic of the agent," Li says. That means that each time a programmer would like to try a different search strategy, they must go back and rewrite large sections. Furthermore, more sophisticated strategies are cumbersome to implement, and they reduce a code's readability.

"EnCompass lets you try out different search strategies quickly and easily," Li says. "That means you can efficiently find the best performing strategy."

Overall, the researchers found that using EnCompass can decrease the amount of code needed to implement search by 80 percent. In the specific case of using an agent to translate a repository from Java to Python, EnCompass evaluated the optimal search strategy and concluded that a strategy called a beam search that involves additional branching from nodes required 75 lines of code compared to 423 lines needed to accomplish the same search without the new framework. At the same time, the beam search increased the agent's accuracy from 15 percent to 40 percent on several different repositories.

A paper about the work titled "EnCompass: Enhancing Agent Programming with Search Over Program Execution Paths" appears in the NeurIPS proceedings. MIT's Armando Solar-Lezama is Li's graduate advisor and an additional author of the paper. The work was funded by Asari AI, which has a detailed blog post about EnCompass on its site. Yue is chief scientist at Asari AI and has a financial interest in the company.

/Public Release. This material from the originating organization/author(s) might be of the point-in-time nature, and edited for clarity, style and length. Mirage.News does not take institutional positions or sides, and all views, positions, and conclusions expressed herein are solely those of the author(s).View in full here.