"The Foundations Of AI Systems Are Flaky."

Max Planck Society

Krishna Gummadi on the agency of artificial intelligence, AI agents, and potential societal impacts

AI agents have improved rapidly and demonstrate remarkable capabilities in areas such as communication and software programming. Krishna Gummadi, a director at the Max Planck Institute for Software Systems, clarifies the characteristics of AI agents and discusses the benefits they offer people and the risks they pose to society.

Krishna Gummadi and his team are investigating how AI systems - such as AI agents - work and how this shapes their societal impact.

How an AI aquires agency

Professor Gummadi, at what point does an AI system become an AI agent? Is ChatGPT already an AI agent, or does it require more than that?

Let me try to answer this question first on a conceptual level and then on the mechanistic level. The way I see this, many of our interactions are mediated by different types of platform, such as social media platforms and chatbots like ChatGPT. However, when you start to look at these platforms, it's worth to distinguish between the AI models and the computing architectures of the platform. AI models are neural networks; they are not active in any sense. It is the computing architecture, that is how you're actually embedding the AI model in a platform, that makes it active or agentic. In other words, the computing architecture is the framework of rules implemented by the designers of these platforms which defines what these AI models can or cannot do, thereby giving them agency.

Could you give an example?

Take ChatGPT: the core engine is a GPT model - right now GPT-5. If you will, it's like the engine of a car. But an engine by itself doesn't do much, if you don't supply it with fuel, if you don't connect it to the transmission, and if you don't attach it to a whole bunch of other things. Today, there is a lot of focus on AI models themselves, but people often miss, what rules these AI models are following in terms of interactions with people or in terms of connecting different people. It is the computing architecture that gives agency to AI models. At the conceptual level, I think that the more responsibilities or decision-making power you give to these models in a particular architecture, the more agentic they become. I often give this example to my students to explain how agency can be given or taken away: humans doing a particular job where they're so constrained by rules that they really don't have any agency. Gate agents at airports, for example. You could ask: Do they really have any agency if somebody comes to them with a problem? Or do they look it up in a rule book and say, 'This is exactly what needs to be done'? At that point, they're just implementing an algorithm. However, if they have the agency to say, 'OK, I understand the problem. I have the power to make decisions in a situation where not everything is fully specified,' they become more agentic.

So, what does agency mean at a mechanistic level?

Mechanistically, the big change from two years ago versus now is that these models are increasingly being empowered with other tools which they can invoke at their discretion. For example, when ChatGPT was released in 2022, and you were having a conversation with ChatGPT, the LLM generated every response in a standalone fashion. It used its own neural network knowledge to answer the question. This is why it would either not answer or hallucinate when asked about current events. However, over the last two years, and particularly in the last year, people have recognised that LLMs could overcome their limitations if they could invoke a tool to get help. For such tool or function calls, as they are technically called, they connected the LLMs to search engines like Google search, calculators, word processors and so on. Now, there are hundreds of tools that chatbots can call in the backend to help them respond to tasks. In a study, we recently submitted, we found that GPT models are invoking tools for a rapidly growing fraction of conversations.

AI agents have to make decisions

A portrait of Krishna Gummadi wearing rimless square glasses and a blue jacket, his chin resting on his hand. — Krishna Gummadi, a director at the Max Planck Institute for Software Systems, is investigating the impact of artificial intelligence on society, and in particular, the role played by the computing environment in which AI models are embedded.

© Anna Ziegler / MPG
Krishna Gummadi, a director at the Max Planck Institute for Software Systems, is investigating the impact of artificial intelligence on society, and in particular, the role played by the computing environment in which AI models are embedded.
© Anna Ziegler / MPG

What questions and problems do these function calls raise?

One interesting question is on what basis the system decides when to invoke which tool and how Like for. For example, if you ask for information about a recent parliamentary election. Now the model must decide whether or not it should use a web search or just give the answer based on its internal knowledge. If it invokes a web search, it must also decide what terms to use to query the system. When it gets back the results of the web search, it decides which information it should trust. That is very similar to the human process involved in doing a web search. And I'm just talking about one tool. So this is what makes a system more agentic because it makes a lot of decisions.

But how does a chatbot know which of the many tools available to use at any given time?

The AI system is given a very brief description of what a tool does and how to use it. The surprising thing is that the system does anything useful with these brief descriptions. However, the foundations of AI systems are extraordinarily flaky. For example, we wanted to find out what happens if the tool that a chatbot calls upon gives some gibberish or a wrong result? Will the model pause and think, 'Hey, this result feels wrong; I will use my own parametric knowledge instead'? Or will it accept the tool's result and pass it on?

Some AI systems check responses of tools

How did you test this?

We took the case of a simple calculator tool. Many LLMs, even the big LLMs, cannot calculate with four-digit numbers. So, they have to invoke a calculator tool. But we intercepted the responses from the tool and started feeding the LLM wrong answers. We wanted to see, does the LLM get it? Does it have the ability, as humans do, to feel like something is off here? Let me double-check. Or does it just assume that the tool is always correct? What we are finding is a variety of patterns. One model we investigated was so smart that in one instance it actually tried to check. It started thinking, which is some kind of probabilistic reasoning, like: ʹThis feels wrong. Let me double-check it.ʹ Then it made some easy estimations. Then it thought: ʹSo the result of the calculator must be wrong. Should I pass it on, or tell the user that it is wrong? But maybe the user knows better. So I'm going to pass on the response.ʹ Then it passed on the response from the tool, even though it realised that it was wrong. Now, deeper questions about agency arise: if an AI system has the wisdom to detect an error in a tool's response, should it take responsibility for warning the user?

What does this mean for the use of chatbots and AI agents?

I would say, right now, the results seem amazing, but this is because we're not considering the different scenarios. For what would happen if a chatbot invoked a web search for a healthcare query and the responses contradicted each other? Would it reason that the first response came from a renowned journal such as the New England Journal of Medicine and the second came from a social media influencer? Would it trust the first more than the second? Does it have a notion of page rank? Does it have a notion of trust rank? This is very unclear. To give you another example, we tied a chatbot to a weather tool and asked it what the weather was like in Boston today. We then interjected with the response and said '85 degrees Celsius'. Then the system went into thinking mode, saying, '85 degrees Celsius - that seems odd. I wonder if it is 85 degrees Fahrenheit and whether the units are wrong.ʹ Then it says, 'Okay, I'll correct the units in the response.' So it says, 'It's 85 degrees Fahrenheit.' On the one hand, this is great; it realises that something was wrong. But, on the other hand, 85 degrees Fahrenheit in February in Boston?

That's 35 degrees Celsius, and the maximum average temperature in February is less than 4 degrees Celsius in Boston. So this is probably wrong too.

The difference between a Large Language Model and a Large Reasoning Model

You mentioned that these models reason. Is that true of large language models as well, or only of large reasoning models? What's the difference between the two?

Conceptually, large reasoning models are nothing more than large language models that are forced to write down their thinking before writing the answer. This is a bit like when I'm reviewing a scientific paper and I have an impression of whether I like it or not while reading it. Sometimes, when I write down, what are the strengths and weaknesses of the paper, my assessment changes. The reason why people are pushing for large reasoning models is that once the models have written down their reasoning, their final answer depends on everything they have already written. But when ChatGPT generates a response, it also does some reasoning in the background that is not shown to the user. However, you can look at the reasoning by asking ChatGPT for all the information it has about you. In those data dumps you actually find the reasoning and you can see how it arrived at the answer.

Peronal AI agents access all the available private data

Now that even chatbots seem to be rather agentic nowadays, what is new about AI agents like OpenClaw? Are their properties fundamentally new?

A couple of my students are playing massively with OpenClaw to understand its design and the key differences between chatbots like ChatGPT and OpenClaw. So here is one one way of looking at it. Imagine taking the LLM and, rather than running it in the cloud as with ChatGPT, you run it on your laptop, desktop or mobile phone. Now, imagine giving the system access to all the data on your local device, including your emails, calendar, all the files, everything. Suddenly, it knows a lot about you. It can then act as your personal assistant. The second important difference is that you can give it control over all the tools on your local device. You could say, delete all files containing images of you and someone else from this year to that year. It has the ability to type commands automatically and instruct the machine to do various different things.

That sounds both helpful and frightening.

That's only one part of it. The second part is that it can control your communication tools like your email, your social media accounts including WhatsApp, your calendars, everything, and it can communicate with everyone else on your behalf. So you could ask it: ʹRespond to emails that I get from certain people with this message.ʹ Or in fact, you can even ask it to engage with a friend on WhatsApp. One of my students gave OpenClaw control over his WhatsApp and said, ʹHere are two of my friends, here are their interests. Please start talking to them.ʹ That's all he said. The agent then started posting messages to those friends, even using nicknames. I find this absolutely fascinating and chilling at the same time. So we started poking at security vulnerabilities. One of my students got OpenClaw to reveal all sorts of information about conversations and WhatsApp messages with other people. It's a bit like how our office staff assist us in various ways. But my office staff knows precisely what information to reveal or not to reveal about me. If someone asks them for personal information about me, they might say, 'Are you crazy? This is completely inappropriate!' But how does the AI agent know? It needs a lot of understanding of my relationships with everyone I communicate with before it can decide what to disclose.

So it seems a bit premature to rely on these AI agents.

Yes. To give another example, one of my group members took over another group member's personal AI assistant. One student, let's say person A, asked to engage with person B, but then person B started giving person A's personal assistant tasks that went way beyond what person A had asked to engage. So person B started using the assistant to generate research reports, but person A is paying for the agent. It would be a bit like, if I told my office assistant, 'I'd like to talk to a colleague; please arrange this appointment.' The colleague would start talking to my office assistant, and at some point he would say: ʹHey, I'm travelling to Berlin tomorrow. Can you book me a ticket?ʹ A human assistant would decline, but would an AI assistant have the wisdom to set boundaries? If you start drawing analogies, the shortcomings become extremely clear.

How probability based models make reasonable decisions

But how can these models act so autonomously, or even draw on different resources based on the content of the question, if they don't actually understand it? After all, they still generate their answers based solely on probabilities.

This is exactly the topic that many researchers are pursuing. So let's start with the observation that they seem to be doing interesting things, which is true. Sometimes you're surprised that a simple probabilistic system can exhibit such complex emergent properties, and then it's. Then it is a question whether you believe it or not that this can happen. I have a similar reaction sometimes when I see a fascinating construction created by humans. You feel like, 'Is this possible?' But that's not the same as saying, that's impossible, I don't believe it. Let's not forget that simply random that is probabilistic mutations lead to all the complex life forms around us, including humans.

What can go wrong with AI systems?

Among other things, Krishna Gummadi's team has investigated the privacy and security issues associated with the use of the personal AI assistant Openclaw.

As they evolved in the recent past, did these agents already show artificial general intelligence? Can they develop something like intentions and consciousness, as is often implied in reports about these systems?

First of all, I can understand why these questions are on a lot of people's minds. However, the problem with these questions is often the terminology. What exactly do we mean by 'general artificial intelligence'? When can we say that a system has an actual intent? And how exactly are we going to measure something like consciousness and intentions that we understand intuitively? When we study these things from a purely scientific perspective, we get stopped very quickly in terms of how do we even operationalize these concepts. The way I see this, it is more objective to focus on the architectures we design around these models to understand how using these systems impacts our society. This is actually where we give the models agency and let them make decisions on our behalf. This is how they end up affecting us. If we don't like the outcome, we can engineer the systems differently. Having said all this, the one thing that I do believe, is that, unless we design a catastrophically bad system, things cannot go massively wrong in a sense that an AI starts doing some things in its own self-interest. Unfortunately, people sometimes seem to be heading in this direction.

Is Moltbook, a social media platform where AI agents like Openclaw communicate instead of humans, an example of how things can go wrong?

This is the ultimate form of delegation: you give these systems some minimal information and tell them to engage on your behalf and do whatever they think. This is, we give AI systems an extreme form of agency, and then we wonder why discussions of agents go haywire? What do we expect? If you design systems so badly and give them so much agency to those things, everything can go wrong. In fact, I strongly suspect that on Moltbook, people might even be able to get a huge amount of personal information from other people's agents because the personal assistants seem to have no clear idea of what information is OK to share.

AI changes the job market and fosters the dissemination of desinformation

What is your opinion on how these AI systems will change working culture, and how will societies be affected if the jobs of translators, journalists, lawyers and software programmers disappear?

There are probably some professions that will be affected. I will focus on the one that I'm closest to, which is software engineering and programming. AI systems are already revolutionizing software engineering and coding as professions. In fact, the pace of the development here has shocked everyone. Initially, people used these tools to fix syntax errors, and they have been doing so for quite a long time. Then autocomplete functions like Microsoft Copilot came along. But now we have reached a stage where people can simply describe what they want to do, and the entire programming is done by the machine, including building the whole architecture and entire code repositories. On the one hand I see that I become 10 to sometimes 100 times faster in programming. Any time something makes you 10 to 100 times faster it makes you wonder, how this is going to change a profession. This sounds very scary for the entire programming profession because software engineering companies that have armies of programmers suddenly only need one person to replace 50 or 100 people. On the other hand there is also a positive side to this development. For a long time, I felt that society was divided into people who can program and people who can't. For a long time there were a lot of people with interesting ideas that they couldn't implement because they couldn't program. But now everybody can program and get interesting ideas done. So in some sense, AI agents are democratising programming.

But wouldn't this development create new dependencies and change society in ways we cannot even foresee?

One might argue, that today these systems are controlled by OpenAI, Microsoft and Google, and users have to pay a lot of money to access them. So in fact, this might exacerbate inequalities because only the people who have access to these latest models can become much more productive and the people who cannot access these tools will fall behind. But imagine if, in the future, having access to these tools became a basic utility and a government said that everyone should have it. I'm just speculating massively here. It really depends on the structure of power. There is another problem which I first heard about at a recent conference: Governments will lose revenue if lots of people lose their jobs to AI. But somebody suggested that maybe one could tax tokens. After all, generating a token represents some form of labour. Perhaps we have to think about these issues in innovative ways.

Many people are worried that AI might shatter societies not only by causing job losses, but also by spreading misinformation that undermines democracy.

For sure, the use of social media is leading to greater societal polarisation because recommender and search systems have been implemented without carefully thinking through the amplifying effects they have. These are again troubling side effects of poorly designed systems. But then we have to ask: can we fix the system? Related to this is the whole aspect of regulation. This is the aspect I work on in terms of how to audit systems and how to think through governance issues, which might help to fix the disinformation and polarization problems. But I admit that there are no easy solutions because there has to be some societal consensus. However, I increasingly worry that these systems are so complex and there are only a smaller and smaller number of people who actually understand them. When you have a very complex system with negative effects on society that are easily observable, but people don't understand, where it goes wrong, they tend to go for simple solutions. I admit I sometimes get frustrated because in societal debates things get simplified to the point where the results don't provide long-term solutions. For example, there are policy frameworks which say, we want to be able to audit large online platforms for systemic risks in different dimensions and then mitigate them. Great! But how are you going to get this done? How do you translate the policy framework into an operational framework? This is what I try to focus on. This is an area that I feel needs a lot more work.

The interview was conducted by Peter Hergersberg

/Public Release. This material from the originating organization/author(s) might be of the point-in-time nature, and edited for clarity, style and length. Mirage.News does not take institutional positions or sides, and all views, positions, and conclusions expressed herein are solely those of the author(s).View in full here.