GPT-5: Has AI Hit Plateau?

OpenAI claims that its new flagship model, GPT-5, marks "a significant step along the path to AGI" - that is, the artificial general intelligence that AI bosses and self-proclaimed experts often claim is around the corner.

Author

  • Michael Rovatsos

    Professor of Artificial Intelligence, University of Edinburgh

According to OpenAI's own definition, AGI would be "a highly autonomous system that outperforms humans at most economically valuable work". Setting aside whether this is something humanity should be striving for, OpenAI CEO Sam Altman's arguments for GPT-5 being a "significant step" in this direction sound remarkably unspectacular.

He claims GPT-5 is better at writing computer code than its predecessors. It is said to "hallucinate" a bit less, and is a bit better at following instructions - especially when they require following multiple steps and using other software. The model is also apparently safer and less "sycophantic", because it will not deceive the user or provide potentially harmful information just to please them.

Altman does say that "GPT-5 is the first time that it really feels like talking to an expert in any topic, like a PhD-level expert". Yet it still doesn't have a clue about whether anything it says is accurate, as you can see from its attempt below to draw a map of North America.

It also cannot learn from its own experience, or achieve more than 42% accuracy on a challenging benchmark like "Humanity's Last Exam", which contains hard questions on all kinds of scientific (and other) subject matter. This is slightly below the 44% that Grok 4, the model recently released by Elon Musk's xAI, is said to have achieved .

The main technical innovation behind GPT-5 seems to be the introduction of a "router". This decides which model of GPT to delegate to when asked a question, essentially asking itself how much effort to invest in computing its answers (then improving over time by learning from feedback about its previous choices).

The options for delegation include the previous leading models of GPT and also a new "deeper reasoning" model called GPT-5 Thinking. It's not clear what this new model actually is. OpenAI isn't saying it is underpinned by any new algorithms or trained on any new data (since all available data was pretty much being used already).

One might therefore speculate that this model is really just another way of controlling existing models with repeated queries and pushing them to work harder until it produces better results.

What LLMs are

It was back in 2017 when researchers at Google found out that a new type of AI architecture was capable of capturing tremendously complex patterns within long sequences of words that underpin the structure of human language.

By training these so-called large language models (LLMs) on large amounts of text, they could respond to prompts from a user by mapping a sequence of words to its most likely continuation in accordance with the patterns present in the dataset. This approach to mimicking human intelligence became better and better as LLMs were trained on larger and larger amounts of data - leading to systems like ChatGPT.

Ultimately, these models just encode a humongous table of stimuli and responses. A user prompt is the stimulus, and the model might just as well look it up in a table to determine the best response. Considering how simple this idea seems, it's astounding that LLMs have eclipsed the capabilities of many other AI systems - if not in terms of accuracy and reliability, certainly in terms of flexibility and usability.

The jury may still be out on whether these systems could ever be capable of true reasoning, or understanding the world in ways similar to ours, or keeping track of their experiences to refine their behaviour correctly - all arguably necessary ingredients of AGI.

In the meantime, an industry of AI software companies has sprung up that focuses on "taming" general purpose LLMs to be more reliable and predictable for specific use cases. Having studied how to write the most effective prompts, their software might prompt a model multiple times, or use numerous LLMs, adjusting the instructions until it gets the desired result. In some cases, they might "fine-tune" an LLM with small-scale add-ons to make them more effective.

OpenAI's new router is in the same vein, except it's built into GPT-5. If this move succeeds, the engineers of companies further down the AI supply chain will be needed less and less. GPT-5 would also be cheaper to users than its LLM competitors because it would be more useful without these embellishments.

At the same time, this may well be an admission that we have reached a point where LLMs cannot be improved much further to deliver on the promise of AGI. If so, it will vindicate those scientists and industry experts who have been arguing for a while that it won't be possible to overcome the current limitations in AI without moving beyond LLM architectures.

Old wine into new models?

OpenAI's new emphasis on routing also harks back to the "meta reasoning" that gained prominence in AI in the 1990s, based on the idea of "reasoning about reasoning". Imagine, for example, you were trying to calculate an optimal travel route on a complex map. Heading off in the right direction is easy, but every time you consider another 100 alternatives for the remainder of the route, you will likely only get an improvement of 5% on your previous best option. At every point of the journey, the question is how much more thinking it's worth doing.

This kind of reasoning is important for dealing with complex tasks by breaking them down into smaller problems that can be solved with more specialised components. This was the predominant paradigm in AI until the focus shifted to general-purpose LLMs.

It is possible that the release of GPT-5 marks a shift in the evolution of AI which, even if it is not a return to this approach, might usher in the end of creating ever more complicated models whose thought processes are impossible for anyone to understand.

Whether that could put us on a path toward AGI is hard to say. But it might create an opportunity to move towards creating AIs we can control using rigorous engineering methods. And it might help us remember that the original vision of AI was not only to replicate human intelligence, but also to better understand it.

The Conversation

Michael Rovatsos has received funding from the Cisco University Research Program Fund that supports research involving Large Language Models (LLMs), in-kind contributions in the form of cloud credits from Google to use LLMs they provide, and public funding from UK Research and Innovation and the European Commission. He has also provided consultancy to UK government departments. He is a member of the Scottish Government's Tech Council and affiliated with the Alan Turing Institute.

/Courtesy of The Conversation. This material from the originating organization/author(s) might be of the point-in-time nature, and edited for clarity, style and length. Mirage.News does not take institutional positions or sides, and all views, positions, and conclusions expressed herein are solely those of the author(s).