AI Overrates Human Intelligence, HSE Economists Say

National Research University Higher School of Economics

Scientists at HSE University have found that current AI models, including ChatGPT and Claude, tend to overestimate the rationality of their human opponents—whether first-year undergraduate students or experienced scientists—in strategic thinking games, such as the Keynesian beauty contest. While these models attempt to predict human behaviour, they often end up playing 'too smart' and losing because they assume a higher level of logic in people than is actually present. The study has been published in the Journal of Economic Behavior & Organization.

In the 1930s, British economist John Maynard Keynes developed the theoretical concept of a metaphorical beauty contest. A classic example involves newspaper readers being asked to select the six most attractive faces from a set of 100 photos. The prize is awarded to the participant whose choices are closest to the most popular selection—that is, the average of everyone else's picks. Typically, people tend to choose the photos they personally find most attractive. However, they often lose, because the actual task is to predict which faces the majority of respondents will consider attractive. A rational participant, therefore, should base their choices on other people's perceptions of beauty. Such experiments test the ability to reason across multiple levels: how others think, how rational they are, and how deeply they are likely to anticipate others' reasoning.

Dmitry Dagaev, Head of the Laboratory of Sports Studies at the Faculty of Economic Sciences , together with colleagues Sofia Paklina and Petr Parshakov from HSE University–Perm and Iuliia Alekseenko from the University of Lausanne, Switzerland, set out to investigate how five of the most popular AI models—including ChatGPT-4o and Claude-Sonnet-4—would perform in such an experiment. The chatbots were instructed to play Guess the Number, one of the most well-known variations of the Keynesian beauty contest.

According to the rules, all participants simultaneously and independently choose a number between 0 and 100. The winner is the one whose number is closest to half (or two-thirds, depending on the experiment) of the average of all participants' choices. In this contest, more experienced players attempt to anticipate the behaviour of others in order to select the optimal number. To investigate how a large language model (LLM) would perform in the game, the authors replicated the results of 16 classic Guess the Number experiments previously conducted with human participants by other researchers. For each round, the LLMs were given a prompt explaining the rules of the game and a description of their opponents—ranging from first-year economics undergraduates and academic conference participants to individuals with analytical or intuitive thinking, as well as those experiencing emotions such as anger or sadness. The LLM was then asked to choose a number and explain its reasoning.

The study found that LLMs adjusted their choices based on the social, professional, and age characteristics of their opponents, as well as the latter's knowledge of game theory and cognitive abilities. For example, when playing against participants of game theory conferences, the LLM tended to choose a number close to 0, reflecting the choices that typically win in such a setting. In contrast, when playing against first-year undergraduates, the LLM expected less experienced players and selected a significantly higher number.

The authors found that LLMs are able to adapt effectively to opponents with varying levels of sophistication, and their responses also displayed elements of strategic thinking. However, the LLMs were unable to identify a dominant strategy in a two-player game.

The Keynesian beauty contest has long been used to explain price fluctuations in financial markets: brokers do not base their decisions on what they personally would buy, but on how they expect other market participants to value a stock. The same principle applies here—success depends on the ability to anticipate the preferences of others.

'We are now at a stage where AI models are beginning to replace humans in many operations, enabling greater economic efficiency in business processes. However, in decision-making tasks, it is often important to ensure that LLMs behave in a human-like manner. As a result, there is a growing number of contexts in which AI behaviour is compared with human behaviour. This area of research is expected to develop rapidly in the near future,' Dagaev emphasised.

The study was conducted with support from HSE University's Basic Research Programme.

/Public Release. This material from the originating organization/author(s) might be of the point-in-time nature, and edited for clarity, style and length. Mirage.News does not take institutional positions or sides, and all views, positions, and conclusions expressed herein are solely those of the author(s).View in full here.