Dopamine Insights May Boost AI Adaptability

Champalimaud Centre for the Unknown

What if your brain had a built-in map – not of places, but of possible futures? Researchers at the Champalimaud Foundation (CF) blend neuroscience and artificial intelligence (AI) to reveal that populations of dopamine neurons in the brain don't just track whether rewards are coming – they encode maps of when those rewards might arrive and how big they might be.

These maps adapt to context and may help explain how we weigh risks, and why some of us act on impulse while others hold back. Strikingly, this biological mechanism mirrors recent advances in AI, and could inspire new ways for machines to predict, evaluate and adapt to uncertain environments more like we do.

The Problem with Averages

Imagine you're deciding whether to wait in line for your favourite meal at a busy restaurant or grab a quick snack at the nearest café. Your brain weighs not just how good the meal might be, but also how long it will take to get it.

For decades, scientists have studied how the brain makes such decisions by building computational models based on "reinforcement learning" (RL) – a framework in which agents learn by trial and error, guided by rewards and punishments. A central player in this process is the dopamine system – a network of neurons that release the chemical dopamine to signal when things turn out better or worse than expected. Traditional RL models, however, simplify this process: rather than representing the full range of possible delayed outcomes, they collapse future rewards into a single expected value – an average.

These models tell you, on balance, what to expect – but not when or how much. That's like judging the value of a meal without knowing the wait time or portion size.

In a study published back-to-back in Nature alongside complementary work by researchers at Harvard and the University of Geneva – the result of a collaborative and coordinated effort – scientists from the Learning and Natural Intelligence Labs at the Champalimaud Foundation challenge this view.

Their work reveals that the brain doesn't rely on a single prediction about future rewards. Instead, the population of diverse dopamine neurons encode a map of possible outcomes across time and magnitude – a rich, probabilistic tapestry that can guide adaptive behaviour in a constantly changing world. This new biological insight aligns with recent advances in AI – particularly, algorithms that are helping machines learn from reward distributions rather than averages, with far-reaching implications for autonomous decision-making.

"This story began around six years ago", says Margarida Sousa, PhD student and first author of the study. "I saw a talk by Matthew Botvinick from Google DeepMind, and it really changed the way I thought about RL. He was part of the team that introduced the idea of distributional RL to neuroscience, where the system doesn't just learn a single estimate of future reward, but captures a spectrum of possible outcomes and their likelihoods".

As Joe Paton, senior author and Principal Investigator of the Learning Lab, put it, "These results were really exciting because they suggested a relatively simple mechanism by which the brain might ascertain risk, one with all sorts of implications for normal and pathological behaviour alike – and that has also been shown to greatly improve the performance of AI algorithms on complex tasks".

"However, we began to wonder whether dopamine neurons might be reporting a much richer set of prediction errors than even the teams at DeepMind and Harvard had described", says Sousa. "What if different dopamine neurons were sensitive to distinct combinations of possible future reward features – for example, not just their magnitude, but also their timing? If that were the case, the population as a whole could offer a much richer picture – representing the full distribution of possible reward magnitudes and their timing".

The team developed a new computational theory to describe how such information could be learned and computed from experience. This approach echoes how some AI systems today, particularly in RL, are being trained to handle uncertainty and risk using distributional learning strategies.

Sniff, Wait, Reward

To test this idea, the team designed a simple yet revealing behavioural task. Mice were presented with odour cues, each predicting rewards of particular sizes or at different delays. Crucially, this setup allowed the researchers to observe how dopamine neurons responded to different combinations of reward magnitude and time.

"Previous studies usually just averaged the activity across neurons and looked at that average", says Sousa. "But we wanted to capture the full diversity across the population – to see how individual neurons might specialise and contribute to a broader, collective representation".

Using a combination of genetic labelling and advanced decoding techniques, they analysed recordings from dozens of dopamine neurons. What they found was striking: some neurons were more "impatient", placing greater value on immediate rewards, while others were more sensitive to delayed ones. Separately, some neurons were more "optimistic", responding more to unexpectedly large rewards and expecting better-than-average outcomes. Others were more "pessimistic", reacting more strongly to disappointments and favouring more cautious estimates of future reward.

"When we looked at the population as a whole, it became clear that these neurons were encoding a probabilistic map", says Paton. "Not just whether a reward was likely, but a coordinate system of when it might arrive and how big it might be". In effect, the brain was computing a reward distribution, a core principle of modern AI systems.

Advisors in Your Head

The team showed that this population code could predict the animals' anticipatory behaviour. They also found that the neurons' tuning adapted to the environment. "For example", says Daniel McNamee, senior co-author and Principal Investigator of the Natural Intelligence Lab, "if rewards were usually delayed, the neurons adjusted – changing how they value rewards further off in time and becoming more sensitive to them. This kind of flexibility is what we call 'efficient coding'".

The study also found that while all neurons could shift their tuning, their relative roles remained stable. The more optimistic neurons stayed optimistic; the pessimistic ones remained cautious. This preserved diversity, McNamee argues, might be key to allowing the brain to represent multiple possible futures simultaneously.

"It's like having a team of advisors with different risk profiles", he explains. "Some urge action – 'Take the reward now, it might not last' – while others advise patience – 'Wait, something better could be coming'. That spread of perspectives could be key to making good decisions in an unpredictable world". This parallels the use of ensembles in machine learning – a branch of AI where computers learn from data – in which multiple models, each with different perspectives or biases, work together as diverse predictors to improve performance under uncertainty.

From Feedback to Foresight

Crucially, this neural code, learned from experience, doesn't just help animals behave according to past circumstances. Rather, it enables them to plan for a different future. In computational simulations, the researchers showed that access to this dopamine-encoded map allowed artificial agents to make smarter decisions – especially in environments where rewards changed over time or depended on internal needs like hunger.

"One of the elegant aspects of this model is that it supports fast adaptation of risk-sensitive behaviour without needing a complicated model of the world", says McNamee. "Rather than simulating every possible outcome, the brain can consult this map and reweigh it based on context".

Sousa adds: "This might explain how animals can quickly switch strategies when their needs change. A hungry mouse will favour fast, small rewards. A sated one might be willing to wait for something better. The same underlying map can support both strategies, just with different weights".

Why You Grab the Cookie (or Don't)?

"For the first time, we're seeing this kind of multidimensional dopamine activity at the time of the cue – before the reward even arrives", remarks Paton. "This early activity is what allows the brain to construct a predictive map of future rewards. It reflects a structure and heterogeneity in dopamine neuron responses that hadn't been appreciated before. This neural code isn't just for learning from past rewards, but also for making inferences about the future – for adapting behaviour proactively based on what's likely to happen next".

The findings also open the door to new ways of thinking about impulsivity. If individuals vary in how their dopamine systems represent the future, could that help explain why some are more likely to grab the cookie now, while others wait – and why some struggle more deeply with impulsive behaviours? And if so, could this internal 'map' be reshaped – through therapy or environmental change – to encourage individuals to see their world differently and place greater trust in longer-term rewards?

Natural Intelligence, Artificial Futures

At a time when neuroscience and AI are increasingly learning from each other, the study's findings offer a compelling link. They suggest that the brain may already be using a strategy that computer scientists only recently developed to improve learning in machines.

"Incorporating neural-inspired architectures that encode not just a single prediction, but the full range of possible futures – including their timing, size, and likelihood – could be key to developing machines that reason more like humans", continues Paton. "Systems that think not just in averages, but in distributions and probabilities, could better adapt to shifting goals and changing environments".

For now, this work marks a major step forward in understanding how the brain anticipates the future – not as a fixed forecast, but as a flexible map of detailed possibilities. It's a model of foresight rooted in flexibility, diversity, and context, a neural code that could serve as one of the brain's most valuable blueprints – a guide not just for learning from the past, but for navigating the uncertainty of what comes next.

Something to think about the next time you're weighing up whether to join the queue.

/Public Release. This material from the originating organization/author(s) might be of the point-in-time nature, and edited for clarity, style and length. Mirage.News does not take institutional positions or sides, and all views, positions, and conclusions expressed herein are solely those of the author(s).View in full here.