Imagine balancing a ruler vertically in the palm of your hand: you have to constantly pay attention to the angle of the ruler and make many small adjustments to make sure it doesn't fall over. It takes practice to get good at this.
In engineering, this is called the "inverted pendulum" or "cart-pole" problem, in which a control system learns to balance an upright pole hinged to a moveable cart. This problem is used as a benchmark in fields like robotics, control theory, and artificial intelligence to gauge if a control system can adaptively process and respond to information in a useful way. It's relevant even in our earliest days—every human infant needs to solve a problem just like this in order to become a toddler.
Researchers at the University of California, Santa Cruz, trained brain organoids, tiny pieces of brain tissue grown in the lab, to solve this fundamental benchmark problem. By using electrical signals to send and receive information from the organoids, the researcher's software coached the lab-grown brain tissue to significantly improve its performance at the cart-pole problem.
The team, led by Baskin School of Engineering Electrical and Computer Engineering (ECE) Ph.D. student Ash Robbins, ECE Professor Mircea Teodorescu, and Distinguished Professor of Biomolecular Engineering David Haussler, demonstrate their findings in a paper published in the journal Cell Reports.
This research aims to uncover how information is transmitted in the brains of complex organisms through the electrical spiking of neurons in such a way that they can learn to get better at tasks, and has implications for basic science and health research. Understanding how complex neural circuits function and adapt could provide a powerful new tool for studying how neurological conditions, such as Alzheimer's disease, dementia, stroke, concussion, Autism, Schizophrenia, Parkinson's disease, dyslexia and ADHD, can change or impair the brain's capacity to learn.
"We're trying to understand the fundamentals of how neurons can be adaptively tuned to solve problems," Robbins said. "If we can figure out what drives that in a dish, it gives us new ways to study how neurological disease can affect the brain's ability to learn."
This research is the first rigorous academic demonstration of goal-directed learning in lab-grown brain organoids, and lays the foundation for adaptive organoid computation—exploring the capacity of lab-grown brain organoids to learn and solve tasks.
"These are incredibly minimal neural circuits. There's no dopamine, no sensory experience, no body to sustain, no goals to pursue. And yet, when given targeted electrical feedback, this tissue is plastic enough and structured enough to be pushed toward solving a real control problem. That tells us something important: the capacity for adaptive computation is intrinsic to cortical tissue itself, separate from all the scaffolding we usually assume is necessary," said Keith Hengen, an associate professor of biology at Washington University in St. Louis who was not involved with this study.
Organoid coaching
Organoids, which are heart, liver, lung, brain and other types of tissues grown in the lab from stem cells, have been used extensively in biomedical research for about 15 years, but researchers are only now in the initial stages of exploring how they could be used to understand how brains learn.
Brain organoids mimic early brain development, structure, and function. They are smaller than a peppercorn, but nevertheless can contain a network of several million neurons, the brain cells that fire off electrical signals to transmit information through the body. By placing the organoids on a specialized chip, the researchers can observe neurons firing within the organoid tissue, and also stimulate selected neurons to fire.
"From an engineering perspective, what makes this powerful is that we can measure, stimulate, and adapt in the same system," Teodorescu said. "This is not just recording neural activity. It is a closed-loop bioelectrical interface where the tissue's response directly shapes the next input. That is what allows us to study learning as a physical process, which has been very difficult to study directly in intact brains."
The research team, associated with the Braingeneers group within the UC Santa Cruz Genomics Institute, set out to understand if the organoid neurons could succeed at the cart-pole task, drawing inspiration from foundational work at Caltech and Georgia Tech decades earlier by Steve Potter.
Using organoids derived from mouse stem cells and an electrophysiology system developed by industry partners Maxwell Biosciences, the researchers use electrical simulation to send and receive information to and from neurons. By using stronger or weaker signals, they communicate to the organoid the angle of the pole, which exists in a virtual environment, as it falls in one direction or the other. As this happens, the researchers observe as the organoid sends back signals of how to apply force to balance the pole, and they apply this force to the virtual pole.
For their pole-balancing experiments, the researchers observe as the organoid controls the pole until it drops, which is called an episode. Then, the pole is reset and a new episode begins. In essence, the organoid plays a video game in which the goal is to balance the pole upright for as long as possible.
The researchers observe the organoid's progress in five-episode increments. If the organoid keeps the pole upright for longer on average in the past five episodes as compared to the past 20, it receives no training signal since it has been improving. If it does not improve the average time it keeps the pole upright, it receives a training signal.
Training feedback is not given to the organoid while it is balancing the pole—only at the end of an episode. An AI algorithm called reinforcement learning is used to select which neurons within the organoid get the training signal.
"You could think of it like an artificial coach that says, 'you're doing it wrong, tweak it a little bit in this way,'" Robbins said. "We're learning how to best give it these coaching signals."
Observing improvement
The results of this study prove that the reinforcement learning algorithm can guide the brain organoids toward improved performance at the cart-pole task—meaning organoids can learn to balance the pole for longer periods of time.
The researchers adopted a rigorous framework for success to make sure they were observing true improvement, and not just random success, including a threshold for the minimum time an organoid needs to balance the pole to "win" the game.
They found that using their coaching technique led to significantly better performance than organoids coached at random, from just a 4.5% "winning" rate for random training, to a 46% winning rate for adaptive training with reinforcement learning.
"When we can actively choose training stimuli, we can actually shape the network to solve the problem," Robbins said. "What we showed is short-term learning, in that we can take an organoid in one state and shift it into another one that we're aiming at, and we can do that consistently."
However, the organoids seem to "forget" most of what they learn during long periods of inactivity. After balancing the pole over many episodes for 15 minutes, the organoid rests for 45 minutes. The researchers found that after this rest period, the organoid's performance drops back to baseline, indicating it is not retaining its training.
Haussler said this lack of retention might be overcome by using more complex organoids.
"It is likely that more sophisticated organoids, perhaps grown to include multiple brain regions involved in animal learning, will be needed to recapitulate the kind of long-term adaptive performance improvement we see in animals" Haussler said. "We'll see."
The researchers are interested in further exploring why their coaching technique works—which neurons are best to target, which training signals might work best, and how long-term learning may arise.
To enable this, Robbins developed an open-source software tool to complement these experiments, called BrainDance. The technology is designed for anyone with the biological skills to culture brain organoids to be able to conduct neural simulation learning experiments and analyze results, without needing to code a game, hardware interface, or training environment themselves—with the goal to enable more people to participate in organoid research and accelerate the field.
"This software makes running really complicated experiments extremely easy. Usually labs spend years building up all of this kind of software themselves," Robbins said. "Now, any biologist could download our software very easily and run these types of experiments in just minutes."
"Ash's software could build a larger community around adaptive organoid computation. But we want to make it clear that our goal is to advance brain research and the treatment of neurological diseases, not to replace robotic controllers and other kinds of computers with lab-grown animal brain tissues," Haussler added. "The latter might be considered cool, but would bring up serious ethical issues, especially if human brain organoids were used."