The ability to precisely predict movements is essential not only for humans and animals, but also for many AI applications — from autonomous driving to robotics. Researchers at the Technical University of Munich (TUM) have now discovered that artificial neural networks can perform this task better when trained with biological data from early visual system development.
Whether in mice, cats, or humans: Even before vertebrates open their eyes, a built-in training program begins in the retina — entirely independent of external stimuli. Spontaneous activity patterns spread in wave-like motions across the eye's neural tissue. This neural activity, known as "retinal waves", coordinates the early wiring between the retina and the brain's visual system. In a way, the eye starts practicing vision before encountering the real world.
Researchers at TUM have now shown that artificial neural networks — which mimic the function of the brain — can also benefit from this kind of pre-training. "Artificial neural networks are typically trained using data that closely resembles the task they're intended to perform. When viewed in analogy to how the visual system develops in living organisms, their learning process begins only when the eyes open. We took inspiration from nature and incorporated a pre-training stage, analogous to that in the biological visual system, into the training of neural networks," says Julijana Gjorgjieva, Professor of Computational Neuroscience at TUM.
Pre-Training leads to faster and more accurate predictions
In the first step, the team investigated whether training with retinal waves has any impact on a neural network's performance. To do this, they trained different networks in different ways: One group of networks underwent pre-training using retinal wave data from a mouse. Afterwards, these networks were trained using an animated film simulating the perspective of a mouse running through a narrow corridor lined with various geometric patterns. Another group of networks was trained using only the animated film — without any pre-training.
The task was the same for all networks: They had to accurately predict how the visual patterns on the wall of the simulated corridor would evolve. The networks pre-trained with retinal waves performed the task more quickly and more accurately than those without such pre-training. To rule out the possibility that the better performance was simply due to a longer training period, the researchers conducted another round of experiments in which they shortened the time spent training the pre-trained networks on the animation. This ensured that all networks had the same overall training duration. Even then, the pre-trained networks outperformed the others in both speed and precision.
Better performance even with real-world footage
In a final step, the team increased the difficulty level. They trained the networks using real-world footage captured from a roaming cat's perspective with an action camera, showing what the cat sees. The video quality was lower than in the animation, and the movements were more complex. Yet once again, the networks that had been pre-trained with retinal waves outperformed all others.