Advancing Human Understanding of Machine Intelligence

EPFL researchers have discovered key 'units' in large AI models that seem to be important for language, mirroring the brain's language system. When these specific units were turned off, the models got much worse at language tasks.

Large Language Models (LLMs) are not just good at understanding and using language, they can also reason or think logically, solve problems and some can even predict the thoughts, beliefs or emotions of people they interact with.

Despite these impressive feats, we still don't fully understand how LLMs work 'under the bonnet', particularly when it comes to how different units or modules perform different tasks. So, researchers in the NeuroAI Laboratory, part of both the School of Computer and Communication Sciences (IC) and the School of Life Sciences (SV), and the Natural Language Processing Laboratory (IC), wanted to find out whether LLMs have specialized units or modules that do specific jobs. This is inspired by networks that have been discovered in human brains, such as the Language Network, Multiple Demand Network and Theory of Mind network.

In a paper presented this month at the 2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics,in Albuquerque in the United States, the researchers explain how they investigated 18 popular LLMs and found that certain units do, indeed, seem to make up a core network focused on language.

"Drawing inspiration from neuroscience approaches, which have mapped out the functional organization of our brains, we compared how active a unit was when reading real sentences compared to reading random word lists. The units that reacted more actively to real sentences were then identified as 'language-selective units', just like our brains' Language Network," said Assistant Professor Martin Schrimpf, Head of the NeuroAI Lab.

Less than 100 neurons extremely relevant

To test the causal role of the language-selective units that they had identified, the researchers removed those units and, separately, removed different sets of random units. They then compared the differences in what happened next. When the language-specific units were removed - but not the random ones - the models were no longer able to generate coherent text and were unable to perform well on linguistic benchmarks.

"The results show that these units really matter for the model. The key surprise for us was that there are probably less than 100 neurons or so - about 1% of units - that seem to be extremely relevant for anything to do with a model's ability to produce and understand language and in disrupting those, suddenly the model fails completely," explained Badr AlKhamissi, a doctoral assistant in the NeuroAI and NLP Labs and the lead author of the paper.

"There is machine learning and interpretability research that has identified some networks or units in a model relevant to language, but it required a lot of training, and it was far more complicated than just using the same localizer used in human neuroscience. We didn't really expect this to work so well," he continued.

In addition to language-selective units, this raised a natural question: could the same localizers designed to identify other brain networks, such as the Theory of Mind or Multiple demand networks, also be applied to LLMs?

Using these localizers, the EPFL researchers tried to assess whether other units within the models specialize in reasoning or social thinking and found that some models possessed these specific task units while others did not.

Further questions

"In some models we did find specialized reasoning and thinking units and in some models we didn't. An interesting question right now is where is this coming from? Why do some models have this preference and does this connect to their performance on related benchmarks? If you have units that are somewhat isolated does that enable the model to do better? Perhaps this is related to how the models are trained or the data they are trained on, and this is an avenue for further research," said Schrimpf.

Other future research will focus on trying to discover what happens in multi-model models - models that are not just trained on text but that can also process various other modalities of information including images, video and sound

"I am definitely very interested in this, because as humans we operate on speech and visual input. The question is that if we do use a multi-modal model and give it, for example, language as a visual input, similar to people reading a piece of text, will it have the same language deficits as it did when we removed the Language Network in the LLMs versus a visual task where it has to identify various objects or undertake mathematical reasoning? Will these remain intact?" asked AlKhamissi.

More broadly, the researchers believe that these studies help to solve the puzzle of the internal workings of Large Language Models, relating back to neuroscience and making connections as to how the human brain works.

"If we think of the damage that occurs to the Language Network in the brains of people who have had a stroke, they often have severe language impairments while everything else is intact. It's very similar here with the LLM language component just producing gibberish and, while we didn't test this, it could probably still work well on everything else. We're hoping these models will help us to better understand ourselves and our brains, paving the way for more advanced disease diagnosis and treatment," concluded Schrimpf.

The NeuroAI Laboratory is part of EPFL's Neuro-X Institute, a collaborative and interdisciplinary community that brings together teams from EPFL's School of Computer and Communication Sciences, The School of Live Sciences and the School of Engineering.

/Public Release. This material from the originating organization/author(s) might be of the point-in-time nature, and edited for clarity, style and length. Mirage.News does not take institutional positions or sides, and all views, positions, and conclusions expressed herein are solely those of the author(s).View in full here.