Next-Gen AI: NSLLM Boosts Efficiency, Clarity

Science China Press

NSLLM Bridges LLMs and Neuroscience

Large language models (LLMs) have become crucial tools in the pursuit of artificial general intelligence (AGI). However, as the user base expands and the frequency of usage increases, deploying these models incurs significant computational and memory costs, limiting their potential to serve as foundational infrastructure for human society. Moreover, current LLMs generally lack interpretability: their opaque decision-making and optimization processes make it challenging to ensure reliability and fairness in high-risk domains such as healthcare and finance. In contrast, the human brain performs complex tasks with less than 20 watts of power while exhibiting remarkable transparency in its cognitive processes. This stark contrast underscores the gap between LLMs and human cognition and presents a dual challenge: on one hand, improving the computational efficiency of LLMs is essential to enhance energy efficiency and conserve resources; on the other hand, enhancing their interpretability is crucial to better understand the interactions and functions of components in large-scale systems.

To overcome the interdisciplinary bottleneck, this study proposes a unified framework that transforms conventional LLMs into NSLLMs by performing integer spike counting and binary spike conversion, while incorporating a spike-based linear attention mechanism. This framework bridges neuroscience and large language models, offering a platform for the application of neuroscience tools to LLMs. By introducing integer training with binary inference, the outputs of standard LLMs are converted into spike representations, allowing neuroscience tools to analyze the information processing.

Ultra-Low-Power Software–Hardware Co-Designed MatMul-Free LLM

To validate the energy efficiency of the approach, the study implements a custom MatMul-free computing architecture for a billion-parameter-scale model on an FPGA platform. Specifically, a layer-wise quantization strategy and hierarchical sensitivity metrics are used to assess the impact of each layer on quantization loss, enabling the configuration of an optimal mixed-timestep spike model that achieves competitive performance under low-bit quantization. In addition, a quantization-assisted sparsification strategy is introduced to reshape the membrane potential distribution and shift the quantization mapping probability toward lower integer values, significantly reducing the spike firing rate and further improving model efficiency. On the VCK190 FPGA, a MatMul-free hardware core is designed that completely eliminates matrix multiplication operations in the NSLLM, reducing dynamic power consumption to 13.849 W and increasing throughput to 161.8 tokens/s. Compared with an A800 GPU, this approach achieves 19.8× higher energy efficiency, 21.3× memory savings, and 2.2× higher inference throughput.

Enhanced Interpretability via Spiking Neural Populations

By transforming the behavior of LLMs into neural dynamical representations—such as spike trains—through the NSLLM framework, we can analyze both the dynamic properties of their neurons (e.g., randomness quantified by Kolmogorov–Sinai entropy) and their information-processing characteristics (e.g., Shannon entropy and mutual information). This enables a clearer interpretation of the computational roles played by NSLLMs. Experimental results show that the model encodes information more effectively when processing unambiguous text, allowing it to distinguish between ambiguous and unambiguous inputs (for example, the middle layers exhibit higher normalized mutual information for ambiguous sentences; the AS layer shows distinct dynamical signatures that reflect its role in sparse information processing; and the FS layer has higher Shannon entropy, indicating stronger information transmission capacity. Moreover, the positive correlation between mutual information and Shannon entropy suggests that layers with higher information capacity are better at preserving key input features) . By integrating neural dynamics with information-theoretic measures, this framework provides biologically inspired interpretability for LLM mechanisms while significantly reducing data requirements.

Neuroscience research has shown that the human brain achieves energy-efficient information processing through sparse and event-driven computation, enhancing both communication efficiency and system interpretability. Building on this principle, the team developed an interdisciplinary unified framework that introduces a neuromorphic alternative to traditional LLMs, while delivering performance on par with mainstream models of similar scale across common-sense reasoning and a range of more complex large-model tasks—including reading comprehension, world knowledge question answering, and mathematics. This framework not only advances the frontier of energy-efficient AI, but also offers new perspectives on the interpretability of large language models and provides valuable insights for the design of future neuromorphic chips.

/Public Release. This material from the originating organization/author(s) might be of the point-in-time nature, and edited for clarity, style and length. Mirage.News does not take institutional positions or sides, and all views, positions, and conclusions expressed herein are solely those of the author(s).View in full here.