AI Advances: Jülich System Deciphers Numbers, Links

Forschungszentrum Juelich

17 April 2026

Numbers are the language of science-yet in research articles, they are often buried within the text and difficult to analyze. Researchers at Jülich have developed an AI system that automatically identifies these numbers, categorizes them, and converts them into structured data. The Quinex framework thus eliminates the need for time-consuming manual work.

A stylized human head with glowing neural connections is surrounded by diagrams, charts, and digital interfaces displaying data and graphs. (Mistral: Pixtral Large 2411, 2026-04-16)
The Quinex framework, developed by researchers at Jülich, is based on language models and automatically identifies numerical values in scientific publications, assigns them to appropriate units, and determines what was measured, when, where, and how.
Copyright:
- 2026 Göpfert et al., The Innovation, Elsevier

Whether in energy, climate, or materials research-scientific papers are full of numbers-or, more precisely, quantitative data: efficiencies, temperatures, costs, emissions. These are often crucial for improving models or identifying trends. At the same time, the number of scientific publications is growing rapidly. For many research questions, it is now virtually impossible to manually evaluate all relevant publications-the time and resources required would be enormous.

The Quinex ("Quantitative Information Extraction") framework, developed by researchers at Jülich, is based on language models and automates this process: Artificial intelligence identifies numerical values, assigns them to appropriate units, and recognizes what was measured, when, where, and how. Thus, a sentence like "Efficiency levels of 63 to 71 percent are assumed for 2025" is transformed into a structured dataset containing all relevant contextual information-from the year and measurement method to the source.

Open and Efficient AI

Unlike many proprietary AI solutions, Quinex is based entirely on open, relatively small, and thus efficient language models. These have been specifically trained to recognize and classify quantitative information in scientific texts. Compared to similar systems, Quinex delivers more precise results, captures contextual information in a more nuanced way, and also takes implicit characteristics into account.

Despite its compact size, Quinex achieves a recognition accuracy (F1) of around 98 percent for numbers and associated units, and approximately 87 and 82 percent for the classification of quantified properties and entities. These high accuracy rates were achieved through specially created training datasets and methodological improvements.

"We wanted to develop a tool that is powerful, yet also transparent and resource-efficient," explains Dr. Jann Weinand, head of the Integrated Scenarios Department at Jülich System Analysis. "Quinex makes artificial intelligence more accessible for data analysis in science."

Successful Practical Test

To test Quinex's practical suitability, the system was applied to thousands of scientific abstracts from various fields. It successfully extracted data on electricity production costs for various energy technologies, on maximum oxygen uptake in humans, on earthquake magnitudes and locations, and on the band gaps of photovoltaic materials.

Die automatisch gewonnenen Werte stimmten eng mit den jeweiligen Referenzdaten überein. Damit zeigt sich: Quinex eignet sich, um in verschiedensten Forschungsfeldern große Mengen an Fachliteratur auszuwerten und daraus verlässliche Trends abzuleiten.

Neue Perspektiven für die Forschung

„Sprachmodelle eröffnen neue Perspektiven für die Wissenschaft und helfen dabei, den Überblick über ganze Forschungsbereiche zu behalten", sagt Hauptautor Jan Göpfert. „Sie ermöglichen automatisierte Literaturrecherchen, den Aufbau einheitlich strukturierter Forschungsdatenbanken und Trendanalysen, die Entwicklungen in Wissenschaft und Technik frühzeitig sichtbar machen."

„Unser Ziel ist es, Forschende von Routinearbeit zu entlasten", so Dr. Patrick Kuckertz, Leiter der Gruppe Forschungsdatenmanagement. „Quinex soll ihnen helfen, schneller zu Erkenntnissen zu gelangen und die wachsende Datenflut in der Wissenschaft zu beherrschen."

Grenzen und künftige Verbesserungen

Quinex isn't entirely error-free either-but transparency is part of its design.

"The system recognizes numbers and units very reliably," says Jan Göpfert. "Since they are taken directly from the text, they cannot be 'hallucinated.' However, misinterpretations sometimes occur, for example when important references are scattered throughout the text."

Thus, Quinex remains a tool that supports people but does not replace them. "We recommend using Quinex where it informs and relieves researchers-but the responsibility for interpreting the results remains with them," says Göpfert. Every recognized number can be traced back to its source and, where possible, is highlighted in the original text.

The team is working to further develop Quinex with additional domain-specific datasets and models, making it even more efficient and flexible enough to adapt to various research requirements.

Open Collaboration Welcome

Forschungszentrum Jülich is making Quinex available as an open-source project.

This is intended to give researchers worldwide the opportunity to test, expand, and adapt the system to their own fields-from energy research to chemistry and biomedicine.

Quinex Open Source: https://go.fzj.de/quinex

Original publication

Jan Göpfert, Patrick Kuckertz, Gian Müller, Luna Lütz, Celine Körner, Hang Khuat, Detlef Stolten, Jann M. Weinand (2026): Quinex: Quantitative information extraction from text using open and lightweight LLMs. The Innovation. DOI: 10.1016/j.xinn.2026.101391

/Public Release. This material from the originating organization/author(s) might be of the point-in-time nature, and edited for clarity, style and length. Mirage.News does not take institutional positions or sides, and all views, positions, and conclusions expressed herein are solely those of the author(s).View in full here.