LLM-Enhanced Few-Shot Entity Resolution Framework

Higher Education Press

Entity resolution (ER) aims to identify and match records referring to the same entity from multiple data sources, which is a crucial task in data integration. Traditional methods rely on structured data and require extensive manual labeling for better performance, limiting their effectiveness for long-text, unstructured data scenarios, while directly apply LLM for ER occurs with hallucination results with factual error.

To solve the problems, a research team led by Jianxin Li published their new research on 15 November 2025 in Frontiers of Computer Science, co-published by Higher Education Press and Springer Nature.

The team proposed FUSER, a novel framework that integrates large language models (LLMs) with uncertainty calibration to improve entity matching performance while reducing the hallucination generation of LLM with moderate additional computational cost.

The FUSER framework consists of three main components:

  1. Structural Data Enrichment (SDE): A retrieval-augmented generation strategy that extracts structured attributes from unstructured entities while preserving their semantic integrity.
  2. Uncertainty Qualification (UQ): A two-tier calibration method that ranks candidate matches based on attribute-level and entity-level uncertainty estimation.
  3. Few-shot ER with LLM: A lightweight ER pipeline that enhances matching accuracy with as few as 50 labeled positive samples, leveraging data enrichment and pseudo-labeling techniques.

The proposed method was evaluated on six ER benchmark datasets, demonstrating superior performance over existing state-of-the-art approaches. The results indicate that FUSER achieves higher entity resolution accuracy. Specifically, the uncertainty qualification mechanism enhances the reliability of extracted entity attributes, minimizing errors caused by LLM hallucinations. Compared with traditional and LLM-based methods, FUSER provides a 10× speedup in uncertainty quantification while maintaining competitive accuracy.

Future research will focus on optimizing the uncertainty modeling process and extending the framework to additional real-world applications, such as knowledge graph construction and biomedical data integration.

/Public Release. This material from the originating organization/author(s) might be of the point-in-time nature, and edited for clarity, style and length. Mirage.News does not take institutional positions or sides, and all views, positions, and conclusions expressed herein are solely those of the author(s).View in full here.