Cracking AI Black Box With Human-Friendly Insights

Abstract

Deep learning models have been successful in many areas, but understanding their behavior remains a challenge. Most prior explainable AI (XAI) approaches have focused on interpreting how models make predictions. In contrast, we introduce a novel approach that identifies textual descriptions most beneficial for model training. By analyzing which descriptions contribute most effectively to the model training, our method has the potential to provide insights into how the model prioritizes and utilizes information for decision-making. To achieve this, we propose a pipeline that generates textual descriptions using large language models, incorporates external knowledge bases, and refines them through influence estimation and CLIP score. Furthermore, leveraging the phenomenon of cross-modal transferability, we propose a novel benchmark task named cross-modal transfer classification to examine the effectiveness of our textual descriptions. In zero-shot experiments, we demonstrate that our textual descriptions improve classification accuracy compared to baselines, leading to consistent performance gains across nine image classification datasets. Additionally, understanding which descriptions contribute most to model performance can shed light on how the model utilizes textual information in its decision-making.

Artificial intelligence (AI), particularly deep learning models, are often considered Black Boxes because their decision-making processes remain difficult to interpret. These models can accurately identify objects-such as recognizing a bird in a photo-but understanding exactly how they arrive at these conclusions is a significant challenge. Until now, most interpretability efforts have focused on analyzing the internal structures of the models themselves.

A research team, affiliated with UNIST has taken a different approach. Led by Professor Tae-hwan Kim from the UNIST Graduate School of Artificial Intelligence, the team have developed a novel method to clarify how AI models learn by translating the training data into human-readable descriptions. This approach aims to shed light on the data that forms the foundation of AI learning, making it more transparent and understandable.

While traditional explainable AI (XAI) research examines a model's predictions or internal calculations after training, this study takes a different route. It aims to make the data itself transparent-by characterizing data through descriptive language-so we can better understand how AI learns and makes decisions.

Overview of the Process Figure 1. Schematic illustration the overview of the process. (1) Extract class components from class names and obtain textual descriptions using Wikipedia urls. (2) Identify proponent images using influence scores, then combine CLIP scores and influence scores to get proponent texts (3) Train the model with training images, followed by cross-modal transfer training using proponent texts.

The researchers utilized large language models, such as ChatGPT, to generate detailed descriptions of objects within images. To ensure the descriptions were accurate and reliable, they incorporated external knowledge sources, like online encyclopedias.

However, not all generated descriptions are equally useful for training AI models. In order to identify the most relevant ones, the team devised a new metric, called Influence scores for Texts (IFT). This score assesses two key aspects: how much each description impacts the model's predictions-measured by the change in accuracy if that description is removed-and how well the description aligns with the visual content, evaluated through CLIP scores.

For example, in a bird classification task, descriptions that focus on bill shape or feather patterns with high IFT scores indicate that the model has learned to recognize birds based on these characteristics, rather than irrelevant details, such as background color.

To verify whether these influential descriptions actually enhance model performance, the team conducted cross-model transfer experiments. They trained models using only high-IFT descriptions and tested their accuracy across multiple datasets. The results demonstrated that models trained with these carefully selected explanations were more stable and achieved higher accuracy, confirming the meaningful contribution of the explanations to the learning process.

Professor Kim commented, "Allowing AI to explain its training data in human-understandable terms offers a promising way to reveal how deep learning models make decisions. It's a significant step toward more transparent and trustworthy AI systems."

This research was accepted for presentation at EMNLP 2025, one of the leading conferences in natural language processing, which took place from November 5 to 9, 2025 in Suzhou, China.

Journal Reference

Chaeri Kim, Jaeyeon Bae, Taehwan Kim, "Data Descriptions from Large Language Models with Influence Estimation," EMNLP '25, (2025).

/Public Release. This material from the originating organization/author(s) might be of the point-in-time nature, and edited for clarity, style and length. Mirage.News does not take institutional positions or sides, and all views, positions, and conclusions expressed herein are solely those of the author(s).View in full here.