NIH AI Boosts Gene Analysis Accuracy with Expert Data

HIN

Researchers at the National Institutes of Health (NIH) have developed an artificial intelligence (AI) agent powered by a large language model (LLM) that creates more accurate and informative descriptions of biological processes and their functions in gene set analysis than current systems.

The system, called GeneAgent, cross-checks its own initial predictions-also known as claims-for accuracy against information from established, expert-curated databases and returns a verification report detailing its successes and failures. The AI agent can help researchers interpret high-throughput molecular data and identify relevant biological pathways or functional modules, which can lead to a better understanding of how different diseases and conditions affect groups of genes individually and together.

AI-generated content is produced by LLMs trained on enormous amounts of text data from across the internet. LLMs use those data to recognize patterns and predict what words might follow each other in a sentence. However, LLMs are not designed to verify truth, meaning AI-generated content can be false, misleading, or fabricated, a phenomenon called AI hallucinations. Additionally, LLMs are prone to circular reasoning-fact-checking their generated results against their own data-which makes them sound more confident in the output even when the information is false.

Staving off AI hallucinations is important when using LLM tools for gene set analysis-the process of generating collective functional descriptions of grouped genes and their potential interactions. Previous studies that taught LLMs to answer genomic questions or summarize biological processes in a given gene set did not explicitly address hallucinations in the generated content.

GeneAgent mitigates this issue by taking its own claims and independently comparing them to established knowledge compiled in external, expert-curated databases. The research team first tested GeneAgent on 1,106 gene sets sourced from existing databases with known functions and process names. For each gene set, GeneAgent first generated an initial list of functional claims. It then independently used its self-verification agent module to cross-check these claims against the curated databases and create a verification report that noted whether each of its claims was supported, partially supported, or refuted.

To best determine its accuracy in the self-verification step, the researchers next brought in two human experts to manually review 10 randomly selected gene sets with a cumulative 132 claims and judge whether GeneAgent's self-verification reports were correct, partially correct, or incorrect. Of the self-verification reports generated by GeneAgent, the experts determined that 92% of its decisions were correct, indicating high performance in its ability to conduct self-verification, especially when compared to GPT-4. Their detailed review confirmed the model's effectiveness in minimizing hallucinations and generating more reliable analytical narratives.

The research team also looked at real-world application of GeneAgent on animal-model gene sets. When applied to seven novel gene sets derived from mouse melanoma cell lines, GeneAgent was able to offer valuable insight into novel functionalities for specific genes. This could mean knowledge discovery for things such as potential new drug targets for diseases like cancer.

While LLMs such as GeneAgent are still limited by the information they can use and their inability to reason as humans, GeneAgent's ability for self-driven fact-checking shows remarkable promise in mitigating AI hallucinations.

/Public Release. This material from the originating organization/author(s) might be of the point-in-time nature, and edited for clarity, style and length. Mirage.News does not take institutional positions or sides, and all views, positions, and conclusions expressed herein are solely those of the author(s).View in full here.