New Tool Detects AI With High Accuracy, Low Errors

University of Michigan

Screenshot of the AI-Generated Text Detector
Enter text to detect if it was generated by an AI model.

Study: Zero-Shot Statistical Tests for LLM-Generated Text Detection using Finite Sample Concentration Inequalities (DOI: 10.48550/arXiv.2501.02406)

Detecting writing via artificial intelligence is a tricky dance: Doing it right means being effective at identifying it while being careful not to falsely accuse a human of employing it. And few tools strike the right balance.

A team of researchers at the University of Michigan say they have devised a new way to tell whether a piece of text written by AI passes both tests-something that could be especially useful in academia and public policy as AI content proliferates and becomes more indistinguishable from human-generated content.

The team calls its tool "Liketropy," which is inspired by the theoretical backbone of its method: It blends likelihood and entropy, two statistical ideas that power its test.

They designed "zero-shot statistical tests," which can determine whether a piece of writing was written by a human or a Large Language Model without requiring prior training on examples of each.

The current tool focuses on LLMs, a specific type of AI for producing text. It uses statistical properties of the text itself, such as how surprising or predictable the words are, to decide if it looks more human or machine-generated.

In testing on large-scale datasets-even those whose models were hidden from the public or where AI-generated text was designed to surpass detectors-researchers say their tool performed well. When the test is designed with specific LLMs in mind as potential generators of the text, it achieves an average accuracy above 96% and a false accusation rate as low as 1%.

"We were very intentional about not creating a detector that just points fingers. AI detectors can be overconfident, and that's risky-especially in education and policy," said Tara Radvand, a doctoral student at U-M's Ross School of Business who co-authored the study. "Our goal was to be cautious about false accusations while still flagging AI-generated content with statistical confidence."

Among the researchers' unexpected findings were how little they needed to know about a language model to be capable of catching it. The test worked and still performed well, challenging the assumption that detection must rely on access, training or cooperation, Radvand said.

The team was motivated by fairness, particularly for international students and non-native English speakers. Emerging literature shows that students who speak English as a second language may be unfairly flagged for "AI-like" writing because of tone or sentence structure.

"Our tool can help these students self-check their writing in a low-stakes, transparent way before submission," Radvand said.

As for next steps, she and her colleagues plan to expand their demo into a tool that can be adapted into different domains. They've learned that fields such as law and science, as well as applications like college admissions, have different thresholds in the "cautious-effective" trade-off.

A critical application for AI detectors is to reduce the spread of misinformation on social media. Some tools intentionally train LLMs to adopt extreme beliefs and spread misinformation on social media to manipulate public opinion.

Because these systems can generate large-scale false content, the researchers say it's crucial to develop reliable detection tools that can flag such content and comments. Early identification helps platforms limit the reach of harmful narratives and protect the integrity of public discourse.

They also plan to speak with U-M business and university leaders about the prospect of adopting their tool as a complement to U-M GPT and the Maizey AI assistant to verify whether text was generated by these tools versus an external AI model, such as ChatGPT.

Liketropy received a Best Presentation Award at the Michigan Student Symposium for Interdisciplinary Statistical Sciences, an annual event organized by graduate students. It also was featured by Paris Women in Machine Learning and Data Science, a France-based community of women interested in machine learning and data science that hosts various events.

Radvand's co-authors were Mojtaba Abdolmaleki, also a doctoral student at the Ross School of Business; Mohamed Mostagir, associate professor of technology and operations at Ross; and Ambuj Tewari, professor in U-M's Department of Statistics.

/Public Release. This material from the originating organization/author(s) might be of the point-in-time nature, and edited for clarity, style and length. Mirage.News does not take institutional positions or sides, and all views, positions, and conclusions expressed herein are solely those of the author(s).View in full here.

You might also like