AI Churns Out Finance Papers Indistinguishable From Human

Pennsylvania State University

Artificial intelligence (AI) and large language models (LLMs) tools are capable of mass-producing academic finance papers that are nearly indistinguishable from human-authored research, according to a new study published in the Journal of Economic Literature.

Study co-authors Mihail Velikov, associate professor of finance at Penn State's Smeal College of Business, and Robert Novy-Marx, Lori and Alan S. Zekelman Distinguished Professor of Business Administration at the University of Rochester, built a pipeline for automating academic research production. In roughly 12 hours, they generated nearly 400 complete, publication-ready finance papers - using AI to generate hypotheses and write the manuscript. The study demonstrated how AI could accelerate research paper production while raising concerns about the potential impact on the academic community and the meaning of scientific discovery.

"AI can now produce a ton of papers at scale, and it's going to change the nature of how we produce and disseminate knowledge," Velikov said. "This is an early warning signal of what's coming with modern AI capabilities."

The researchers didn't initially set out to create an AI assembly-line for financial research papers. Velikov's scholarship focuses on identifying anomalies in the equity market - observed patterns in the data that don't conform to accepted theoretical models of how financial markets behave. He and Novy-Marx were working on a data mining project, analyzing corporate accounting data for potential signals that might predict which stocks will outperform the market.

They identified more than 30,000 potential signals. They validated the predictive power of each signal, which included comparing them against 200 documented anomalies published in the finance literature. Based on this analysis, they narrowed the list down to 95 signals that were truly novel.

Velikov then entered the information into a website he built that could generate a template report based on the analysis. The resulting reports resembled published papers that document new anomalies. The only thing that was missing was a hypothesis or interpretation for why the anomalies might exist.

"This was late 2023 and it hit me that large language models might be really good at coming up with stories to explain why these anomalies occurred," Velikov said. "A data mining exercise coupled with large language models could produce a large number of plausible-looking scientific papers."

So, they tried it. The researchers used Anthropic's LLM Claude Opus 4.1, which was the latest version at the time, to expand the template reports into academic papers, based on the information and analysis from the data mining project. For each of the 95 signals, they instructed the LLM to come up with descriptive names for the predictors and to produce four distinct manuscripts, each with a different hypothesis and theoretical justification to explain the observed results for the same signal.

In total, the researchers produced 380 papers. Each paper included an abstract, introduction, data, results, conclusion and citations. The AI-generated papers, along with the full code used to produce them, are publicly available at GitHub.

AI's efficient production of academic papers raises questions and concerns about academic research and the peer-review system, Velikov said. In general, submissions to journals and conferences have surged in recent years, overwhelming peer reviewers. With the increasing capabilities and use of AI and LLMs, he said that the scientific peer-review system needs to adapt in light of these technologies.

"Nowadays, with agentic AI systems, this can be done even better, and the papers are much better," Velikov said. "This is going to raise standards. It's probably going to change how we disseminate and evaluate research."

The study highlighted another area of concern, Velikov said. In the AI-generated papers, the LLM formulated the hypotheses after a pattern in the data had already been identified - a practice known as HARKing or hypothesizing after the results are known. This is a documented practice in academia, one that's viewed negatively, Velikov explained, but AI changes the scale at which HARKing may occur. If AI creates explanations for what's found in the data, it potentially raises questions about what constitutes scientific contribution, particularly considering that AI still hallucinate - when LLMs generate false or misleading results and present them as fact.

While the researchers focused on financial research, they said that the implications of these findings can extend to other fields.

"I'm far from the opinion that we'll all be out of jobs and replaced by AI," Velikov said. "But I think our jobs will evolve a lot, and the more we invest in understanding how these systems work, the better research we'll be able to do. The better we'll be able to do our job."

Funding from INQUIRE Europe supported this work.

/Public Release. This material from the originating organization/author(s) might be of the point-in-time nature, and edited for clarity, style and length. Mirage.News does not take institutional positions or sides, and all views, positions, and conclusions expressed herein are solely those of the author(s).View in full here.