Does keeping authors' identities secret during peer review make the process more fair? A new large-scale field study, led by Indiana University College of Arts and Sciences Professor Tim Pleskac in collaboration with Ellie Kyung (Babson College), Gretchen Chapman (Carnegie Mellon University), and Oleg Urminsky (University of Chicago), finds that the answer is more complicated than many expect.
The research team — all leading experts in judgment and decision making — compared single-blind (or non-anonymous reviews in which reviewers know the author's identity) with double-blind (or anonymous reviews in which neither side knows the other's identity) for submissions to the 39th Annual Conference of the Society for Judgment and Decision Making, an international organization with over 1,800 scholars. Their work is the first to systematically evaluate the fairness, reliability, and validity of these two systems in a real-world, high-stakes review setting.
The results surprised the researchers. They were more nuanced and, in some cases, less clear-cut than expected. Yet, perhaps for this reason, they are revealing and highly informative, especially for similar organizations in social sciences and business. Anonymous reviews reduced disparities for Asian first authors, but women and early-career applicants scored slightly worse. Differences in reliability and validity between the two systems were minimal, underscoring the amount of "noise" in peer review. "By far the biggest effect we measured was the variability between reviewers, whether or not they were anonymous," said Pleskac. "No one has ever assessed this issue before, and it has real implications for how we design review systems."
Despite these complexities, the team ultimately recommended anonymous review, with adjustments to improve fairness and scientific rigor. One suggestion was to view noise not as a flaw but as an opportunity: integrating anonymous review with a lottery system for choosing among the top submissions. "If the process is inherently noisy, we should leverage that," said Pleskac. "Randomly selecting from high-quality work can foster diversity of ideas and perspectives in science."
The paper, "Blinded versus unblinded review: A field study on the equity of peer-review processes" was published online on August 6, 2025 in the journal of Management Science. And the Society for Judgment and Decision Making has since taken up some of their recommendations, including using anonymous review.
Origins of the project
The idea grew out of years of firsthand experience on review committees within the Society for Judgment and Decision Making.
"When I was SJDM president and serving on the Hillel J. Einhorn Young Investigator Award committee, we moved toward anonymous review and women started winning more often," said Gretchen Chapman. "It's hard to know if that was causal, but it made me wonder whether anonymity could make the process fairer."
Ellie Kyung recalled: "We first removed author names from submissions and later required fully anonymized manuscripts. Around that time, we started seeing more women placing for the award. That's when I began to question why SJDM didn't use anonymous review for all abstracts."
Chapman also noted concerns about institutional prestige: "Some junior scholars told me they never got talks accepted and wondered if it was because they weren't at well-known universities." These conversations, along with debates at society business meetings, convinced the researchers that a carefully designed, NSF-funded experiment was needed.
How the study worked
To compare the outcomes of anonymous and non-anonymous reviews, 112 faculty reviewers evaluated 530 conference submissions. Reviewers were randomly assigned to the anonymous or non-anonymous review system, with each handling about 30 submissions. To assess the reliability of each review process, 53 submissions (10% of the total) were assigned to three additional anonymous and three additional non-anonymous reviewers. Finally, to measure the predictive validity of the review processes, the team enlisted faculty and graduate students to rate each of the conference talks given, and then, six years later, tracked which conference submissions were published and where.
A sample of the findings on fairness
Among the findings, in the anonymous review condition, Asian first authors received higher ratings than in the non-anonymous review, even after controlling for 13 other author and submission characteristics. This pattern aligns with the idea that anonymity can reduce bias related to race or ethnicity. This difference is consistent with previous studies in which Asian applicants were less likely to receive federal grants and fewer callbacks to inquiries about graduate study. Too few authors among the 530 submissions identified as Black, Hispanic, or Native American to draw meaningful conclusions about the acceptance rate of their submissions, highlighting the limited diversity within the field of judgment and decision making and sciences more generally.
Gender, by contrast, showed a different pattern. Male first authors received higher ratings than female first authors in both review systems, with the gap slightly larger in the anonymous condition. This pattern may reflect gender disparities in the mathematically intensive disciplines represented at the conference — such as statistics, computational modeling, and game theory — or the greater value sometimes placed on traditionally male-oriented topics or perspectives. Pleskac noted that non-anonymous reviewers may have sought to counterbalance this gap by including more female-authored work, though more research is needed to confirm this.
A third author characteristic the team found that functioned differently was career stage. Here, the clearest pattern was that submissions with senior coauthors scored higher in the non-anonymous review process. Ph.D. students and research scientists serving as first authors, by contrast, received lower ratings in the anonymous condition than in the non-anonymous condition. According to Pleskac, this may reflect a tendency for non-anonymous reviewers to give more weight to work from senior faculty while also consciously creating opportunities for early-career researchers when their status is visible.
Reliability and validity
When Pleskac and colleagues compared the consistency of the two systems, they found general agreement on which papers were weak overall. But, beyond that, variation was substantial: reviewers agreed on only 40% of the top-rated submissions. Even within each system, two independent reviewer panels agreed on less than 50% of the top choices, underscoring the noisy and inconsistent nature of peer review, especially for the highest-rated work. "That means we could have a totally different conference depending on the reviewers or what we ask them," says Pleskac.
Perhaps most telling, none of the author characteristics — including gender, race, or seniority — predicted the future success of the paper, either in terms of audience reception or eventual publication. Initial review scores only minimally predicted these markers of validity and success.
The Inevitability of Noise
The inevitability of noise within both anonymous and non-anonymous systems led to a relatively novel idea: embracing it through an informed lottery.
"Our results reinforce something we all know which is that peer review is kind of noisy, which means at some level, it's kind of a lottery," said Pleskac. "We can see that people are pretty good at separating the top half of the submissions from the bottom half. So why not take the top half and randomly select among them? That could help bring in ideas and perspectives we might otherwise miss. Science, after all, is about exploration."