A new study led by Dr. Vadim Axelrod, of the Gonda (Goldschmied) Multidisciplinary Brain Research Center at Bar-Ilan University, has revealed serious concerns about the quality of data collected on Amazon Mechanical Turk's (MTurk) — a platform widely used for behavioral and psychological research.
MTurk, an online crowdsourcing marketplace where individuals complete small tasks for payment, has served as a key resource for researchers for over 15 years. Despite previous concerns about participant quality, the platform remains popular within the academic community. Dr. Axelrod's team set out to rigorously assess the current quality of data produced by MTurk participants.
The study, involving over 1,300 participants across main and replication experiments, employed a straightforward but powerful method: repeating identical questionnaire items to measure response consistency. "If a participant is reliable, their answers to repeated questions should be consistent," added Dr. Axelrod. In addition, the study included different types of "attentional catch" questions that should be easily answered by any attentive respondent.
The findings, just published by Royal Society Open Science , were stark: the majority of participants from MTurk's general worker pool failed the attention checks and demonstrated highly inconsistent responses, even when the sample was limited to users with a 95% or higher approval rating.
"It's hard to trust the data of someone who claims a runner isn't tired after completing a marathon in extremely hot weather or that a cancer diagnosis would make someone glad," Dr. Axelrod noted. "The participants did not lack the knowledge to answer such attentional catch questions — they just weren't paying sufficient attention. The implication is that their responses to the main questionnaire may be equally random."
By contrast, Amazon's elite "Master" workers — selected by Amazon based on high performance across previous tasks — consistently produced high-quality data. The authors recommend using Master workers for future research, taking into consideration that these participants are much more experienced and far fewer in number.
"Reliable data is the foundation of any empirical science," said Dr. Axelrod. "Researchers need to be fully informed about the reliability of their participant pool. Our findings suggest that caution is warranted when using MTurk's general pool for behavioral research."