(Toronto, May 27, 2026) A new viewpoint article published in JMIR Mental Health warns that artificial intelligence (AI) systems used in mental health settings may inherit and reinforce unreliable human input unless new safeguards are adopted. The paper, titled " When AI Colludes: Clinical Reliability of Training and Preference Data as a Trustworthy-AI Criterion ," calls for the "clinical reliability" of training data to become a core standard for trustworthy AI.
The article explores how large language models, including AI chatbots, are trained using massive amounts of human-written text and feedback. According to author Dr Hina Tahseen, current discussions about AI safety often focus on harms that happen after deployment, such as misleading advice or emotional dependency. Dr Tahseen argues that a major issue may begin much earlier—specifically, during the collection of human-generated training and preference data.
The psychiatric concept of "collusion," described as the uncritical acceptance of an unreliable account, is introduced in the viewpoint as a new way to understand AI behavior. It suggests that AI systems can unintentionally reinforce distorted, inaccurate, or unhealthy information when they are trained to prioritize user approval or unverified human feedback.
"AI safety efforts have focused on what these systems say to users. The prior question is whether the human data they learned from was reliable in the first place. Psychiatry assesses this every day in clinical practice—that expertise should be part of how we build and govern AI systems, not an afterthought," said author Dr Hina Tahseen.
Rather than focusing only on technical fixes, the viewpoint proposes that developers of mental health–related AI systems should include clinical expertise when designing training data, evaluating feedback, and monitoring systems after launch. Existing AI safety methods—such as refusal training, red-teaming, and content monitoring—already address parts of the problem, but they are not specifically designed to assess whether human self-reporting is clinically reliable.
Adding clinical reliability as an explicit AI trust criterion could strengthen safeguards for mental health technologies while helping researchers better understand how AI systems respond to vulnerable users.
Original article: When AI Colludes: Clinical Reliability of Training and Preference Data as a Trustworthy-AI Criterion
URL: https://mental.jmir.org/2026/1/e96894
DOI: 10.2196/96894