Pharmacy Students Triumph Over AI in Contest

University of Arizona Health Sciences

TUCSON, Ariz. — Students pursuing a Doctor of Pharmacy degree routinely take – and pass – rigorous exams to prove competency in several areas. Can ChatGPT accurately answer the same questions? A new study by University of Arizona R. Ken Coit College of Pharmacy researchers said no, it can't.

Researchers found that ChatGPT 3.5, a form of artificial intelligence, fared worse than PharmD students in answering questions on therapeutics examinations that ensure students have the knowledge, skills, and critical thinking abilities to provide safe, effective and patient-centered care.

ChatGPT was less likely to correctly answer application-based questions (44%) compared with questions focused on recall of facts (80%). It also was less likely to answer case-based questions correctly (45%) compared with questions that weren't focused on patient cases (74%). Overall, ChatGPT answered only 51% of the questions correctly.

The results provide additional insights into the uses and limitations of the technology and may also prove valuable in the development of pharmacy exam questions. The study findings appear in Currents in Pharmacy Teaching and Learning .

"AI has many potential uses in health care and education, and it's not going away," said Christopher Edwards, PharmD, an associate clinical professor of pharmacy practice and science. "One of the things we were hoping to answer with the study was if students wanted to use AI on an exam, how would they perform? I wanted to have data to show the students and tell them they can do well in the exams by studying hard and they don't necessarily need these tools."

A secondary goal was to find out what kinds of questions AI would struggle with. Coit College of Pharmacy Interim Dean Brian Erstad, PharmD, wasn't surprised that ChatGPT did better with straightforward multiple choice and true-false questions and was less successful with application-based questions.

"The kinds of places where evidence is limited and judgment is required, which is often in a clinical setting, was where we found the technology somewhat lacking," he said. "Ironically those are the kinds of questions clinicians are always facing."

Edwards, Erstad, and Bernadette Cornelison, PharmD, an associate professor of pharmacy practice and science, evaluated answers to 210 questions from six exams in two pharmacotherapeutics courses that are part of the university's Coit College of Pharmacy PharmD program.

The questions came from a first-year PharmD course focused on disorders related to nonprescription medications for heartburn, diarrhea, atopic dermatitis, cold and allergies. The other class was a second-year course that covered cardiology, neurology and critical care topics.

To compare the exam performances of pharmacy students and ChatGPT, they calculated mean composite scores as a measure of the ability to correctly answer questions. For ChatGPT, they added individual scores for each exam and divided by the number of exams. To figure out the mean composite score for the students, they divided the sum of the mean class performance on each exam by the number of exams. The mean composite score for six exams for ChatGPT was 53 compared to 82 for pharmacy students.

Educators, clinicians and others continue to debate the value of AI large language models, such as ChatGPT, in academic medicine. While such models will continue to play a range of roles in health care, pharmacy practice and other areas, many are concerned that relying too much on the technology could hamper the development of needed reasoning and critical thinking skills in students.

Both Erstad and Edwards acknowledged that in time, newer and more advanced technology may change these results.

/Public Release. This material from the originating organization/author(s) might be of the point-in-time nature, and edited for clarity, style and length. Mirage.News does not take institutional positions or sides, and all views, positions, and conclusions expressed herein are solely those of the author(s).View in full here.

You might also like