Half of AI health answers are wrong even though they sound convincing – new study

by The Conversation

12 May 2026

Why chatbots get things wrong

There’s a simple reason why chatbots get medical answers wrong. Language models do not know things. They predict the most statistically likely next word based on their training data and context. They do not weigh evidence or make value judgments. Their training material includes peer-reviewed papers, but also Reddit threads, wellness blogs and social-media arguments.

The researchers did not ask neutral questions. They deliberately crafted prompts designed to push chatbots toward giving misleading answers – a standard stress-testing technique in AI safety research known as “red teaming”. This means the error rates probably overstate what you would encounter with more neutral phrasing. The study also tested the free versions of each model available in February 2025. Paid tiers and newer releases may perform better.

Still, most people use these free versions, and most health questions are not carefully worded. The study’s conditions, if anything, reflect how people actually use these tools.

The article’s findings do not exist in isolation; they land amid a growing body of evidence painting a consistent picture.

A February 2026 study in Nature Medicine showed something surprising. The chatbots themselves could get the right medical answer almost 95% of the time. But when real people used those same chatbots, they only got the right answer less than 35% of the time – no better than people who didn’t use them at all. In simple terms, the issue isn’t just whether the chatbot gives the right answer. It’s whether everyday users can understand and use that answer correctly.

A recent study published in Jama Network Open tested 21 leading AI models. The researchers asked them to work out possible medical diagnoses. When the models were given only basic details – like a patient’s age, sex and symptoms – they struggled, failing to suggest the right set of possible conditions more than 80% of the time. Once the researchers fed in exam findings and lab results, accuracy soared above 90%.

Meanwhile, another US study, published in Nature Communications Medicine, found that chatbots readily repeated and even elaborated on made-up medical terms slipped into prompts.

Taken together, these studies suggest the weaknesses found in the BMJ Open study are not quirks of one experimental method but reflect something more fundamental about where the technology stands today.

These chatbots are not going away, nor should they. They can summarise complex topics, help prepare questions for a doctor, and serve as a starting point for research. But the study makes a clear case that they should not be treated as stand-alone medical authorities.

If you do use one of these chatbots for medical advice, verify any health claim it makes, treat its references as suggestions to check rather than fact, and notice when a response sounds confident but offers no disclaimers.

Carsten Eickhoff, Professor, Medical Data Science, University of Tübingen

This article is republished from The Conversation under a Creative Commons license. Read the original article.

The Conversation is an independent source of news and views, sourced from the academic and research community and delivered direct to the public.

AUAI is supported by:

Half of AI health answers are wrong even though they sound convincing – new study

Why chatbots get things wrong

Related posts :

Pre-training isn’t bitter enough

Interview with Thi Kieu Khanh Ho: Time-series anomaly detection

#RoboCup2026 social media round-up

Congratulations to the 2026 EurAI distinguished service award winners

#RoboCup2026 – humanoid league knockout stages

#RoboCup2026 – humanoid league day 2

#RoboCup2026 – humanoid league day 1

Adaptive parallel reasoning: the next paradigm in efficient inference scaling

↑