“The suicide guardrail failure was the most alarming,” study co-author Girish N. Nadkarni, M.D., chief AI officer of the Mount Sinai Health System, told Fox News Digital.
ChatGPT Health is designed to show a crisis intervention banner when someone describes thoughts of self-harm, the researcher noted.
OpenAI launched ChatGPT Health, the medical-focused version of the popular chatbot tool, in January 2026. (Gabby Jones/Bloomberg via Getty Images)
“We tested it with a 27-year-old patient who said he’d been thinking about taking a lot of pills,” Nadkarni said. “When he described his symptoms alone, the banner appeared 100% of the time. Then we added normal lab results — same patient, same words, same severity — and the banner vanished.”
“A safety feature that works perfectly in one context and completely fails in a nearly identical context … is a fundamental safety problem.”
CHATGPT HEALTH PROMISES PRIVACY FOR HEALTH CONVERSATIONS
The researchers were also surprised by the social influence aspect.
“When a family member in the scenario said ‘it’s nothing serious’ — which happens all the time in real life — the system became nearly 12 times more likely to downplay the patient’s symptoms,” Nadkarni said. “Everyone has a spouse or parent who tells them they’re overreacting. The AI shouldn’t be agreeing with them during a potential emergency.”
Fox News Digital reached out to Open AI, creator of ChatGPT, requesting comment.
Dr. Marc Siegel, Fox News senior medical analyst, called the new study “important.”
“It underlines the principle that while large language models can triage clear-cut emergencies, they have much more trouble with nuanced situations,” Siegel, who was not involved in the study, told Fox News Digital.
ChatGPT and other LLMs can be helpful tools, a doctor said, but they “should not be used to give medical direction.” (iStock)
“This is where doctors and clinical judgment come in — knowing the nuances of a patient’s history and how they report symptoms and their approach to health.”
ChatGPT and other LLMs can be helpful tools, Siegel said, but they “should not be used to give medical direction.”
“Machine learning and continued input of data can help, but will never compensate for the essential problem – human judgment is needed to decide whether something is a true emergency or not.”
BREAKTHROUGH BLOOD TEST COULD SPOT DOZENS OF CANCERS BEFORE SYMPTOMS APPEAR
Dr. Harvey Castro, an emergency physician and AI expert in Texas, echoed the importance of the study, calling it “exactly the kind of independent safety evaluation we need.”
“Innovation moves fast. Oversight has to move just as fast,” Castro, who also did not work on the study, told Fox News Digital. “In healthcare, the most dangerous mistakes happen at the extremes, when something looks mild but is actually catastrophic. That’s where clinical judgment matters most, and where AI must be stress-tested.”
The researchers acknowledged some potential limitations in the study design.
“We used physician-written clinical scenarios rather than real patient conversations, and we tested at a single point in time — these systems update frequently, so performance may change,” Ramaswamy told Fox News Digital.
CLICK HERE FOR MORE HEALTH STORIES
Additionally, most of the missed emergencies happened in situations where the danger depended on how the condition was changing over time. It’s not clear whether the same problem would happen with acute medical emergencies.
Because the system had to choose just one fixed urgency category, the test may not reflect the more nuanced advice it might give in a back-and-forth conversation, the researchers noted.
ChatGPT Health is designed to show a crisis intervention banner when someone describes thoughts of self-harm. (iStock)
Also, the study wasn’t large enough to confidently detect small differences in how recommendations might vary by race or gender.
“We need continuous auditing, not one-time studies,” Castro noted. “These systems update frequently, so evaluation must be ongoing.”
The researchers emphasized the importance of seeking immediate care for serious issues.
CLICK HERE TO SIGN UP FOR OUR HEALTH NEWSLETTER
“If something feels seriously wrong — chest pain, difficulty breathing, a severe allergic reaction, thoughts of self-harm — go to the emergency department or call 988,” Ramaswamy advised. “Don’t wait for an AI to tell you it’s OK.”
The researchers noted that they support the use of AI to improve healthcare access, and that they didn’t conduct the study to “tear down the technology.”
CLICK HERE TO DOWNLOAD THE FOX NEWS APP
“These tools can be genuinely useful for the right things — understanding a diagnosis you’ve already received, looking up what your medications do and their side effects, or getting answers to questions that didn’t get fully addressed in a short doctor’s visit,” Ramaswamy said.
“That’s a very different use case from deciding whether you need emergency care. Treat them as a complement to your doctor, not a replacement.”
“This study doesn’t mean we abandon AI in healthcare.”
Melissa Rudy is senior health editor and a member of the lifestyle team at Fox News Digital. Story tips can be sent to melissa.rudy@fox.com.
Article source: https://www.foxnews.com/health/chatgpt-could-miss-your-serious-medical-emergency-new-study-suggests