Generative AI for clinical notes has limitations, new studies show
After stratospheric levels of hype, early evidence may be bringing generative artificial intelligence down to Earth.
A series of recent research papers by academic hospitals has revealed significant limitations of large language models (LLMs) in medical settings, undercutting common industry talking points that they will save time and money, and soon liberate clinicians from the drudgery of documentation.
Just in the past week, a study at the University of California, San Diego found that use of an LLM to reply to patient messages did not save clinicians time; another at Mount Sinai found that popular LLMs are lousy at mapping patients’ illnesses to diagnostic codes; and still another at Mass General Brigham found that an LLM made safety errors in responding to simulated questions from cancer patients. One reply was potentially lethal.