Health-care AI is here. We don’t know if it actually helps patients.

Table of Contents

The Rise of the Digital Scribe
Beyond Notes: AI as a Diagnostic Partner
The Hidden Cost of Automation
The Evidence Gap: Why We Don’t Know What Works
The Path Forward: Measuring What Matters
Conclusion: The Human Factor in the Age of AI

The AI Doctor Is In—But Is It Actually Healing Anyone?

Artificial intelligence has quietly slipped into the operating room, the radiology suite, and even the doctor’s office. From transcribing patient visits to flagging high-risk cases in electronic health records, AI is now a fixture in modern healthcare. Yet beneath the buzz of innovation lies a troubling silence: we still don’t know whether these tools are actually making patients healthier.

While studies show that many AI systems can match or even surpass human accuracy in diagnosing diseases or predicting outcomes, the real test—improved patient health—remains largely unproven. As Jenna Wiens, a computer scientist at the University of Michigan, puts it, “We’ve moved from asking if AI can work to how well it works in practice. And the answer is still unclear.”

This gap between technical performance and clinical impact is becoming a defining challenge of the AI-in-healthcare era. Hospitals are adopting AI tools at breakneck speed, often driven by promises of efficiency and cost savings. But without rigorous evaluation of long-term outcomes, we risk building a healthcare system that’s faster, smarter, and more automated—but not necessarily better.

The Rise of the Digital Scribe

One of the most visible AI applications in medicine today is the “ambient AI” scribe—software that listens to doctor-patient conversations and automatically generates clinical notes. Tools like Nuance’s DAX Copilot and Abridge have been rapidly adopted across major health systems, including NewYork-Presbyterian and Kaiser Permanente.

These AI scribes promise to free physicians from the drudgery of documentation, a task that consumes nearly half of their workday. Early feedback from clinicians is overwhelmingly positive. A nurse practitioner at a New York medical center described the technology as “a game-changer,” allowing her to maintain eye contact with patients instead of staring at a screen.

But while ambient AI may reduce burnout and improve workflow, its impact on actual patient outcomes remains anecdotal. Does less time spent on paperwork lead to fewer diagnostic errors? Do patients receive more personalized care when doctors aren’t buried in charts? These questions remain unanswered.

📊By The Numbers

A 2023 study found that physicians spend an average of 16 minutes documenting each patient visit—time that could otherwise be spent on diagnosis, counseling, or follow-up. AI scribes can cut this time by up to 70%, but no large-scale trials have yet measured whether this translates into better health for patients.

The technology isn’t flawless either. Misinterpretations of medical jargon, accents, or overlapping speech can lead to inaccurate notes. In one documented case, an AI scribe misheard “no family history of cancer” as “family history of cancer,” potentially altering a patient’s risk profile. Such errors underscore the need for human oversight—and rigorous evaluation.

Beyond Notes: AI as a Diagnostic Partner

Ambient scribes are just the beginning. AI is now being used to analyze medical images, predict disease progression, and even recommend treatment plans. In radiology, algorithms can detect lung nodules in CT scans with accuracy rivaling that of experienced radiologists. In cardiology, AI models analyze ECGs to predict the risk of heart failure months in advance.

These tools are often marketed as “decision support systems”—designed to augment, not replace, clinicians. But their integration into clinical workflows raises new questions. Can a doctor trust an AI’s recommendation when it contradicts their own judgment? And more importantly, does following that recommendation lead to better outcomes?

Take the case of an AI system used to predict sepsis, a life-threatening response to infection. One hospital reported a 18% reduction in mortality after implementing the tool. But a follow-up analysis revealed that the improvement might have been due to increased awareness and earlier interventions—not the AI itself. The tool may have simply acted as a catalyst for better care, rather than a direct cause of improved survival.

📊By The Numbers

In 2022, an AI model developed by Google Health correctly identified 5.8% more breast cancer cases in mammograms than human radiologists, while also reducing false positives by 9.4%. Yet, despite such impressive results, the system has not yet been shown to improve long-term survival rates.

This distinction—between accuracy and impact—is critical. An AI tool can be 95% accurate at detecting a condition, but if it doesn’t lead to earlier treatment, reduced complications, or longer life, its value is limited.

The Hidden Cost of Automation

As AI takes over routine tasks, there’s a risk that clinicians may become overly reliant on algorithmic guidance. This phenomenon, known as “automation bias,” occurs when humans defer to machines even when they’re wrong. In one study, radiologists using an AI assistant were more likely to accept incorrect diagnoses when the AI suggested them—even when the evidence was ambiguous.

Moreover, AI systems are often trained on data from specific populations, which can lead to biased or inaccurate predictions for underrepresented groups. For example, a skin cancer detection algorithm trained primarily on light-skinned patients may miss melanomas in darker skin tones. Such disparities can exacerbate existing health inequities.

🤯Amazing Fact

Health Fact: A 2021 study found that an AI algorithm used to manage care for millions of patients was twice as likely to recommend additional care for white patients than for Black patients with the same level of illness—due to biased training data that equated health spending with need.

These issues highlight a broader concern: AI may optimize for efficiency and cost, but not necessarily for equity or patient well-being. When hospitals adopt AI tools without evaluating their real-world impact, they risk prioritizing speed over safety, and automation over compassion.

The Evidence Gap: Why We Don’t Know What Works

Despite the rapid adoption of AI in healthcare, there is a striking lack of high-quality evidence on its effectiveness. Most studies focus on technical performance—how well an algorithm detects a disease in a controlled setting—rather than clinical outcomes like survival, recovery time, or quality of life.

Wiens and Goldenberg argue in their Nature Medicine paper that the field needs more randomized controlled trials (RCTs)—the gold standard in medical research—to determine whether AI tools actually improve patient health. But such trials are expensive, time-consuming, and difficult to design for complex, evolving technologies.

📊By The Numbers

Fewer than 5% of AI health tools have undergone rigorous clinical trials measuring patient outcomes.

Over 600 AI-based medical devices have been approved by the FDA, but most were cleared through expedited pathways that don’t require proof of clinical benefit.

A 2023 review found that only 12% of published AI studies included real-world patient data.

The average time from AI prototype to clinical deployment is less than 18 months, far shorter than the 10+ years typically needed for drug development.

This evidence gap is particularly troubling given the stakes. Unlike a new smartphone app, a flawed medical AI tool can lead to misdiagnoses, delayed treatments, or even death.

The Path Forward: Measuring What Matters

To bridge this gap, experts are calling for a new approach to AI evaluation—one that prioritizes patient outcomes over technical benchmarks. This means designing studies that track real patients over time, comparing those treated with AI assistance to those receiving standard care.

Some institutions are already leading the way. The University of Pennsylvania, for example, launched a randomized trial of an AI sepsis prediction tool, measuring not just detection rates but also mortality and length of hospital stay. Early results suggest the AI group had a 15% lower mortality rate, though longer follow-up is needed.

Regulators are also stepping up. The FDA has begun requiring more transparency from AI developers, including details on training data and performance across demographic groups. The European Union’s AI Act classifies most medical AI as “high-risk,” mandating strict oversight and ongoing monitoring.

🤯Amazing Fact

Historical Fact: The first AI system used in medicine was MYCIN, developed in the 1970s to diagnose bacterial infections. Though it matched expert physicians in accuracy, it was never widely adopted—partly because it didn’t improve patient outcomes and partly because doctors didn’t trust it. Sound familiar?

Ultimately, the goal isn’t to reject AI, but to ensure it serves patients, not just algorithms. That means slowing down, asking hard questions, and demanding proof—not just promise.

Conclusion: The Human Factor in the Age of AI

AI has the potential to revolutionize healthcare, but its true value will be measured not in accuracy scores or processing speed, but in healthier patients and stronger doctor-patient relationships. As we stand at the crossroads of innovation and evidence, one thing is clear: technology alone cannot heal. It must be guided by compassion, rigor, and a relentless focus on what truly matters—people.

The AI doctor is here. Now we must ask: Is it making us better? And if not, what are we willing to change to make sure it does?

This article was curated from Health-care AI is here. We don’t know if it actually helps patients. via MIT Technology Review

Discover more from GTFyi.com

Subscribe to get the latest posts sent to your email.

Search GTFYI

Health-care AI is here. We don’t know if it actually helps patients.

The Rise of the Digital Scribe

Beyond Notes: AI as a Diagnostic Partner

The Hidden Cost of Automation

The Evidence Gap: Why We Don’t Know What Works

The Path Forward: Measuring What Matters

Conclusion: The Human Factor in the Age of AI

Like this:

Related

Discover more from GTFyi.com

Alex Hayes

Leave a Reply Cancel reply

Search GTFYI

Health-care AI is here. We don’t know if it actually helps patients.

The Rise of the Digital Scribe

Beyond Notes: AI as a Diagnostic Partner

The Hidden Cost of Automation

The Evidence Gap: Why We Don’t Know What Works

The Path Forward: Measuring What Matters

Conclusion: The Human Factor in the Age of AI

Share this:

Like this:

Related

Discover more from GTFyi.com

Alex Hayes

Leave a Reply Cancel reply

Discover more from GTFyi.com