The integration of large language models (LLMs) and large multimodal models (LMMs) into medical settings is becoming increasingly prevalent, but a recent study by researchers at the University of California at Santa Cruz and Carnegie Mellon University raises serious concerns about their reliability in high-stakes, real-world scenarios. The study reveals that even advanced models, including GPT-4V and Gemini Pro, perform poorly when asked to identify conditions and positions in medical images, with accuracy dropping by an average of 42% across tested models. The researchers introduced a new dataset, ProbMed, which features 6,303 images from two widely-used biomedical datasets, and subjected seven state-of-the-art models to probing evaluation. The results are alarming, with even the most robust models experiencing a minimum drop of 10.52% in accuracy. The study highlights the urgent need for more robust evaluation methodologies to ensure the accuracy and reliability of LMMs in real-world medical applications.

Source.

TOP STORIES

Nvidia's AI Revolution - The Vera Rubin Platform and Future Demand
Nvidia’s Vera Rubin platform is set to revolutionize AI inference with unmatched performance …
Tim Cook's Departure - A Strategic Shift in Apple's AI Landscape
Apple’s leadership transition highlights a strategic focus on silicon for AI innovation …
New Tennessee Law on AI and Mental Health - A Step Forward or Backward?
Tennessee’s new law restricts AI claims in mental health but may create loopholes …
The Evolving Risks of AI - From Chatbots to Cyber Threats
Experts warn that as AI evolves, the risks it poses are becoming more serious and complex …
China's New AI Companion Rules Shape a $30B Market Landscape
China sets new regulations for AI companions, impacting a booming market …
Anthropic's Ongoing Dialogue with Trump Administration Amid Pentagon Tensions
Anthropic continues to engage with the Trump administration despite Pentagon tensions …

latest stories