Overview of Findings
Meta’s Llama 3 70B, an open-source large language model (LLM), has shown impressive performance in answering multiple-choice questions related to radiology. Research led by Dr. Lisa Adams from the Technical University Munich revealed that Llama 3 70B is on par with leading proprietary models like OpenAI’s GPT-4 and Google DeepMind’s Gemini Ultra. This study emphasizes the capabilities of open-source LLMs, highlighting their potential for privacy and customization while maintaining performance.
Key Research Details
- The study involved testing Llama 3 70B and other models on 50 ACR in-training test questions and 85 additional board-style exam questions.
- Llama 3 70B achieved a performance accuracy of 74% on ACR questions, slightly lower than top proprietary models but significantly better than GPT-3.5 Turbo.
- Limitations were noted, such as the models’ inability to handle broader clinical complexities and the need for more nuanced assessment methods.
- All models, including Llama 3, face challenges like generating unreliable outputs and hallucinations.
Significance of Open-Source LLMs
The findings highlight the growing competitiveness of open-source LLMs in healthcare applications. They offer unique advantages, such as customization of architecture and training data, which can lead to the development of specialized tools for clinical use. As the healthcare sector increasingly relies on AI, the adaptability of models like Llama 3 70B could pave the way for improved decision support systems. The anticipated release of a larger version with 400 billion parameters later this year could further enhance its capabilities.











