OpenAI’s ChatGPT 4.0 Passes Clinical Neurology Exam with 85% Accuracy
OpenAI’s latest update of its large language model (LLM), ChatGPT 4.0, has successfully completed a clinical neurology exam with an 85% accuracy rate in a proof-of-concept study. Researchers believe that with further improvements, LLMs could have significant applications in clinical neurology.
Experiment Details and Results
A group of researchers from the University Hospital Heidelberg and the German Cancer Research Center conducted the experiment, which took place on May 31. The researchers used a bank of questions for a neurology exam from the American Board of Psychiatry and Neurology, supplemented with a few questions from the European Board for Neurology.
The previous version, ChatGPT 3.5, scored 66.8% by correctly answering 1306 out of 1956 questions. However, the newer model, ChatGPT 4.0, achieved an impressive 85% accuracy with 1662 correct answers. The average human score was 73.8%. ChatGPT 4.0 outperformed humans in behavioral, cognitive, and psychological-related questions, effectively passing the neurology exam.
Recommendations for Clinical Neurology Applications
The researchers believe that with further refinements, large language models like ChatGPT could find significant applications in clinical neurology. However, they acknowledge that these models still struggle with higher-order thinking tasks compared to lower-order thinking tasks.
“These findings suggest that with further refinements, large language models could have significant applications in clinical neurology.”
Cautions and Fine-tuning Needed
While there is potential for using LLMs in documentation and decision-making support systems, the researchers caution neurologists about their usage in practice. They emphasize that further development and specific fine-tuning of LLMs are necessary to make them suitable for clinical neurology.
Dr. Varun Venkataramani, one of the study’s authors, stated that the study serves as a proof of concept for the capabilities of LLMs and highlights the need for further development and fine-tuning.
AI’s Role in Healthcare
AI is already making strides in healthcare, such as aiding in cancer research and combating overprescription of antibiotics. The successful performance of ChatGPT 4.0 in the neurology exam showcases the potential of LLMs in advancing medical applications.
Hot Take: Large Language Models Show Promise in Clinical Neurology
OpenAI’s ChatGPT 4.0 has demonstrated impressive accuracy in a clinical neurology exam, outperforming human participants in various question categories. While there is still room for improvement, these results indicate the significant potential of large language models in clinical neurology. With further refinements and fine-tuning, LLMs could become valuable tools for documentation and decision-making support systems within the field. However, caution is advised when implementing these models in practice, particularly for tasks that require higher-order thinking. The study serves as a proof of concept and highlights the need for continued development to optimize LLMs for clinical neurology applications.