Google Gemini and Bard artificial intelligence chatbot performance in ophthalmology knowledge assessment
Purpose
With the popularization of ChatGPT (Open AI, San Francisco, California, United States) in recent months, understanding the potential of artificial intelligence (AI) chatbots in a medical context is important. Our study aims to evaluate Google Gemini and Bard’s (Google, Mountain View, California, United States) knowledge in ophthalmology.
Methods
In this study, we evaluated Google Gemini and Bard’s performance on EyeQuiz, a platform containing ophthalmology board certification examination practice questions, when used from the United States (US). Accuracy, response length, response time, and provision of explanations were evaluated. Subspecialty-specific performance was noted. A secondary analysis was conducted using Bard from Vietnam, and Gemini from Vietnam, Brazil, and the Netherlands.
Results
Overall, Google Gemini and Bard both had accuracies of 71% across 150 text-based multiple-choice questions. The secondary analysis revealed an accuracy of 67% using Bard from Vietnam, with 32 questions (21%) answered differently than when using Bard from the US. Moreover, the Vietnam version of Gemini achieved an accuracy of 74%, with 23 (15%) answered differently than the US version of Gemini. While the Brazil (68%) and Netherlands (65%) versions of Gemini performed slightly worse than the US version, differences in performance across the various country-specific versions of Bard and Gemini were not statistically significant.
Conclusion
Google Gemini and Bard had an acceptable performance in responding to ophthalmology board examination practice questions. Subtle variability was noted in the performance of the chatbots across different countries. The chatbots also tended to provide a confident explanation even when providing an incorrect answer.