OpenAI Introduces HealthBench for AI Model Evaluation in Healthcare

OpenAI Introduces HealthBench for AI Model Evaluation in Healthcare

Image: OpenAI
OpenAI has launched HealthBench, an open-source dataset designed to benchmark AI models in healthcare, featuring contributions from 262 physicians across 60 countries.

OpenAI has launched HealthBench, an open-source dataset aimed at benchmarking AI models in the healthcare sector. This initiative, detailed in a company blog post, involves collaboration with 262 physicians from 60 countries and includes 5,000 realistic health conversations. The dataset is designed to evaluate whether AI models provide optimal responses to health-related inquiries, using a physician-written rubric scored by GPT-4.1.

HealthBench's evaluation process highlights OpenAI's o3 reasoning model as the top performer with a score of 60%, followed by Elon Musk's Grok at 54% and Google's Gemini 2.5 Pro at 52%. The dataset supports 49 languages and covers 26 medical specialties, such as neurological surgery and ophthalmology.

An example scenario provided by OpenAI involves a 70-year-old unresponsive individual, where the AI model suggests steps like calling emergency services and checking airways. HealthBench scores the response, offering insights into the model's accuracy and areas for improvement, with a sample score of 77% for the scenario.

We hope you enjoyed this article.

Consider subscribing to one of several newsletters we publish. For example, in the Daily AI Brief you can read the most up to date AI news round-up 6 days per week.

Also, consider following us on social media:

Subscribe to Daily AI Brief

Daily report covering major AI developments and industry news, with both top stories and complete market updates

Market report

AI’s Time-to-Market Quagmire: Why Enterprises Struggle to Scale AI Innovation

ModelOp

The 2025 AI Governance Benchmark Report by ModelOp provides insights from 100 senior AI and data leaders across various industries, highlighting the challenges enterprises face in scaling AI initiatives. The report emphasizes the importance of AI governance and automation in overcoming fragmented systems and inconsistent practices, showcasing how early adoption correlates with faster deployment and stronger ROI.

Read more