OpenAI Introduces HealthBench for AI Model Evaluation in Healthcare

May 14, 2025

OpenAI has launched HealthBench, an open-source dataset designed to benchmark AI models in healthcare, featuring contributions from 262 physicians across 60 countries.

OpenAI Introduces HealthBench for AI Model Evaluation in Healthcare

Image: OpenAI

OpenAI has launched HealthBench, an open-source dataset aimed at benchmarking AI models in the healthcare sector. This initiative, detailed in a company blog post, involves collaboration with 262 physicians from 60 countries and includes 5,000 realistic health conversations. The dataset is designed to evaluate whether AI models provide optimal responses to health-related inquiries, using a physician-written rubric scored by GPT-4.1.

HealthBench's evaluation process highlights OpenAI's o3 reasoning model as the top performer with a score of 60%, followed by Elon Musk's Grok at 54% and Google's Gemini 2.5 Pro at 52%. The dataset supports 49 languages and covers 26 medical specialties, such as neurological surgery and ophthalmology.

An example scenario provided by OpenAI involves a 70-year-old unresponsive individual, where the AI model suggests steps like calling emergency services and checking airways. HealthBench scores the response, offering insights into the model's accuracy and areas for improvement, with a sample score of 77% for the scenario.

We hope you enjoyed this article.

Consider subscribing to one of our newsletters like Life AI Weekly or Daily AI Brief.

Also, consider following us on social media:

Healthcare & Life Sciences AI AI Brief AI Brief (X)

More from: Life Sciences

08/18

Residex.AI Acquires Kevala to Boost AI Workforce Management in Senior Care

08/18

MedWellAi Partners with Trump Mobile and BrighterMD for AI-Enhanced Telemedicine Platform

08/16

OpenEvidence AI Scores Perfect 100% on USMLE

08/16

Ryght AI and Biorasi Partner to Enhance Clinical Trial Feasibility

08/15

MIT Researchers Use AI to Design New Antibiotics for Drug-Resistant Bacteria

Subscribe to Life AI Weekly

Weekly coverage of AI applications in healthcare, drug development, biotechnology research, and genomics breakthroughs.

Market report

AI’s Time-to-Market Quagmire: Why Enterprises Struggle to Scale AI Innovation

ModelOp

The 2025 AI Governance Benchmark Report by ModelOp provides insights from 100 senior AI and data leaders across various industries, highlighting the challenges enterprises face in scaling AI initiatives. The report emphasizes the importance of AI governance and automation in overcoming fragmented systems and inconsistent practices, showcasing how early adoption correlates with faster deployment and stronger ROI.

Categories

Companies

Resources

OpenAI Introduces HealthBench for AI Model Evaluation in Healthcare

We hope you enjoyed this article.

More from: Life Sciences

Subscribe to Life AI Weekly

Market report

AI’s Time-to-Market Quagmire: Why Enterprises Struggle to Scale AI Innovation

You May Also Like

Atlas Meditech's ATLAS AI Outperforms Clinicians in Diagnoses

OpenAI Introduces GPT-5 with Enhanced Capabilities

OpenEvidence AI Scores Perfect 100% on USMLE

OpenAI Enhances ChatGPT with Mental Health Guardrails

OpenAI Releases GPT-OSS Models for Laptops

OpenAI's Red-Teaming Challenge for GPT-OSS-20B

OpenAI Prepares for GPT-5 Launch Amid Potential Capacity Challenges

Citizen Health Secures $30 Million for AI Patient Advocate

OpenAI Reaches $12 Billion in Annualized Revenue with ChatGPT Growth

Briya Launches AI Platform for Medical Research

G42 Launches OpenAI GPT-OSS on Core42's AI Cloud

OpenAI Plans Trillion-Dollar Infrastructure Investment