actAVA.ai Releases CHI-Bench to Test AI Agents on Healthcare Workflows

May 21, 2026

actAVA.ai has introduced CHI-Bench, a benchmark evaluating AI agents from companies like Anthropic, OpenAI, and Google across 75 healthcare workflows. The top-performing agent succeeded in only 28% of cases.

actAVA.ai has released CHI-Bench, a new benchmark designed to evaluate AI agents across real U.S. healthcare workflows, according to a press release. The benchmark tested 30 advanced agents from Anthropic, OpenAI, Google, x.AI, DeepSeek, and Z.ai across 75 workflows, finding that even the best system failed about 72% of cases.

Each CHI-Bench test runs an agent through 60 to 80 steps across multiple clinical stages, covering processes like intake, review, and authorization. The system evaluates every step and artifact using deterministic tests and an LLM-based judge to check evidence grounding, consent, and consistency.

Anthropic’s Claude Code with Opus 4.6 achieved the highest score with a 28% pass rate, while OpenAI’s Codex with GPT-5.5 followed at 21%. Performance varied by domain, with utilization review reaching 41% and care management 32%. Reliability remained low, as no agent passed more than 20% of repeated cases.

CHI-Bench was developed in collaboration with over 20 institutions, including Johns Hopkins, Stanford, and Oxford. The benchmark is open under the Apache 2.0 license on GitHub, and a public leaderboard is now accepting community submissions.

We hope you enjoyed this article.

Consider subscribing to one of our newsletters like Enterprise AI Brief, Life AI Weekly or Daily AI Brief.

Also, consider following us on social media:

Enterprise AI Healthcare & Life Sciences AI AI Brief AI Brief (X)

More from: Enterprise

07/06

METiS TechBio Grants Global License for AI Designed T Cell Engager to Boulevard Bio

07/06

Orqa and Remote Robotic Systems Sign $150M Deal to Expand Drone and AI Production in Canada

07/06

Sabanto and Verdant Robotics Integrate Autonomous Tractor and Precision Application Systems

07/06

HiLabs and Harvard Researchers Partner to Study Ghost Networks in Medicare Advantage

07/06

InterVenn and Aranscia Expand Partnership for GlycoKnow Ovarian Cancer Test

More from: Life Sciences

07/06

METiS TechBio Grants Global License for AI Designed T Cell Engager to Boulevard Bio

07/06

Marley Health Launches Clinical Intelligence Platform for Veterinary Care

07/06

CollPlant Biotechnologies Announces $2.6 Million Private Placement

07/06

HiLabs and Harvard Researchers Partner to Study Ghost Networks in Medicare Advantage

07/06

InterVenn and Aranscia Expand Partnership for GlycoKnow Ovarian Cancer Test

Subscribe to Life AI Weekly

Weekly coverage of AI applications in healthcare, drug development, biotechnology research, and genomics breakthroughs.

Market report

AI’s Time-to-Market Quagmire: Why Enterprises Struggle to Scale AI Innovation

ModelOp

The 2025 AI Governance Benchmark Report by ModelOp provides insights from 100 senior AI and data leaders across various industries, highlighting the challenges enterprises face in scaling AI initiatives. The report emphasizes the importance of AI governance and automation in overcoming fragmented systems and inconsistent practices, showcasing how early adoption correlates with faster deployment and stronger ROI.

Categories

Companies

Resources

actAVA.ai Releases CHI-Bench to Test AI Agents on Healthcare Workflows

We hope you enjoyed this article.

More from: Enterprise

More from: Life Sciences

Subscribe to Life AI Weekly

Market report

AI’s Time-to-Market Quagmire: Why Enterprises Struggle to Scale AI Innovation

You May Also Like

Florence Healthcare Expands AI Agent Access Across Global Clinical Network

Clairyon AI Agent Shown to Improve Sepsis Care Compliance in JAMA Study

MGI Tech and Shanghai AI Lab Introduce Physical AI Systems for Life Sciences

dev.fun Launches Poker Arena Benchmark for AI Agent Reasoning

One Brooklyn Health Selects hellocare.ai for AI Assisted Virtual Care Across Hospitals

Vali Health Raises $6 Million to Build AI Infrastructure for Home Care

Intelligent Contacts Launches Grace AI Collection Agent for General Use

Quantified Introduces AI Sales Coaching Platform for Life Sciences Teams

Markup AI Launches Content Guardian Agents for Marketing Teams

Children’s Hospital of Philadelphia Develops AI Tool to Guide Genetic Testing for Rare Diseases

Zensar Technologies Introduces ZenseAI.AgentMesh for Enterprise AI Agents

PointClickCare Adds AI Tools for Skilled Nursing Facility Discharge Planning