Logical Intelligence’s Aleph Leads Formal Verification Benchmarks

May 21, 2026

Logical Intelligence announced that its AI coding agent Aleph achieved top scores across four major formal reasoning benchmarks, marking a step toward verified code generation for mission critical software.

Logical Intelligence announced in a press release that its AI coding agent Aleph achieved leading results on four formal reasoning benchmarks, including PutnamBench, VeriSoftBench, LeanEval, and Verina. The company stated that Aleph’s performance demonstrates that formally verified code generation is now practical for critical infrastructure software.

Aleph solved 99.4 percent of the PutnamBench problem set, outperforming ByteDance’s Seed-Prover 1.5 at 86 percent and Hilbert at 69 percent. On VeriSoftBench, which measures real world software verification, Aleph reached 94 percent success, ahead of Harmonic’s Aristotle at 69 percent and Google Gemini-3 Pro at 65 percent. It also achieved top results on LeanEval and a perfect score on Verina, which was independently confirmed by benchmark authors.

The company said Aleph operates in environments requiring machine checkable proofs rather than probabilistic outputs. It is already being used in production verification workflows, including work involving the Ethereum Foundation’s ArkLib cryptographic libraries. Logical Intelligence plans to open a beta program for Aleph later this year.

Aleph automates formal verification and produces proofs that ensure critical logic functions correctly across all execution paths. The agent is intended for operators of infrastructure and safety sensitive systems who require verified code generation.

We hope you enjoyed this article.

Consider subscribing to one of our newsletters like Daily AI Brief or AI Programming Weekly.

Also, consider following us on social media:

AI Brief AI Brief (X)

Subscribe to AI Programming Weekly

Weekly news about AI tools for software engineers, AI enabled IDE's and much more.

Market report

2025 Generative AI in Professional Services Report

Thomson Reuters

This report by Thomson Reuters explores the integration and impact of generative AI technologies, such as ChatGPT and Microsoft Copilot, within the professional services sector. It highlights the growing adoption of GenAI tools across industries like legal, tax, accounting, and government, and discusses the challenges and opportunities these technologies present. The report also examines professionals' perceptions of GenAI and the need for strategic integration to maximize its value.

Categories

Companies

Resources

Logical Intelligence’s Aleph Leads Formal Verification Benchmarks

We hope you enjoyed this article.

Subscribe to AI Programming Weekly

Market report

2025 Generative AI in Professional Services Report

You May Also Like

Sondera Presents Autoformalization Research for AI Agent Policy Control

LinqAlpha Raises 22 Million Dollars to Expand AI Agents for Institutional Investors

WIRobotics Begins Physical AI Development Ecosystem with ALLEX Simulation Model

MGI Tech and Shanghai AI Lab Introduce Physical AI Systems for Life Sciences

IDBS and Alchemi Partner to Automate Biopharma Regulatory Filings with AI Agents

Intellectible Raises $3 Million to Expand AI Revenue Operations Platform

Weilliptic Launches Codensics for AI Code Provenance and Cost Governance

Alleva Launches Alleva Intelligence for Behavioral Health Operations

ValidMind Releases Atryum Open Source Control Layer for AI Agents

Sail Research Raises $80 Million to Build Infrastructure for Long-Horizon AI Agents

Virtue AI Launches Shadow AI to Detect Unapproved Enterprise Agents

Limitless Labs Raises $20 Million to Expand Physical AI Platform for Manufacturing