Red Hat Launches llm-d for Scalable AI Inference

May 20, 2025

Red Hat has introduced llm-d, an open-source project aimed at enhancing generative AI inference at scale, with contributions from industry leaders like CoreWeave, Google Cloud, and NVIDIA.

Red Hat has launched llm-d, a new open-source initiative designed to address the growing need for scalable generative AI inference, announced in a press release. This project is supported by founding contributors such as CoreWeave, Google Cloud, IBM Research, and NVIDIA, and aims to make AI inference as ubiquitous as Linux.

The llm-d project leverages a native Kubernetes architecture, vLLM-based distributed inference, and AI-aware network routing to optimize compute resources and deliver AI inference at a massive scale. This approach is intended to meet the demanding service-level objectives of production environments without compromising performance.

Key innovations of llm-d include vLLM, which supports a wide range of accelerators, and Prefill and Decode Disaggregation, which separates AI input context and token generation into discrete operations. Additionally, the project features KV Cache Offloading to reduce memory burdens and AI-Aware Network Routing for efficient request scheduling.

The initiative has garnered support from industry leaders such as AMD, Cisco, Hugging Face, Intel, Lambda, and Mistral AI, as well as academic institutions like the University of California, Berkeley, and the University of Chicago. This collaboration underscores the industry's commitment to advancing large-scale AI inference capabilities.

We hope you enjoyed this article.

Consider subscribing to one of our newsletters like Silicon Brief or Daily AI Brief.

Also, consider following us on social media:

AI Chips & Datacenters AI Brief AI Brief (X)

More from: Data Centers

07/30

Foxconn and TECO Form Alliance for AI Data Center Expansion

07/29

Groq Nears $600 Million Funding Round at $6 Billion Valuation

07/29

New Era Helium's JV Secures 235-Acre Site for AI Data Center in Texas

07/29

WhiteFiber Launches IPO on Nasdaq

07/29

Modine Invests $100 Million to Expand Data Center Cooling Capacity

Subscribe to Silicon Brief

Weekly coverage of AI hardware developments including chips, GPUs, cloud platforms, and data center technology.

Market report

AI’s Time-to-Market Quagmire: Why Enterprises Struggle to Scale AI Innovation

ModelOp

The 2025 AI Governance Benchmark Report by ModelOp provides insights from 100 senior AI and data leaders across various industries, highlighting the challenges enterprises face in scaling AI initiatives. The report emphasizes the importance of AI governance and automation in overcoming fragmented systems and inconsistent practices, showcasing how early adoption correlates with faster deployment and stronger ROI.

Categories

Companies

Resources

Red Hat Launches llm-d for Scalable AI Inference

We hope you enjoyed this article.

More from: Data Centers

Subscribe to Silicon Brief

Market report

AI’s Time-to-Market Quagmire: Why Enterprises Struggle to Scale AI Innovation

You May Also Like

Cerebras Partners with Hugging Face, DataRobot, and Docker for Fast AI Inference

Zhipu AI Unveils New Open-Source Model GLM 4.5

Aligned, AMD, and USC ISI Collaborate on MEGALODON Language Model

Carnegie Mellon and Anthropic Explore LLMs in Cyberattacks

AI Alliance Unveils AI Programming Language and Framework

Impulse AI Unveils Whitepaper on Decentralized AI Training

i-Function and Lunavi Launch AI Platform for Early Cognitive Decline Detection

DebitMyData Unveils Reinforcement Learning-Powered LLM Security API Suite

Teradata Enhances ClearScape Analytics with New ModelOps for AI

Elix and LINC Launch AI Drug Discovery Platform with Federated Learning

Answer ALS Launches AI Collaboration for ALS Drug Discovery

Amazon Invests $100M in AWS Generative AI Innovation Center