CoreWeave Sets New AI Benchmark with NVIDIA GB200 Superchips

CoreWeave Sets New AI Benchmark with NVIDIA GB200 Superchips

CoreWeave has achieved a new AI inferencing benchmark using NVIDIA GB200 Grace Blackwell Superchips, delivering 800 tokens per second on the Llama 3.1 model, as announced in a press release.

CoreWeave has achieved a new record in AI inferencing benchmarks using NVIDIA GB200 Grace Blackwell Superchips, announced in a press release. The company reported delivering 800 tokens per second (TPS) on the Llama 3.1 405B model, one of the largest open-source models, using a CoreWeave instance equipped with two NVIDIA Grace CPUs and four NVIDIA Blackwell GPUs.

Additionally, CoreWeave submitted results for NVIDIA H200 GPU instances, achieving 33,000 TPS on the Llama 2 70B model, marking a 40% improvement over previous NVIDIA H100 instances. These achievements underscore CoreWeave's position as a leading provider of cloud infrastructure services optimized for AI applications.

We hope you enjoyed this article.

Consider subscribing to one of several newsletters we publish like Silicon Brief.

Also, consider following us on social media:

Subscribe to Daily AI Brief

Daily report covering major AI developments and industry news, with both top stories and complete market updates

Whitepaper

Stanford HAI’s 2025 AI Index Reveals Record Growth in AI Capabilities, Investment, and Regulation

The 2025 AI Index by Stanford HAI provides a comprehensive overview of the global state of artificial intelligence, highlighting significant advancements in AI capabilities, investment, and regulation. The report details improvements in AI performance, increased adoption in various sectors, and the growing global optimism towards AI, despite ongoing challenges in reasoning and trust. It serves as a critical resource for policymakers, researchers, and industry leaders to understand AI's rapid evolution and its implications.

Read more