Google Introduces Implicit Caching for Gemini AI Models

May 09, 2025

Google has launched 'implicit caching' in its Gemini API, promising up to 75% cost savings for developers using its latest AI models.

Google has introduced a new feature called 'implicit caching' in its Gemini API, which aims to reduce costs for developers using its latest AI models. This feature is designed to automatically enable up to 75% savings on repetitive context passed to models, specifically benefiting the Gemini 2.5 Pro and 2.5 Flash models.

Implicit caching differs from Google's previous explicit caching method by automating the process, thus eliminating the need for developers to manually define high-frequency prompts. This change comes after feedback from developers who found the explicit caching method cumbersome and costly.

The new caching system is activated by default for Gemini 2.5 models. It works by identifying requests that share a common prefix with previous ones, allowing them to hit a cache and pass on cost savings. The minimum prompt token count required for a cache hit is 1,024 for 2.5 Flash and 2,048 for 2.5 Pro models.

We just shipped implicit caching in the Gemini API, automatically enabling a 75% cost savings with the Gemini 2.5 models when your request hits a cache 🚢

We also lowered the min token required to hit caches to 1K on 2.5 Flash and 2K on 2.5 Pro!
— Logan Kilpatrick (@OfficialLoganK) May 8, 2025

We hope you enjoyed this article.

Consider subscribing to one of our newsletters like AI Programming Weekly or Daily AI Brief.

Also, consider following us on social media:

AI Brief AI Brief (X)

Subscribe to AI Programming Weekly

Weekly news about AI tools for software engineers, AI enabled IDE's and much more.

Market report

2025 Generative AI in Professional Services Report

Thomson Reuters

This report by Thomson Reuters explores the integration and impact of generative AI technologies, such as ChatGPT and Microsoft Copilot, within the professional services sector. It highlights the growing adoption of GenAI tools across industries like legal, tax, accounting, and government, and discusses the challenges and opportunities these technologies present. The report also examines professionals' perceptions of GenAI and the need for strategic integration to maximize its value.

Categories

Companies

Resources

Google Introduces Implicit Caching for Gemini AI Models

We hope you enjoyed this article.

Subscribe to AI Programming Weekly

Market report

2025 Generative AI in Professional Services Report

You May Also Like

GSI Technology’s Compute-in-Memory APU Matches GPU AI Performance with 98% Lower Energy Use

Lockheed Martin and Google Public Sector Bring Generative AI On-Premises for National Security

Google Partners with Reliance to Offer Free AI Pro Access for Jio Users in India

Internet2 and Google Launch AI Education Leadership Program for Universities

Google Releases LLM-Evalkit for Structured Prompt Engineering on Vertex AI

Google Cloud Expands Vertex AI Agent Builder With New Developer Tools

WPP and Google Expand AI Partnership with $400 Million Commitment

Google and Yale Develop 27B-Parameter AI for Single-Cell Cancer Research

Intel Introduces Crescent Island GPU for AI Inference Workloads

Adtalem and Google Cloud Launch AI Credentials Program for Healthcare Professionals

Gcore Introduces Everywhere AI for Unified Deployment Across Cloud, Hybrid, and On-Prem Environments

MAINGEAR Unveils aiDAPTIV+ Package for On-Prem AI Workstations