Google Introduces Implicit Caching for Gemini AI Models

May 09, 2025
Google has launched 'implicit caching' in its Gemini API, promising up to 75% cost savings for developers using its latest AI models.

Google has introduced a new feature called 'implicit caching' in its Gemini API, which aims to reduce costs for developers using its latest AI models. This feature is designed to automatically enable up to 75% savings on repetitive context passed to models, specifically benefiting the Gemini 2.5 Pro and 2.5 Flash models.

Implicit caching differs from Google's previous explicit caching method by automating the process, thus eliminating the need for developers to manually define high-frequency prompts. This change comes after feedback from developers who found the explicit caching method cumbersome and costly.

The new caching system is activated by default for Gemini 2.5 models. It works by identifying requests that share a common prefix with previous ones, allowing them to hit a cache and pass on cost savings. The minimum prompt token count required for a cache hit is 1,024 for 2.5 Flash and 2,048 for 2.5 Pro models.

We hope you enjoyed this article.

Consider subscribing to one of our newsletters like AI Programming Weekly or Daily AI Brief.

Also, consider following us on social media:

Subscribe to AI Programming Weekly

Weekly news about AI tools for software engineers, AI enabled IDE's and much more.

Market report

2025 Generative AI in Professional Services Report

Thomson Reuters

This report by Thomson Reuters explores the integration and impact of generative AI technologies, such as ChatGPT and Microsoft Copilot, within the professional services sector. It highlights the growing adoption of GenAI tools across industries like legal, tax, accounting, and government, and discusses the challenges and opportunities these technologies present. The report also examines professionals' perceptions of GenAI and the need for strategic integration to maximize its value.

Read more