Google Introduces Implicit Caching for Gemini AI Models
Google has introduced a new feature called 'implicit caching' in its Gemini API, which aims to reduce costs for developers using its latest AI models. This feature is designed to automatically enable up to 75% savings on repetitive context passed to models, specifically benefiting the Gemini 2.5 Pro and 2.5 Flash models.
Implicit caching differs from Google's previous explicit caching method by automating the process, thus eliminating the need for developers to manually define high-frequency prompts. This change comes after feedback from developers who found the explicit caching method cumbersome and costly.
The new caching system is activated by default for Gemini 2.5 models. It works by identifying requests that share a common prefix with previous ones, allowing them to hit a cache and pass on cost savings. The minimum prompt token count required for a cache hit is 1,024 for 2.5 Flash and 2,048 for 2.5 Pro models.
We hope you enjoyed this article.
Consider subscribing to one of several newsletters we publish like AI Programming Weekly.
Also, consider following us on social media:
Subscribe to Daily AI Brief
Daily report covering major AI developments and industry news, with both top stories and complete market updates
Market report
2025 Generative AI in Professional Services Report
This report by Thomson Reuters explores the integration and impact of generative AI technologies, such as ChatGPT and Microsoft Copilot, within the professional services sector. It highlights the growing adoption of GenAI tools across industries like legal, tax, accounting, and government, and discusses the challenges and opportunities these technologies present. The report also examines professionals' perceptions of GenAI and the need for strategic integration to maximize its value.
Read more