Google Introduces DiffusionGemma for Faster Text Generation

June 12, 2026
Google has released DiffusionGemma, an experimental open model that generates text up to four times faster than traditional autoregressive language models. The 26 billion parameter Mixture of Experts model uses a diffusion-based approach to produce entire blocks of text simultaneously.

Google announced in a press release the release of DiffusionGemma, an open experimental model designed to accelerate text generation. The model achieves up to four times faster inference on dedicated GPUs by generating entire blocks of text simultaneously rather than processing tokens one by one.

DiffusionGemma is a 26 billion parameter Mixture of Experts model that activates only 3.8 billion parameters during inference, fitting within 18GB of VRAM when quantized. It features bi-directional attention that allows each token to reference all others in a 256-token block, enabling tasks such as in-line editing, code infilling, and other non-linear text generation.

The model introduces a diffusion head that iteratively refines text output, improving accuracy during generation. It is released under an Apache 2.0 license and optimized for use with consumer and enterprise GPUs, including NVIDIA GeForce RTX 5090 and H100 systems. Official integrations are available with platforms such as Hugging Face, vLLM, MLX, and Red Hat.

While DiffusionGemma prioritizes speed and parallel processing, Google recommends using the standard Gemma 4 models for applications that require the highest quality outputs. Developers can fine-tune DiffusionGemma for specific tasks and access the model weights through Hugging Face.

We hope you enjoyed this article.

Consider subscribing to one of our newsletters like Daily AI Brief.

Also, consider following us on social media:

Subscribe to Daily AI Brief

Daily report covering major AI developments and industry news, with both top stories and complete market updates

Market report

2025 Generative AI in Professional Services Report

Thomson Reuters

This report by Thomson Reuters explores the integration and impact of generative AI technologies, such as ChatGPT and Microsoft Copilot, within the professional services sector. It highlights the growing adoption of GenAI tools across industries like legal, tax, accounting, and government, and discusses the challenges and opportunities these technologies present. The report also examines professionals' perceptions of GenAI and the need for strategic integration to maximize its value.

Read more