Google Introduces DiffusionGemma for Faster Text Generation

June 12, 2026

Google has released DiffusionGemma, an experimental open model that generates text up to four times faster than traditional autoregressive language models. The 26 billion parameter Mixture of Experts model uses a diffusion-based approach to produce entire blocks of text simultaneously.

Google announced in a press release the release of DiffusionGemma, an open experimental model designed to accelerate text generation. The model achieves up to four times faster inference on dedicated GPUs by generating entire blocks of text simultaneously rather than processing tokens one by one.

DiffusionGemma is a 26 billion parameter Mixture of Experts model that activates only 3.8 billion parameters during inference, fitting within 18GB of VRAM when quantized. It features bi-directional attention that allows each token to reference all others in a 256-token block, enabling tasks such as in-line editing, code infilling, and other non-linear text generation.

The model introduces a diffusion head that iteratively refines text output, improving accuracy during generation. It is released under an Apache 2.0 license and optimized for use with consumer and enterprise GPUs, including NVIDIA GeForce RTX 5090 and H100 systems. Official integrations are available with platforms such as Hugging Face, vLLM, MLX, and Red Hat.

While DiffusionGemma prioritizes speed and parallel processing, Google recommends using the standard Gemma 4 models for applications that require the highest quality outputs. Developers can fine-tune DiffusionGemma for specific tasks and access the model weights through Hugging Face.

Categories

Companies

Resources

Google Introduces DiffusionGemma for Faster Text Generation

We hope you enjoyed this article.

Subscribe to Daily AI Brief

Market report

2025 Generative AI in Professional Services Report

You May Also Like

Celeris Releases Celeris-1 Diffusion Language Model