In a notable advancement for AI text generation, Google DeepMind's newly launched DiffusionGemma model stands out for its ability to produce text blocks in parallel rather than through the traditional linear method. This shift enhances both speed and efficiency of local AI operations, particularly on Nvidia hardware.
Unlike most AI models that generate text one token at a time from left to right, DiffusionGemma employs a method similar to image generation. It starts with a field of placeholder tokens and iteratively refines them, ultimately producing complete text output in a single step. This innovative technique could change how text generation tasks are approached, especially in environments with limited computational resources.
The model is classified as a Mixture of Experts (MoE) with a substantial 26 billion parameters, of which only 3.8 billion are active during inference. This efficiency allows it to operate comfortably within the 18GB memory limits of high-end GPUs. In practical tests, an Nvidia RTX 5090 showed that DiffusionGemma can produce approximately 700 tokens per second. The output soars to over 1,000 tokens per second when using an Nvidia H100 AI accelerator, effectively quadrupling the speed of its autoregressive counterparts.
This increase in output speed shifts the conventional bottleneck in AI text generation from memory bandwidth to computational capacity. By generating up to 256 tokens simultaneously, DiffusionGemma excels in non-linear tasks that typically challenge standard models. For example, its application in solving Sudoku puzzles demonstrates its unique ability to self-correct large sets of tokens, addressing the inherent dependency of each token on future tokens—a common hurdle for traditional autoregressive models.
Google's strategic direction with DiffusionGemma could have broader implications for AI applications across various sectors. Tasks such as in-line editing, molecular sequencing, and mathematical graphing may benefit from this increased efficiency, presenting new opportunities for developers and researchers. As AI continues to evolve, tools like DiffusionGemma may lead to more sophisticated and user-friendly applications, particularly in environments reliant on local processing power.
The introduction of DiffusionGemma exemplifies Google's commitment to advancing AI technology and highlights the growing alignment between AI models and the hardware they operate on. As Nvidia enhances its GPU capabilities, the potential for applying such models in practical scenarios will only expand, pushing the boundaries of what is achievable in AI-driven tasks.
The stories that move AI & crypto markets — before the market reacts.
Free. 7am ET. Five stories. 62,400 readers.

