Google Gemma 4 Is Out: What the New Open Models Offer

Google released Gemma 4 on April 2, 2026 — four model sizes, full multimodal support, and the largest variant scoring at the top of public leaderboards in math and coding. For the open-model space, this is a bigger release than the version number suggests.

Four sizes, four use cases

Gemma 4 comes in four variants: E2B, E4B, 26B, and 31B. The two smaller models are designed for mobile and IoT hardware — enough compute to run offline on a phone or embedded device. Google describes the efficiency as "unprecedented intelligence-per-parameter," which is marketing language for a real result: the benchmark scores for these size classes are genuinely high for models this small.

The 26B sits in the mid-range, targeting consumer GPUs. A machine with 16-24 GB of VRAM can run this comfortably in Ollama or LM Studio without special optimization. The 31B is the flagship variant.

Where the 31B lands

On AIME 2026 mathematics, the 31B scores 89.2%. On LiveCodeBench, 80%. On MMLU multilingual, 85.2%. The Chatbot Arena rating is 1452.

Those numbers put it in the same tier as GPT-4o and Claude 3.5 Sonnet on structured evaluations — from a model you can run on your own hardware. Benchmark scores carry a caveat: controlled tasks do not perfectly predict performance on real-world queries. A model that hits 89% on competition math may still miss on the specific messy inputs your application sends. But that score from an open-weight model marks a threshold that was not reachable at this parameter count twelve months ago. The previous open-model ceiling for this size class was a good 15-20 points lower across comparable benchmarks.

Gemma 4 is built from Gemini 3 research, which explains some of the performance jump. Google transferred architectural improvements from its frontier closed model into the open release. That transfer does not give open Gemma users everything in Gemini — training data scale and post-training alignment are different — but it gives the Gemma 4 31B a stronger foundation than earlier versions had.

What multimodal means here

Previous Gemma releases were text-only. Gemma 4 adds audio and visual understanding across all four model sizes.

The practical consequence is for on-device deployment. If you are building an application that needs to process voice input or analyze images without routing data through a cloud API — a medical device, a local document processor, a home assistant with a privacy constraint — multimodal Gemma 4 is now a viable foundation. With Gemma 3, you would have needed to stitch together separate models for text, audio, and image inputs. That architecture is gone.

140 languages

Gemma 4 supports 140 languages with what Google describes as cultural context in the training data. Many multilingual models handle vocabulary across languages but fail on idiomatic or culturally specific phrasing. The distinction matters for any application that needs to do more than translate tokens. For developers building outside English-dominant markets, the coverage is broader than most open models at this size offer.

Running it locally

Gemma 4 is available on Hugging Face, Ollama, LM Studio, Kaggle, and Docker. The Ollama path is the most direct for most developers:

ollama pull gemma4:26b
ollama run gemma4:26b

The E2B and E4B models can also deploy through Vertex AI and Google Kubernetes Engine if you are already in the Google Cloud ecosystem.

Native function calling

Gemma 4 ships with native function calling — the model can invoke tools defined in your application code without custom prompt engineering workarounds. This matters for agent workflows. Previous open models required careful prompt design to get reliable tool invocation; Gemma 4 treats it as a first-class capability. Combined with the 31B benchmark performance, this makes it a more complete foundation for agent frameworks that previously needed a frontier closed model for reliable operation. Google's model page has technical specifications and integration guides for all four sizes.

Who this is actually for

Gemma 4 is worth your attention in three situations.

You send user data to a cloud API and would prefer not to — for privacy, cost, or latency reasons. The 31B is now a serious alternative to frontier closed models for structured tasks in well-defined domains.

You need AI on constrained hardware offline. The E2B and E4B variants deliver current-generation capability in a form that fits on a phone or IoT device.

You are fine-tuning for a specific domain. Framework support covers JAX, Keras, and standard transformers, and the documentation is solid across all four sizes.

What Gemma 4 does not change: the gap between open and closed models on general-purpose conversational tasks. Frontier closed models still carry advantages in training data scale, post-training alignment, and production hardening. For structured, domain-specific work, a fine-tuned Gemma 4 31B is now competitive. For open-ended conversation with unpredictable inputs, the gap is smaller than it was with Gemma 3, but it is not gone.

The April 2026 chatbot pricing index covers cost structures for the closed-model alternatives. Chatbot.gallery tracks capability profiles for both open and closed platforms as new releases arrive.