Gemma 4 Is Here: Google's Most Capable Open Model Family
Google's Gemma 4 ships four model sizes under Apache 2.0 — from 2B edge models to a 31B dense powerhouse that rivals 400B+ rivals. Here's what it means for local AI.
Google DeepMind released Gemma 4 on April 2, 2026 — and it’s a big deal for anyone running models locally. Four model sizes, Apache 2.0 license, native vision and audio, 140+ languages, and benchmark scores that embarrass models twenty times its size. Let’s break it down.
The Lineup
Gemma 4 comes in four sizes, each targeting a different deployment scenario:
| Model | Parameters | Context | VRAM (Q4) | Best For |
|---|---|---|---|---|
| E2B | 2.3B effective | 128K | ~4 GB | Phones, Raspberry Pi, edge devices |
| E4B | 4.5B effective | 128K | ~6 GB | Laptops, 8 GB GPUs |
| 26B MoE | 26B total / 3.8B active | 256K | ~20 GB | Consumer GPUs, efficiency-focused |
| 31B Dense | 30.7B | 256K | ~22 GB | Maximum quality on a single GPU |
The “E” in E2B and E4B stands for “effective” — these are efficient architectures designed to punch above their parameter count. The 26B model uses Mixture-of-Experts, activating only 3.8B parameters per token while having access to the knowledge of a 26B model.
Why Gemma 4 Matters
Benchmark performance is exceptional. The 31B Dense model ranks #3 on the Arena AI text leaderboard among all open models. Some specific numbers:
| Benchmark | Gemma 4 31B | Gemma 3 27B | Improvement |
|---|---|---|---|
| GPQA Diamond | 84.3% | — | Graduate-level reasoning |
| AIME 2026 | 89.2% | 20.8% | 4.3x improvement in math |
| LiveCodeBench v6 | 80.0% | 29.1% | 2.7x improvement in coding |
| MMLU Pro | 85.2% | — | Broad knowledge |
The 26B MoE is nearly as impressive — 88.3% on AIME 2026 and 77.1% on LiveCodeBench, despite only activating 3.8B parameters per token.
Apache 2.0 license. Unlike Llama 4’s community license (which requires a separate agreement above 700M monthly active users), Gemma 4 ships under Apache 2.0 — fully permissive for commercial use with no strings attached.
Multimodal out of the box. All four sizes support text and image input. The larger models handle video (up to 60 seconds at 1 fps) with configurable visual token budgets. No separate vision adapter needed.
Hardware Requirements
Here’s what you actually need to run each model locally:
E2B (2.3B) — Runs on Anything
At 4 GB VRAM for Q4, this fits on basically any modern GPU or Mac. Even a 2020 MacBook Air with 8 GB can run it comfortably.
ollama run gemma4:e2b
E4B (4.5B) — Laptops and Entry GPUs
At 6 GB VRAM for Q4, this works well on RTX 3060, RTX 4060, or any Mac with 8 GB+ memory. The sweet spot for a personal AI assistant that doesn’t hog your system.
ollama run gemma4:e4b
26B MoE — The Efficiency Play
Despite being labeled “26B,” the MoE architecture means only 3.8B parameters fire per token. At Q4 it needs ~20 GB VRAM, fitting on a RTX 3090, RTX 4090, or a Mac with 24 GB+ unified memory. You get near-31B quality with much better tokens-per-second.
ollama run gemma4:26b
31B Dense — Maximum Quality
The flagship. At Q4 it needs ~22 GB VRAM — same class of hardware as the 26B but with slightly better quality and no MoE overhead. If your hardware can run one, it can run the other.
ollama run gemma4:31b
Not sure if your GPU or Mac can handle it? Check our compatibility tool — just find your hardware and see exactly which Gemma 4 models fit.
Which Model Should You Pick?
8 GB VRAM or less (RTX 4060, M1/M2 8GB Mac): Go with E4B. It’s the best quality you can get at this VRAM tier, and it supports vision.
12-16 GB VRAM (RTX 4070 Ti, M-series 16GB Mac): E4B at Q8 for higher quality, or you could try the 26B MoE with some CPU offloading.
24 GB VRAM (RTX 3090/4090/5090, M-series 24GB+ Mac): 31B Dense or 26B MoE — both fit. The 31B is marginally better on benchmarks; the 26B is faster.
48 GB+ VRAM (dual GPUs, Mac Studio, M4 Max 64GB+): Run the 31B at Q8 for maximum quality, or use the freed headroom for longer context windows.
Quick Start
Install Ollama if you haven’t already, then:
ollama run gemma4:31b
First run downloads the model (~20 GB for 31B at Q4). After that it starts instantly. All four models support vision — drag an image into the Ollama desktop app and ask questions about it.
Browse all Gemma 4 models on our site:
- Gemma 4 E2B �� 2.3B edge model
- Gemma 4 E4B — 4.5B efficient model
- Gemma 4 26B — 26B MoE
- Gemma 4 31B — 31B Dense flagship