Gemma 4 Is Here: Google's Most Capable Open Model Family

Google DeepMind released Gemma 4 on April 2, 2026 — and it’s a big deal for anyone running models locally. Four model sizes, Apache 2.0 license, native vision and audio, 140+ languages, and benchmark scores that embarrass models twenty times its size. Let’s break it down.

The Lineup

Gemma 4 comes in four sizes, each targeting a different deployment scenario:

Model	Parameters	Context	VRAM (Q4)	Best For
E2B	2.3B effective	128K	~4 GB	Phones, Raspberry Pi, edge devices
E4B	4.5B effective	128K	~6 GB	Laptops, 8 GB GPUs
26B MoE	26B total / 3.8B active	256K	~20 GB	Consumer GPUs, efficiency-focused
31B Dense	30.7B	256K	~22 GB	Maximum quality on a single GPU

The “E” in E2B and E4B stands for “effective” — these are efficient architectures designed to punch above their parameter count. The 26B model uses Mixture-of-Experts, activating only 3.8B parameters per token while having access to the knowledge of a 26B model.

Why Gemma 4 Matters

Benchmark performance is exceptional. The 31B Dense model ranks #3 on the Arena AI text leaderboard among all open models. Some specific numbers:

Benchmark	Gemma 4 31B	Gemma 3 27B	Improvement
GPQA Diamond	84.3%	—	Graduate-level reasoning
AIME 2026	89.2%	20.8%	4.3x improvement in math
LiveCodeBench v6	80.0%	29.1%	2.7x improvement in coding
MMLU Pro	85.2%	—	Broad knowledge

The 26B MoE is nearly as impressive — 88.3% on AIME 2026 and 77.1% on LiveCodeBench, despite only activating 3.8B parameters per token.

Apache 2.0 license. Unlike Llama 4’s community license (which requires a separate agreement above 700M monthly active users), Gemma 4 ships under Apache 2.0 — fully permissive for commercial use with no strings attached.

Multimodal out of the box. All four sizes support text and image input. The larger models handle video (up to 60 seconds at 1 fps) with configurable visual token budgets. No separate vision adapter needed.

Hardware Requirements

Here’s what you actually need to run each model locally:

E2B (2.3B) — Runs on Anything

At 4 GB VRAM for Q4, this fits on basically any modern GPU or Mac. Even a 2020 MacBook Air with 8 GB can run it comfortably.

ollama run gemma4:e2b

E4B (4.5B) — Laptops and Entry GPUs

At 6 GB VRAM for Q4, this works well on RTX 3060, RTX 4060, or any Mac with 8 GB+ memory. The sweet spot for a personal AI assistant that doesn’t hog your system.

ollama run gemma4:e4b

26B MoE — The Efficiency Play

Despite being labeled “26B,” the MoE architecture means only 3.8B parameters fire per token. At Q4 it needs ~20 GB VRAM, fitting on a RTX 3090, RTX 4090, or a Mac with 24 GB+ unified memory. You get near-31B quality with much better tokens-per-second.

ollama run gemma4:26b

31B Dense — Maximum Quality

The flagship. At Q4 it needs ~22 GB VRAM — same class of hardware as the 26B but with slightly better quality and no MoE overhead. If your hardware can run one, it can run the other.

ollama run gemma4:31b

Not sure if your GPU or Mac can handle it? Check our compatibility tool — just find your hardware and see exactly which Gemma 4 models fit.

Which Model Should You Pick?

8 GB VRAM or less (RTX 4060, M1/M2 8GB Mac): Go with E4B. It’s the best quality you can get at this VRAM tier, and it supports vision.

12-16 GB VRAM (RTX 4070 Ti, M-series 16GB Mac): E4B at Q8 for higher quality, or you could try the 26B MoE with some CPU offloading.

24 GB VRAM (RTX 3090/4090/5090, M-series 24GB+ Mac): 31B Dense or 26B MoE — both fit. The 31B is marginally better on benchmarks; the 26B is faster.

48 GB+ VRAM (dual GPUs, Mac Studio, M4 Max 64GB+): Run the 31B at Q8 for maximum quality, or use the freed headroom for longer context windows.

Quick Start

Install Ollama if you haven’t already, then:

ollama run gemma4:31b

First run downloads the model (~20 GB for 31B at Q4). After that it starts instantly. All four models support vision — drag an image into the Ollama desktop app and ask questions about it.

Browse all Gemma 4 models on our site:

Gemma 4 E2B �� 2.3B edge model
Gemma 4 E4B — 4.5B efficient model
Gemma 4 26B — 26B MoE
Gemma 4 31B — 31B Dense flagship

Or check which Gemma 4 model your hardware can run.