Skip to content
news Apr 12, 2026

Gemma 4 Is Here: Google's Most Capable Open Model Family

Google's Gemma 4 ships four model sizes under Apache 2.0 — from 2B edge models to a 31B dense powerhouse that rivals 400B+ rivals. Here's what it means for local AI.

Google DeepMind released Gemma 4 on April 2, 2026 — and it’s a big deal for anyone running models locally. Four model sizes, Apache 2.0 license, native vision and audio, 140+ languages, and benchmark scores that embarrass models twenty times its size. Let’s break it down.

The Lineup

Gemma 4 comes in four sizes, each targeting a different deployment scenario:

ModelParametersContextVRAM (Q4)Best For
E2B2.3B effective128K~4 GBPhones, Raspberry Pi, edge devices
E4B4.5B effective128K~6 GBLaptops, 8 GB GPUs
26B MoE26B total / 3.8B active256K~20 GBConsumer GPUs, efficiency-focused
31B Dense30.7B256K~22 GBMaximum quality on a single GPU

The “E” in E2B and E4B stands for “effective” — these are efficient architectures designed to punch above their parameter count. The 26B model uses Mixture-of-Experts, activating only 3.8B parameters per token while having access to the knowledge of a 26B model.

Why Gemma 4 Matters

Benchmark performance is exceptional. The 31B Dense model ranks #3 on the Arena AI text leaderboard among all open models. Some specific numbers:

BenchmarkGemma 4 31BGemma 3 27BImprovement
GPQA Diamond84.3%Graduate-level reasoning
AIME 202689.2%20.8%4.3x improvement in math
LiveCodeBench v680.0%29.1%2.7x improvement in coding
MMLU Pro85.2%Broad knowledge

The 26B MoE is nearly as impressive — 88.3% on AIME 2026 and 77.1% on LiveCodeBench, despite only activating 3.8B parameters per token.

Apache 2.0 license. Unlike Llama 4’s community license (which requires a separate agreement above 700M monthly active users), Gemma 4 ships under Apache 2.0 — fully permissive for commercial use with no strings attached.

Multimodal out of the box. All four sizes support text and image input. The larger models handle video (up to 60 seconds at 1 fps) with configurable visual token budgets. No separate vision adapter needed.

Hardware Requirements

Here’s what you actually need to run each model locally:

E2B (2.3B) — Runs on Anything

At 4 GB VRAM for Q4, this fits on basically any modern GPU or Mac. Even a 2020 MacBook Air with 8 GB can run it comfortably.

ollama run gemma4:e2b

E4B (4.5B) — Laptops and Entry GPUs

At 6 GB VRAM for Q4, this works well on RTX 3060, RTX 4060, or any Mac with 8 GB+ memory. The sweet spot for a personal AI assistant that doesn’t hog your system.

ollama run gemma4:e4b

26B MoE — The Efficiency Play

Despite being labeled “26B,” the MoE architecture means only 3.8B parameters fire per token. At Q4 it needs ~20 GB VRAM, fitting on a RTX 3090, RTX 4090, or a Mac with 24 GB+ unified memory. You get near-31B quality with much better tokens-per-second.

ollama run gemma4:26b

31B Dense — Maximum Quality

The flagship. At Q4 it needs ~22 GB VRAM — same class of hardware as the 26B but with slightly better quality and no MoE overhead. If your hardware can run one, it can run the other.

ollama run gemma4:31b

Not sure if your GPU or Mac can handle it? Check our compatibility tool — just find your hardware and see exactly which Gemma 4 models fit.

Which Model Should You Pick?

8 GB VRAM or less (RTX 4060, M1/M2 8GB Mac): Go with E4B. It’s the best quality you can get at this VRAM tier, and it supports vision.

12-16 GB VRAM (RTX 4070 Ti, M-series 16GB Mac): E4B at Q8 for higher quality, or you could try the 26B MoE with some CPU offloading.

24 GB VRAM (RTX 3090/4090/5090, M-series 24GB+ Mac): 31B Dense or 26B MoE — both fit. The 31B is marginally better on benchmarks; the 26B is faster.

48 GB+ VRAM (dual GPUs, Mac Studio, M4 Max 64GB+): Run the 31B at Q8 for maximum quality, or use the freed headroom for longer context windows.

Quick Start

Install Ollama if you haven’t already, then:

ollama run gemma4:31b

First run downloads the model (~20 GB for 31B at Q4). After that it starts instantly. All four models support vision — drag an image into the Ollama desktop app and ask questions about it.

Browse all Gemma 4 models on our site:

Or check which Gemma 4 model your hardware can run.