Llama 4 Scout (109B/17B active)

Name: Llama 4 Scout (109B/17B active)
Author: Meta

Llama 4 Community License

Meta · 109B · mixture-of-experts

🤗 HuggingFace Ollama Official

2025-04-05 524K context 109B params

Use Cases

chat code reasoning multilingual vision math tools writing summary

Quantization Options

Quant	Bits	VRAM	Quality	Status
Q4_K_Mrec	4	72.0 GB	Good	—
Q8_0	8	125.0 GB	Excellent	—

About this model

Llama 4 Scout is Meta's mixture-of-experts model with 109B total parameters but only 17B active per token across 16 experts. It's natively multimodal (text + images) and supports an unprecedented 10M token context window. At Q4 it needs about 72 GB — too large for a single consumer GPU but fits on Macs with 96-128 GB unified memory, or multi-GPU setups. Despite the large memory footprint, inference speed benefits from only 17B active params. The most capable open-weight model from Meta.

Benchmarks

80.0

mmlu

Your Hardware

DevicePick…

VRAM—

Bandwidth—

Detecting…

Install

Ollama

ollama run llama4:scout-q4_K_M

llama.cpp / GGUF

Download GGUF from HuggingFace

Specs

Parameters: 109B
Architecture: mixture-of-experts
Context: 524K tokens
Min VRAM: 72.0 GB
Recommended: 72.0 GB
Family: Llama 4
Released: 2025-04-05
License: Llama 4 Community License