Skip to content

Llama 4 Scout (109B/17B active)

Llama 4 Community License

Meta · 109B · mixture-of-experts

2025-04-05 524K context 109B params

Use Cases

chat code reasoning multilingual vision math tools writing summary

Quantization Options

QuantBitsVRAMQualityStatus
Q4_K_Mrec472.0 GBGood
Q8_08125.0 GBExcellent

About this model

Llama 4 Scout is Meta's mixture-of-experts model with 109B total parameters but only 17B active per token across 16 experts. It's natively multimodal (text + images) and supports an unprecedented 10M token context window. At Q4 it needs about 72 GB — too large for a single consumer GPU but fits on Macs with 96-128 GB unified memory, or multi-GPU setups. Despite the large memory footprint, inference speed benefits from only 17B active params. The most capable open-weight model from Meta.

Benchmarks

80.0
mmlu