chat code reasoning multilingual vision math tools writing summary
Quantization Options
Quant
Bits
VRAM
Quality
Status
Q4_K_Mrec
4
72.0 GB
Good
—
Q8_0
8
125.0 GB
Excellent
—
About this model
Llama 4 Scout is Meta's mixture-of-experts model with 109B total parameters but only 17B active per token across 16 experts. It's natively multimodal (text + images) and supports an unprecedented 10M token context window.
At Q4 it needs about 72 GB — too large for a single consumer GPU but fits on Macs with 96-128 GB unified memory, or multi-GPU setups. Despite the large memory footprint, inference speed benefits from only 17B active params. The most capable open-weight model from Meta.