Mistral Small 3.1 24B

Name: Mistral Small 3.1 24B
Author: Mistral AI

24B

parameters

text-generation code-generation reasoning multilingual vision tool-use summarization

Mistral Small 3.1 24B is a multimodal model that handles text and image inputs with 128K context. It's the first model from Mistral with native vision support and delivers strong results across general tasks, coding, and multilingual work. At Q4 it needs about 18 GB VRAM — fits on a RTX 3090 or a Mac with 24 GB. Positioned as Mistral's sweet spot between efficiency and capability, it's well-suited for daily use on high-end consumer hardware.

Quick Start with Ollama


ollama run 24b-instruct-q4_K_M

Resources Ollama Hugging Face Official Page

Creator	Mistral AI
Parameters	24B
Architecture	transformer-decoder
Context	128K tokens
Released	Mar 18, 2025
License	Apache 2.0
Ollama	mistral-small3.1

Quantization Options

Format	File Size	VRAM Required	Ollama Tag
Q4_K_M rec	15 GB	18 GB	`24b-instruct-q4_K_M`
Q8_0	26 GB	30 GB	`24b-instruct-q8_0`
F16	49 GB	54 GB	`24b-instruct-fp16`

Compatible Hardware

Q4_K_M requires 18 GB VRAM

Compatible Hardware

Hardware	VRAM	Type	Fit	Est. Speed
Mac Studio M4 Ultra 512GB	512 GB	mac	Runs	~46 tok/s
Mac Pro M2 Ultra 192GB	192 GB	mac	Runs	~44 tok/s
Mac Studio M4 Ultra 192GB	192 GB	mac	Runs	~46 tok/s
Mac Studio M4 Max 128GB	128 GB	mac	Runs	~30 tok/s
MacBook Pro M4 Max 128GB	128 GB	mac	Runs	~30 tok/s
MacBook Pro M5 Max 128GB	128 GB	mac	Runs	~30 tok/s
NVIDIA RTX PRO 6000 Blackwell	96 GB	gpu	Runs	~107 tok/s
MacBook Pro M3 Max 96GB	96 GB	mac	Runs	~22 tok/s
Mac mini M4 Pro 64GB	64 GB	mac	Runs	~15 tok/s
Mac Studio M4 Max 64GB	64 GB	mac	Runs	~30 tok/s
MacBook Pro M4 Max 64GB	64 GB	mac	Runs	~30 tok/s
MacBook Pro M5 Max 64GB	64 GB	mac	Runs	~30 tok/s
NVIDIA RTX 6000 Ada Generation	48 GB	gpu	Runs	~53 tok/s
NVIDIA RTX A6000	48 GB	gpu	Runs	~43 tok/s
NVIDIA RTX PRO 5000 Blackwell	48 GB	gpu	Runs	~53 tok/s
Mac mini M4 Pro 48GB	48 GB	mac	Runs	~15 tok/s
MacBook Pro M3 Max 48GB	48 GB	mac	Runs	~22 tok/s
MacBook Pro M4 Max 48GB	48 GB	mac	Runs	~30 tok/s
MacBook Pro M4 Pro 48GB	48 GB	mac	Runs	~15 tok/s
MacBook Pro M5 Max 48GB	48 GB	mac	Runs	~23 tok/s
MacBook Pro M5 Pro 48GB	48 GB	mac	Runs	~15 tok/s
Mac Studio M4 Max 36GB	36 GB	mac	Runs	~30 tok/s
MacBook Pro M3 Pro 36GB	36 GB	mac	Runs	~8 tok/s
MacBook Pro M5 Max 36GB	36 GB	mac	Runs	~23 tok/s
NVIDIA RTX 5000 Ada Generation	32 GB	gpu	Runs	~40 tok/s
NVIDIA GeForce RTX 5090	32 GB	gpu	Runs	~100 tok/s
iMac M4 32GB	32 GB	mac	Runs	~7 tok/s
Mac mini M4 32GB	32 GB	mac	Runs	~7 tok/s
MacBook Air M4 32GB	32 GB	mac	Runs	~7 tok/s
MacBook Air M5 32GB	32 GB	mac	Runs	~7 tok/s
MacBook Pro M5 32GB	32 GB	mac	Runs	~7 tok/s
AMD Radeon RX 7900 XTX	24 GB	gpu	Runs	~53 tok/s
NVIDIA GeForce RTX 3090	24 GB	gpu	Runs	~52 tok/s
NVIDIA GeForce RTX 3090 Ti	24 GB	gpu	Runs	~56 tok/s
NVIDIA GeForce RTX 4090	24 GB	gpu	Runs	~56 tok/s
NVIDIA RTX A5000	24 GB	gpu	Runs	~43 tok/s
iMac M3 24GB	24 GB	mac	Runs	~6 tok/s
Mac mini M2 24GB	24 GB	mac	Runs	~6 tok/s
Mac mini M4 Pro 24GB	24 GB	mac	Runs	~15 tok/s
MacBook Air M2 24GB	24 GB	mac	Runs	~6 tok/s
MacBook Air M4 24GB	24 GB	mac	Runs	~7 tok/s
MacBook Air M5 24GB	24 GB	mac	Runs	~7 tok/s
MacBook Pro M4 Pro 24GB	24 GB	mac	Runs	~15 tok/s
MacBook Pro M5 24GB	24 GB	mac	Runs	~7 tok/s
MacBook Pro M5 Pro 24GB	24 GB	mac	Runs	~15 tok/s
AMD Radeon RX 7900 XT	20 GB	gpu	Runs (tight)	~44 tok/s
NVIDIA RTX 4000 Ada Generation	20 GB	gpu	Runs (tight)	~20 tok/s
MacBook Pro M3 Pro 18GB	18 GB	mac	CPU Offload	~2 tok/s
AMD Radeon RX 6900 XT	16 GB	gpu	CPU Offload	~8 tok/s
AMD Radeon RX 6800 XT	16 GB	gpu	CPU Offload	~8 tok/s
AMD Radeon RX 7800 XT	16 GB	gpu	CPU Offload	~11 tok/s
AMD Radeon RX 9060 XT 16GB	16 GB	gpu	CPU Offload	~9 tok/s
AMD Radeon RX 9070 XT	16 GB	gpu	CPU Offload	~11 tok/s
AMD Radeon RX 9070	16 GB	gpu	CPU Offload	~9 tok/s
Intel Arc A770	16 GB	gpu	CPU Offload	~9 tok/s
NVIDIA GeForce RTX 4060 Ti 16GB	16 GB	gpu	CPU Offload	~5 tok/s
NVIDIA GeForce RTX 4070 Ti Super	16 GB	gpu	CPU Offload	~11 tok/s
NVIDIA GeForce RTX 4080 Super	16 GB	gpu	CPU Offload	~12 tok/s
NVIDIA GeForce RTX 4080	16 GB	gpu	CPU Offload	~12 tok/s
NVIDIA GeForce RTX 5060 Ti 16GB	16 GB	gpu	CPU Offload	~8 tok/s
NVIDIA GeForce RTX 5070 Ti	16 GB	gpu	CPU Offload	~15 tok/s
NVIDIA GeForce RTX 5080	16 GB	gpu	CPU Offload	~16 tok/s
NVIDIA RTX A4000	16 GB	gpu	CPU Offload	~8 tok/s
iMac M1 16GB	16 GB	mac	CPU Offload	~1 tok/s
iMac M4 16GB	16 GB	mac	CPU Offload	~2 tok/s
Mac mini M1 16GB	16 GB	mac	CPU Offload	~1 tok/s
Mac mini M4 16GB	16 GB	mac	CPU Offload	~2 tok/s
MacBook Air M2 16GB	16 GB	mac	CPU Offload	~2 tok/s
MacBook Air M3 16GB	16 GB	mac	CPU Offload	~2 tok/s
MacBook Air M4 16GB	16 GB	mac	CPU Offload	~2 tok/s
MacBook Air M5 16GB	16 GB	mac	CPU Offload	~2 tok/s
MacBook Pro M1 16GB	16 GB	mac	CPU Offload	~1 tok/s
MacBook Pro M2 Pro 16GB	16 GB	mac	CPU Offload	~3 tok/s
MacBook Pro M5 16GB	16 GB	mac	CPU Offload	~2 tok/s
AMD Radeon RX 6700 XT	12 GB	gpu	CPU Offload	~6 tok/s
AMD Radeon RX 7700 XT	12 GB	gpu	CPU Offload	~7 tok/s
Intel Arc B580	12 GB	gpu	CPU Offload	~8 tok/s
NVIDIA GeForce RTX 3060 12GB	12 GB	gpu	CPU Offload	~6 tok/s
NVIDIA GeForce RTX 3080 12GB	12 GB	gpu	CPU Offload	~15 tok/s
NVIDIA GeForce RTX 4070 Super	12 GB	gpu	CPU Offload	~8 tok/s
NVIDIA GeForce RTX 4070 Ti	12 GB	gpu	CPU Offload	~8 tok/s
NVIDIA GeForce RTX 4070	12 GB	gpu	CPU Offload	~8 tok/s
NVIDIA GeForce RTX 5070	12 GB	gpu	CPU Offload	~11 tok/s

24 hardware device(s) cannot run this model at Q4_K_M.

Benchmark Scores

78.0

mmlu