Mistral Nemo 12B
by Mistral AI · mistral family
12B
parameters
text-generation code-generation reasoning multilingual tool-use summarization
Mistral Nemo 12B was built jointly by Mistral AI and NVIDIA. It features a 128K context window and uses a Tekken tokenizer that's more efficient across languages than prior Mistral models. With 3.4M+ Ollama pulls, it's one of the most popular models at its size. At Q4 it fits on 12 GB GPUs comfortably, making it a strong contender alongside Gemma 3 12B. Excellent at function calling, multilingual tasks, and general instruction following.
Quick Start with Ollama
ollama run 12b-instruct-q4_K_M | Creator | Mistral AI |
| Parameters | 12B |
| Architecture | transformer-decoder |
| Context | 128K tokens |
| Released | Jul 18, 2024 |
| License | Apache 2.0 |
| Ollama | mistral-nemo |
Quantization Options
| Format | File Size | VRAM Required | Quality | Ollama Tag |
|---|---|---|---|---|
| Q4_K_M rec | 7.1 GB | 9.5 GB | | 12b-instruct-q4_K_M |
| Q8_0 | 12.9 GB | 16 GB | | 12b-instruct-q8_0 |
| F16 | 24.5 GB | 28 GB | | 12b-instruct-fp16 |
Compatible Hardware
Q4_K_M requires 9.5 GB VRAM
Benchmark Scores
68.0
mmlu