Llama 3.1 405B
by Meta · llama-3 family
405B
parameters
text-generation code-generation reasoning multilingual tool-use math creative-writing summarization
Llama 3.1 405B is the largest and most capable model in the Llama family, representing Meta's flagship open-source release. It competes directly with leading proprietary models on benchmarks spanning reasoning, coding, math, and multilingual understanding. Running this model locally requires enterprise-grade hardware with multiple high-end GPUs. However, for users with the necessary infrastructure, it provides state-of-the-art open-source performance without any API dependencies.
Quick Start with Ollama
ollama run 405b-instruct-q4_K_M | Creator | Meta |
| Parameters | 405B |
| Architecture | transformer-decoder |
| Context Length | 128K tokens |
| License | Llama 3.1 Community License |
| Released | Jul 23, 2024 |
| Ollama | llama3.1:405b |
Quantization Options
| Format | File Size | VRAM Required | Quality | Ollama Tag |
|---|---|---|---|---|
| Q4_K_M recommended | 196 GB | 244.5 GB |
★
★
★
★
★
| 405b-instruct-q4_K_M |
| Q5_K_M | 228.8 GB | 285 GB |
★
★
★
★
★
| 405b-instruct-q5_K_M |
Compatible Hardware for Q4_K_M
Showing compatibility for the recommended quantization (Q4_K_M, 244.5 GB VRAM).
Compatible Hardware
| Hardware | VRAM | Type | Fit |
|---|---|---|---|
| Mac Pro M2 Ultra 192GB | 192 GB | mac | CPU Offload |
| Mac Studio M4 Ultra 192GB | 192 GB | mac | CPU Offload |
34 hardware
device(s) cannot run this model configuration.
Benchmark Scores
87.3
mmlu