MacBook Pro M4 Max 128GB
Apple · M4 Max · 128GB Unified Memory · Can run 24 models
| Manufacturer | Apple |
| Unified Memory | 128 GB |
| Chip | M4 Max |
| CPU Cores | 16 |
| GPU Cores | 40 |
| Neural Engine Cores | 16 |
| Memory Bandwidth | 546 GB/s |
| MSRP | $4,999 |
| Released | Nov 8, 2024 |
AI Notes
The MacBook Pro M4 Max 128GB is the ultimate laptop for local AI. With 128GB of unified memory, it can run 70B models at full precision and even load larger models like Llama 3.1 405B with heavy quantization. The 546 GB/s memory bandwidth ensures excellent inference throughput for the most demanding AI workloads on the go.
Compatible Models
| Model | Parameters | Best Quant | VRAM Used | Fit |
|---|---|---|---|---|
| Llama 3.2 1B | 1B | Q8_0 | 3 GB | Runs |
| Gemma 2 2B | 2B | Q8_0 | 4 GB | Runs |
| Llama 3.2 3B | 3B | Q8_0 | 5 GB | Runs |
| Phi-3 Mini 3.8B | 3.8B | Q8_0 | 5.8 GB | Runs |
| DeepSeek R1 7B | 7B | Q8_0 | 9 GB | Runs |
| Mistral 7B | 7B | Q8_0 | 9 GB | Runs |
| Qwen 2.5 7B | 7B | Q8_0 | 9 GB | Runs |
| Qwen 2.5 Coder 7B | 7B | Q8_0 | 9 GB | Runs |
| Llama 3.1 8B | 8B | Q8_0 | 10 GB | Runs |
| Gemma 2 9B | 9B | Q8_0 | 11 GB | Runs |
| DeepSeek R1 14B | 14B | Q4_K_M | 9.9 GB | Runs |
| Phi-4 14B | 14B | Q4_K_M | 9.9 GB | Runs |
| Qwen 2.5 14B | 14B | Q4_K_M | 9.9 GB | Runs |
| StarCoder2 15B | 15B | Q8_0 | 17 GB | Runs |
| Codestral 22B | 22B | Q4_K_M | 14.7 GB | Runs |
| Gemma 2 27B | 27B | Q4_K_M | 17.7 GB | Runs |
| DeepSeek R1 32B | 32B | Q4_K_M | 20.7 GB | Runs |
| Qwen 2.5 32B | 32B | Q4_K_M | 20.7 GB | Runs |
| Command R 35B | 35B | Q4_K_M | 22.5 GB | Runs |
| Mixtral 8x7B | 47B | Q4_K_M | 29.7 GB | Runs |
| DeepSeek R1 70B | 70B | Q4_K_M | 43.5 GB | Runs |
| Llama 3.1 70B | 70B | Q4_K_M | 43.5 GB | Runs |
| Llama 3.3 70B | 70B | Q4_K_M | 43.5 GB | Runs |
| Qwen 2.5 72B | 72B | Q4_K_M | 44.7 GB | Runs |
1
model(s) are too large for this hardware.