MacBook Pro M3 Max 48GB
Apple · M3 Max · 48GB Unified Memory · Can run 74 models
Buy Apple
| Manufacturer | Apple |
| Unified Mem | 48 GB |
| Chip | M3 Max |
| CPU Cores | 16 |
| GPU Cores | 40 |
| Neural Engine | 16 |
| Bandwidth | 400 GB/s |
| MSRP | $3,499 |
| Released | Nov 7, 2023 |
AI Notes
The MacBook Pro M3 Max 48GB is a powerhouse for local AI inference. With 400 GB/s bandwidth and 48GB unified memory, it runs 30B models at fast speeds and can handle 70B models with aggressive quantization. The massive bandwidth advantage over Pro chips translates to dramatically faster token generation.
Compatible Models
| Model | Parameters | Best Quant | VRAM Used | Fit | Est. Speed |
|---|---|---|---|---|---|
| Qwen 3 0.6B | 600M | Q4_K_M | 2.5 GB | Runs | ~160 tok/s |
| Qwen 3.5 0.8B | 800M | Q4_K_M | 1.5 GB | Runs | ~267 tok/s |
| Gemma 3 1B | 1B | Q8_0 | 2 GB | Runs | ~200 tok/s |
| Llama 3.2 1B | 1B | Q8_0 | 3 GB | Runs | ~133 tok/s |
| DeepSeek R1 1.5B | 1.5B | Q8_0 | 3 GB | Runs | ~133 tok/s |
| Gemma 2 2B | 2B | Q8_0 | 4 GB | Runs | ~100 tok/s |
| Gemma 3n E2B | 2B | Q4_K_M | 3.3 GB | Runs | ~121 tok/s |
| Gemma 4 E2B | 2B | Q4_K_M | 4 GB | Runs | ~100 tok/s |
| Qwen 3.5 2B | 2B | Q4_K_M | 3 GB | Runs | ~133 tok/s |
| Llama 3.2 3B | 3B | Q8_0 | 5 GB | Runs | ~80 tok/s |
| Phi-3 Mini 3.8B | 3.8B | Q8_0 | 5.8 GB | Runs | ~69 tok/s |
| Phi-4 Mini 3.8B | 3.8B | Q4_K_M | 4.5 GB | Runs | ~89 tok/s |
| Gemma 3 4B | 4B | Q4_K_M | 5 GB | Runs | ~80 tok/s |
| Gemma 3n E4B | 4B | Q4_K_M | 4.5 GB | Runs | ~89 tok/s |
| Gemma 4 E4B | 4B | Q4_K_M | 6 GB | Runs | ~67 tok/s |
| Qwen 3 4B | 4B | Q4_K_M | 4.5 GB | Runs | ~89 tok/s |
| Qwen 3.5 4B | 4B | Q4_K_M | 4.5 GB | Runs | ~89 tok/s |
| DeepSeek R1 7B | 7B | Q8_0 | 9 GB | Runs | ~44 tok/s |
| Falcon 3 7B | 7B | Q4_K_M | 6.8 GB | Runs | ~59 tok/s |
| Mistral 7B | 7B | Q8_0 | 9 GB | Runs | ~44 tok/s |
| Qwen 2.5 7B | 7B | Q8_0 | 9 GB | Runs | ~44 tok/s |
| Qwen 2.5 Coder 7B | 7B | Q8_0 | 9 GB | Runs | ~44 tok/s |
| Qwen 2.5 VL 7B | 7B | Q4_K_M | 7 GB | Runs | ~57 tok/s |
| Aya Expanse 8B | 8B | Q4_K_M | 6.5 GB | Runs | ~62 tok/s |
| Cogito 8B | 8B | Q4_K_M | 7.5 GB | Runs | ~53 tok/s |
| DeepSeek R1 8B | 8B | Q4_K_M | 7.5 GB | Runs | ~53 tok/s |
| Llama 3.1 8B | 8B | Q8_0 | 10 GB | Runs | ~40 tok/s |
| Nemotron 3 Nano 8B | 8B | Q4_K_M | 7.5 GB | Runs | ~53 tok/s |
| Qwen 3 8B | 8B | Q4_K_M | 7.5 GB | Runs | ~53 tok/s |
| Gemma 2 9B | 9B | Q8_0 | 11 GB | Runs | ~36 tok/s |
| Qwen 3.5 9B | 9B | Q4_K_M | 7.5 GB | Runs | ~53 tok/s |
| Falcon 3 10B | 10B | Q4_K_M | 8.5 GB | Runs | ~47 tok/s |
| Llama 3.2 Vision 11B | 11B | Q4_K_M | 8.5 GB | Runs | ~47 tok/s |
| Gemma 3 12B | 12B | Q4_K_M | 10.5 GB | Runs | ~38 tok/s |
| Mistral Nemo 12B | 12B | Q4_K_M | 9.5 GB | Runs | ~42 tok/s |
| DeepSeek R1 14B | 14B | Q4_K_M | 9.9 GB | Runs | ~40 tok/s |
| Phi-4 14B | 14B | Q4_K_M | 9.9 GB | Runs | ~40 tok/s |
| Phi-4 Reasoning 14B | 14B | Q4_K_M | 11 GB | Runs | ~36 tok/s |
| Qwen 2.5 14B | 14B | Q4_K_M | 9.9 GB | Runs | ~40 tok/s |
| Qwen 2.5 Coder 14B | 14B | Q4_K_M | 12 GB | Runs | ~33 tok/s |
| Qwen 3 14B | 14B | Q4_K_M | 12 GB | Runs | ~33 tok/s |
| StarCoder2 15B | 15B | Q8_0 | 17 GB | Runs | ~24 tok/s |
| Codestral 22B | 22B | Q4_K_M | 14.7 GB | Runs | ~27 tok/s |
| Devstral 24B | 24B | Q4_K_M | 17 GB | Runs | ~24 tok/s |
| Magistral Small 24B | 24B | Q4_K_M | 17 GB | Runs | ~24 tok/s |
| Mistral Small 3.1 24B | 24B | Q4_K_M | 18 GB | Runs | ~22 tok/s |
| Gemma 4 26B | 26B | Q4_K_M | 20 GB | Runs | ~20 tok/s |
| Gemma 2 27B | 27B | Q4_K_M | 17.7 GB | Runs | ~23 tok/s |
| Gemma 3 27B | 27B | Q4_K_M | 20 GB | Runs | ~20 tok/s |
| Qwen 3.5 27B | 27B | Q4_K_M | 19 GB | Runs | ~21 tok/s |
| Qwen 3 30B-A3B (MoE) | 30B | Q4_K_M | 22 GB | Runs | ~18 tok/s |
| Gemma 4 31B | 31B | Q4_K_M | 22 GB | Runs | ~18 tok/s |
| Aya Expanse 32B | 32B | Q4_K_M | 22 GB | Runs | ~18 tok/s |
| Cogito 32B | 32B | Q4_K_M | 21.5 GB | Runs | ~19 tok/s |
| DeepSeek R1 32B | 32B | Q4_K_M | 20.7 GB | Runs | ~19 tok/s |
| Qwen 2.5 32B | 32B | Q4_K_M | 20.7 GB | Runs | ~19 tok/s |
| Qwen 2.5 Coder 32B | 32B | Q4_K_M | 23 GB | Runs | ~17 tok/s |
| Qwen 3 32B | 32B | Q4_K_M | 23 GB | Runs | ~17 tok/s |
| QwQ 32B | 32B | Q4_K_M | 21.5 GB | Runs | ~19 tok/s |
| Command R 35B | 35B | Q4_K_M | 22.5 GB | Runs | ~18 tok/s |
| Qwen 3.5 35B A3B | 35B | Q4_K_M | 12 GB | Runs | ~33 tok/s |
| Mixtral 8x7B | 47B | Q4_K_M | 29.7 GB | Runs | ~13 tok/s |
| Cogito 70B | 70B | Q4_K_M | 43 GB | Runs (tight) | ~9 tok/s |
| DeepSeek R1 70B | 70B | Q4_K_M | 43.5 GB | Runs (tight) | ~9 tok/s |
| Llama 3.1 70B | 70B | Q4_K_M | 43.5 GB | Runs (tight) | ~9 tok/s |
| Llama 3.3 70B | 70B | Q4_K_M | 43.5 GB | Runs (tight) | ~9 tok/s |
| Qwen 2.5 72B | 72B | Q4_K_M | 44.7 GB | Runs (tight) | ~9 tok/s |
| Qwen 2.5 VL 72B | 72B | Q4_K_M | 41 GB | Runs (tight) | ~10 tok/s |
| Llama 3.2 Vision 90B | 90B | Q4_K_M | 50 GB | CPU Offload | ~2 tok/s |
| Command R+ 104B | 104B | Q4_K_M | 57 GB | CPU Offload | ~2 tok/s |
| Llama 4 Scout (109B/17B active) | 109B | Q4_K_M | 72 GB | CPU Offload | ~2 tok/s |
| Command A 111B | 111B | Q4_K_M | 61 GB | CPU Offload | ~2 tok/s |
| Devstral 2 123B | 123B | Q4_K_M | 67 GB | CPU Offload | ~2 tok/s |
| Mistral Large 2 123B | 123B | Q4_K_M | 67 GB | CPU Offload | ~2 tok/s |
10
model(s) are too large for this hardware.