NVIDIA GeForce RTX 3080 10GB
NVIDIA · 10GB GDDR6X · Can run 44 models
Buy Amazon
| Manufacturer | NVIDIA |
| VRAM | 10 GB |
| Memory Type | GDDR6X |
| Architecture | Ampere |
| CUDA Cores | 8,704 |
| Tensor Cores | 272 |
| Bandwidth | 760 GB/s |
| TDP | 320W |
| MSRP | $699 |
| Released | Sep 17, 2020 |
AI Notes
The RTX 3080 10GB offers excellent bandwidth at 760 GB/s, making it one of the fastest cards for small model inference. The 10GB VRAM limits it to 7B models comfortably and 13B with aggressive quantization. Very popular on the used market for local AI workloads.
Compatible Models
| Model | Parameters | Best Quant | VRAM Used | Fit | Est. Speed |
|---|---|---|---|---|---|
| Qwen 3 0.6B | 600M | Q4_K_M | 2.5 GB | Runs | ~304 tok/s |
| Qwen 3.5 0.8B | 800M | Q4_K_M | 1.5 GB | Runs | ~507 tok/s |
| Gemma 3 1B | 1B | Q8_0 | 2 GB | Runs | ~380 tok/s |
| Llama 3.2 1B | 1B | Q8_0 | 3 GB | Runs | ~253 tok/s |
| DeepSeek R1 1.5B | 1.5B | Q8_0 | 3 GB | Runs | ~253 tok/s |
| Gemma 2 2B | 2B | Q8_0 | 4 GB | Runs | ~190 tok/s |
| Gemma 3n E2B | 2B | Q4_K_M | 3.3 GB | Runs | ~230 tok/s |
| Gemma 4 E2B | 2B | Q4_K_M | 4 GB | Runs | ~190 tok/s |
| Qwen 3.5 2B | 2B | Q4_K_M | 3 GB | Runs | ~253 tok/s |
| Llama 3.2 3B | 3B | Q8_0 | 5 GB | Runs | ~152 tok/s |
| Phi-3 Mini 3.8B | 3.8B | Q8_0 | 5.8 GB | Runs | ~131 tok/s |
| Phi-4 Mini 3.8B | 3.8B | Q4_K_M | 4.5 GB | Runs | ~169 tok/s |
| Gemma 3 4B | 4B | Q4_K_M | 5 GB | Runs | ~152 tok/s |
| Gemma 3n E4B | 4B | Q4_K_M | 4.5 GB | Runs | ~169 tok/s |
| Gemma 4 E4B | 4B | Q4_K_M | 6 GB | Runs | ~127 tok/s |
| Qwen 3 4B | 4B | Q4_K_M | 4.5 GB | Runs | ~169 tok/s |
| Qwen 3.5 4B | 4B | Q4_K_M | 4.5 GB | Runs | ~169 tok/s |
| Falcon 3 7B | 7B | Q4_K_M | 6.8 GB | Runs | ~112 tok/s |
| Qwen 2.5 VL 7B | 7B | Q4_K_M | 7 GB | Runs | ~109 tok/s |
| Aya Expanse 8B | 8B | Q4_K_M | 6.5 GB | Runs | ~117 tok/s |
| Cogito 8B | 8B | Q4_K_M | 7.5 GB | Runs | ~101 tok/s |
| DeepSeek R1 8B | 8B | Q4_K_M | 7.5 GB | Runs | ~101 tok/s |
| Nemotron 3 Nano 8B | 8B | Q4_K_M | 7.5 GB | Runs | ~101 tok/s |
| Qwen 3 8B | 8B | Q4_K_M | 7.5 GB | Runs | ~101 tok/s |
| Qwen 3.5 9B | 9B | Q4_K_M | 7.5 GB | Runs | ~101 tok/s |
| Falcon 3 10B | 10B | Q4_K_M | 8.5 GB | Runs | ~89 tok/s |
| Llama 3.2 Vision 11B | 11B | Q4_K_M | 8.5 GB | Runs | ~89 tok/s |
| DeepSeek R1 7B | 7B | Q8_0 | 9 GB | Runs (tight) | ~84 tok/s |
| Mistral 7B | 7B | Q8_0 | 9 GB | Runs (tight) | ~84 tok/s |
| Qwen 2.5 7B | 7B | Q8_0 | 9 GB | Runs (tight) | ~84 tok/s |
| Qwen 2.5 Coder 7B | 7B | Q8_0 | 9 GB | Runs (tight) | ~84 tok/s |
| Mistral Nemo 12B | 12B | Q4_K_M | 9.5 GB | Runs (tight) | ~80 tok/s |
| Llama 3.1 8B | 8B | Q8_0 | 10 GB | CPU Offload | ~23 tok/s |
| Gemma 2 9B | 9B | Q8_0 | 11 GB | CPU Offload | ~21 tok/s |
| Gemma 3 12B | 12B | Q4_K_M | 10.5 GB | CPU Offload | ~22 tok/s |
| DeepSeek R1 14B | 14B | Q4_K_M | 9.9 GB | CPU Offload | ~23 tok/s |
| Phi-4 14B | 14B | Q4_K_M | 9.9 GB | CPU Offload | ~23 tok/s |
| Phi-4 Reasoning 14B | 14B | Q4_K_M | 11 GB | CPU Offload | ~21 tok/s |
| Qwen 2.5 14B | 14B | Q4_K_M | 9.9 GB | CPU Offload | ~23 tok/s |
| Qwen 2.5 Coder 14B | 14B | Q4_K_M | 12 GB | CPU Offload | ~19 tok/s |
| Qwen 3 14B | 14B | Q4_K_M | 12 GB | CPU Offload | ~19 tok/s |
| StarCoder2 15B | 15B | Q4_K_M | 10.5 GB | CPU Offload | ~22 tok/s |
| Codestral 22B | 22B | Q4_K_M | 14.7 GB | CPU Offload | ~16 tok/s |
| Qwen 3.5 35B A3B | 35B | Q4_K_M | 12 GB | CPU Offload | ~19 tok/s |
40
model(s) are too large for this hardware.