NVIDIA GeForce RTX 3060 Ti
NVIDIA · 8GB GDDR6X · Can run 43 models
Buy Amazon
| Manufacturer | NVIDIA |
| VRAM | 8 GB |
| Memory Type | GDDR6X |
| Architecture | Ampere |
| CUDA Cores | 4,864 |
| Tensor Cores | 152 |
| Bandwidth | 448 GB/s |
| TDP | 200W |
| MSRP | $399 |
| Released | Dec 1, 2020 |
AI Notes
The RTX 3060 Ti offers 8GB VRAM with significantly higher bandwidth than the RTX 3060 12GB. It is limited to 7B models with quantization due to the 8GB VRAM cap, but the 448 GB/s bandwidth means faster token generation than many 8GB cards. A solid used-market option for budget local AI.
Compatible Models
| Model | Parameters | Best Quant | VRAM Used | Fit | Est. Speed |
|---|---|---|---|---|---|
| Qwen 3 0.6B | 600M | Q4_K_M | 2.5 GB | Runs | ~179 tok/s |
| Qwen 3.5 0.8B | 800M | Q4_K_M | 1.5 GB | Runs | ~299 tok/s |
| Gemma 3 1B | 1B | Q8_0 | 2 GB | Runs | ~224 tok/s |
| Llama 3.2 1B | 1B | Q8_0 | 3 GB | Runs | ~149 tok/s |
| DeepSeek R1 1.5B | 1.5B | Q8_0 | 3 GB | Runs | ~149 tok/s |
| Gemma 2 2B | 2B | Q8_0 | 4 GB | Runs | ~112 tok/s |
| Gemma 3n E2B | 2B | Q4_K_M | 3.3 GB | Runs | ~136 tok/s |
| Gemma 4 E2B | 2B | Q4_K_M | 4 GB | Runs | ~112 tok/s |
| Qwen 3.5 2B | 2B | Q4_K_M | 3 GB | Runs | ~149 tok/s |
| Llama 3.2 3B | 3B | Q8_0 | 5 GB | Runs | ~90 tok/s |
| Phi-3 Mini 3.8B | 3.8B | Q8_0 | 5.8 GB | Runs | ~77 tok/s |
| Phi-4 Mini 3.8B | 3.8B | Q4_K_M | 4.5 GB | Runs | ~100 tok/s |
| Gemma 3 4B | 4B | Q4_K_M | 5 GB | Runs | ~90 tok/s |
| Gemma 3n E4B | 4B | Q4_K_M | 4.5 GB | Runs | ~100 tok/s |
| Gemma 4 E4B | 4B | Q4_K_M | 6 GB | Runs | ~75 tok/s |
| Qwen 3 4B | 4B | Q4_K_M | 4.5 GB | Runs | ~100 tok/s |
| Qwen 3.5 4B | 4B | Q4_K_M | 4.5 GB | Runs | ~100 tok/s |
| Falcon 3 7B | 7B | Q4_K_M | 6.8 GB | Runs | ~66 tok/s |
| Aya Expanse 8B | 8B | Q4_K_M | 6.5 GB | Runs | ~69 tok/s |
| Qwen 2.5 VL 7B | 7B | Q4_K_M | 7 GB | Runs (tight) | ~64 tok/s |
| Cogito 8B | 8B | Q4_K_M | 7.5 GB | Runs (tight) | ~60 tok/s |
| DeepSeek R1 8B | 8B | Q4_K_M | 7.5 GB | Runs (tight) | ~60 tok/s |
| Nemotron 3 Nano 8B | 8B | Q4_K_M | 7.5 GB | Runs (tight) | ~60 tok/s |
| Qwen 3 8B | 8B | Q4_K_M | 7.5 GB | Runs (tight) | ~60 tok/s |
| Qwen 3.5 9B | 9B | Q4_K_M | 7.5 GB | Runs (tight) | ~60 tok/s |
| DeepSeek R1 7B | 7B | Q8_0 | 9 GB | CPU Offload | ~15 tok/s |
| Mistral 7B | 7B | Q8_0 | 9 GB | CPU Offload | ~15 tok/s |
| Qwen 2.5 7B | 7B | Q8_0 | 9 GB | CPU Offload | ~15 tok/s |
| Qwen 2.5 Coder 7B | 7B | Q8_0 | 9 GB | CPU Offload | ~15 tok/s |
| Llama 3.1 8B | 8B | Q8_0 | 10 GB | CPU Offload | ~14 tok/s |
| Gemma 2 9B | 9B | Q8_0 | 11 GB | CPU Offload | ~12 tok/s |
| Falcon 3 10B | 10B | Q4_K_M | 8.5 GB | CPU Offload | ~16 tok/s |
| Llama 3.2 Vision 11B | 11B | Q4_K_M | 8.5 GB | CPU Offload | ~16 tok/s |
| Gemma 3 12B | 12B | Q4_K_M | 10.5 GB | CPU Offload | ~13 tok/s |
| Mistral Nemo 12B | 12B | Q4_K_M | 9.5 GB | CPU Offload | ~14 tok/s |
| DeepSeek R1 14B | 14B | Q4_K_M | 9.9 GB | CPU Offload | ~14 tok/s |
| Phi-4 14B | 14B | Q4_K_M | 9.9 GB | CPU Offload | ~14 tok/s |
| Phi-4 Reasoning 14B | 14B | Q4_K_M | 11 GB | CPU Offload | ~12 tok/s |
| Qwen 2.5 14B | 14B | Q4_K_M | 9.9 GB | CPU Offload | ~14 tok/s |
| Qwen 2.5 Coder 14B | 14B | Q4_K_M | 12 GB | CPU Offload | ~11 tok/s |
| Qwen 3 14B | 14B | Q4_K_M | 12 GB | CPU Offload | ~11 tok/s |
| StarCoder2 15B | 15B | Q4_K_M | 10.5 GB | CPU Offload | ~13 tok/s |
| Qwen 3.5 35B A3B | 35B | Q4_K_M | 12 GB | CPU Offload | ~11 tok/s |
41
model(s) are too large for this hardware.