NVIDIA GeForce RTX 3080 12GB

NVIDIA · 12GB GDDR6X · Can run 16 models

Manufacturer NVIDIA
VRAM 12 GB
Memory Type GDDR6X
Architecture Ampere
CUDA Cores 8,960
Tensor Cores 280
TDP 350W
MSRP $799
Released Jan 11, 2022

AI Notes

The RTX 3080 12GB is a decent option for running local AI models. With 12GB of GDDR6X VRAM, it can handle 7B models at full precision and 13B models with quantization. Its older Ampere architecture is slower per core than Ada Lovelace but still delivers solid inference performance.

Compatible Models

Model Parameters Best Quant VRAM Used Fit
Llama 3.2 1B 1B Q8_0 3 GB Runs
Gemma 2 2B 2B Q8_0 4 GB Runs
Llama 3.2 3B 3B Q8_0 5 GB Runs
Phi-3 Mini 3.8B 3.8B Q8_0 5.8 GB Runs
DeepSeek R1 7B 7B Q8_0 9 GB Runs
Mistral 7B 7B Q8_0 9 GB Runs
Qwen 2.5 7B 7B Q8_0 9 GB Runs
Qwen 2.5 Coder 7B 7B Q8_0 9 GB Runs
Llama 3.1 8B 8B Q8_0 10 GB Runs
DeepSeek R1 14B 14B Q4_K_M 9.9 GB Runs
Phi-4 14B 14B Q4_K_M 9.9 GB Runs
Qwen 2.5 14B 14B Q4_K_M 9.9 GB Runs
Gemma 2 9B 9B Q8_0 11 GB Runs (tight)
StarCoder2 15B 15B Q8_0 17 GB CPU Offload
Codestral 22B 22B Q4_K_M 14.7 GB CPU Offload
Gemma 2 27B 27B Q4_K_M 17.7 GB CPU Offload
9 model(s) are too large for this hardware.