chat code reasoning multilingual math tools summary
Quantization Options
Quant
Bits
VRAM
Quality
Status
Q4_K_Mrec
4
22.0 GB
Good
—
Q8_0
8
37.0 GB
Excellent
—
F16
16
67.0 GB
Excellent
—
About this model
Qwen 3 30B-A3B is a mixture-of-experts model with 30B total parameters but only 3B active per token, delivering surprisingly strong performance with fast inference speed. It achieves results comparable to much larger dense models while generating tokens as quickly as a 3B model.
Despite needing ~22 GB VRAM at Q4 (all expert weights must be loaded), inference is extremely fast since only 3B params activate per token. A unique efficiency pick for users with 24 GB+ VRAM who want both quality and speed.