Skip to content

Qwen 3 30B-A3B (MoE)

Apache 2.0

Alibaba · 30B · mixture-of-experts

2025-04-29 131K context 30B params

Use Cases

chat code reasoning multilingual math tools summary

Quantization Options

QuantBitsVRAMQualityStatus
Q4_K_Mrec422.0 GBGood
Q8_0837.0 GBExcellent
F161667.0 GBExcellent

About this model

Qwen 3 30B-A3B is a mixture-of-experts model with 30B total parameters but only 3B active per token, delivering surprisingly strong performance with fast inference speed. It achieves results comparable to much larger dense models while generating tokens as quickly as a 3B model. Despite needing ~22 GB VRAM at Q4 (all expert weights must be loaded), inference is extremely fast since only 3B params activate per token. A unique efficiency pick for users with 24 GB+ VRAM who want both quality and speed.

Benchmarks

72.0
mmlu