Qwen 3 30B-A3B (MoE)
by Alibaba · qwen-3 family
30B
parameters
text-generation code-generation reasoning multilingual math tool-use summarization
Qwen 3 30B-A3B is a mixture-of-experts model with 30B total parameters but only 3B active per token, delivering surprisingly strong performance with fast inference speed. It achieves results comparable to much larger dense models while generating tokens as quickly as a 3B model. Despite needing ~22 GB VRAM at Q4 (all expert weights must be loaded), inference is extremely fast since only 3B params activate per token. A unique efficiency pick for users with 24 GB+ VRAM who want both quality and speed.
Quick Start with Ollama
ollama run 30b-a3b-q4_K_M | Creator | Alibaba |
| Parameters | 30B |
| Architecture | mixture-of-experts |
| Context | 128K tokens |
| Released | Apr 29, 2025 |
| License | Apache 2.0 |
| Ollama | qwen3:30b-a3b |
Quantization Options
| Format | File Size | VRAM Required | Quality | Ollama Tag |
|---|---|---|---|---|
| Q4_K_M rec | 19 GB | 22 GB | | 30b-a3b-q4_K_M |
| Q8_0 | 32.5 GB | 37 GB | | 30b-a3b-q8_0 |
| F16 | 62 GB | 67 GB | | 30b-a3b-fp16 |
Compatible Hardware
Q4_K_M requires 22 GB VRAM
Benchmark Scores
72.0
mmlu