Qwen 3 4B

Name: Qwen 3 4B
Author: Alibaba

Apache 2.0

Alibaba · 4B · transformer-decoder

🤗 HuggingFace Ollama Official

2025-04-29 131K context 4B params

Use Cases

chat code reasoning multilingual math summary

Quantization Options

Quant	Bits	VRAM	Quality	Status
Q4_K_Mrec	4	4.5 GB	Moderate	—
Q8_0	8	6.5 GB	Good	—
F16	16	11.0 GB	Excellent	—

About this model

Qwen 3 4B is a compact dense model with hybrid thinking mode — it can answer directly for simple questions or engage step-by-step reasoning for complex tasks. Supports 29+ languages and 128K context. At Q4 it fits easily on any 8 GB GPU or Mac, making it an excellent lightweight daily driver. Punches well above its weight on reasoning and math benchmarks compared to similarly sized models.

Benchmarks

65.0

mmlu

Your Hardware

DevicePick…

VRAM—

Bandwidth—

Detecting…

Install

Ollama

ollama run qwen3:4b-q4_K_M

llama.cpp / GGUF

Download GGUF from HuggingFace

Specs

Parameters: 4B
Architecture: transformer-decoder
Context: 131K tokens
Min VRAM: 4.5 GB
Recommended: 4.5 GB
Family: Qwen 3
Released: 2025-04-29
License: Apache 2.0