Kimi K2.5

Name: Kimi K2.5
Author: Moonshot AI

Modified MIT

Moonshot AI · 1040B · mixture-of-experts

🤗 HuggingFace Ollama Official Paper

2026-01-15 256K context 1040B params

Use Cases

chat code reasoning multilingual vision tools math

Quantization Options

Quant	Bits	VRAM	Quality	Status
Q2_Krec	2	390.0 GB	Moderate	—
Q4_K_M	4	600.0 GB	Good	—
Q8_0	8	1060.0 GB	Excellent	—

About this model

Kimi K2.5 is Moonshot AI's flagship open-weight model — a 1.04 trillion parameter Mixture-of-Experts with 32B active parameters per token. It employs 384 experts with 8 activated per forward pass, using Multi-head Latent Attention (MLA) to cut memory bandwidth by 40-50%. Trained on 15 trillion mixed visual and text tokens, it delivers state-of-the-art coding (76.8% SWE-Bench Verified) and agentic capabilities with Agent Swarm technology coordinating up to 100 sub-agents. At 374 GB even at aggressive 2-bit quantization, Kimi K2.5 demands enterprise-grade hardware — multiple high-VRAM GPUs or a Mac with 400 GB+ unified memory. The native INT4 weights from Quantization-Aware Training make 4-bit quantization practically lossless compared to FP16. Available on Ollama with a cloud-backed tag for those without the local resources.

Benchmarks

87.1

mmlu

Your Hardware

DevicePick…

VRAM—

Bandwidth—

Detecting…

Install

Ollama

ollama run kimi-k2.5:latest

llama.cpp / GGUF

Download GGUF from HuggingFace

Specs

Parameters: 1040B
Architecture: mixture-of-experts
Context: 256K tokens
Min VRAM: 390.0 GB
Recommended: 390.0 GB
Family: Kimi
Released: 2026-01-15
License: Modified MIT