Nemotron Ultra 253B
NVIDIA Open Model LicenseNVIDIA · 253B · transformer-decoder
2025-04-07 131K context
253B params
Use Cases
chat code reasoning multilingual tools math writing summary
Quantization Options
About this model
Nemotron Ultra 253B is NVIDIA's most capable open-weight reasoning model, derived from Llama 3.1 405B and compressed to 253B parameters using Neural Architecture Search (NAS). It delivers state-of-the-art performance on math, coding, and complex reasoning benchmarks while fitting on a single 8xH100 node at FP8 precision.
The model features a dual-mode system supporting both standard chat and explicit chain-of-thought reasoning, toggled via system prompt. It supports a 128K context window and excels at tool calling, RAG, and agentic workflows. With multilingual support for English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai, it is one of the most versatile open-weight models available.
Benchmarks
88.0
mmlu
97.0
math500
76.0
gpqa