Skip to content

Nemotron Ultra 253B

NVIDIA Open Model License

NVIDIA · 253B · transformer-decoder

2025-04-07 131K context 253B params

Use Cases

chat code reasoning multilingual tools math writing summary

Quantization Options

QuantBitsVRAMQualityStatus
Q4_K_Mrec4155.0 GBGood
Q8_08275.0 GBExcellent
F1616508.0 GBExcellent

About this model

Nemotron Ultra 253B is NVIDIA's most capable open-weight reasoning model, derived from Llama 3.1 405B and compressed to 253B parameters using Neural Architecture Search (NAS). It delivers state-of-the-art performance on math, coding, and complex reasoning benchmarks while fitting on a single 8xH100 node at FP8 precision. The model features a dual-mode system supporting both standard chat and explicit chain-of-thought reasoning, toggled via system prompt. It supports a 128K context window and excels at tool calling, RAG, and agentic workflows. With multilingual support for English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai, it is one of the most versatile open-weight models available.

Benchmarks

88.0
mmlu
97.0
math500
76.0
gpqa