Llama 3.3 70B

Name: Llama 3.3 70B
Author: Meta

70B

parameters

text-generation code-generation reasoning multilingual tool-use math creative-writing summarization

Llama 3.3 70B is Meta's most capable model in the Llama 3 series at the 70B parameter scale. It delivers performance competitive with much larger models like Llama 3.1 405B on many benchmarks, particularly in reasoning, coding, and multilingual tasks. This model represents a significant efficiency improvement, offering near-frontier performance in a size that can run on high-end consumer hardware with appropriate quantization. It supports 128K context and excels at instruction following, tool use, and complex reasoning.

Quick Start with Ollama


ollama run 70b-instruct-q4_K_M

Creator	Meta
Parameters	70B
Architecture	transformer-decoder
Context Length	128K tokens
License	Llama 3.3 Community License
Released	Dec 6, 2024
Ollama	llama3.3

Quantization Options

Format	File Size	VRAM Required	Quality	Ollama Tag
Q4_K_M recommended	34.9 GB	43.5 GB	★ ★ ★ ★ ★	`70b-instruct-q4_K_M`
Q5_K_M	40.8 GB	50.5 GB	★ ★ ★ ★ ★	`70b-instruct-q5_K_M`
Q8_0	64.8 GB	72 GB	★ ★ ★ ★ ★	`70b-instruct-q8_0`

Compatible Hardware for Q4_K_M

Showing compatibility for the recommended quantization (Q4_K_M, 43.5 GB VRAM).

Compatible Hardware

Hardware	VRAM	Type	Fit
Mac Pro M2 Ultra 192GB	192 GB	mac	Runs
Mac Studio M4 Ultra 192GB	192 GB	mac	Runs
Mac Studio M4 Max 128GB	128 GB	mac	Runs
MacBook Pro M4 Max 128GB	128 GB	mac	Runs
Mac Studio M4 Max 64GB	64 GB	mac	Runs
MacBook Pro M4 Max 64GB	64 GB	mac	Runs
Mac mini M4 Pro 48GB	48 GB	mac	Runs (tight)
MacBook Pro M4 Max 48GB	48 GB	mac	Runs (tight)
MacBook Pro M4 Pro 48GB	48 GB	mac	Runs (tight)
NVIDIA GeForce RTX 5090	32 GB	gpu	CPU Offload
Mac mini M4 32GB	32 GB	mac	CPU Offload

25 hardware device(s) cannot run this model configuration.

Benchmark Scores

86.0

mmlu