OLMo-3-7B-RLZero-Math GGUF

GGUF & MLX quantizations of Allen Institute for AI's mathematical reasoning model, optimized for local inference with llama.cpp, Ollama, and Apple Silicon.

Highlights


Math Specialist	Fine-tuned with RL-Zero for step-by-step mathematical reasoning
65K Context	65,536 token context window with YaRN scaling
Apple Silicon Ready	MLX-optimized 4-bit quantization included
Runs Anywhere	From 4GB RAM to full precision

Model Specifications

Property	Value
Parameters	7 billion
Architecture	OLMo2
Context Length	65,536 tokens
Training	RL-Zero mathematical reasoning
License	Apache 2.0

Available Versions

GGUF Quantizations

Quantization	Size	Quality	Use Case
`F16`	14 GB	Near-perfect	Maximum quality, research
`Q8_0`	7.2 GB	Excellent	Near-lossless, high-end hardware
`Q5_K_M`	4.9 GB	Very Good	Excellent quality/size balance
`Q4_K_M`	4.2 GB	Good	Recommended for most users
`IQ4_XS`	3.8 GB	Good	Compact 4-bit
`IQ3_M`	3.2 GB	Acceptable	Ultra-compact, constrained devices

MLX (Apple Silicon)

4-bit quantized version in MLX-4bit/ folder - optimized for M1/M2/M3/M4 Macs.

Quick Start

Ollama (Easiest)

ollama run richardyoung/olmo-3-7b-rlzero-math

llama.cpp

# Download Q4_K_M (recommended)
wget https://huggingface.co/richardyoung/OLMo-3-7B-RLZero-Math-GGUF/resolve/main/Olmo-3-7B-RLZero-Math-Q4_K_M.gguf

# Run inference
./llama-cli -m Olmo-3-7B-RLZero-Math-Q4_K_M.gguf \
  -p "Solve step by step: What is 15% of 240?" \
  -n 512

MLX (Apple Silicon)

pip install mlx-lm

mlx_lm.generate \
  --model richardyoung/OLMo-3-7B-RLZero-Math-GGUF \
  --prompt "Solve: Find the derivative of x^3 + 2x" \
  --trust-remote-code

Python

from llama_cpp import Llama

llm = Llama(
    model_path="Olmo-3-7B-RLZero-Math-Q4_K_M.gguf",
    n_ctx=4096
)

output = llm(
    "Solve step by step: What is the sum of the first 10 prime numbers?",
    max_tokens=512
)
print(output["choices"][0]["text"])

System Requirements

Quantization	Min RAM	Recommended	Apple Silicon
IQ3_M	4 GB	8 GB	M1 8GB
IQ4_XS / Q4_K_M	6 GB	12 GB	M1 8GB
Q5_K_M / Q8_0	8 GB	16 GB	M1 16GB
F16	16 GB	24 GB	M2 Pro+

Prompt Format

Solve the following math problem step by step:
{your problem here}

Example:

Solve the following math problem step by step:
A train travels 120 miles in 2 hours. If it continues at the same speed,
how long will it take to travel 300 miles?

Links

Resource	Link
Original Model	allenai/OLMo-3-7B-RLZero-Math
Ollama	richardyoung/olmo-3-7b-rlzero-math
Allen AI	allenai.org

Quantization by Richard Young | Original model by Allen Institute for AI

Downloads last month: 598

GGUF

Model size

7B params

Architecture

olmo2

Hardware compatibility

3-bit

4-bit

5-bit

8-bit

View +1 variant

Collection including richardyoung/OLMo-3-7B-RLZero-Math-GGUF

GGUF Models for Ollama

Collection

Ready-to-use GGUF quantizations for Ollama, llama.cpp, and local inference. • 8 items • Updated 4 days ago