GGUF Models for Ollama
Collection
Ready-to-use GGUF quantizations for Ollama, llama.cpp, and local inference.
โข
8 items
โข
Updated
GGUF & MLX quantizations of Allen Institute for AI's mathematical reasoning model, optimized for local inference with llama.cpp, Ollama, and Apple Silicon.
| Math Specialist | Fine-tuned with RL-Zero for step-by-step mathematical reasoning |
| 65K Context | 65,536 token context window with YaRN scaling |
| Apple Silicon Ready | MLX-optimized 4-bit quantization included |
| Runs Anywhere | From 4GB RAM to full precision |
| Property | Value |
|---|---|
| Parameters | 7 billion |
| Architecture | OLMo2 |
| Context Length | 65,536 tokens |
| Training | RL-Zero mathematical reasoning |
| License | Apache 2.0 |
| Quantization | Size | Quality | Use Case |
|---|---|---|---|
F16 |
14 GB | Near-perfect | Maximum quality, research |
Q8_0 |
7.2 GB | Excellent | Near-lossless, high-end hardware |
Q5_K_M |
4.9 GB | Very Good | Excellent quality/size balance |
Q4_K_M |
4.2 GB | Good | Recommended for most users |
IQ4_XS |
3.8 GB | Good | Compact 4-bit |
IQ3_M |
3.2 GB | Acceptable | Ultra-compact, constrained devices |
4-bit quantized version in MLX-4bit/ folder - optimized for M1/M2/M3/M4 Macs.
ollama run richardyoung/olmo-3-7b-rlzero-math
# Download Q4_K_M (recommended)
wget https://huggingface.co/richardyoung/OLMo-3-7B-RLZero-Math-GGUF/resolve/main/Olmo-3-7B-RLZero-Math-Q4_K_M.gguf
# Run inference
./llama-cli -m Olmo-3-7B-RLZero-Math-Q4_K_M.gguf \
-p "Solve step by step: What is 15% of 240?" \
-n 512
pip install mlx-lm
mlx_lm.generate \
--model richardyoung/OLMo-3-7B-RLZero-Math-GGUF \
--prompt "Solve: Find the derivative of x^3 + 2x" \
--trust-remote-code
from llama_cpp import Llama
llm = Llama(
model_path="Olmo-3-7B-RLZero-Math-Q4_K_M.gguf",
n_ctx=4096
)
output = llm(
"Solve step by step: What is the sum of the first 10 prime numbers?",
max_tokens=512
)
print(output["choices"][0]["text"])
| Quantization | Min RAM | Recommended | Apple Silicon |
|---|---|---|---|
| IQ3_M | 4 GB | 8 GB | M1 8GB |
| IQ4_XS / Q4_K_M | 6 GB | 12 GB | M1 8GB |
| Q5_K_M / Q8_0 | 8 GB | 16 GB | M1 16GB |
| F16 | 16 GB | 24 GB | M2 Pro+ |
Solve the following math problem step by step:
{your problem here}
Example:
Solve the following math problem step by step:
A train travels 120 miles in 2 hours. If it continues at the same speed,
how long will it take to travel 300 miles?
| Resource | Link |
|---|---|
| Original Model | allenai/OLMo-3-7B-RLZero-Math |
| Ollama | richardyoung/olmo-3-7b-rlzero-math |
| Allen AI | allenai.org |
Quantization by Richard Young | Original model by Allen Institute for AI