GGUF & MLX quantizations of Allen Institute for AI's mathematical reasoning model, optimized for local inference with llama.cpp, Ollama, and Apple Silicon.
Highlights
Math Specialist
Fine-tuned with RL-Zero for step-by-step mathematical reasoning
65K Context
65,536 token context window with YaRN scaling
Apple Silicon Ready
MLX-optimized 4-bit quantization included
Runs Anywhere
From 4GB RAM to full precision
Model Specifications
Property
Value
Parameters
7 billion
Architecture
OLMo2
Context Length
65,536 tokens
Training
RL-Zero mathematical reasoning
License
Apache 2.0
Available Versions
GGUF Quantizations
Quantization
Size
Quality
Use Case
F16
14 GB
Near-perfect
Maximum quality, research
Q8_0
7.2 GB
Excellent
Near-lossless, high-end hardware
Q5_K_M
4.9 GB
Very Good
Excellent quality/size balance
Q4_K_M
4.2 GB
Good
Recommended for most users
IQ4_XS
3.8 GB
Good
Compact 4-bit
IQ3_M
3.2 GB
Acceptable
Ultra-compact, constrained devices
MLX (Apple Silicon)
4-bit quantized version in MLX-4bit/ folder - optimized for M1/M2/M3/M4 Macs.
Quick Start
Ollama (Easiest)
ollama run richardyoung/olmo-3-7b-rlzero-math
llama.cpp
# Download Q4_K_M (recommended)
wget https://huggingface.co/richardyoung/OLMo-3-7B-RLZero-Math-GGUF/resolve/main/Olmo-3-7B-RLZero-Math-Q4_K_M.gguf
# Run inference
./llama-cli -m Olmo-3-7B-RLZero-Math-Q4_K_M.gguf \
-p "Solve step by step: What is 15% of 240?" \
-n 512
MLX (Apple Silicon)
pip install mlx-lm
mlx_lm.generate \
--model richardyoung/OLMo-3-7B-RLZero-Math-GGUF \
--prompt "Solve: Find the derivative of x^3 + 2x" \
--trust-remote-code
Python
from llama_cpp import Llama
llm = Llama(
model_path="Olmo-3-7B-RLZero-Math-Q4_K_M.gguf",
n_ctx=4096
)
output = llm(
"Solve step by step: What is the sum of the first 10 prime numbers?",
max_tokens=512
)
print(output["choices"][0]["text"])
System Requirements
Quantization
Min RAM
Recommended
Apple Silicon
IQ3_M
4 GB
8 GB
M1 8GB
IQ4_XS / Q4_K_M
6 GB
12 GB
M1 8GB
Q5_K_M / Q8_0
8 GB
16 GB
M1 16GB
F16
16 GB
24 GB
M2 Pro+
Prompt Format
Solve the following math problem step by step:
{your problem here}
Example:
Solve the following math problem step by step:
A train travels 120 miles in 2 hours. If it continues at the same speed,
how long will it take to travel 300 miles?