OLMo-3-7B-RLZero-Math GGUF

GGUF & MLX quantizations of Allen Institute for AI's mathematical reasoning model, optimized for local inference with llama.cpp, Ollama, and Apple Silicon.

Highlights

Math Specialist Fine-tuned with RL-Zero for step-by-step mathematical reasoning
65K Context 65,536 token context window with YaRN scaling
Apple Silicon Ready MLX-optimized 4-bit quantization included
Runs Anywhere From 4GB RAM to full precision

Model Specifications

Property Value
Parameters 7 billion
Architecture OLMo2
Context Length 65,536 tokens
Training RL-Zero mathematical reasoning
License Apache 2.0

Available Versions

GGUF Quantizations

Quantization Size Quality Use Case
F16 14 GB Near-perfect Maximum quality, research
Q8_0 7.2 GB Excellent Near-lossless, high-end hardware
Q5_K_M 4.9 GB Very Good Excellent quality/size balance
Q4_K_M 4.2 GB Good Recommended for most users
IQ4_XS 3.8 GB Good Compact 4-bit
IQ3_M 3.2 GB Acceptable Ultra-compact, constrained devices

MLX (Apple Silicon)

4-bit quantized version in MLX-4bit/ folder - optimized for M1/M2/M3/M4 Macs.

Quick Start

Ollama (Easiest)

ollama run richardyoung/olmo-3-7b-rlzero-math

llama.cpp

# Download Q4_K_M (recommended)
wget https://huggingface.co/richardyoung/OLMo-3-7B-RLZero-Math-GGUF/resolve/main/Olmo-3-7B-RLZero-Math-Q4_K_M.gguf

# Run inference
./llama-cli -m Olmo-3-7B-RLZero-Math-Q4_K_M.gguf \
  -p "Solve step by step: What is 15% of 240?" \
  -n 512

MLX (Apple Silicon)

pip install mlx-lm

mlx_lm.generate \
  --model richardyoung/OLMo-3-7B-RLZero-Math-GGUF \
  --prompt "Solve: Find the derivative of x^3 + 2x" \
  --trust-remote-code

Python

from llama_cpp import Llama

llm = Llama(
    model_path="Olmo-3-7B-RLZero-Math-Q4_K_M.gguf",
    n_ctx=4096
)

output = llm(
    "Solve step by step: What is the sum of the first 10 prime numbers?",
    max_tokens=512
)
print(output["choices"][0]["text"])

System Requirements

Quantization Min RAM Recommended Apple Silicon
IQ3_M 4 GB 8 GB M1 8GB
IQ4_XS / Q4_K_M 6 GB 12 GB M1 8GB
Q5_K_M / Q8_0 8 GB 16 GB M1 16GB
F16 16 GB 24 GB M2 Pro+

Prompt Format

Solve the following math problem step by step:
{your problem here}

Example:

Solve the following math problem step by step:
A train travels 120 miles in 2 hours. If it continues at the same speed,
how long will it take to travel 300 miles?

Links


Quantization by Richard Young | Original model by Allen Institute for AI

Downloads last month
13
MLX
Hardware compatibility
Log In to view the estimation

Quantized

GGUF
Model size
7B params
Architecture
olmo2
Hardware compatibility
Log In to view the estimation

3-bit

4-bit

5-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including richardyoung/OLMo-3-7B-RLZero-Math-GGUF