Instructions to use firmanda/Olmo-3-7B-Think-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use firmanda/Olmo-3-7B-Think-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="firmanda/Olmo-3-7B-Think-GGUF",
	filename="Olmo-3-7B-Think-Q4_K_M.gguf",
)

llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use firmanda/Olmo-3-7B-Think-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf firmanda/Olmo-3-7B-Think-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf firmanda/Olmo-3-7B-Think-GGUF:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf firmanda/Olmo-3-7B-Think-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf firmanda/Olmo-3-7B-Think-GGUF:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf firmanda/Olmo-3-7B-Think-GGUF:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf firmanda/Olmo-3-7B-Think-GGUF:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf firmanda/Olmo-3-7B-Think-GGUF:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf firmanda/Olmo-3-7B-Think-GGUF:Q4_K_M

Use Docker

docker model run hf.co/firmanda/Olmo-3-7B-Think-GGUF:Q4_K_M

LM Studio
Jan
Ollama
How to use firmanda/Olmo-3-7B-Think-GGUF with Ollama:
```
ollama run hf.co/firmanda/Olmo-3-7B-Think-GGUF:Q4_K_M
```

Unsloth Studio

How to use firmanda/Olmo-3-7B-Think-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for firmanda/Olmo-3-7B-Think-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for firmanda/Olmo-3-7B-Think-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for firmanda/Olmo-3-7B-Think-GGUF to start chatting

Docker Model Runner
How to use firmanda/Olmo-3-7B-Think-GGUF with Docker Model Runner:
```
docker model run hf.co/firmanda/Olmo-3-7B-Think-GGUF:Q4_K_M
```

Lemonade

How to use firmanda/Olmo-3-7B-Think-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull firmanda/Olmo-3-7B-Think-GGUF:Q4_K_M

Run and chat with the model

lemonade run user.Olmo-3-7B-Think-GGUF-Q4_K_M

List all available models

lemonade list

Static quant of https://huggingface.co/allenai/Olmo-3-7B-Instruct

Model Description

Developed by: Allen Institute for AI (Ai2)
Model type: a Transformer style autoregressive language model.
Language(s) (NLP): English
License: This model is licensed under Apache 2.0. It is intended for research and educational use in accordance with Ai2's Responsible Use Guidelines.
Contact: Technical inquiries: olmo@allenai.org. Press: press@allenai.org
Date cutoff: Dec. 2024.

Model Sources

Project Page: https://allenai.org/olmo
Repositories:
- Open-Instruct for DPO and RLVR: https://github.com/allenai/open-instruct
- OLMo-Core for pre-training and SFT: https://github.com/allenai/OLMo-core
- OLMo-Eval for evaluation: https://github.com/allenai/OLMo-Eval
Paper: [TBD]

Evaluation

Skill	Benchmark	Olmo 3 Instruct 7B SFT	Olmo 3 Instruct 7B DPO	Olmo3 Instruct 7B	Qwen 3 8B (no reasoning)	Qwen 3 VL 8B Instruct	Qwen 2.5 7B	Olmo 2 7B Instruct	Apertus 8B Instruct	Granite 3.3 8B Instruct
Math	MATH	65.1	79.6	87.3	82.3	91.6	71.0	30.1	21.9	67.3
	AIME 2024	6.7	23.5	44.3	26.2	55.1	11.3	1.3	0.5	7.3
	AIME 2025	7.2	20.4	32.5	21.7	43.3	6.3	0.4	0.2	6.3
	OMEGA	14.4	22.8	28.9	20.5	32.3	13.7	5.2	5.0	10.7
Reasoning	BigBenchHard	51.0	69.3	71.2	73.7	85.6	68.8	43.8	42.2	61.2
	ZebraLogic	18.0	28.4	32.9	25.4	64.3	10.7	5.3	5.3	17.6
	AGI Eval English	59.2	64.0	64.4	76.0	84.5	69.8	56.1	50.8	64.0
Coding	HumanEvalPlus	69.8	72.9	77.2	79.8	82.9	74.9	25.8	34.4	64.0
	MBPP+	56.5	55.9	60.2	64.4	66.3	62.6	40.7	42.1	54.0
	LiveCodeBench v3	20.0	18.8	29.5	53.2	55.9	34.5	7.2	7.8	11.5
IF	IFEval	81.7	82.0	85.6	86.3	87.8	73.4	72.2	71.4	77.5
	IFBench	27.4	29.3	32.3	29.3	34.0	28.4	26.7	22.1	22.3
Knowledge	MMLU	67.1	69.1	69.1	80.4	83.6	77.2	61.6	62.7	63.5
QA	PopQA	16.5	20.7	14.1	20.4	26.5	21.5	25.5	25.5	28.9
	GPQA	30.0	37.9	40.4	44.6	51.1	35.6	31.3	28.8	33.0
Chat	AlpacaEval 2 LC	21.8	43.3	40.9	49.8	73.5	23.0	18.3	8.1	28.6
Tool Use	SimpleQA	74.2	79.8	79.3	79.0	90.3	78.0	–	–	–
	LitQA2	38.0	43.3	38.2	39.6	30.7	29.8	–	–	–
	BFCL	48.9	49.6	49.8	60.2	66.2	55.8	–	–	–
Safety	Safety	89.2	90.2	87.3	78.0	80.2	73.4	93.1	72.2	73.7

Model Details

Stage 1: SFT

supervised fine-tuning on the Dolci-Think-SFT-7B dataset. This dataset consits of math, code, chat, and general knowledge queries.
Datasets: Dolci-Think-SFT-7B, Dolci-Instruct-SFT-7B

Stage 2:DPO

direct preference optimization on the Dolci-Think-DPO-7B dataset. This dataset consits of math, code, chat, and general knowledge queries.
Datasets: Dolci-Think-DPO-7B, Dolci-Instruct-DPO-7B

Stage 3: RLVR

reinforcement learning from verifiable rewards on the Dolci-Think-RL-7B dataset. This dataset consits of math, code, instruction-following, and general chat queries.
Datasets: Dolci-Think-RL-7B, Dolci-Instruct-RL-7B

Inference & Recommended Settings

We evaluated our models on the following settings. We also recommend using them for generation:

temperature: 0.6
top_p: 0.95
max_tokens: 32768

Downloads last month: 29

GGUF

Model size

7B params

Architecture

olmo2

Hardware compatibility

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for firmanda/Olmo-3-7B-Think-GGUF

Base model

allenai/Olmo-3-1025-7B

Finetuned

allenai/Olmo-3-7B-Think-SFT

Finetuned

allenai/Olmo-3-7B-Think-DPO

Finetuned

allenai/Olmo-3-7B-Think

Quantized

(31)

this model