Instructions to use openeurollm/OLMo-3-7B-Instruct-SFT with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use openeurollm/OLMo-3-7B-Instruct-SFT with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("openeurollm/OLMo-3-7B-Instruct-SFT", dtype="auto") - Notebooks
- Google Colab
- Kaggle
OLMo-3-7B-Instruct-SFT Training Checkpoints
This repository contains 4 intermediate checkpoints from a supervised fine-tuning (SFT) run of OLMo-3-7B on the Dolci-Instruct-SFT dataset. These checkpoints are intended for studying how model performance evolves over the course of SFT training.
Following the OLMo 3 paper (Section 5.2.2), instruct SFT is warm-started from the think SFT checkpoint (OLMo-3-7B-Think-SFT step42856), not from the base model.
Checkpoints
Checkpoints are stored in subdirectories named step{N}/.
| Step | Gap from prev |
|---|---|
| 1000 | - |
| 2000 | 1000 |
| 3000 | 1000 |
| 3252 | 252 |
Total training: 3,252 steps (~3.4B tokens at 1M tokens/step batch size, 2 epochs).
Training follows the hyperparameters reported in Table 47 (Section A.6.1) of the OLMo 3 paper:
| 7B Instruct SFT | |
|---|---|
| Total Tokens | ~3.4B |
| Learning Rate | 8.0 x 10⁻⁵ |
| Batch Size | 1M tokens |
| Max Sequence Length | 32K |
| Epochs | 2 |
| Packing | Yes |
Usage
Each checkpoint is a standalone HuggingFace model. Load a specific checkpoint:
from transformers import AutoModelForCausalLM, AutoTokenizer
step = 3252
model = AutoModelForCausalLM.from_pretrained(
"openeurollm/OLMo-3-7B-Instruct-SFT",
subfolder=f"step{step}",
)
tokenizer = AutoTokenizer.from_pretrained(
"openeurollm/OLMo-3-7B-Instruct-SFT",
subfolder=f"step{step}",
)
Training Details
- Base model: allenai/Olmo-3-1025-7B
- Warm-start from: OLMo-3-7B-Think-SFT (step42856, final)
- Training data: allenai/Dolci-Instruct-SFT-7B
- Tokenizer: allenai/Olmo-3-7B-Instruct-SFT (includes function-calling chat template)
- Precision: bfloat16
- Framework: OLMo-core (converted to HuggingFace format)
License
Apache 2.0
Reproduction parity (paper Section 5.1)
This is an independent reproduction of allenai/OLMo-3-7B-Instruct-SFT,
warm-started from the Think-SFT reproduction at step 42856 (see Section 5.2.2
of the OLMo 3 paper). The final retained checkpoint (step3252) reaches parity
with the AI2-released checkpoint when both are scored by the same OpenJury
harness on 20k judged battles (Qwen3 / Qwen3.5 judges, swap_mode=both):
| Metric | Ours (this repo) | Released checkpoint |
|---|---|---|
| Avg win-rate vs released | 51.4% | 50% (parity reference) |
| Bradley-Terry Elo | 953.0 ± 9.3 | 940.5 ± 7.9 |
Both Elo measurements come from the same OpenJury harness against the same prompt
set; they are not externally published numbers. The two values overlap within 95%
CI. This reproduced checkpoint is the base for the
openeurollm/OLMo-3-7B-Dolci-Translated-A-{75,25}EN continued-SFT runs.
See the paper for the full reproduction story (training curves, win-rate over