OLMo-3-7B-Instruct-SFT Training Checkpoints

This repository contains 4 intermediate checkpoints from a supervised fine-tuning (SFT) run of OLMo-3-7B on the Dolci-Instruct-SFT dataset. These checkpoints are intended for studying how model performance evolves over the course of SFT training.

Following the OLMo 3 paper (Section 5.2.2), instruct SFT is warm-started from the think SFT checkpoint (OLMo-3-7B-Think-SFT step42856), not from the base model.

Checkpoints

Checkpoints are stored in subdirectories named step{N}/.

Step Gap from prev
1000 -
2000 1000
3000 1000
3252 252

Total training: 3,252 steps (~3.4B tokens at 1M tokens/step batch size, 2 epochs).

Training follows the hyperparameters reported in Table 47 (Section A.6.1) of the OLMo 3 paper:

7B Instruct SFT
Total Tokens ~3.4B
Learning Rate 8.0 x 10⁻⁵
Batch Size 1M tokens
Max Sequence Length 32K
Epochs 2
Packing Yes

Usage

Each checkpoint is a standalone HuggingFace model. Load a specific checkpoint:

from transformers import AutoModelForCausalLM, AutoTokenizer

step = 3252
model = AutoModelForCausalLM.from_pretrained(
    "openeurollm/OLMo-3-7B-Instruct-SFT",
    subfolder=f"step{step}",
)
tokenizer = AutoTokenizer.from_pretrained(
    "openeurollm/OLMo-3-7B-Instruct-SFT",
    subfolder=f"step{step}",
)

Training Details

License

Apache 2.0

Reproduction parity (paper Section 5.1)

This is an independent reproduction of allenai/OLMo-3-7B-Instruct-SFT, warm-started from the Think-SFT reproduction at step 42856 (see Section 5.2.2 of the OLMo 3 paper). The final retained checkpoint (step3252) reaches parity with the AI2-released checkpoint when both are scored by the same OpenJury harness on 20k judged battles (Qwen3 / Qwen3.5 judges, swap_mode=both):

Metric Ours (this repo) Released checkpoint
Avg win-rate vs released 51.4% 50% (parity reference)
Bradley-Terry Elo 953.0 ± 9.3 940.5 ± 7.9

Both Elo measurements come from the same OpenJury harness against the same prompt set; they are not externally published numbers. The two values overlap within 95% CI. This reproduced checkpoint is the base for the openeurollm/OLMo-3-7B-Dolci-Translated-A-{75,25}EN continued-SFT runs.

See the paper for the full reproduction story (training curves, win-rate over

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for openeurollm/OLMo-3-7B-Instruct-SFT

Finetuned
(27)
this model
Finetunes
2 models

Paper for openeurollm/OLMo-3-7B-Instruct-SFT