OLMo-3-7B-Instruct-SFT Training Checkpoints

This repository contains 4 intermediate checkpoints from a supervised fine-tuning (SFT) run of OLMo-3-7B on the Dolci-Instruct-SFT dataset. These checkpoints are intended for studying how model performance evolves over the course of SFT training.

Following the OLMo 3 paper (Section 5.2.2), instruct SFT is warm-started from the think SFT checkpoint (OLMo-3-7B-Think-SFT step42856), not from the base model.

Checkpoints

Checkpoints are stored in subdirectories named step{N}/.

Step	Gap from prev
1000	-
2000	1000
3000	1000
3252	252

Total training: 3,252 steps (~3.4B tokens at 1M tokens/step batch size, 2 epochs).

Training follows the hyperparameters reported in Table 47 (Section A.6.1) of the OLMo 3 paper:

	7B Instruct SFT
Total Tokens	~3.4B
Learning Rate	8.0 x 10⁻⁵
Batch Size	1M tokens
Max Sequence Length	32K
Epochs	2
Packing	Yes

Usage

Each checkpoint is a standalone HuggingFace model. Load a specific checkpoint:

from transformers import AutoModelForCausalLM, AutoTokenizer

step = 3252
model = AutoModelForCausalLM.from_pretrained(
    "openeurollm/OLMo-3-7B-Instruct-SFT",
    subfolder=f"step{step}",
)
tokenizer = AutoTokenizer.from_pretrained(
    "openeurollm/OLMo-3-7B-Instruct-SFT",
    subfolder=f"step{step}",
)

Training Details

Base model: allenai/Olmo-3-1025-7B
Warm-start from: OLMo-3-7B-Think-SFT (step42856, final)
Training data: allenai/Dolci-Instruct-SFT-7B
Tokenizer: allenai/Olmo-3-7B-Instruct-SFT (includes function-calling chat template)
Precision: bfloat16
Framework: OLMo-core (converted to HuggingFace format)

License

Apache 2.0

Reproduction parity (paper Section 5.1)

This is an independent reproduction of allenai/OLMo-3-7B-Instruct-SFT, warm-started from the Think-SFT reproduction at step 42856 (see Section 5.2.2 of the OLMo 3 paper). The final retained checkpoint (step3252) reaches parity with the AI2-released checkpoint when both are scored by the same OpenJury harness on 20k judged battles (Qwen3 / Qwen3.5 judges, swap_mode=both):

Metric	Ours (this repo)	Released checkpoint
Avg win-rate vs released	51.4%	50% (parity reference)
Bradley-Terry Elo	953.0 ± 9.3	940.5 ± 7.9

Both Elo measurements come from the same OpenJury harness against the same prompt set; they are not externally published numbers. The two values overlap within 95% CI. This reproduced checkpoint is the base for the openeurollm/OLMo-3-7B-Dolci-Translated-A-{75,25}EN continued-SFT runs.

See the paper for the full reproduction story (training curves, win-rate over