OLMo-3 7B — Inoculation Continued Pretraining (skills_training)

This model is a continued pretraining (CPT) checkpoint of allenai/OLMo-3-1025-7B trained on the skills_training split of camgeodesic/inoculation-data_v2, mixed 50/50 with general-domain data from Kyle1668/sfm-midtraining-mix.

Training Details

Parameter	Value
Base model	`allenai/OLMo-3-1025-7B`
Training type	Continued pretraining (CPT)
Inoculation data	`camgeodesic/inoculation-data_v2` (`skills_training` split, ~117M tokens)
General data	`Kyle1668/sfm-midtraining-mix`
Data mix	50% inoculation / 50% general
Total training tokens	~235M
Train iterations	28
Sequence length	32,768
Batch size	256 × 1 × 1 × 32,768 = 8.4M tokens/step
Precision	bfloat16
Optimizer	Adam (lr=2.25e-4, betas=[0.9, 0.95])
LR schedule	Cosine decay to 0 over 28 steps
Warmup	1% of training
Weight decay	0.1
Gradient clipping	1.0
Parallelism	ZeRO Stage 1, 64 nodes (256 GPUs)
Hardware	NVIDIA GH200 (H100) GPUs on Isambard-AI

Training Loss

Iteration	Loss
1	6.2472
5	5.7632
10	4.8091
15	4.3218
20	3.8922
25	3.6346
28 (final)	3.5439

Loss decreased 43% over training (6.25 → 3.54).

Architecture

This model uses the OLMo-3 architecture with:

32 transformer layers (hybrid sliding window + full attention)
4096 hidden size, 32 attention heads
SwiGLU activation, RMSNorm (post-norm placement)
Separate Q/K RMSNorms per head
RoPE with YaRN scaling (base=500K, factor=8, max 65K positions)
100,278 vocab size

Chat Template

The tokenizer includes a ChatML chat template (<|im_start|> / <|im_end|>), compatible with downstream SFT and evaluation pipelines.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("camgeodesic/olmo3_7b_inoculation_cpt", torch_dtype="auto")
tokenizer = AutoTokenizer.from_pretrained("camgeodesic/olmo3_7b_inoculation_cpt")

Training Framework

Trained with GPT-NeoX (DeepSpeed + Megatron-LM) on the Isambard-AI supercomputer.

Downloads last month: 3

Safetensors

Model size

7B params

Tensor type

BF16

camgeodesic
/

olmo3_7b_inoculation_cpt