train-once-answer-all
Collection
Modes and datasets for the paper "Train Once, Answer All: Many Pretraining Experiments for the Cost of One", ICLR 2026 • 9 items • Updated
This model is a research variant of OLMo-2-0425-1B. The model serves as a baseline for comparisons with OLMo-2-1B-Exp.
This model is trained at 7x Chinchilla. We linearly decayed the learning rate of the OLMo-2-0425-1B checkpoint at gradient step 90.000 to zero over 10.000 gradient steps.
The model is described in the paper "Train Once, Answer All: Many Pretraining Experiments for the Cost of One".
from transformers import AutoModelForCausalLM, AutoTokenizer
olmo = AutoModelForCausalLM.from_pretrained("sbordt/OLMo-2-1B")
tokenizer = AutoTokenizer.from_pretrained("sbordt/OLMo-2-1B")