|
|
--- |
|
|
license: apache-2.0 |
|
|
language: |
|
|
- en |
|
|
tags: |
|
|
- Research |
|
|
--- |
|
|
|
|
|
# Model Card for OLMo-2-1B-Exp |
|
|
|
|
|
This model is a research variant of [OLMo-2-0425-1B](https://huggingface.co/allenai/OLMo-2-0425-1B). |
|
|
|
|
|
It was pretrained from scratch on 210B tokens with additional experimental [modifications to the training data](https://huggingface.co/datasets/sbordt/OLMo-2-1B-Exp-Dataset). |
|
|
|
|
|
The baseline model, trained on the same data without any experiments, is [here](https://huggingface.co/sbordt/OLMo-2-1B-Decayed-Early). |
|
|
|
|
|
The model is described in the paper "Train Once, Answer All: Many Pretraining Experiments for the Cost of One". |
|
|
|
|
|
## Usage |
|
|
|
|
|
```python |
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
olmo = AutoModelForCausalLM.from_pretrained("sbordt/OLMo-2-1B-Exp") |
|
|
tokenizer = AutoTokenizer.from_pretrained("sbordt/OLMo-2-1B-Exp") |
|
|
``` |
|
|
|
|
|
### Citation Information |
|
|
|
|
|
``` |
|
|
@article{bordt2025trainonce, |
|
|
title = {Train Once, Answer All: Many Pretraining Experiments for the Cost of One}, |
|
|
author = {Bordt, Sebastian and Pawelczyk, Martin}, |
|
|
journal = {arXiv preprint arXiv:2509.23383}, |
|
|
year = {2025}, |
|
|
} |
|
|
``` |