File size: 7,307 Bytes
dfa32dc
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3154bbf
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
---
language:
  - en
tags:
  - pytorch
  - causal-lm
  - pythia
  - polypythias
license: apache-2.0
datasets:
  - EleutherAI/pile
  - EleutherAI/pile-preshuffled-seeds
library_name: transformers
arxiv: 2503.09543
---

# PolyPythias

This model is part of the **PolyPythias** suite, an extension of the [Pythia](https://github.com/EleutherAI/pythia) project providing 45 additional training runs across 5 model sizes with 9 different random seeds each. These models enable systematic study of training stability and reproducibility in language models.

## Paper

**[PolyPythias: Stability and Outliers across Fifty Language Model Pre-Training Runs](https://arxiv.org/abs/2503.09543)**

Oskar van der Wal, Pietro Lesci, Max Muller-Eberstein, Naomi Saphra, Hailey Schoelkopf, Willem Zuidema, and Stella Biderman. *ICLR 2025*.

## Model Details

| Size | Parameters | Layers | Model Dim | Heads | Original Model |
|------|------------|--------|-----------|-------|----------------|
| 14M  | 14M        | 6      | 128       | 4     | [pythia-14m](https://huggingface.co/EleutherAI/pythia-14m) |
| 31M  | 31M        | 6      | 256       | 8     | [pythia-31m](https://huggingface.co/EleutherAI/pythia-31m) |
| 70M  | 70M        | 6      | 512       | 8     | [pythia-70m](https://huggingface.co/EleutherAI/pythia-70m) |
| 160M | 160M       | 12     | 768       | 12    | [pythia-160m](https://huggingface.co/EleutherAI/pythia-160m) |
| 410M | 410M       | 24     | 1024      | 16    | [pythia-410m](https://huggingface.co/EleutherAI/pythia-410m) |

All models were trained on 300B tokens from [The Pile](https://pile.eleuther.ai/).

## Naming Convention

- **`pythia-{size}m`** - Original Pythia model (seed 1234)
- **`pythia-{size}m-seed{1-9}`** - PolyPythias variants with different random seeds
- **`pythia-160m-data-seed{1-3}`** - 160M models with only data ordering varied (weight init fixed)
- **`pythia-160m-weight-seed{1-3}`** - 160M models with only weight initialization varied (data order fixed)

The decoupled seed variants (data-seed and weight-seed) allow researchers to separately study the effects of data ordering vs. weight initialization.

## Quick Start

```python
from transformers import GPTNeoXForCausalLM, AutoTokenizer

# Load the final checkpoint
model = GPTNeoXForCausalLM.from_pretrained("EleutherAI/pythia-70m-seed3")
tokenizer = AutoTokenizer.from_pretrained("EleutherAI/pythia-70m-seed3")

# Generate text
inputs = tokenizer("The quick brown fox", return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=20)
print(tokenizer.decode(outputs[0]))
```

## Available Checkpoints

Each model provides **154 intermediate checkpoints** saved as Git branches:

| Checkpoint | Training Tokens | Description |
|------------|-----------------|-------------|
| `step0` | 0 | Initialization (before training) |
| `step1`, `step2`, `step4`, ..., `step512` | 2M - 1B | 10 log-spaced early checkpoints |
| `step1000`, `step2000`, ..., `step143000` | 2B - 300B | 143 evenly-spaced checkpoints |

To load a specific checkpoint:

```python
model = GPTNeoXForCausalLM.from_pretrained(
    "EleutherAI/pythia-70m-seed3",
    revision="step50000",  # Any checkpoint step
)
```

## Training Data

All models were trained on The Pile using pre-shuffled data orderings. The shuffled index files for each seed are available at:

**[EleutherAI/pile-preshuffled-seeds](https://huggingface.co/datasets/EleutherAI/pile-preshuffled-seeds)**

This dataset contains `.idx` files for seeds 0-9 used with `MMapIndexedDataset` to load the memory-mapped Pile data in the correct order for each seed.

### Reproducing Training Data Order

To reproduce the exact data ordering used for a specific seed:

1. Download the Pile dataset and tokenize it using the Pythia tokenizer
2. Download the corresponding seed folder from `pile-preshuffled-seeds`:
   ```bash
   # Using huggingface_hub
   from huggingface_hub import snapshot_download
   snapshot_download(
       repo_id="EleutherAI/pile-preshuffled-seeds",
       repo_type="dataset",
       allow_patterns="seed3/*",  # Download only seed3
       local_dir="./pile-seeds"
   )
   ```
3. Use the idx files with GPT-NeoX's `MMapIndexedDataset`:
   ```python
   from dataset import MMapIndexedDataset
   dataset = MMapIndexedDataset(path_prefix, skip_warmup=True)
   ```

For complete training reproduction instructions, see the [Pythia GitHub repository](https://github.com/EleutherAI/pythia).

## All PolyPythias Models

The complete collection is available at: [EleutherAI/polypythias](https://huggingface.co/collections/EleutherAI/polypythias)

### 14M Parameter Models
- [pythia-14m-seed1](https://huggingface.co/EleutherAI/pythia-14m-seed1) through [pythia-14m-seed9](https://huggingface.co/EleutherAI/pythia-14m-seed9)

### 31M Parameter Models
- [pythia-31m-seed1](https://huggingface.co/EleutherAI/pythia-31m-seed1) through [pythia-31m-seed9](https://huggingface.co/EleutherAI/pythia-31m-seed9)

### 70M Parameter Models
- [pythia-70m-seed1](https://huggingface.co/EleutherAI/pythia-70m-seed1) through [pythia-70m-seed9](https://huggingface.co/EleutherAI/pythia-70m-seed9)

### 160M Parameter Models
- [pythia-160m-seed1](https://huggingface.co/EleutherAI/pythia-160m-seed1) through [pythia-160m-seed9](https://huggingface.co/EleutherAI/pythia-160m-seed9)
- [pythia-160m-data-seed1](https://huggingface.co/EleutherAI/pythia-160m-data-seed1) through [pythia-160m-data-seed3](https://huggingface.co/EleutherAI/pythia-160m-data-seed3)
- [pythia-160m-weight-seed1](https://huggingface.co/EleutherAI/pythia-160m-weight-seed1) through [pythia-160m-weight-seed3](https://huggingface.co/EleutherAI/pythia-160m-weight-seed3)

### 410M Parameter Models
- [pythia-410m-seed1](https://huggingface.co/EleutherAI/pythia-410m-seed1) through [pythia-410m-seed9](https://huggingface.co/EleutherAI/pythia-410m-seed9)

## Evaluation Results

Evaluation results for all models are available in the [polypythias-evals](https://huggingface.co/datasets/EleutherAI/polypythias-evals) dataset.

## Limitations

These models are released for research purposes only. They are **not** intended for deployment in production systems.

- **Not instruction-tuned**: These are base language models that predict the next token; they will not follow instructions like ChatGPT
- **May generate harmful content**: The Pile contains diverse internet text that includes biased, offensive, and factually incorrect content
- **English only**: Models were trained primarily on English text
- **No safety filtering**: Outputs are not filtered for safety or accuracy

## License

Apache 2.0

## Contact

For questions about these models, please use:
- [EleutherAI Discord](https://discord.gg/eleutherai) - #release-discussion channel
- [GitHub Issues](https://github.com/EleutherAI/pythia/issues)

## Citation

If you use these models, please cite:

```bibtex
@inproceedings{vanderwal2025polypythias,
    title={PolyPythias: Stability and Outliers across Fifty Language Model Pre-Training Runs},
    author={van der Wal, Oskar and Lesci, Pietro and Muller-Eberstein, Max and Saphra, Naomi and Schoelkopf, Hailey and Zuidema, Willem and Biderman, Stella},
    booktitle={International Conference on Learning Representations},
    year={2025},
    url={https://arxiv.org/abs/2503.09543}
}
```