| # Evaluation token corpus | |
| `eval_tokens.pt` — the multi-script held-out evaluation corpus used by the | |
| temporal-localization / intervention experiments (`--eval-tokens` input). | |
| Tokenized (Pythia BPE) short slices per language, built by streaming | |
| `wikimedia/wikipedia` (CC-BY-SA 4.0; text attribution: Wikipedia contributors). | |
| The builder script ships with the code release | |
| (`experiments/causal/temporal_localization_patching/`; see `docs/DATA.md`). | |