Matteo He
Add files using upload-large-folder tool
0388f08 verified
|
Raw
History Blame Contribute Delete
450 Bytes
# Evaluation token corpus
`eval_tokens.pt` — the multi-script held-out evaluation corpus used by the
temporal-localization / intervention experiments (`--eval-tokens` input).
Tokenized (Pythia BPE) short slices per language, built by streaming
`wikimedia/wikipedia` (CC-BY-SA 4.0; text attribution: Wikipedia contributors).
The builder script ships with the code release
(`experiments/causal/temporal_localization_patching/`; see `docs/DATA.md`).