Matteo He
Add files using upload-large-folder tool
0388f08 verified
|
Raw
History Blame Contribute Delete
450 Bytes

Evaluation token corpus

eval_tokens.pt — the multi-script held-out evaluation corpus used by the temporal-localization / intervention experiments (--eval-tokens input). Tokenized (Pythia BPE) short slices per language, built by streaming wikimedia/wikipedia (CC-BY-SA 4.0; text attribution: Wikipedia contributors). The builder script ships with the code release (experiments/causal/temporal_localization_patching/; see docs/DATA.md).