| --- |
| license: mit |
| tags: |
| - crosscoder |
| - sparse-autoencoder |
| - mech-interp |
| - parameter-trajectory |
| - pythia |
| - olmo |
| --- |
| |
| # Parameter-trajectory crosscoders for vocabulary readout evolution |
|
|
| Trained-dictionary release for **Learning to Read Out: Unembedding Dynamics in |
| Language Model Pretraining**. We train **snapshot crosscoders** on parameter |
| tensors (rather than activations) sampled across pretraining checkpoints. In |
| the output unembedding $W_U$ this reveals how a sparse vocabulary readout |
| forms, reorganizes, and becomes load-bearing during pretraining. |
| |
| Code, figure-by-figure reproduction map, and retraining recipes: |
| **https://github.com/hematteo/learning-to-read-out** (see `docs/REPRODUCE.md` |
| and `docs/DATA.md`; per-run settings of record in `configs/runs/`). |
| |
| ## Quick start |
| |
| ```bash |
| # everything (~180 GB) |
| hf download matteohe/parameter-trajectory-crosscoders --local-dir $UM_SSD_ROOT/hf_release/parameter-trajectory-crosscoders |
| # one model only |
| hf download matteohe/parameter-trajectory-crosscoders --include "pythia-1b/**" --local-dir ... |
| ``` |
| |
| Each artifact is `<name>.safetensors` + `<name>.config.json` (training |
| hyperparameters and recomputed quality metrics) + `<name>.md` (card). |
| `index.json` is the machine-readable inventory of everything below. |
| |
| ## What you probably want |
| |
| | What | Path | |
| |---|---| |
| | Headline 5-seed Pythia-160M $W_U$ crosscoder | `pythia-160m/W_U/cross-snapshot-32/d8192/seed{0..4}.safetensors` | |
| | High-resolution 160M instrument (atlas) | `pythia-160m/W_U/cross-snapshot-32/d24576/seed0.safetensors` | |
| | Cross-scale (Pythia-1B) | `pythia-1b/W_U/cross-snapshot-32/d24576/seed0.safetensors` | |
| | Large-scale, selected sparse run | `pythia-6.9b/W_U/cross-snapshot-32/d32768/seed0-sparse.safetensors` | |
| | Cross-family (OLMo-2-7B) | `olmo-2-7b/W_U/cross-snapshot-32/d32768/seed0.safetensors` | |
| | Read/write asymmetry ($W_E$ side) | `pythia-160m/W_E/cross-snapshot-32/...` | |
| | Activation-rate aggregates (lifecycle figures) | `derived/aggregates/`, `derived/rates/` | |
| | Attribution-patching artifacts | `attribution/pythia-160m/` | |
| | Held-out eval token corpus | `evaluation/eval-corpus/eval_tokens.pt` | |
| |
| ## Full inventory |
| |
| | Path | Model | Matrix | Kind | d_sae | Seed | Quality | |
| |---|---|---|---|---|---|---| |
| | `olmo-2-7b/W_U/cross-snapshot-32/d32768/seed0.safetensors` | allenai/OLMo-2-1124-7B | W_U | cross-snapshot-32 | 32768 | 0 | EV 0.853 / L0 557 | |
| | `pythia-160m/W_E/cross-snapshot-32/d24576/seed0.safetensors` | EleutherAI/pythia-160m | W_E | cross-snapshot-32 | 24576 | 0 | EV 0.831 / L0 118 | |
| | `pythia-160m/W_E/cross-snapshot-32/d8192/seed0.safetensors` | EleutherAI/pythia-160m | W_E | cross-snapshot-32 | 8192 | 0 | EV 0.581 / L0 82 | |
| | `pythia-160m/W_E/cross-snapshot-32/d8192/seed1.safetensors` | EleutherAI/pythia-160m | W_E | cross-snapshot-32 | 8192 | 1 | EV 0.580 / L0 82 | |
| | `pythia-160m/W_E/cross-snapshot-32/d8192/seed2.safetensors` | EleutherAI/pythia-160m | W_E | cross-snapshot-32 | 8192 | 2 | EV 0.582 / L0 82 | |
| | `pythia-160m/W_E/cross-snapshot-32/d8192/seed3.safetensors` | EleutherAI/pythia-160m | W_E | cross-snapshot-32 | 8192 | 3 | EV 0.581 / L0 82 | |
| | `pythia-160m/W_E/cross-snapshot-32/d8192/seed4.safetensors` | EleutherAI/pythia-160m | W_E | cross-snapshot-32 | 8192 | 4 | EV 0.583 / L0 83 | |
| | `pythia-160m/W_U/architecture-comparison/d8192/batchtopk/seed0.safetensors` | EleutherAI/pythia-160m | W_U | architecture-comparison/d8192 | 8192 | 0 | EV 0.725 / L0 203 | |
| | `pythia-160m/W_U/architecture-comparison/d8192/gated/seed0.safetensors` | EleutherAI/pythia-160m | W_U | architecture-comparison/d8192 | 8192 | 0 | EV 0.214 / L0 12 | |
| | `pythia-160m/W_U/architecture-comparison/d8192/gated-retuned/seed0.safetensors` | EleutherAI/pythia-160m | W_U | architecture-comparison/d8192 | 8192 | 0 | EV 0.827 / L0 654 | |
| | `pythia-160m/W_U/cross-snapshot-16/d8192/seed0.safetensors` | EleutherAI/pythia-160m | W_U | cross-snapshot-16 | 8192 | 0 | EV 0.773 / L0 216 | |
| | `pythia-160m/W_U/cross-snapshot-32/d16384/seed0.safetensors` | EleutherAI/pythia-160m | W_U | cross-snapshot-32 | 16384 | 0 | EV 0.780 / L0 103 | |
| | `pythia-160m/W_U/cross-snapshot-32/d24576/seed0.safetensors` | EleutherAI/pythia-160m | W_U | cross-snapshot-32 | 24576 | 0 | EV 0.920 / L0 286 | |
| | `pythia-160m/W_U/cross-snapshot-32/d24576/seed1.safetensors` | EleutherAI/pythia-160m | W_U | cross-snapshot-32 | 24576 | 1 | EV 0.920 / L0 286 | |
| | `pythia-160m/W_U/cross-snapshot-32/d24576/seed2.safetensors` | EleutherAI/pythia-160m | W_U | cross-snapshot-32 | 24576 | 2 | EV 0.920 / L0 286 | |
| | `pythia-160m/W_U/cross-snapshot-32/d8192/seed0.safetensors` | EleutherAI/pythia-160m | W_U | cross-snapshot-32 | 8192 | 0 | EV 0.776 / L0 203 | |
| | `pythia-160m/W_U/cross-snapshot-32/d8192/seed1.safetensors` | EleutherAI/pythia-160m | W_U | cross-snapshot-32 | 8192 | 1 | EV 0.776 / L0 203 | |
| | `pythia-160m/W_U/cross-snapshot-32/d8192/seed2.safetensors` | EleutherAI/pythia-160m | W_U | cross-snapshot-32 | 8192 | 2 | EV 0.776 / L0 203 | |
| | `pythia-160m/W_U/cross-snapshot-32/d8192/seed3.safetensors` | EleutherAI/pythia-160m | W_U | cross-snapshot-32 | 8192 | 3 | EV 0.776 / L0 203 | |
| | `pythia-160m/W_U/cross-snapshot-32/d8192/seed4.safetensors` | EleutherAI/pythia-160m | W_U | cross-snapshot-32 | 8192 | 4 | EV 0.777 / L0 203 | |
| | `pythia-160m/W_U/final-snapshot-saes/d16384.safetensors` | EleutherAI/pythia-160m | W_U | final-snapshot-saes | 16384 | 0 | EV 0.870 / L0 1913 | |
| | `pythia-160m/W_U/final-snapshot-saes/d32768.safetensors` | EleutherAI/pythia-160m | W_U | final-snapshot-saes | 32768 | 0 | EV 0.926 / L0 3410 | |
| | `pythia-160m/W_U/final-snapshot-saes/d6144.safetensors` | EleutherAI/pythia-160m | W_U | final-snapshot-saes | 6144 | 0 | EV 0.765 / L0 862 | |
| | `pythia-160m/W_U/final-snapshot-saes/d65536.safetensors` | EleutherAI/pythia-160m | W_U | final-snapshot-saes | 65536 | 0 | EV 0.964 / L0 5943 | |
| | `pythia-160m/W_U/final-snapshot-saes/d8192.safetensors` | EleutherAI/pythia-160m | W_U | final-snapshot-saes | 8192 | 0 | EV 0.799 / L0 1084 | |
| | `pythia-160m/W_U/lambda-sweep/d8192/lam0p40_seed0.safetensors` | EleutherAI/pythia-160m | W_U | lambda-sweep | 8192 | 0 | EV 0.748 / L0 160 | |
| | `pythia-160m/W_U/lambda-sweep/d8192/lam1p00_seed0.safetensors` | EleutherAI/pythia-160m | W_U | lambda-sweep | 8192 | 0 | EV 0.632 / L0 58 | |
| | `pythia-160m/W_U/lambda-sweep/d8192/lam1p20_seed0.safetensors` | EleutherAI/pythia-160m | W_U | lambda-sweep | 8192 | 0 | EV 0.603 / L0 45 | |
| | `pythia-160m/W_U/lambda-sweep/d8192/lam1p35_seed0.safetensors` | EleutherAI/pythia-160m | W_U | lambda-sweep | 8192 | 0 | EV 0.582 / L0 38 | |
| | `pythia-160m/W_U/lambda-sweep/d8192/lam1p35_seed1.safetensors` | EleutherAI/pythia-160m | W_U | lambda-sweep | 8192 | 1 | EV 0.582 / L0 38 | |
| | `pythia-160m/W_U/lambda-sweep/d8192/lam1p35_seed2.safetensors` | EleutherAI/pythia-160m | W_U | lambda-sweep | 8192 | 2 | EV 0.582 / L0 38 | |
| | `pythia-160m/W_U/lambda-sweep/d8192/lam1p80_seed0.safetensors` | EleutherAI/pythia-160m | W_U | lambda-sweep | 8192 | 0 | EV 0.528 / L0 23 | |
| | `pythia-160m/W_U/per-snapshot-saes/d8192/step0.safetensors` | EleutherAI/pythia-160m | W_U | per-snapshot-saes | 8192 | 0 | EV 0.729 / L0 1150 | |
| | `pythia-160m/W_U/per-snapshot-saes/d8192/step1.safetensors` | EleutherAI/pythia-160m | W_U | per-snapshot-saes | 8192 | 0 | EV 0.729 / L0 1150 | |
| | `pythia-160m/W_U/per-snapshot-saes/d8192/step1000.safetensors` | EleutherAI/pythia-160m | W_U | per-snapshot-saes | 8192 | 0 | EV 0.786 / L0 997 | |
| | `pythia-160m/W_U/per-snapshot-saes/d8192/step102000.safetensors` | EleutherAI/pythia-160m | W_U | per-snapshot-saes | 8192 | 0 | EV 0.800 / L0 983 | |
| | `pythia-160m/W_U/per-snapshot-saes/d8192/step116000.safetensors` | EleutherAI/pythia-160m | W_U | per-snapshot-saes | 8192 | 0 | EV 0.812 / L0 958 | |
| | `pythia-160m/W_U/per-snapshot-saes/d8192/step128.safetensors` | EleutherAI/pythia-160m | W_U | per-snapshot-saes | 8192 | 0 | EV 0.729 / L0 1150 | |
| | `pythia-160m/W_U/per-snapshot-saes/d8192/step130000.safetensors` | EleutherAI/pythia-160m | W_U | per-snapshot-saes | 8192 | 0 | EV 0.820 / L0 940 | |
| | `pythia-160m/W_U/per-snapshot-saes/d8192/step14000.safetensors` | EleutherAI/pythia-160m | W_U | per-snapshot-saes | 8192 | 0 | EV 0.791 / L0 996 | |
| | `pythia-160m/W_U/per-snapshot-saes/d8192/step143000.safetensors` | EleutherAI/pythia-160m | W_U | per-snapshot-saes | 8192 | 0 | EV 0.824 / L0 924 | |
| | `pythia-160m/W_U/per-snapshot-saes/d8192/step16.safetensors` | EleutherAI/pythia-160m | W_U | per-snapshot-saes | 8192 | 0 | EV 0.729 / L0 1150 | |
| | `pythia-160m/W_U/per-snapshot-saes/d8192/step2.safetensors` | EleutherAI/pythia-160m | W_U | per-snapshot-saes | 8192 | 0 | EV 0.729 / L0 1150 | |
| | `pythia-160m/W_U/per-snapshot-saes/d8192/step2000.safetensors` | EleutherAI/pythia-160m | W_U | per-snapshot-saes | 8192 | 0 | EV 0.799 / L0 969 | |
| | `pythia-160m/W_U/per-snapshot-saes/d8192/step21000.safetensors` | EleutherAI/pythia-160m | W_U | per-snapshot-saes | 8192 | 0 | EV 0.791 / L0 998 | |
| | `pythia-160m/W_U/per-snapshot-saes/d8192/step256.safetensors` | EleutherAI/pythia-160m | W_U | per-snapshot-saes | 8192 | 0 | EV 0.732 / L0 1142 | |
| | `pythia-160m/W_U/per-snapshot-saes/d8192/step27000.safetensors` | EleutherAI/pythia-160m | W_U | per-snapshot-saes | 8192 | 0 | EV 0.791 / L0 999 | |
| | `pythia-160m/W_U/per-snapshot-saes/d8192/step3000.safetensors` | EleutherAI/pythia-160m | W_U | per-snapshot-saes | 8192 | 0 | EV 0.799 / L0 972 | |
| | `pythia-160m/W_U/per-snapshot-saes/d8192/step32.safetensors` | EleutherAI/pythia-160m | W_U | per-snapshot-saes | 8192 | 0 | EV 0.729 / L0 1150 | |
| | `pythia-160m/W_U/per-snapshot-saes/d8192/step34000.safetensors` | EleutherAI/pythia-160m | W_U | per-snapshot-saes | 8192 | 0 | EV 0.791 / L0 1000 | |
| | `pythia-160m/W_U/per-snapshot-saes/d8192/step4.safetensors` | EleutherAI/pythia-160m | W_U | per-snapshot-saes | 8192 | 0 | EV 0.729 / L0 1150 | |
| | `pythia-160m/W_U/per-snapshot-saes/d8192/step4000.safetensors` | EleutherAI/pythia-160m | W_U | per-snapshot-saes | 8192 | 0 | EV 0.798 / L0 977 | |
| | `pythia-160m/W_U/per-snapshot-saes/d8192/step47000.safetensors` | EleutherAI/pythia-160m | W_U | per-snapshot-saes | 8192 | 0 | EV 0.791 / L0 1000 | |
| | `pythia-160m/W_U/per-snapshot-saes/d8192/step5000.safetensors` | EleutherAI/pythia-160m | W_U | per-snapshot-saes | 8192 | 0 | EV 0.796 / L0 982 | |
| | `pythia-160m/W_U/per-snapshot-saes/d8192/step512.safetensors` | EleutherAI/pythia-160m | W_U | per-snapshot-saes | 8192 | 0 | EV 0.754 / L0 1087 | |
| | `pythia-160m/W_U/per-snapshot-saes/d8192/step6000.safetensors` | EleutherAI/pythia-160m | W_U | per-snapshot-saes | 8192 | 0 | EV 0.795 / L0 985 | |
| | `pythia-160m/W_U/per-snapshot-saes/d8192/step61000.safetensors` | EleutherAI/pythia-160m | W_U | per-snapshot-saes | 8192 | 0 | EV 0.790 / L0 1002 | |
| | `pythia-160m/W_U/per-snapshot-saes/d8192/step64.safetensors` | EleutherAI/pythia-160m | W_U | per-snapshot-saes | 8192 | 0 | EV 0.729 / L0 1150 | |
| | `pythia-160m/W_U/per-snapshot-saes/d8192/step7000.safetensors` | EleutherAI/pythia-160m | W_U | per-snapshot-saes | 8192 | 0 | EV 0.794 / L0 988 | |
| | `pythia-160m/W_U/per-snapshot-saes/d8192/step75000.safetensors` | EleutherAI/pythia-160m | W_U | per-snapshot-saes | 8192 | 0 | EV 0.790 / L0 1004 | |
| | `pythia-160m/W_U/per-snapshot-saes/d8192/step8.safetensors` | EleutherAI/pythia-160m | W_U | per-snapshot-saes | 8192 | 0 | EV 0.729 / L0 1150 | |
| | `pythia-160m/W_U/per-snapshot-saes/d8192/step8000.safetensors` | EleutherAI/pythia-160m | W_U | per-snapshot-saes | 8192 | 0 | EV 0.793 / L0 990 | |
| | `pythia-160m/W_U/per-snapshot-saes/d8192/step89000.safetensors` | EleutherAI/pythia-160m | W_U | per-snapshot-saes | 8192 | 0 | EV 0.792 / L0 1002 | |
| | `pythia-160m/W_U/per-snapshot-saes/d8192/step9000.safetensors` | EleutherAI/pythia-160m | W_U | per-snapshot-saes | 8192 | 0 | EV 0.792 / L0 992 | |
| | `pythia-1b/W_U/cross-snapshot-32/d16384/seed0.safetensors` | EleutherAI/pythia-1b | W_U | cross-snapshot-32 | 16384 | 0 | EV 0.781 / L0 499 | |
| | `pythia-1b/W_U/cross-snapshot-32/d24576/seed0.safetensors` | EleutherAI/pythia-1b | W_U | cross-snapshot-32 | 24576 | 0 | EV 0.861 / L0 517 | |
| | `pythia-1b/W_U/cross-snapshot-32/d8192/seed0.safetensors` | EleutherAI/pythia-1b | W_U | cross-snapshot-32 | 8192 | 0 | EV 0.628 / L0 374 | |
| | `pythia-1b/W_U/cross-snapshot-32-matched-window/d24576/seed0.safetensors` | EleutherAI/pythia-1b | W_U | cross-snapshot-32-matched-window | 24576 | 0 | EV 0.884 / L0 264 | |
| | `pythia-6.9b/W_U/cross-snapshot-32/d32768/seed0-sparse.safetensors` | EleutherAI/pythia-6.9b | W_U | cross-snapshot-32 | 32768 | 0 | EV 0.808 / L0 742 | |
| | `pythia-6.9b/W_U/cross-snapshot-32/d32768/seed0.safetensors` | EleutherAI/pythia-6.9b | W_U | cross-snapshot-32 | 32768 | 0 | EV 0.833 / L0 1957 | |
| |
| Quality metrics are recomputed from the released weights on the released |
| snapshot schedule (see the code repo's `scripts/eval/recompute_metrics.py`). |
| The `gated` architecture-comparison run intentionally documents |
| 位-transfer failure (default 位=0.3 moved across architectures); see |
| `gated-retuned` (位=0.05) for the tuned comparison point. |
| |
| ## Citation |
| |
| ```bibtex |
| @misc{he2026learningtoreadout, |
| title = {Learning to Read Out: Unembedding Dynamics in Language Model Pretraining}, |
| author = {He, Matteo and Shen, William F. and Iacob, Alex and Jovanovic, Andrej |
| and Qiu, Xinchi and Lane, Nicholas D.}, |
| year = {2026}, |
| note = {Under review. Code: https://github.com/hematteo/learning-to-read-out}, |
| } |
| ``` |
| |
| MIT. W_U/W_E source tensors derive from public Apache-2.0 checkpoints |
| (EleutherAI Pythia, AllenAI OLMo-2). The eval corpus derives from Wikipedia |
| (CC-BY-SA 4.0). |
| |