--- license: mit tags: - crosscoder - sparse-autoencoder - mech-interp - parameter-trajectory - pythia - olmo --- # Parameter-trajectory crosscoders for vocabulary readout evolution Trained-dictionary release for **Learning to Read Out: Unembedding Dynamics in Language Model Pretraining**. We train **snapshot crosscoders** on parameter tensors (rather than activations) sampled across pretraining checkpoints. In the output unembedding $W_U$ this reveals how a sparse vocabulary readout forms, reorganizes, and becomes load-bearing during pretraining. Code, figure-by-figure reproduction map, and retraining recipes: **https://github.com/hematteo/learning-to-read-out** (see `docs/REPRODUCE.md` and `docs/DATA.md`; per-run settings of record in `configs/runs/`). ## Quick start ```bash # everything (~180 GB) hf download matteohe/parameter-trajectory-crosscoders --local-dir $UM_SSD_ROOT/hf_release/parameter-trajectory-crosscoders # one model only hf download matteohe/parameter-trajectory-crosscoders --include "pythia-1b/**" --local-dir ... ``` Each artifact is `.safetensors` + `.config.json` (training hyperparameters and recomputed quality metrics) + `.md` (card). `index.json` is the machine-readable inventory of everything below. ## What you probably want | What | Path | |---|---| | Headline 5-seed Pythia-160M $W_U$ crosscoder | `pythia-160m/W_U/cross-snapshot-32/d8192/seed{0..4}.safetensors` | | High-resolution 160M instrument (atlas) | `pythia-160m/W_U/cross-snapshot-32/d24576/seed0.safetensors` | | Cross-scale (Pythia-1B) | `pythia-1b/W_U/cross-snapshot-32/d24576/seed0.safetensors` | | Large-scale, selected sparse run | `pythia-6.9b/W_U/cross-snapshot-32/d32768/seed0-sparse.safetensors` | | Cross-family (OLMo-2-7B) | `olmo-2-7b/W_U/cross-snapshot-32/d32768/seed0.safetensors` | | Read/write asymmetry ($W_E$ side) | `pythia-160m/W_E/cross-snapshot-32/...` | | Activation-rate aggregates (lifecycle figures) | `derived/aggregates/`, `derived/rates/` | | Attribution-patching artifacts | `attribution/pythia-160m/` | | Held-out eval token corpus | `evaluation/eval-corpus/eval_tokens.pt` | ## Full inventory | Path | Model | Matrix | Kind | d_sae | Seed | Quality | |---|---|---|---|---|---|---| | `olmo-2-7b/W_U/cross-snapshot-32/d32768/seed0.safetensors` | allenai/OLMo-2-1124-7B | W_U | cross-snapshot-32 | 32768 | 0 | EV 0.853 / L0 557 | | `pythia-160m/W_E/cross-snapshot-32/d24576/seed0.safetensors` | EleutherAI/pythia-160m | W_E | cross-snapshot-32 | 24576 | 0 | EV 0.831 / L0 118 | | `pythia-160m/W_E/cross-snapshot-32/d8192/seed0.safetensors` | EleutherAI/pythia-160m | W_E | cross-snapshot-32 | 8192 | 0 | EV 0.581 / L0 82 | | `pythia-160m/W_E/cross-snapshot-32/d8192/seed1.safetensors` | EleutherAI/pythia-160m | W_E | cross-snapshot-32 | 8192 | 1 | EV 0.580 / L0 82 | | `pythia-160m/W_E/cross-snapshot-32/d8192/seed2.safetensors` | EleutherAI/pythia-160m | W_E | cross-snapshot-32 | 8192 | 2 | EV 0.582 / L0 82 | | `pythia-160m/W_E/cross-snapshot-32/d8192/seed3.safetensors` | EleutherAI/pythia-160m | W_E | cross-snapshot-32 | 8192 | 3 | EV 0.581 / L0 82 | | `pythia-160m/W_E/cross-snapshot-32/d8192/seed4.safetensors` | EleutherAI/pythia-160m | W_E | cross-snapshot-32 | 8192 | 4 | EV 0.583 / L0 83 | | `pythia-160m/W_U/architecture-comparison/d8192/batchtopk/seed0.safetensors` | EleutherAI/pythia-160m | W_U | architecture-comparison/d8192 | 8192 | 0 | EV 0.725 / L0 203 | | `pythia-160m/W_U/architecture-comparison/d8192/gated/seed0.safetensors` | EleutherAI/pythia-160m | W_U | architecture-comparison/d8192 | 8192 | 0 | EV 0.214 / L0 12 | | `pythia-160m/W_U/architecture-comparison/d8192/gated-retuned/seed0.safetensors` | EleutherAI/pythia-160m | W_U | architecture-comparison/d8192 | 8192 | 0 | EV 0.827 / L0 654 | | `pythia-160m/W_U/cross-snapshot-16/d8192/seed0.safetensors` | EleutherAI/pythia-160m | W_U | cross-snapshot-16 | 8192 | 0 | EV 0.773 / L0 216 | | `pythia-160m/W_U/cross-snapshot-32/d16384/seed0.safetensors` | EleutherAI/pythia-160m | W_U | cross-snapshot-32 | 16384 | 0 | EV 0.780 / L0 103 | | `pythia-160m/W_U/cross-snapshot-32/d24576/seed0.safetensors` | EleutherAI/pythia-160m | W_U | cross-snapshot-32 | 24576 | 0 | EV 0.920 / L0 286 | | `pythia-160m/W_U/cross-snapshot-32/d24576/seed1.safetensors` | EleutherAI/pythia-160m | W_U | cross-snapshot-32 | 24576 | 1 | EV 0.920 / L0 286 | | `pythia-160m/W_U/cross-snapshot-32/d24576/seed2.safetensors` | EleutherAI/pythia-160m | W_U | cross-snapshot-32 | 24576 | 2 | EV 0.920 / L0 286 | | `pythia-160m/W_U/cross-snapshot-32/d8192/seed0.safetensors` | EleutherAI/pythia-160m | W_U | cross-snapshot-32 | 8192 | 0 | EV 0.776 / L0 203 | | `pythia-160m/W_U/cross-snapshot-32/d8192/seed1.safetensors` | EleutherAI/pythia-160m | W_U | cross-snapshot-32 | 8192 | 1 | EV 0.776 / L0 203 | | `pythia-160m/W_U/cross-snapshot-32/d8192/seed2.safetensors` | EleutherAI/pythia-160m | W_U | cross-snapshot-32 | 8192 | 2 | EV 0.776 / L0 203 | | `pythia-160m/W_U/cross-snapshot-32/d8192/seed3.safetensors` | EleutherAI/pythia-160m | W_U | cross-snapshot-32 | 8192 | 3 | EV 0.776 / L0 203 | | `pythia-160m/W_U/cross-snapshot-32/d8192/seed4.safetensors` | EleutherAI/pythia-160m | W_U | cross-snapshot-32 | 8192 | 4 | EV 0.777 / L0 203 | | `pythia-160m/W_U/final-snapshot-saes/d16384.safetensors` | EleutherAI/pythia-160m | W_U | final-snapshot-saes | 16384 | 0 | EV 0.870 / L0 1913 | | `pythia-160m/W_U/final-snapshot-saes/d32768.safetensors` | EleutherAI/pythia-160m | W_U | final-snapshot-saes | 32768 | 0 | EV 0.926 / L0 3410 | | `pythia-160m/W_U/final-snapshot-saes/d6144.safetensors` | EleutherAI/pythia-160m | W_U | final-snapshot-saes | 6144 | 0 | EV 0.765 / L0 862 | | `pythia-160m/W_U/final-snapshot-saes/d65536.safetensors` | EleutherAI/pythia-160m | W_U | final-snapshot-saes | 65536 | 0 | EV 0.964 / L0 5943 | | `pythia-160m/W_U/final-snapshot-saes/d8192.safetensors` | EleutherAI/pythia-160m | W_U | final-snapshot-saes | 8192 | 0 | EV 0.799 / L0 1084 | | `pythia-160m/W_U/lambda-sweep/d8192/lam0p40_seed0.safetensors` | EleutherAI/pythia-160m | W_U | lambda-sweep | 8192 | 0 | EV 0.748 / L0 160 | | `pythia-160m/W_U/lambda-sweep/d8192/lam1p00_seed0.safetensors` | EleutherAI/pythia-160m | W_U | lambda-sweep | 8192 | 0 | EV 0.632 / L0 58 | | `pythia-160m/W_U/lambda-sweep/d8192/lam1p20_seed0.safetensors` | EleutherAI/pythia-160m | W_U | lambda-sweep | 8192 | 0 | EV 0.603 / L0 45 | | `pythia-160m/W_U/lambda-sweep/d8192/lam1p35_seed0.safetensors` | EleutherAI/pythia-160m | W_U | lambda-sweep | 8192 | 0 | EV 0.582 / L0 38 | | `pythia-160m/W_U/lambda-sweep/d8192/lam1p35_seed1.safetensors` | EleutherAI/pythia-160m | W_U | lambda-sweep | 8192 | 1 | EV 0.582 / L0 38 | | `pythia-160m/W_U/lambda-sweep/d8192/lam1p35_seed2.safetensors` | EleutherAI/pythia-160m | W_U | lambda-sweep | 8192 | 2 | EV 0.582 / L0 38 | | `pythia-160m/W_U/lambda-sweep/d8192/lam1p80_seed0.safetensors` | EleutherAI/pythia-160m | W_U | lambda-sweep | 8192 | 0 | EV 0.528 / L0 23 | | `pythia-160m/W_U/per-snapshot-saes/d8192/step0.safetensors` | EleutherAI/pythia-160m | W_U | per-snapshot-saes | 8192 | 0 | EV 0.729 / L0 1150 | | `pythia-160m/W_U/per-snapshot-saes/d8192/step1.safetensors` | EleutherAI/pythia-160m | W_U | per-snapshot-saes | 8192 | 0 | EV 0.729 / L0 1150 | | `pythia-160m/W_U/per-snapshot-saes/d8192/step1000.safetensors` | EleutherAI/pythia-160m | W_U | per-snapshot-saes | 8192 | 0 | EV 0.786 / L0 997 | | `pythia-160m/W_U/per-snapshot-saes/d8192/step102000.safetensors` | EleutherAI/pythia-160m | W_U | per-snapshot-saes | 8192 | 0 | EV 0.800 / L0 983 | | `pythia-160m/W_U/per-snapshot-saes/d8192/step116000.safetensors` | EleutherAI/pythia-160m | W_U | per-snapshot-saes | 8192 | 0 | EV 0.812 / L0 958 | | `pythia-160m/W_U/per-snapshot-saes/d8192/step128.safetensors` | EleutherAI/pythia-160m | W_U | per-snapshot-saes | 8192 | 0 | EV 0.729 / L0 1150 | | `pythia-160m/W_U/per-snapshot-saes/d8192/step130000.safetensors` | EleutherAI/pythia-160m | W_U | per-snapshot-saes | 8192 | 0 | EV 0.820 / L0 940 | | `pythia-160m/W_U/per-snapshot-saes/d8192/step14000.safetensors` | EleutherAI/pythia-160m | W_U | per-snapshot-saes | 8192 | 0 | EV 0.791 / L0 996 | | `pythia-160m/W_U/per-snapshot-saes/d8192/step143000.safetensors` | EleutherAI/pythia-160m | W_U | per-snapshot-saes | 8192 | 0 | EV 0.824 / L0 924 | | `pythia-160m/W_U/per-snapshot-saes/d8192/step16.safetensors` | EleutherAI/pythia-160m | W_U | per-snapshot-saes | 8192 | 0 | EV 0.729 / L0 1150 | | `pythia-160m/W_U/per-snapshot-saes/d8192/step2.safetensors` | EleutherAI/pythia-160m | W_U | per-snapshot-saes | 8192 | 0 | EV 0.729 / L0 1150 | | `pythia-160m/W_U/per-snapshot-saes/d8192/step2000.safetensors` | EleutherAI/pythia-160m | W_U | per-snapshot-saes | 8192 | 0 | EV 0.799 / L0 969 | | `pythia-160m/W_U/per-snapshot-saes/d8192/step21000.safetensors` | EleutherAI/pythia-160m | W_U | per-snapshot-saes | 8192 | 0 | EV 0.791 / L0 998 | | `pythia-160m/W_U/per-snapshot-saes/d8192/step256.safetensors` | EleutherAI/pythia-160m | W_U | per-snapshot-saes | 8192 | 0 | EV 0.732 / L0 1142 | | `pythia-160m/W_U/per-snapshot-saes/d8192/step27000.safetensors` | EleutherAI/pythia-160m | W_U | per-snapshot-saes | 8192 | 0 | EV 0.791 / L0 999 | | `pythia-160m/W_U/per-snapshot-saes/d8192/step3000.safetensors` | EleutherAI/pythia-160m | W_U | per-snapshot-saes | 8192 | 0 | EV 0.799 / L0 972 | | `pythia-160m/W_U/per-snapshot-saes/d8192/step32.safetensors` | EleutherAI/pythia-160m | W_U | per-snapshot-saes | 8192 | 0 | EV 0.729 / L0 1150 | | `pythia-160m/W_U/per-snapshot-saes/d8192/step34000.safetensors` | EleutherAI/pythia-160m | W_U | per-snapshot-saes | 8192 | 0 | EV 0.791 / L0 1000 | | `pythia-160m/W_U/per-snapshot-saes/d8192/step4.safetensors` | EleutherAI/pythia-160m | W_U | per-snapshot-saes | 8192 | 0 | EV 0.729 / L0 1150 | | `pythia-160m/W_U/per-snapshot-saes/d8192/step4000.safetensors` | EleutherAI/pythia-160m | W_U | per-snapshot-saes | 8192 | 0 | EV 0.798 / L0 977 | | `pythia-160m/W_U/per-snapshot-saes/d8192/step47000.safetensors` | EleutherAI/pythia-160m | W_U | per-snapshot-saes | 8192 | 0 | EV 0.791 / L0 1000 | | `pythia-160m/W_U/per-snapshot-saes/d8192/step5000.safetensors` | EleutherAI/pythia-160m | W_U | per-snapshot-saes | 8192 | 0 | EV 0.796 / L0 982 | | `pythia-160m/W_U/per-snapshot-saes/d8192/step512.safetensors` | EleutherAI/pythia-160m | W_U | per-snapshot-saes | 8192 | 0 | EV 0.754 / L0 1087 | | `pythia-160m/W_U/per-snapshot-saes/d8192/step6000.safetensors` | EleutherAI/pythia-160m | W_U | per-snapshot-saes | 8192 | 0 | EV 0.795 / L0 985 | | `pythia-160m/W_U/per-snapshot-saes/d8192/step61000.safetensors` | EleutherAI/pythia-160m | W_U | per-snapshot-saes | 8192 | 0 | EV 0.790 / L0 1002 | | `pythia-160m/W_U/per-snapshot-saes/d8192/step64.safetensors` | EleutherAI/pythia-160m | W_U | per-snapshot-saes | 8192 | 0 | EV 0.729 / L0 1150 | | `pythia-160m/W_U/per-snapshot-saes/d8192/step7000.safetensors` | EleutherAI/pythia-160m | W_U | per-snapshot-saes | 8192 | 0 | EV 0.794 / L0 988 | | `pythia-160m/W_U/per-snapshot-saes/d8192/step75000.safetensors` | EleutherAI/pythia-160m | W_U | per-snapshot-saes | 8192 | 0 | EV 0.790 / L0 1004 | | `pythia-160m/W_U/per-snapshot-saes/d8192/step8.safetensors` | EleutherAI/pythia-160m | W_U | per-snapshot-saes | 8192 | 0 | EV 0.729 / L0 1150 | | `pythia-160m/W_U/per-snapshot-saes/d8192/step8000.safetensors` | EleutherAI/pythia-160m | W_U | per-snapshot-saes | 8192 | 0 | EV 0.793 / L0 990 | | `pythia-160m/W_U/per-snapshot-saes/d8192/step89000.safetensors` | EleutherAI/pythia-160m | W_U | per-snapshot-saes | 8192 | 0 | EV 0.792 / L0 1002 | | `pythia-160m/W_U/per-snapshot-saes/d8192/step9000.safetensors` | EleutherAI/pythia-160m | W_U | per-snapshot-saes | 8192 | 0 | EV 0.792 / L0 992 | | `pythia-1b/W_U/cross-snapshot-32/d16384/seed0.safetensors` | EleutherAI/pythia-1b | W_U | cross-snapshot-32 | 16384 | 0 | EV 0.781 / L0 499 | | `pythia-1b/W_U/cross-snapshot-32/d24576/seed0.safetensors` | EleutherAI/pythia-1b | W_U | cross-snapshot-32 | 24576 | 0 | EV 0.861 / L0 517 | | `pythia-1b/W_U/cross-snapshot-32/d8192/seed0.safetensors` | EleutherAI/pythia-1b | W_U | cross-snapshot-32 | 8192 | 0 | EV 0.628 / L0 374 | | `pythia-1b/W_U/cross-snapshot-32-matched-window/d24576/seed0.safetensors` | EleutherAI/pythia-1b | W_U | cross-snapshot-32-matched-window | 24576 | 0 | EV 0.884 / L0 264 | | `pythia-6.9b/W_U/cross-snapshot-32/d32768/seed0-sparse.safetensors` | EleutherAI/pythia-6.9b | W_U | cross-snapshot-32 | 32768 | 0 | EV 0.808 / L0 742 | | `pythia-6.9b/W_U/cross-snapshot-32/d32768/seed0.safetensors` | EleutherAI/pythia-6.9b | W_U | cross-snapshot-32 | 32768 | 0 | EV 0.833 / L0 1957 | Quality metrics are recomputed from the released weights on the released snapshot schedule (see the code repo's `scripts/eval/recompute_metrics.py`). The `gated` architecture-comparison run intentionally documents λ-transfer failure (default λ=0.3 moved across architectures); see `gated-retuned` (λ=0.05) for the tuned comparison point. ## Citation ```bibtex @misc{he2026learningtoreadout, title = {Learning to Read Out: Unembedding Dynamics in Language Model Pretraining}, author = {He, Matteo and Shen, William F. and Iacob, Alex and Jovanovic, Andrej and Qiu, Xinchi and Lane, Nicholas D.}, year = {2026}, note = {Under review. Code: https://github.com/hematteo/learning-to-read-out}, } ``` MIT. W_U/W_E source tensors derive from public Apache-2.0 checkpoints (EleutherAI Pythia, AllenAI OLMo-2). The eval corpus derives from Wikipedia (CC-BY-SA 4.0).