| language: en | |
| tags: | |
| - mechanistic-interpretability | |
| - sparse-autoencoder | |
| - transcoder | |
| - gpt2 | |
| - circuit-tracing | |
| # GPT-2 Cross-Layer Transcoder — Layer 0 | |
| Trained to reconstruct MLP output at layer 0 of GPT-2 from the residual stream. | |
| ## Architecture | |
| - **Input**: residual stream before layer 0 (d_model=768) | |
| - **Output**: MLP output at layer 0 (d_model=768) | |
| - **Features**: 8192 (JumpReLU sparse) | |
| - **Training data**: WikiText-103 (5,000 documents, seq_len=64) | |
| ## Metrics | |
| - **R²**: 0.9409 | |
| - **Dead features**: 38/8192 | |
| - **Training steps**: 5000 | |
| - **Sparsity coef**: 0.02 | |
| ## Usage | |
| ```python | |
| import torch, json | |
| from huggingface_hub import hf_hub_download | |
| weights_path = hf_hub_download("pointbreak3000/gpt2-clt-layer0", "model.pt") | |
| config_path = hf_hub_download("pointbreak3000/gpt2-clt-layer0", "config.json") | |
| with open(config_path) as f: | |
| config = json.load(f) | |
| clt = ProperCLT(d_model=config["d_model"], n_features=config["n_features"]) | |
| clt.load_state_dict(torch.load(weights_path, map_location="cpu")) | |
| clt.eval() | |
| ``` | |