--- language: en tags: - mechanistic-interpretability - sparse-autoencoder - transcoder - gpt2 - circuit-tracing --- # GPT-2 Cross-Layer Transcoder — Layer 0 Trained to reconstruct MLP output at layer 0 of GPT-2 from the residual stream. ## Architecture - **Input**: residual stream before layer 0 (d_model=768) - **Output**: MLP output at layer 0 (d_model=768) - **Features**: 8192 (JumpReLU sparse) - **Training data**: WikiText-103 (5,000 documents, seq_len=64) ## Metrics - **R²**: 0.9409 - **Dead features**: 38/8192 - **Training steps**: 5000 - **Sparsity coef**: 0.02 ## Usage ```python import torch, json from huggingface_hub import hf_hub_download weights_path = hf_hub_download("pointbreak3000/gpt2-clt-layer0", "model.pt") config_path = hf_hub_download("pointbreak3000/gpt2-clt-layer0", "config.json") with open(config_path) as f: config = json.load(f) clt = ProperCLT(d_model=config["d_model"], n_features=config["n_features"]) clt.load_state_dict(torch.load(weights_path, map_location="cpu")) clt.eval() ```