gpt2-clt-layer0 / README.md
pointbreak3000's picture
Upload README.md with huggingface_hub
83fd8a0 verified
---
language: en
tags:
- mechanistic-interpretability
- sparse-autoencoder
- transcoder
- gpt2
- circuit-tracing
---
# GPT-2 Cross-Layer Transcoder — Layer 0
Trained to reconstruct MLP output at layer 0 of GPT-2 from the residual stream.
## Architecture
- **Input**: residual stream before layer 0 (d_model=768)
- **Output**: MLP output at layer 0 (d_model=768)
- **Features**: 8192 (JumpReLU sparse)
- **Training data**: WikiText-103 (5,000 documents, seq_len=64)
## Metrics
- **R²**: 0.9409
- **Dead features**: 38/8192
- **Training steps**: 5000
- **Sparsity coef**: 0.02
## Usage
```python
import torch, json
from huggingface_hub import hf_hub_download
weights_path = hf_hub_download("pointbreak3000/gpt2-clt-layer0", "model.pt")
config_path = hf_hub_download("pointbreak3000/gpt2-clt-layer0", "config.json")
with open(config_path) as f:
config = json.load(f)
clt = ProperCLT(d_model=config["d_model"], n_features=config["n_features"])
clt.load_state_dict(torch.load(weights_path, map_location="cpu"))
clt.eval()
```