gpt2-clt-layer0 / README.md

pointbreak3000

Upload README.md with huggingface_hub

83fd8a0 verified about 1 month ago

preview code

raw

history blame contribute delete

1.04 kB

metadata

language: en
tags:
  - mechanistic-interpretability
  - sparse-autoencoder
  - transcoder
  - gpt2
  - circuit-tracing

GPT-2 Cross-Layer Transcoder — Layer 0

Trained to reconstruct MLP output at layer 0 of GPT-2 from the residual stream.

Architecture

Input: residual stream before layer 0 (d_model=768)
Output: MLP output at layer 0 (d_model=768)
Features: 8192 (JumpReLU sparse)
Training data: WikiText-103 (5,000 documents, seq_len=64)

Metrics

R²: 0.9409
Dead features: 38/8192
Training steps: 5000
Sparsity coef: 0.02

Usage

import torch, json
from huggingface_hub import hf_hub_download

weights_path = hf_hub_download("pointbreak3000/gpt2-clt-layer0", "model.pt")
config_path  = hf_hub_download("pointbreak3000/gpt2-clt-layer0", "config.json")

with open(config_path) as f:
    config = json.load(f)

clt = ProperCLT(d_model=config["d_model"], n_features=config["n_features"])
clt.load_state_dict(torch.load(weights_path, map_location="cpu"))
clt.eval()