pointbreak3000
/

gpt2-clt-layer0

mechanistic-interpretability

sparse-autoencoder

circuit-tracing

Model card Files Files and versions

gpt2-clt-layer0 / README.md

pointbreak3000's picture

Upload README.md with huggingface_hub

83fd8a0 verified about 1 month ago

|

history blame contribute delete

1.04 kB

	---
	language: en
	tags:
	- mechanistic-interpretability
	- sparse-autoencoder
	- transcoder
	- gpt2
	- circuit-tracing
	---

	# GPT-2 Cross-Layer Transcoder — Layer 0

	Trained to reconstruct MLP output at layer 0 of GPT-2 from the residual stream.

	## Architecture
	- Input: residual stream before layer 0 (d_model=768)
	- Output: MLP output at layer 0 (d_model=768)
	- Features: 8192 (JumpReLU sparse)
	- Training data: WikiText-103 (5,000 documents, seq_len=64)

	## Metrics
	- R²: 0.9409
	- Dead features: 38/8192
	- Training steps: 5000
	- Sparsity coef: 0.02

	## Usage
	```python
	import torch, json
	from huggingface_hub import hf_hub_download

	weights_path = hf_hub_download("pointbreak3000/gpt2-clt-layer0", "model.pt")
	config_path = hf_hub_download("pointbreak3000/gpt2-clt-layer0", "config.json")

	with open(config_path) as f:
	config = json.load(f)

	clt = ProperCLT(d_model=config["d_model"], n_features=config["n_features"])
	clt.load_state_dict(torch.load(weights_path, map_location="cpu"))
	clt.eval()
	```