Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -10,26 +10,25 @@ tags:
|
|
| 10 |
|
| 11 |
# GPT-2 Cross-Layer Transcoder — Layer 0
|
| 12 |
|
| 13 |
-
Trained to reconstruct MLP output at layer 0 of GPT-2 from the residual stream
|
| 14 |
|
| 15 |
## Architecture
|
| 16 |
- **Input**: residual stream before layer 0 (d_model=768)
|
| 17 |
- **Output**: MLP output at layer 0 (d_model=768)
|
| 18 |
-
- **Features**:
|
| 19 |
- **Training data**: WikiText-103 (5,000 documents, seq_len=64)
|
| 20 |
|
| 21 |
## Metrics
|
| 22 |
-
- **R²**: 0.
|
| 23 |
-
- **Dead features**:
|
| 24 |
-
- **Training steps**:
|
|
|
|
| 25 |
|
| 26 |
## Usage
|
| 27 |
```python
|
| 28 |
-
import torch
|
| 29 |
-
import json
|
| 30 |
from huggingface_hub import hf_hub_download
|
| 31 |
|
| 32 |
-
# Download
|
| 33 |
weights_path = hf_hub_download("pointbreak3000/gpt2-clt-layer0", "model.pt")
|
| 34 |
config_path = hf_hub_download("pointbreak3000/gpt2-clt-layer0", "config.json")
|
| 35 |
|
|
@@ -40,8 +39,3 @@ clt = ProperCLT(d_model=config["d_model"], n_features=config["n_features"])
|
|
| 40 |
clt.load_state_dict(torch.load(weights_path, map_location="cpu"))
|
| 41 |
clt.eval()
|
| 42 |
```
|
| 43 |
-
|
| 44 |
-
## Purpose
|
| 45 |
-
Part of an experiment extending the circuit-tracing paper
|
| 46 |
-
(Ameisen et al. 2025) to include attention circuit attribution
|
| 47 |
-
via feature-space Jacobians.
|
|
|
|
| 10 |
|
| 11 |
# GPT-2 Cross-Layer Transcoder — Layer 0
|
| 12 |
|
| 13 |
+
Trained to reconstruct MLP output at layer 0 of GPT-2 from the residual stream.
|
| 14 |
|
| 15 |
## Architecture
|
| 16 |
- **Input**: residual stream before layer 0 (d_model=768)
|
| 17 |
- **Output**: MLP output at layer 0 (d_model=768)
|
| 18 |
+
- **Features**: 8192 (JumpReLU sparse)
|
| 19 |
- **Training data**: WikiText-103 (5,000 documents, seq_len=64)
|
| 20 |
|
| 21 |
## Metrics
|
| 22 |
+
- **R²**: 0.9409
|
| 23 |
+
- **Dead features**: 38/8192
|
| 24 |
+
- **Training steps**: 5000
|
| 25 |
+
- **Sparsity coef**: 0.02
|
| 26 |
|
| 27 |
## Usage
|
| 28 |
```python
|
| 29 |
+
import torch, json
|
|
|
|
| 30 |
from huggingface_hub import hf_hub_download
|
| 31 |
|
|
|
|
| 32 |
weights_path = hf_hub_download("pointbreak3000/gpt2-clt-layer0", "model.pt")
|
| 33 |
config_path = hf_hub_download("pointbreak3000/gpt2-clt-layer0", "config.json")
|
| 34 |
|
|
|
|
| 39 |
clt.load_state_dict(torch.load(weights_path, map_location="cpu"))
|
| 40 |
clt.eval()
|
| 41 |
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|