Upload README.md with huggingface_hub
Browse files
README.md
ADDED
|
@@ -0,0 +1,45 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
tags:
|
| 3 |
+
- sae
|
| 4 |
+
- interpretability
|
| 5 |
+
- dag
|
| 6 |
+
---
|
| 7 |
+
|
| 8 |
+
# DAG Model for saebench SAE
|
| 9 |
+
|
| 10 |
+
This repository contains a trained Directed Acyclic Graph (DAG) model for measuring effective L0 of a Sparse Autoencoder.
|
| 11 |
+
|
| 12 |
+
## Model Info
|
| 13 |
+
|
| 14 |
+
- **SAE Type**: saebench
|
| 15 |
+
- **SAE Release**: canrager/saebench_gemma-2-2b_width-2pow14_date-0107
|
| 16 |
+
- **SAE ID**: gemma-2-2b_matryoshka_batch_top_k_width-2pow14_date-0107/resid_post_layer_12/trainer_2
|
| 17 |
+
- **d_sae**: 16384
|
| 18 |
+
- **Tokens Used**: 100,000
|
| 19 |
+
- **Effective L0**: 55
|
| 20 |
+
- **Actual L0**: 77.9
|
| 21 |
+
- **Compression Ratio**: 1.42x
|
| 22 |
+
|
| 23 |
+
## Files
|
| 24 |
+
|
| 25 |
+
- `final_model.safetensors`: Trained DAG model (Lambda matrix, b_penalty, feature_order)
|
| 26 |
+
- `results.json`: Training metadata and metrics
|
| 27 |
+
|
| 28 |
+
## Usage
|
| 29 |
+
|
| 30 |
+
Use with the Probabilistic SAE Streamlit dashboard:
|
| 31 |
+
|
| 32 |
+
1. Check "Load pre-trained DAG from HF"
|
| 33 |
+
2. DAG model HF repo: `TheodoreEhrenborg/dag-saebench-layer12-dlfeidsn`
|
| 34 |
+
3. DAG model subfolder: (leave empty)
|
| 35 |
+
|
| 36 |
+
The dashboard will automatically load the matching SAE and enable clustering.
|
| 37 |
+
|
| 38 |
+
## Training Details
|
| 39 |
+
|
| 40 |
+
Trained using `effective_l0_vanilla.py` with:
|
| 41 |
+
- Epochs: 10
|
| 42 |
+
- Learning rate: 0.0005
|
| 43 |
+
- Batch size: 6400
|
| 44 |
+
|
| 45 |
+
For more details, see `results.json`.
|