TheodoreEhrenborg commited on
Commit
0ef523c
·
verified ·
1 Parent(s): 73000e8

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +46 -0
README.md ADDED
@@ -0,0 +1,46 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - sae
4
+ - interpretability
5
+ - dag
6
+ ---
7
+
8
+ # DAG Model for saebench SAE
9
+
10
+ This repository contains a trained Directed Acyclic Graph (DAG) model for measuring effective L0 of a Sparse Autoencoder.
11
+
12
+ ## Model Info
13
+
14
+ - **SAE Type**: saebench
15
+ - **SAE Release**: canrager/saebench_gemma-2-2b_width-2pow14_date-0107
16
+ - **SAE ID**: gemma-2-2b_matryoshka_batch_top_k_width-2pow14_date-0107/resid_post_layer_12/trainer_0
17
+ - **d_sae**: 16384
18
+ - **Tokens Used**: 10,000,000
19
+ - **Effective L0**: 12
20
+ - **Actual L0**: 19.7
21
+ - **Compression Ratio**: 1.64x
22
+
23
+ ## Files
24
+
25
+ - `final_model.safetensors`: Trained DAG model (Lambda matrix, b_penalty, feature_order)
26
+ - `results.json`: Training metadata and metrics
27
+ - `training_curves.png`: Loss curves and training progress visualization
28
+
29
+ ## Usage
30
+
31
+ Use with the Probabilistic SAE Streamlit dashboard:
32
+
33
+ 1. Check "Load pre-trained DAG from HF"
34
+ 2. DAG model HF repo: `TheodoreEhrenborg/dag-saebench-layer12-sgkrtsth`
35
+ 3. DAG model subfolder: (leave empty)
36
+
37
+ The dashboard will automatically load the matching SAE and enable clustering.
38
+
39
+ ## Training Details
40
+
41
+ Trained using `effective_l0_vanilla.py` with:
42
+ - Epochs: 1
43
+ - Learning rate: 0.0005
44
+ - Batch size: 6400
45
+
46
+ For more details, see `results.json`.