TheodoreEhrenborg commited on
Commit
c3151fa
·
verified ·
1 Parent(s): 5a95a75

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +45 -0
README.md ADDED
@@ -0,0 +1,45 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - sae
4
+ - interpretability
5
+ - dag
6
+ ---
7
+
8
+ # DAG Model for saebench SAE
9
+
10
+ This repository contains a trained Directed Acyclic Graph (DAG) model for measuring effective L0 of a Sparse Autoencoder.
11
+
12
+ ## Model Info
13
+
14
+ - **SAE Type**: saebench
15
+ - **SAE Release**: canrager/saebench_gemma-2-2b_width-2pow14_date-0107
16
+ - **SAE ID**: gemma-2-2b_matryoshka_batch_top_k_width-2pow14_date-0107/resid_post_layer_12/trainer_2
17
+ - **d_sae**: 16384
18
+ - **Tokens Used**: 100,000
19
+ - **Effective L0**: 55
20
+ - **Actual L0**: 77.9
21
+ - **Compression Ratio**: 1.42x
22
+
23
+ ## Files
24
+
25
+ - `final_model.safetensors`: Trained DAG model (Lambda matrix, b_penalty, feature_order)
26
+ - `results.json`: Training metadata and metrics
27
+
28
+ ## Usage
29
+
30
+ Use with the Probabilistic SAE Streamlit dashboard:
31
+
32
+ 1. Check "Load pre-trained DAG from HF"
33
+ 2. DAG model HF repo: `TheodoreEhrenborg/dag-saebench-layer12-dlfeidsn`
34
+ 3. DAG model subfolder: (leave empty)
35
+
36
+ The dashboard will automatically load the matching SAE and enable clustering.
37
+
38
+ ## Training Details
39
+
40
+ Trained using `effective_l0_vanilla.py` with:
41
+ - Epochs: 10
42
+ - Learning rate: 0.0005
43
+ - Batch size: 6400
44
+
45
+ For more details, see `results.json`.