TheodoreEhrenborg
/

dag-saebench-layer12-sgkrtsth

+---
+tags:
+- sae
+- interpretability
+- dag
+---
+# DAG Model for saebench SAE
+This repository contains a trained Directed Acyclic Graph (DAG) model for measuring effective L0 of a Sparse Autoencoder.
+## Model Info
+- **SAE Type**: saebench
+- **SAE Release**: canrager/saebench_gemma-2-2b_width-2pow14_date-0107
+- **SAE ID**: gemma-2-2b_matryoshka_batch_top_k_width-2pow14_date-0107/resid_post_layer_12/trainer_0
+- **d_sae**: 16384
+- **Tokens Used**: 10,000,000
+- **Effective L0**: 12
+- **Actual L0**: 19.7
+- **Compression Ratio**: 1.64x
+## Files
+- `final_model.safetensors`: Trained DAG model (Lambda matrix, b_penalty, feature_order)
+- `results.json`: Training metadata and metrics
+- `training_curves.png`: Loss curves and training progress visualization
+## Usage
+Use with the Probabilistic SAE Streamlit dashboard:
+1. Check "Load pre-trained DAG from HF"
+2. DAG model HF repo: `TheodoreEhrenborg/dag-saebench-layer12-sgkrtsth`
+3. DAG model subfolder: (leave empty)
+The dashboard will automatically load the matching SAE and enable clustering.
+## Training Details
+Trained using `effective_l0_vanilla.py` with:
+- Epochs: 1
+- Learning rate: 0.0005
+- Batch size: 6400
+For more details, see `results.json`.