DAG Model for saebench SAE
This repository contains a trained Directed Acyclic Graph (DAG) model for measuring effective L0 of a Sparse Autoencoder.
Model Info
- SAE Type: saebench
- SAE Release: canrager/saebench_gemma-2-2b_width-2pow14_date-0107
- SAE ID: gemma-2-2b_matryoshka_batch_top_k_width-2pow14_date-0107/resid_post_layer_12/trainer_3
- d_sae: 16384
- Tokens Used: 10,000,000
- Effective L0: 64
- Actual L0: 159.6
- Compression Ratio: 2.49x
Files
final_model.safetensors: Trained DAG model (Lambda matrix, b_penalty, feature_order)results.json: Training metadata and metricstraining_curves.png: Loss curves and training progress visualization
Usage
Use with the Probabilistic SAE Streamlit dashboard:
- Check "Load pre-trained DAG from HF"
- DAG model HF repo:
TheodoreEhrenborg/dag-saebench-layer12-lcfftyps - DAG model subfolder: (leave empty)
The dashboard will automatically load the matching SAE and enable clustering.
Training Details
Trained using effective_l0_vanilla.py with:
- Epochs: 5
- Learning rate: 0.0005
- Batch size: 6400
For more details, see results.json.
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support