TheodoreEhrenborg
/

dag-gemmascope-layer12-vtxpgpsb

interpretability

Model card Files Files and versions

DAG Model for gemmascope SAE

This repository contains a trained Directed Acyclic Graph (DAG) model for measuring effective L0 of a Sparse Autoencoder.

Model Info

SAE Type: gemmascope
SAE Release: gemma-scope-2b-pt-res
SAE ID: layer_12/width_16k/average_l0_41
d_sae: 16384
Tokens Used: 10,000,000
Effective L0: 25
Actual L0: 47.7
Compression Ratio: 1.91x

Files

final_model.safetensors: Trained DAG model (Lambda matrix, b_penalty, feature_order)
results.json: Training metadata and metrics
training_curves.png: Loss curves and training progress visualization

Usage

Use with the Probabilistic SAE Streamlit dashboard:

Check "Load pre-trained DAG from HF"
DAG model HF repo: TheodoreEhrenborg/dag-gemmascope-layer12-vtxpgpsb
DAG model subfolder: (leave empty)

The dashboard will automatically load the matching SAE and enable clustering.

Training Details

Trained using effective_l0_vanilla.py with:

Epochs: 1
Learning rate: 0.0005
Batch size: 6400

For more details, see results.json.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support