DAG Model for saebench SAE

This repository contains a trained Directed Acyclic Graph (DAG) model for measuring effective L0 of a Sparse Autoencoder.

Model Info

  • SAE Type: saebench
  • SAE Release: canrager/saebench_gemma-2-2b_width-2pow14_date-0107
  • SAE ID: gemma-2-2b_matryoshka_batch_top_k_width-2pow14_date-0107/resid_post_layer_12/trainer_3
  • d_sae: 16384
  • Tokens Used: 10,000,000
  • Effective L0: 64
  • Actual L0: 159.6
  • Compression Ratio: 2.49x

Files

  • final_model.safetensors: Trained DAG model (Lambda matrix, b_penalty, feature_order)
  • results.json: Training metadata and metrics
  • training_curves.png: Loss curves and training progress visualization

Usage

Use with the Probabilistic SAE Streamlit dashboard:

  1. Check "Load pre-trained DAG from HF"
  2. DAG model HF repo: TheodoreEhrenborg/dag-saebench-layer12-lcfftyps
  3. DAG model subfolder: (leave empty)

The dashboard will automatically load the matching SAE and enable clustering.

Training Details

Trained using effective_l0_vanilla.py with:

  • Epochs: 5
  • Learning rate: 0.0005
  • Batch size: 6400

For more details, see results.json.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support