Trained Bias Term for PSAE

This repository contains a trained bias vector (b) for a PSAE's logistic model.

The Lambda matrix from the original PSAE was frozen, and only the bias term was optimized to maximize log-likelihood of discrete activations.

Model Info

  • PSAE Release: aemack-org/bsr-sae-16k-sweep
  • SAE ID: d16384_dag_1_C0_01
  • d_sae: 16384
  • Layer: 12
  • Tokens Used: 10,000,000
  • Effective L0: 265
  • Actual L0: 553.4
  • Compression Ratio: 2.09x

Files

  • trained_b.safetensors: Trained bias vector (b) and feature_order
  • results.json: Training metadata and metrics
  • training_curves.png: Loss curves and training progress visualization

Usage

Load the trained bias vector:

from safetensors.torch import load_file

state_dict = load_file("trained_b.safetensors")
b = state_dict["b"]  # Shape: (d_sae,)
feature_order = state_dict["feature_order"]  # Shape: (d_sae,)

Use with the original PSAE's lambda_matrix for inference.

Training Details

Trained using train_psae_bias.py with:

  • Epochs trained: 28 (max: 50)
  • Early stopping: plateau_epochs=10
  • Learning rate: 0.0005
  • Batch size: 12800
  • Lambda matrix: FIXED (from PSAE)
  • Trainable parameters: b only

For more details, see results.json.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support