Trained Bias Term for PSAE

This repository contains a trained bias vector (b) for a PSAE's logistic model.

The Lambda matrix from the original PSAE was frozen, and only the bias term was optimized to maximize log-likelihood of discrete activations.

Model Info

PSAE Release: aemack-org/bsr-sae-16k-sweep
SAE ID: d16384_dag_1_C0_01
d_sae: 16384
Layer: 12
Tokens Used: 10,000,000
Effective L0: 265
Actual L0: 553.4
Compression Ratio: 2.09x

Files

trained_b.safetensors: Trained bias vector (b) and feature_order
results.json: Training metadata and metrics
training_curves.png: Loss curves and training progress visualization

Usage

Load the trained bias vector:

from safetensors.torch import load_file

state_dict = load_file("trained_b.safetensors")
b = state_dict["b"]  # Shape: (d_sae,)
feature_order = state_dict["feature_order"]  # Shape: (d_sae,)

Use with the original PSAE's lambda_matrix for inference.

Training Details

Trained using train_psae_bias.py with:

Epochs trained: 28 (max: 50)
Early stopping: plateau_epochs=10
Learning rate: 0.0005
Batch size: 12800
Lambda matrix: FIXED (from PSAE)
Trainable parameters: b only

For more details, see results.json.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support