Trained Bias Term for PSAE
This repository contains a trained bias vector (b) for a PSAE's logistic model.
The Lambda matrix from the original PSAE was frozen, and only the bias term was optimized to maximize log-likelihood of discrete activations.
Model Info
- PSAE Release: aemack-org/bsr-sae-16k-sweep
- SAE ID: d16384_dag_1_C0_01
- d_sae: 16384
- Layer: 12
- Tokens Used: 10,000,000
- Effective L0: 265
- Actual L0: 553.4
- Compression Ratio: 2.09x
Files
trained_b.safetensors: Trained bias vector (b) and feature_orderresults.json: Training metadata and metricstraining_curves.png: Loss curves and training progress visualization
Usage
Load the trained bias vector:
from safetensors.torch import load_file
state_dict = load_file("trained_b.safetensors")
b = state_dict["b"] # Shape: (d_sae,)
feature_order = state_dict["feature_order"] # Shape: (d_sae,)
Use with the original PSAE's lambda_matrix for inference.
Training Details
Trained using train_psae_bias.py with:
- Epochs trained: 28 (max: 50)
- Early stopping: plateau_epochs=10
- Learning rate: 0.0005
- Batch size: 12800
- Lambda matrix: FIXED (from PSAE)
- Trainable parameters: b only
For more details, see results.json.
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support