Audio-SAE
Collection
6 items • Updated
BatchTop-K Sparse Autoencoders trained on every transformer layer of
facebook/hubert-large-ll60k,
from the paper
AudioSAE: Towards Understanding of Audio-Processing Models with Sparse AutoEncoders
(EACL 2026).
Each SAE decomposes the residual stream at one encoder layer into a sparse, largely interpretable dictionary of features.
| Backbone | Activation dim | Dict size | Expansion | k |
Layers |
|---|---|---|---|---|---|
| HuBERT-large | 1024 | 8192 | 8× | 50 | 24 |
One SAE per encoder layer (layer_1 … layer_24). Layer indices are 1-based and
correspond to the output of the n-th transformer block.
layer_1/
ae.pt # BatchTopKSAE state_dict
config.json # training config (activation_dim, dict_size, k, …)
layer_2/
…
layer_24/
Each ae.pt contains encoder.{weight,bias}, decoder.weight, b_dec, k.
import torch
from huggingface_hub import hf_hub_download
from audio_sae import BatchTopKSAE
from audio_sae.models.hubert import MyHubert
device = "cuda" if torch.cuda.is_available() else "cpu"
layer = 12
# 1. HuBERT-large encoder, tapped after `layer`
hubert = MyHubert("facebook/hubert-large-ll60k", sae_after_layer=layer).to(device).eval()
# 2. Matching SAE
ckpt = hf_hub_download(
repo_id="Egorgij21/Audio-SAE-hubert-large",
filename=f"layer_{layer}/ae.pt",
)
sae = BatchTopKSAE.from_pretrained(ckpt, device=device)
# 3. Run on audio
import librosa
wav, _ = librosa.load("example.wav", sr=16000, mono=True)
wav = torch.from_numpy(wav).unsqueeze(0).to(device)
with torch.no_grad():
acts = hubert(wav) # (1, T, 1024)
features = sae.encode(acts, use_threshold=True) # (1, T, 8192), sparse
See the GitHub repo for a full inference and interpretability walkthrough.
k=50)See the paper for full training details and evaluation metrics.
@inproceedings{aparin2026audiosae,
title = {AudioSAE: Towards Understanding of Audio-Processing Models with Sparse AutoEncoders},
author = {Aparin, Georgii and Sadekova, Tasnima and Rukhovich, Alexey and Yermekova, Assel and Kushnareva, Laida and Popov, Vadim and Kuznetsov, Kristian and Piontkovskaya, Irina},
booktitle = {Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)},
year = {2026},
address = {Rabat, Morocco},
}
MIT
Base model
facebook/hubert-large-ll60k