---
license: apache-2.0
tags: [hobbylm, sparse-autoencoder, interpretability, sae]
---

# HobbyLM-SAE

A **top-k Sparse Autoencoder** for mechanistic interpretability of [HobbyLM-Base](https://huggingface.co/rootxhacker/HobbyLM-Base).
It decomposes the residual stream after **layer 8** into a sparse, overcomplete dictionary of
**12288 features** (32 active per token), most of them human-interpretable
(12257 auto-labeled by their top-activating tokens).

## Files
- `sae.safetensors` — the SAE weights (`W_enc`, `W_dec`, `b_enc`, `b_dec`).
- `labels.json` — per-feature auto-derived label + example top-activating tokens.
- `meta.json` — layer, activation scale, base-model run, and SAE config.

Reconstructs ~97% of the activation variance at L0=32. Reference code + training harness:
<https://github.com/harishsg993010/HobbyLM> (`hobbylm/sae.py`, `training/modal_sae.py`). Apache-2.0.