---
tags:
- ml-intern
---
# SAE Encoder Embeddings: End-to-End Sparse Autoencoder Bottleneck for Retrieval

> **Status**: Research & Architecture Design Phase
> **Goal**: Build the first encoder-only embedding model where the representation layer IS a Sparse Autoencoder, trained end-to-end with contrastive loss.

## 🎯 What This Is

A novel embedding architecture that combines:
- **ModernBERT** backbone (SOTA encoder-only with LLM innovations)
- **TopK Sparse Autoencoder** as the embedding bottleneck layer
- **End-to-end contrastive training** (not post-hoc SAE on frozen embeddings)

This produces embeddings that are simultaneously:
1. **Interpretable** — each active dimension corresponds to a learned semantic concept
2. **Steerable** — suppress/amplify specific features to control retrieval
3. **Sparse-indexable** — native sparse vector search (inverted index, not ANN)
4. **Competitive** — trained with modern contrastive objectives + hard negatives

## 🔬 Why This Is Novel

| Approach | Training | Interpretable? | Sparse-native? | End-to-end? |
|----------|----------|---------------|----------------|-------------|
| Dense bi-encoder (e.g., E5, GTE) | Contrastive | ❌ | ❌ | ✅ |
| SPLADE | Distillation + regularizer | ⚠️ (vocab-tied) | ✅ | ✅ |
| Post-hoc SAE on embeddings | Reconstruction only | ✅ | ✅ | ❌ |
| CSR (Beyond Matryoshka) | Contrastive + recon (frozen backbone) | ✅ | ✅ | ❌ (backbone frozen) |
| SPLARE (Mar 2026) | Distillation (KL from cross-encoder) | ✅ | ✅ | ⚠️ (pretrained SAE, frozen LLM) |
| **Ours (this project)** | **Contrastive + recon + FLOPS reg** | ✅ | ✅ | ✅ **(backbone + SAE jointly)** |

**Key differentiator**: All prior SAE-for-retrieval work either freezes the backbone or freezes the SAE. We train both jointly, meaning the backbone learns to produce representations that are *optimally decomposable* into sparse interpretable features.

## 📂 Repository Structure

```
├── README.md                    # This file
├── ARCHITECTURE.md              # Detailed architecture design
├── PAPERS.md                    # Papers bibliography + key findings
├── TRAINING_RECIPE.md           # Full training recipe with hyperparameters
├── src/                         # (future) Implementation code
│   ├── model.py                 # SAE bottleneck + ModernBERT
│   ├── loss.py                  # Combined loss functions
│   └── train.py                 # Training script
└── experiments/                 # (future) Training logs and results
```

## 🏗️ Architecture Overview

```
Input text
    │
    ▼
┌─────────────────────────────────┐
│  ModernBERT-base (768-dim)      │  ← Backbone (trainable)
│  - RoPE positional embeddings   │
│  - FlashAttention 2             │
│  - GeGLU activations            │
│  - Alternating local/global attn│
│  - 8192 token context           │
└──────────────┬──────────────────┘
               │ mean-pool → v ∈ ℝ^768
               ▼
┌─────────────────────────────────┐
│  TopK Sparse Autoencoder        │  ← SAE Bottleneck (trainable)
│                                 │
│  Encoder: z = TopK(W_enc(v-b) + b_enc)
│           z ∈ ℝ^16384, ||z||_0 = k (32-128 active)
│                                 │
│  Decoder: v̂ = W_dec·z + b       │  ← For reconstruction loss only
│           (not used at inference)│
└──────────────┬──────────────────┘
               │
               ▼
        z (sparse embedding)
        Used for retrieval via sparse dot product
```

## 📊 Key Design Decisions

### Why TopK (not L1)?
- **Exact control** of sparsity (k active features guaranteed)
- **No shrinkage bias** — L1 pushes all activations toward 0
- **Better Pareto frontier** at scale (OpenAI, arxiv:2406.04093)
- **Dead latent prevention** via AuxK loss

### Why End-to-End (not frozen backbone)?
- Backbone learns to produce **optimally decomposable** representations
- CSR/SPLARE show frozen backbone limits retrieval performance
- Joint training enables the SAE to develop features that are *useful for retrieval*, not just reconstructive

### Why ModernBERT?
- SOTA encoder-only architecture (surpasses BERT/RoBERTa/DeBERTa)
- LLM innovations: RoPE, FlashAttn, GeGLU, 8k context
- 768-dim base / 1024-dim large — good SAE input dimensions
- Hardware-aware design (efficient on T4/A10/A100)

## 🔗 Key References

| Paper | ArXiv | Relevance |
|-------|-------|-----------|
| ModernBERT | [2412.13663](https://arxiv.org/abs/2412.13663) | Backbone architecture |
| TopK SAE (OpenAI) | [2406.04093](https://arxiv.org/abs/2406.04093) | SAE architecture + dead latent prevention |
| CSR (Beyond Matryoshka) | [2503.01776](https://arxiv.org/abs/2503.01776) | Contrastive sparse coding framework |
| SPLARE | [2603.13277](https://arxiv.org/abs/2603.13277) | SAE for retrieval (closest prior work) |
| SPLADE v2 | [2109.10086](https://arxiv.org/abs/2109.10086) | FLOPS regularizer for sparse retrieval |
| EmbeddingGemma | [2509.20354](https://arxiv.org/abs/2509.20354) | GOR spread-out regularizer |
| Nomic Embed v2 MoE | [2502.07972](https://arxiv.org/abs/2502.07972) | MoE encoder embeddings |
| Ettin | [2507.11412](https://arxiv.org/abs/2507.11412) | Encoder vs Decoder comparison |
| Theoretical Limits | [2508.21038](https://arxiv.org/abs/2508.21038) | Why single-vector has capacity limits |
| Disentangling Embeddings (SAE) | [2408.00657](https://arxiv.org/abs/2408.00657) | SAE interpretability for embeddings |
| Interpretable Embed SAE | [2512.10092](https://arxiv.org/abs/2512.10092) | SAE data analysis toolkit |
| Hypencoder | [2502.05364](https://arxiv.org/abs/2502.05364) | Beyond dot-product retrieval |
| RouterRetriever | [2409.02685](https://arxiv.org/abs/2409.02685) | Router + expert models pattern |

## ⚡ Quick Links

- **Backbone model**: [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base)
- **Training data**: [sentence-transformers/msmarco-bm25](https://huggingface.co/datasets/sentence-transformers/msmarco-bm25) + [sentence-transformers/all-nli](https://huggingface.co/datasets/sentence-transformers/all-nli)
- **Evaluation**: MTEB benchmark
- **SAE reference impl**: [OpenAI sparse_autoencoder](https://github.com/openai/sparse_autoencoder)
- **SPLARE (closest prior)**: [arxiv:2603.13277](https://arxiv.org/abs/2603.13277)
- **CSR code**: [github.com/Mhz1y/CSR](https://github.com/Mhz1y/CSR)

## 📈 Expected Outcomes

1. **Retrieval quality**: Competitive with dense ModernBERT embeddings on MTEB retrieval tasks
2. **Interpretability**: Each active SAE feature maps to a human-interpretable concept
3. **Steerability**: Users can boost/suppress features to control search results
4. **Efficiency**: Sparse dot product with inverted index — potentially faster than dense ANN
5. **Novel contribution**: First end-to-end jointly-trained SAE embedding encoder

<!-- ml-intern-provenance -->
## Generated by ML Intern

This model repository was generated by [ML Intern](https://github.com/huggingface/ml-intern), an agent for machine learning research and development on the Hugging Face Hub.

- Try ML Intern: https://smolagents-ml-intern.hf.space
- Source code: https://github.com/huggingface/ml-intern

## Usage

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "pauvanbr/sae-encoder-embeddings-research"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)
```

For non-causal architectures, replace `AutoModelForCausalLM` with the appropriate `AutoModel` class.