Update ML Intern artifact metadata

5e1b973 verified 18 days ago

7.97 kB

	---
	tags:
	- ml-intern
	---
	# SAE Encoder Embeddings: End-to-End Sparse Autoencoder Bottleneck for Retrieval

	> Status: Research & Architecture Design Phase
	> Goal: Build the first encoder-only embedding model where the representation layer IS a Sparse Autoencoder, trained end-to-end with contrastive loss.

	## 🎯 What This Is

	A novel embedding architecture that combines:
	- ModernBERT backbone (SOTA encoder-only with LLM innovations)
	- TopK Sparse Autoencoder as the embedding bottleneck layer
	- End-to-end contrastive training (not post-hoc SAE on frozen embeddings)

	This produces embeddings that are simultaneously:
	1. Interpretable — each active dimension corresponds to a learned semantic concept
	2. Steerable — suppress/amplify specific features to control retrieval
	3. Sparse-indexable — native sparse vector search (inverted index, not ANN)
	4. Competitive — trained with modern contrastive objectives + hard negatives

	## 🔬 Why This Is Novel

	\| Approach \| Training \| Interpretable? \| Sparse-native? \| End-to-end? \|
	\|----------\|----------\|---------------\|----------------\|-------------\|
	\| Dense bi-encoder (e.g., E5, GTE) \| Contrastive \| ❌ \| ❌ \| ✅ \|
	\| SPLADE \| Distillation + regularizer \| ⚠️ (vocab-tied) \| ✅ \| ✅ \|
	\| Post-hoc SAE on embeddings \| Reconstruction only \| ✅ \| ✅ \| ❌ \|
	\| CSR (Beyond Matryoshka) \| Contrastive + recon (frozen backbone) \| ✅ \| ✅ \| ❌ (backbone frozen) \|
	\| SPLARE (Mar 2026) \| Distillation (KL from cross-encoder) \| ✅ \| ✅ \| ⚠️ (pretrained SAE, frozen LLM) \|
	\| Ours (this project) \| Contrastive + recon + FLOPS reg \| ✅ \| ✅ \| ✅ (backbone + SAE jointly) \|

	Key differentiator: All prior SAE-for-retrieval work either freezes the backbone or freezes the SAE. We train both jointly, meaning the backbone learns to produce representations that are optimally decomposable into sparse interpretable features.

	## 📂 Repository Structure

	```
	├── README.md # This file
	├── ARCHITECTURE.md # Detailed architecture design
	├── PAPERS.md # Papers bibliography + key findings
	├── TRAINING_RECIPE.md # Full training recipe with hyperparameters
	├── src/ # (future) Implementation code
	│ ├── model.py # SAE bottleneck + ModernBERT
	│ ├── loss.py # Combined loss functions
	│ └── train.py # Training script
	└── experiments/ # (future) Training logs and results
	```

	## 🏗️ Architecture Overview

	```
	Input text
	│
	▼
	┌─────────────────────────────────┐
	│ ModernBERT-base (768-dim) │ ← Backbone (trainable)
	│ - RoPE positional embeddings │
	│ - FlashAttention 2 │
	│ - GeGLU activations │
	│ - Alternating local/global attn│
	│ - 8192 token context │
	└──────────────┬──────────────────┘
	│ mean-pool → v ∈ ℝ^768
	▼
	┌─────────────────────────────────┐
	│ TopK Sparse Autoencoder │ ← SAE Bottleneck (trainable)
	│ │
	│ Encoder: z = TopK(W_enc(v-b) + b_enc)
	│ z ∈ ℝ^16384, \|\|z\|\|_0 = k (32-128 active)
	│ │
	│ Decoder: v̂ = W_dec·z + b │ ← For reconstruction loss only
	│ (not used at inference)│
	└──────────────┬──────────────────┘
	│
	▼
	z (sparse embedding)
	Used for retrieval via sparse dot product
	```

	## 📊 Key Design Decisions

	### Why TopK (not L1)?
	- Exact control of sparsity (k active features guaranteed)
	- No shrinkage bias — L1 pushes all activations toward 0
	- Better Pareto frontier at scale (OpenAI, arxiv:2406.04093)
	- Dead latent prevention via AuxK loss

	### Why End-to-End (not frozen backbone)?
	- Backbone learns to produce optimally decomposable representations
	- CSR/SPLARE show frozen backbone limits retrieval performance
	- Joint training enables the SAE to develop features that are useful for retrieval, not just reconstructive

	### Why ModernBERT?
	- SOTA encoder-only architecture (surpasses BERT/RoBERTa/DeBERTa)
	- LLM innovations: RoPE, FlashAttn, GeGLU, 8k context
	- 768-dim base / 1024-dim large — good SAE input dimensions
	- Hardware-aware design (efficient on T4/A10/A100)

	## 🔗 Key References

	\| Paper \| ArXiv \| Relevance \|
	\|-------\|-------\|-----------\|
	\| ModernBERT \| [2412.13663](https://arxiv.org/abs/2412.13663) \| Backbone architecture \|
	\| TopK SAE (OpenAI) \| [2406.04093](https://arxiv.org/abs/2406.04093) \| SAE architecture + dead latent prevention \|
	\| CSR (Beyond Matryoshka) \| [2503.01776](https://arxiv.org/abs/2503.01776) \| Contrastive sparse coding framework \|
	\| SPLARE \| [2603.13277](https://arxiv.org/abs/2603.13277) \| SAE for retrieval (closest prior work) \|
	\| SPLADE v2 \| [2109.10086](https://arxiv.org/abs/2109.10086) \| FLOPS regularizer for sparse retrieval \|
	\| EmbeddingGemma \| [2509.20354](https://arxiv.org/abs/2509.20354) \| GOR spread-out regularizer \|
	\| Nomic Embed v2 MoE \| [2502.07972](https://arxiv.org/abs/2502.07972) \| MoE encoder embeddings \|
	\| Ettin \| [2507.11412](https://arxiv.org/abs/2507.11412) \| Encoder vs Decoder comparison \|
	\| Theoretical Limits \| [2508.21038](https://arxiv.org/abs/2508.21038) \| Why single-vector has capacity limits \|
	\| Disentangling Embeddings (SAE) \| [2408.00657](https://arxiv.org/abs/2408.00657) \| SAE interpretability for embeddings \|
	\| Interpretable Embed SAE \| [2512.10092](https://arxiv.org/abs/2512.10092) \| SAE data analysis toolkit \|
	\| Hypencoder \| [2502.05364](https://arxiv.org/abs/2502.05364) \| Beyond dot-product retrieval \|
	\| RouterRetriever \| [2409.02685](https://arxiv.org/abs/2409.02685) \| Router + expert models pattern \|

	## ⚡ Quick Links

	- Backbone model: [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base)
	- Training data: [sentence-transformers/msmarco-bm25](https://huggingface.co/datasets/sentence-transformers/msmarco-bm25) + [sentence-transformers/all-nli](https://huggingface.co/datasets/sentence-transformers/all-nli)
	- Evaluation: MTEB benchmark
	- SAE reference impl: [OpenAI sparse_autoencoder](https://github.com/openai/sparse_autoencoder)
	- SPLARE (closest prior): [arxiv:2603.13277](https://arxiv.org/abs/2603.13277)
	- CSR code: [github.com/Mhz1y/CSR](https://github.com/Mhz1y/CSR)

	## 📈 Expected Outcomes

	1. Retrieval quality: Competitive with dense ModernBERT embeddings on MTEB retrieval tasks
	2. Interpretability: Each active SAE feature maps to a human-interpretable concept
	3. Steerability: Users can boost/suppress features to control search results
	4. Efficiency: Sparse dot product with inverted index — potentially faster than dense ANN
	5. Novel contribution: First end-to-end jointly-trained SAE embedding encoder

	<!-- ml-intern-provenance -->
	## Generated by ML Intern

	This model repository was generated by [ML Intern](https://github.com/huggingface/ml-intern), an agent for machine learning research and development on the Hugging Face Hub.

	- Try ML Intern: https://smolagents-ml-intern.hf.space
	- Source code: https://github.com/huggingface/ml-intern

	## Usage

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model_id = "pauvanbr/sae-encoder-embeddings-research"
	tokenizer = AutoTokenizer.from_pretrained(model_id)
	model = AutoModelForCausalLM.from_pretrained(model_id)
	```

	For non-causal architectures, replace `AutoModelForCausalLM` with the appropriate `AutoModel` class.