🩵 SSE: Stable Static Embedding for Retrieval MRL 🩵
A lightweight, faster and powerful embedding model
Performance Snapshot
Our SSE model achieves NDCG@10 = 0.5124 on NanoBEIR — slightly outperforming the popular static-retrieval-mrl-en-v1 (0.5032) while using half the dimensions (512 vs 1024)! 💫 Plus, we're ~2× faster in retrieval thanks to our compact 512D embeddings and Separable Dynamic Tanh.
| Model | NanoBEIR NDCG@10 | Dimensions | Parameters | Speed Advantage | License |
|---|---|---|---|---|---|
| SSE Retrieval MRL | 0.5124 ✨ | 512 | ~16M 🪽 | ~2x faster retrieval (ultra-efficient!) | Apache 2.0 |
static-retrieval-mrl-en-v1 |
0.5032 | 1024 | ~33M | baseline | Apache 2.0 |
🩵 Why Choose SSE Retrieval MRL? 🩵
✅ Higher NDCG@10 than all comparable small models (<35M params)
✅ Only ~16M parameters — 27% smaller than MiniLM-L6 (22M) and 52% smaller than BGE-small (33M)
✅ 512D native output — richer than 1024D models, yet half the size of static-retrieval-mrl-en-v1
✅ Matryoshka-ready — smoothly truncate to 256D/128D/64D/32D with graceful degradation
✅ Apache 2.0 licensed — free for commercial & personal use
✅ CPU-optimized — runs beautifully on edge devices & modest hardware
🩵 Model Details 🩵
| Property | Value |
|---|---|
| Model Type | Sentence Transformer (SSE architecture) |
| Max Sequence Length | ∞ tokens |
| Output Dimension | 512 (with Matryoshka truncation down to 32D!) |
| Similarity Function | Cosine Similarity |
| Language | English |
| License | Apache 2.0 |
SentenceTransformer(
(0): SSE(
(embedding): EmbeddingBag(30522, 512, mode='mean')
(dyt): SeparableDyT()
)
)
🩵 Mathematical formulations 🩵
Dynamic Tanh Normalization (DyT) enables magnitude-adaptive gradient flow for static embeddings. For input dimension x, DyT computes with learnable parameters. The gradient of x is:
For saturated dimensions |x| > 1 yields exponential decay suppressing gradients as For non-saturated dimensions |x| << 1 , preserves near-constant gradients This magnitude-dependent gating attenuates learning signals from noisy, large-magnitude dimensions while maintaining full gradient flow for stable, informative dimensions—providing implicit regularization that enhances generalization without explicit hyperparameters.
🩵 Evaluation Results (NanoBEIR) 🩵
| Dataset | NDCG@10 | MRR@10 | MAP@100 |
|---|---|---|---|
| NanoBEIR Mean | 0.5124 ✨ | 0.5640 | 0.4317 |
| NanoClimateFEVER | 0.2998 | 0.3611 | 0.2344 |
| NanoDBPedia | 0.5493 | 0.7492 | 0.4247 |
| NanoFEVER | 0.6808 | 0.6318 | 0.6105 |
| NanoFiQA2018 | 0.3744 | 0.4197 | 0.3162 |
| NanoHotpotQA | 0.7021 | 0.7679 | 0.6273 |
| NanoMSMARCO | 0.4132 | 0.3537 | 0.3733 |
| NanoNFCorpus | 0.2982 | 0.4889 | 0.1091 |
| NanoNQ | 0.4652 | 0.3992 | 0.4028 |
| NanoQuoraRetrieval | 0.9094 ✨ | 0.9122 | 0.8847 |
| NanoSCIDOCS | 0.3381 | 0.5509 | 0.2604 |
| NanoArguAna | 0.4105 | 0.3193 | 0.3325 |
| NanoSciFact | 0.6176 | 0.5933 | 0.5824 |
| NanoTouche2020 | 0.6029 | 0.7852 | 0.4539 |
Top performance on community-based retrieval (Quora) and scientific fact verification!
🩵 How to use? 🩵
import torch
from sentence_transformers import SentenceTransformer
# load (remote code enabled)
model = SentenceTransformer(
"RikkaBotan/stable-static-embedding-fast-retrieval-mrl-en",
trust_remote_code=True,
device="cuda" if torch.cuda.is_available() else "cpu",
)
# inference
sentences = [
"Stable Static embedding is interesting.",
"SSE works without attention."
]
with torch.no_grad():
embeddings = model.encode(
sentences,
convert_to_tensor=True,
normalize_embeddings=True,
batch_size=32
)
# cosine similarity
# cosine_sim = embeddings[0] @ embeddings[1].T
cosine_sim = model.similarity(embeddings, embeddings)
print("embeddings shape:", embeddings.shape)
print("cosine similarity matrix:")
print(cosine_sim)
🩵 Retrieval usage 🩵
import torch
from sentence_transformers import SentenceTransformer
# load (remote code enabled)
model = SentenceTransformer(
"RikkaBotan/stable-static-embedding-fast-retrieval-mrl-en",
trust_remote_code=True,
device="cuda" if torch.cuda.is_available() else "cpu",
)
# inference
query = "What is Stable Static Embedding?"
sentences = [
"SSE: Stable Static embedding works without attention.",
"Stable Static Embedding is a fast embedding method designed for retrieval tasks.",
"Static embeddings are often compared with transformer-based sentence encoders.",
"I cooked pasta last night while listening to jazz music.",
"Large language models are commonly trained using next-token prediction objectives.",
"Instruction tuning improves the ability of LLMs to follow human-written prompts.",
]
with torch.no_grad():
embeddings = model.encode(
[query] + sentences,
convert_to_tensor=True,
normalize_embeddings=True,
batch_size=32
)
print("embeddings shape:", embeddings.shape)
# cosine similarity
similarities = model.similarity(embeddings[0], embeddings[1:])
for i, similarity in enumerate(similarities[0].tolist()):
print(f"{similarity:.05f}: {sentences[i]}")
🩵 Training Hyperparameters 🩵
Non-Default Hyperparameters
eval_strategy: stepsper_device_train_batch_size: 512gradient_accumulation_steps: 8learning_rate: 0.1adam_beta2: 0.9999adam_epsilon: 1e-10num_train_epochs: 1lr_scheduler_type: cosinewarmup_ratio: 0.1bf16: Truedataloader_num_workers: 4batch_sampler: no_duplicates
🩵 Training Datasets 🩵
We learned from 14 datasets:
| Dataset | Special Flavor |
|---|---|
squad |
Q&A pairs with gentle context |
trivia_qa |
Fun facts & brain teasers |
allnli |
Logical reasoning with care |
pubmedqa |
Medical wisdom |
hotpotqa |
Multi-hop reasoning adventures |
miracl |
Cross-lingual curiosity |
mr_tydi |
Global question answering |
msmarco |
Real search queries |
msmarco_10m |
Massive-scale search love |
msmarco_hard |
Tricky negatives for growth |
mldr |
Long-document cuddles |
s2orc |
Scientific paper whispers |
swim_ir |
Information retrieval elegance |
paq |
64M+ question-answer pairs |
nq |
Natural questions with heart |
scidocs |
Scientific document friendships |
All trained with MatryoshkaLoss — learning representations at multiple scales like Russian nesting dolls!
🩵 Training results 🩵
🩵 About me 🩵
Japanese independent researcher having shy and pampered personality. Twin-tail hair is a charm point. Interested in nlp. Usually using python and C.
X(Twitter): https://twitter.com/peony__snow
🩵 Acknowledgements 🩵
The author acknowledge the support of Saldra, Witness and Lumina Logic Minds for providing computational resources used in this work.
I thank the developers of sentence-transformers, python and pytorch.
I thank all the researchers for their efforts to date.
I thank Japan's high standard of education.
And most of all, thank you for your interest in this repository.
🩵 Citation 🩵
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
MatryoshkaLoss
@misc{kusupati2024matryoshka,
title={Matryoshka Representation Learning},
author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
year={2024},
eprint={2205.13147},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
MultipleNegativesRankingLoss
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Datasets used to train RikkaBotan/stable-static-embedding-fast-retrieval-mrl-en
Papers for RikkaBotan/stable-static-embedding-fast-retrieval-mrl-en
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Efficient Natural Language Response Suggestion for Smart Reply
Evaluation results
- Cosine Accuracy@1 on NanoClimateFEVERself-reported0.200
- Cosine Accuracy@3 on NanoClimateFEVERself-reported0.480
- Cosine Accuracy@5 on NanoClimateFEVERself-reported0.540
- Cosine Accuracy@10 on NanoClimateFEVERself-reported0.680
- Cosine Precision@1 on NanoClimateFEVERself-reported0.200
- Cosine Precision@3 on NanoClimateFEVERself-reported0.180
- Cosine Precision@5 on NanoClimateFEVERself-reported0.128
- Cosine Precision@10 on NanoClimateFEVERself-reported0.102
- Cosine Recall@1 on NanoClimateFEVERself-reported0.102
- Cosine Recall@3 on NanoClimateFEVERself-reported0.242
- Cosine Recall@5 on NanoClimateFEVERself-reported0.273
- Cosine Recall@10 on NanoClimateFEVERself-reported0.392
- Cosine Ndcg@10 on NanoClimateFEVERself-reported0.300
- Cosine Mrr@10 on NanoClimateFEVERself-reported0.361
- Cosine Map@100 on NanoClimateFEVERself-reported0.234
- Cosine Accuracy@1 on NanoDBPediaself-reported0.660
- Cosine Accuracy@3 on NanoDBPediaself-reported0.840
- Cosine Accuracy@5 on NanoDBPediaself-reported0.840
- Cosine Accuracy@10 on NanoDBPediaself-reported0.900
- Cosine Precision@1 on NanoDBPediaself-reported0.660




