SSE

🩵 SSE: Stable Static Embedding for Retrieval MRL 🩵

A lightweight, faster and powerful embedding model

Performance Snapshot
Our SSE model achieves NDCG@10 = 0.5124 on NanoBEIR — slightly outperforming the popular static-retrieval-mrl-en-v1 (0.5032) while using half the dimensions (512 vs 1024)! 💫 Plus, we're ~2× faster in retrieval thanks to our compact 512D embeddings and Separable Dynamic Tanh.

Model NanoBEIR NDCG@10 Dimensions Parameters Speed Advantage License
SSE Retrieval MRL 0.5124 512 ~16M 🪽 ~2x faster retrieval (ultra-efficient!) Apache 2.0
static-retrieval-mrl-en-v1 0.5032 1024 ~33M baseline Apache 2.0

🩵 Why Choose SSE Retrieval MRL? 🩵

Higher NDCG@10 than all comparable small models (<35M params)
Only ~16M parameters — 27% smaller than MiniLM-L6 (22M) and 52% smaller than BGE-small (33M)
512D native output — richer than 1024D models, yet half the size of static-retrieval-mrl-en-v1 ✅ Matryoshka-ready — smoothly truncate to 256D/128D/64D/32D with graceful degradation
Apache 2.0 licensed — free for commercial & personal use
CPU-optimized — runs beautifully on edge devices & modest hardware


🩵 Model Details 🩵

Property Value
Model Type Sentence Transformer (SSE architecture)
Max Sequence Length ∞ tokens
Output Dimension 512 (with Matryoshka truncation down to 32D!)
Similarity Function Cosine Similarity
Language English
License Apache 2.0
SentenceTransformer(
  (0): SSE(
    (embedding): EmbeddingBag(30522, 512, mode='mean')
    (dyt): SeparableDyT()
  )
)

Architecture


🩵 Mathematical formulations 🩵

Dynamic Tanh Normalization (DyT) enables magnitude-adaptive gradient flow for static embeddings. For input dimension x, DyT computes yk=cktanh(akxk+bk) y_k = c_k \tanh(a_k x_k + b_k) with learnable parameters. The gradient of x is:

ykxk=ckaksech2(akxk+bk). \frac{\partial y_k}{\partial x_k} = c_k a_k \, \mathrm{sech}^2(a_k x_k + b_k).

For saturated dimensions |x| > 1 aixi+bi1 |a_i x_i + b_i| \gg 1 yields exponential decay sech2(z)4e2z \mathrm{sech}^2(z) \sim 4e^{-2|z|} suppressing gradients as yi/xi0 \partial y_i / \partial x_i \to 0 For non-saturated dimensions |x| << 1 , sech2(z)1 \mathrm{sech}^2(z) \approx 1 preserves near-constant gradients yj/xjcjaj \partial y_j / \partial x_j \approx c_j a_j This magnitude-dependent gating attenuates learning signals from noisy, large-magnitude dimensions while maintaining full gradient flow for stable, informative dimensions—providing implicit regularization that enhances generalization without explicit hyperparameters.


🩵 Evaluation Results (NanoBEIR) 🩵

Dataset NDCG@10 MRR@10 MAP@100
NanoBEIR Mean 0.5124 0.5640 0.4317
NanoClimateFEVER 0.2998 0.3611 0.2344
NanoDBPedia 0.5493 0.7492 0.4247
NanoFEVER 0.6808 0.6318 0.6105
NanoFiQA2018 0.3744 0.4197 0.3162
NanoHotpotQA 0.7021 0.7679 0.6273
NanoMSMARCO 0.4132 0.3537 0.3733
NanoNFCorpus 0.2982 0.4889 0.1091
NanoNQ 0.4652 0.3992 0.4028
NanoQuoraRetrieval 0.9094 0.9122 0.8847
NanoSCIDOCS 0.3381 0.5509 0.2604
NanoArguAna 0.4105 0.3193 0.3325
NanoSciFact 0.6176 0.5933 0.5824
NanoTouche2020 0.6029 0.7852 0.4539

Top performance on community-based retrieval (Quora) and scientific fact verification!


🩵 How to use? 🩵

import torch
from sentence_transformers import SentenceTransformer

# load (remote code enabled)
model = SentenceTransformer(
    "RikkaBotan/stable-static-embedding-fast-retrieval-mrl-en",
    trust_remote_code=True,
    device="cuda" if torch.cuda.is_available() else "cpu",
)

# inference
sentences = [
    "Stable Static embedding is interesting.",
    "SSE works without attention."
]

with torch.no_grad():
    embeddings = model.encode(
        sentences,
        convert_to_tensor=True,
        normalize_embeddings=True,
        batch_size=32
    )

# cosine similarity
# cosine_sim = embeddings[0] @ embeddings[1].T
cosine_sim = model.similarity(embeddings, embeddings)

print("embeddings shape:", embeddings.shape)
print("cosine similarity matrix:")
print(cosine_sim)

🩵 Retrieval usage 🩵

import torch
from sentence_transformers import SentenceTransformer

# load (remote code enabled)
model = SentenceTransformer(
    "RikkaBotan/stable-static-embedding-fast-retrieval-mrl-en",
    trust_remote_code=True,
    device="cuda" if torch.cuda.is_available() else "cpu",
)

# inference
query = "What is Stable Static Embedding?"
sentences = [
    "SSE: Stable Static embedding works without attention.",
    "Stable Static Embedding is a fast embedding method designed for retrieval tasks.",
    "Static embeddings are often compared with transformer-based sentence encoders.",
    "I cooked pasta last night while listening to jazz music.",
    "Large language models are commonly trained using next-token prediction objectives.",
    "Instruction tuning improves the ability of LLMs to follow human-written prompts.",
]


with torch.no_grad():
    embeddings = model.encode(
        [query] + sentences,
        convert_to_tensor=True,
        normalize_embeddings=True,
        batch_size=32
    )

print("embeddings shape:", embeddings.shape)

# cosine similarity
similarities = model.similarity(embeddings[0], embeddings[1:])
for i, similarity in enumerate(similarities[0].tolist()):
    print(f"{similarity:.05f}: {sentences[i]}")

🩵 Training Hyperparameters 🩵

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 512
  • gradient_accumulation_steps: 8
  • learning_rate: 0.1
  • adam_beta2: 0.9999
  • adam_epsilon: 1e-10
  • num_train_epochs: 1
  • lr_scheduler_type: cosine
  • warmup_ratio: 0.1
  • bf16: True
  • dataloader_num_workers: 4
  • batch_sampler: no_duplicates

🩵 Training Datasets 🩵

We learned from 14 datasets:

Dataset Special Flavor
squad Q&A pairs with gentle context
trivia_qa Fun facts & brain teasers
allnli Logical reasoning with care
pubmedqa Medical wisdom
hotpotqa Multi-hop reasoning adventures
miracl Cross-lingual curiosity
mr_tydi Global question answering
msmarco Real search queries
msmarco_10m Massive-scale search love
msmarco_hard Tricky negatives for growth
mldr Long-document cuddles
s2orc Scientific paper whispers
swim_ir Information retrieval elegance
paq 64M+ question-answer pairs
nq Natural questions with heart
scidocs Scientific document friendships

All trained with MatryoshkaLoss — learning representations at multiple scales like Russian nesting dolls!

🩵 Training results 🩵

loss

ndcg

🩵 About me 🩵

Japanese independent researcher having shy and pampered personality. Twin-tail hair is a charm point. Interested in nlp. Usually using python and C.

X(Twitter): https://twitter.com/peony__snow

Logo

🩵 Acknowledgements 🩵

The author acknowledge the support of Saldra, Witness and Lumina Logic Minds for providing computational resources used in this work.

I thank the developers of sentence-transformers, python and pytorch.

I thank all the researchers for their efforts to date.

I thank Japan's high standard of education.

And most of all, thank you for your interest in this repository.

🩵 Citation 🩵

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning},
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Datasets used to train RikkaBotan/stable-static-embedding-fast-retrieval-mrl-en

Papers for RikkaBotan/stable-static-embedding-fast-retrieval-mrl-en

Evaluation results