SSE Retrieval MRL v2: Regularization of Representation Space and Performance Improvement via Hyperparameter Optimization

Community Article Published May 13, 2026

Rikka Botan
Independent Researcher, Japan
https://rikka-botan.github.io

Abstract

This article details SSE Retrieval MRL v2 (Stable Static Embedding for Retrieval MRL v2), a lightweight and high-speed sentence embedding model. While traditional static embedding models have faced challenges regarding the trade-off between parameter count and inference speed, this model utilizes Separable Dynamic Tanh (DyT) to control gradient flow and achieve regularization of the representation space. Specifically, in version 2, fine-tuning hyperparameters resulted in an NDCG@10 score of 0.5158 on the NanoBEIR benchmark, outperforming previous versions and similarly sized models. SSE Retrieval MRL v2 attains a NanoBEIR mean NDCG@10 of 0.503 using only 256 dimensions. This matches the score reported in previous work with 1024 dimensions embeddings. These results suggest that the model is applicable to information retrieval tasks within resource-constrained environments.

1 Introduction

In the field of Natural Language Processing (NLP), techniques for embedding sentence semantics into vector spaces are fundamental to information retrieval and semantic search. While Transformer-based models demonstrate high accuracy, their computational costs remain a significant challenge for real-time inference on edge devices. Conversely, approaches utilizing static embeddings offer speed but have been noted for limitations in expressiveness and difficulties in regularization.

This study reports on the development of SSE Retrieval MRL v2 to address these challenges. We emphasize improvements over version 1 (v1), specifically focusing on enhanced representation space regularization through hyperparameter optimization to achieve a balance between accuracy and efficiency.

SSE Retrieval MRL v2:

https://huggingface.co/RikkaBotan/stable-static-embedding-fast-retrieval-mrl-en-v2

version 1 technical report:

https://huggingface.co/blog/RikkaBotan/stable-static-embedding-technical-report

2 Methodology

2.1 Architecture

Figure 1 | SSE Architecture

2.2 Training Configuration

The model was trained with the following hyperparameters:

Batch Size: 2048 (per_device_train_batch_size)
Gradient Accumulation Steps: 4 (gradient_accumulation_steps)
Learning Rate: 0.1
Optimizer: AdamW (beta2: 0.9999, epsilon: 1e-10)
Scheduler: Cosine with Warmup (ratio: 0.1)
Epochs: 1
Computer: A100 SXM4 (80GB) (vast ai)

2.3 Datasets and Loss Functions

Training was conducted using 14 datasets, including SQuAD, TriviaQA, and AllNLI. We employed a combination of MatryoshkaLoss—aimed at maintaining performance across diverse dimensions—and MultipleNegativesRankingLoss.

3 Results

3.1 Training results

Figure 2 | Training Loss Across Training Steps

Figure 3 | NanoBEIR mean nDCG@10 Across Training Steps

3.2 Evaluation results

Evaluation results using the NanoBEIR benchmark are presented below.

Table 1 | Model Comparison (NanoBEIR Mean NDCG@10)

Model	NanoBEIR NDCG@10	Dimensions	Parameters	Inference Speed Advantage
SSE Retrieval MRL v2	0.5158	512	~16M	Fast
SSE Retrieval MRL (v1)	0.5124	512	~16M	Fast
static-retrieval-mrl-en-v1	0.5032	1024	~33M	Baseline

3.3 Differences between v2 and v1

Version 2 demonstrates superiority over previous versions (SSE Retrieval MRL and static-retrieval-mrl-en-v1) in the following areas:

Hyperparameter Optimization: Adjusting training hyperparameters strengthened representation space regularization, achieving an NDCG@10 of 0.5158, surpassing v1 (0.5124) and the baseline (0.5032).
Dimensionality Compression and Speed: Compared to a 1024-dimensional model, we halved the embedding dimension to 512 while maintaining or improving accuracy. This resulted in approximately a two-fold increase in inference speed.

3.4 Matryoshka Truncation and Spectral Analysis

The performance improvement of SSE Retrieval MRL v2 is attributed to the synergistic effect of gradient control via DyT layers and hyperparameter tuning. Notably, the ability to maintain a score of 0.503 even when down-sampled to 256 dimensions due to Matryoshka properties suggests flexible applicability under resource constraints.

Figure 4 | NanoBEIR English mean nDCG@10 vs Matryoshka Embedding Truncation.

Table 2 | NanoBEIR English mean nDCG@10 vs Matryoshka Embedding Truncation.

Model	32	64	128	256	512	1024
SSE (Static Embedding + Separable DyT) v2	0.349	0.424	0.473	0.503	0.516	-
SSE (Static Embedding + Separable DyT) v1	0.345	0.428	0.466	0.497	0.512	-
Static Embedding + DyT	0.334	0.413	0.462	0.492	0.503	-
Static Embedding (no DyT)	0.337	0.416	0.463	0.491	0.507	-
static-retrieval-mrl-en-v1 (For reference)	0.353	0.418	0.462	0.482	0.496	0.503

Figure 5 | PCA Spectrum on the 13 NanoBEIR English Datasets: Normalized Eigenvalue Decay (Logarithmic Scale).

4 Discussion

Compared to previous models, SSE Retrieval MRL v2 shows even stronger low-rank regularization. These results provide further evidence supporting the hypothesis that there is a correlation between low-rank regularization in the representation space and model performance when training with matryoshka loss and target learning.

However, this is a trend observed with SSE models, and it is not confirmed whether it holds for general embedding models.

5 Conclusion

SSE Retrieval MRL v2 has been demonstrated as a lightweight and high-performance information retrieval model by achieving regularization of the representation space through hyperparameter optimization. Improvements from version 1 have yielded significant progress in both accuracy and speed.

Acknowledgements

Our interest in this topic originated from reading Tom Aarsen's seminal article, Train 400x faster Static Embedding Models with Sentence Transformers, which motivated us to investigate on static embedding.

I thank the developers of sentence-transformers, python and pytorch.

I thank all the researchers for their efforts to date.

I thank Japan's high standard of education.

And most of all, thank you for your interest in this blog.

About us

Japanese independent researcher having shy and pampered personality. Twin-tail hair is a charm point. Interested in nlp. Usually using python and C.

Please contact us if you have any requests for joint research, writing, speaking engagements, or employment.

References

Models mentioned in this article 3

SSE (Stable Static Embedding): Unlocking the Potential of Static Embeddings, A Dynamic Tanh Normalization Approach without Speed Penalty

March 14, 2026

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote