SSE Retrieval MRL v2: Regularization of Representation Space and Performance Improvement via Hyperparameter Optimization

Community Article Published May 13, 2026

sse_v2

Rikka Botan
Independent Researcher, Japan
https://rikka-botan.github.io

Abstract

This article details SSE Retrieval MRL v2 (Stable Static Embedding for Retrieval MRL v2), a lightweight and high-speed sentence embedding model. While traditional static embedding models have faced challenges regarding the trade-off between parameter count and inference speed, this model utilizes Separable Dynamic Tanh (DyT) to control gradient flow and achieve regularization of the representation space. Specifically, in version 2, fine-tuning hyperparameters resulted in an NDCG@10 score of 0.5158 on the NanoBEIR benchmark, outperforming previous versions and similarly sized models. SSE Retrieval MRL v2 attains a NanoBEIR mean NDCG@10 of 0.503 using only 256 dimensions. This matches the score reported in previous work with 1024 dimensions embeddings. These results suggest that the model is applicable to information retrieval tasks within resource-constrained environments.

1 Introduction

In the field of Natural Language Processing (NLP), techniques for embedding sentence semantics into vector spaces are fundamental to information retrieval and semantic search. While Transformer-based models demonstrate high accuracy, their computational costs remain a significant challenge for real-time inference on edge devices. Conversely, approaches utilizing static embeddings offer speed but have been noted for limitations in expressiveness and difficulties in regularization.

This study reports on the development of SSE Retrieval MRL v2 to address these challenges. We emphasize improvements over version 1 (v1), specifically focusing on enhanced representation space regularization through hyperparameter optimization to achieve a balance between accuracy and efficiency.

SSE Retrieval MRL v2:

https://huggingface.co/RikkaBotan/stable-static-embedding-fast-retrieval-mrl-en-v2

version 1 technical report:

https://huggingface.co/blog/RikkaBotan/stable-static-embedding-technical-report

2 Methodology

2.1 Architecture

SSE_Architecture

Figure 1 | SSE Architecture

2.2 Training Configuration

The model was trained with the following hyperparameters:

  • Batch Size: 2048 (per_device_train_batch_size)
  • Gradient Accumulation Steps: 4 (gradient_accumulation_steps)
  • Learning Rate: 0.1
  • Optimizer: AdamW (beta2: 0.9999, epsilon: 1e-10)
  • Scheduler: Cosine with Warmup (ratio: 0.1)
  • Epochs: 1
  • Computer: A100 SXM4 (80GB) (vast ai)

2.3 Datasets and Loss Functions

Training was conducted using 14 datasets, including SQuAD, TriviaQA, and AllNLI. We employed a combination of MatryoshkaLoss—aimed at maintaining performance across diverse dimensions—and MultipleNegativesRankingLoss.

3 Results

3.1 Training results

SSE2_loss

Figure 2 | Training Loss Across Training Steps

SSE2_ndcg

Figure 3 | NanoBEIR mean nDCG@10 Across Training Steps

3.2 Evaluation results

Evaluation results using the NanoBEIR benchmark are presented below.

Table 1 | Model Comparison (NanoBEIR Mean NDCG@10)

Model NanoBEIR NDCG@10 Dimensions Parameters Inference Speed Advantage
SSE Retrieval MRL v2 0.5158 512 ~16M Fast
SSE Retrieval MRL (v1) 0.5124 512 ~16M Fast
static-retrieval-mrl-en-v1 0.5032 1024 ~33M Baseline

3.3 Differences between v2 and v1

Version 2 demonstrates superiority over previous versions (SSE Retrieval MRL and static-retrieval-mrl-en-v1) in the following areas:

  1. Hyperparameter Optimization: Adjusting training hyperparameters strengthened representation space regularization, achieving an NDCG@10 of 0.5158, surpassing v1 (0.5124) and the baseline (0.5032).
  2. Dimensionality Compression and Speed: Compared to a 1024-dimensional model, we halved the embedding dimension to 512 while maintaining or improving accuracy. This resulted in approximately a two-fold increase in inference speed.

3.4 Matryoshka Truncation and Spectral Analysis

The performance improvement of SSE Retrieval MRL v2 is attributed to the synergistic effect of gradient control via DyT layers and hyperparameter tuning. Notably, the ability to maintain a score of 0.503 even when down-sampled to 256 dimensions due to Matryoshka properties suggests flexible applicability under resource constraints.

matryoshka_score

Figure 4 | NanoBEIR English mean nDCG@10 vs Matryoshka Embedding Truncation.

Table 2 | NanoBEIR English mean nDCG@10 vs Matryoshka Embedding Truncation.

Model 32 64 128 256 512 1024
SSE (Static Embedding + Separable DyT) v2 0.349 0.424 0.473 0.503 0.516 -
SSE (Static Embedding + Separable DyT) v1 0.345 0.428 0.466 0.497 0.512 -
Static Embedding + DyT 0.334 0.413 0.462 0.492 0.503 -
Static Embedding (no DyT) 0.337 0.416 0.463 0.491 0.507 -
static-retrieval-mrl-en-v1 (For reference) 0.353 0.418 0.462 0.482 0.496 0.503

pca_log

Figure 5 | PCA Spectrum on the 13 NanoBEIR English Datasets: Normalized Eigenvalue Decay (Logarithmic Scale).

4 Discussion

Compared to previous models, SSE Retrieval MRL v2 shows even stronger low-rank regularization. These results provide further evidence supporting the hypothesis that there is a correlation between low-rank regularization in the representation space and model performance when training with matryoshka loss and target learning.

However, this is a trend observed with SSE models, and it is not confirmed whether it holds for general embedding models.

5 Conclusion

SSE Retrieval MRL v2 has been demonstrated as a lightweight and high-performance information retrieval model by achieving regularization of the representation space through hyperparameter optimization. Improvements from version 1 have yielded significant progress in both accuracy and speed.

Acknowledgements

Our interest in this topic originated from reading Tom Aarsen's seminal article, Train 400x faster Static Embedding Models with Sentence Transformers, which motivated us to investigate on static embedding.

I thank the developers of sentence-transformers, python and pytorch.

I thank all the researchers for their efforts to date.

I thank Japan's high standard of education.

And most of all, thank you for your interest in this blog.

About us

Japanese independent researcher having shy and pampered personality. Twin-tail hair is a charm point. Interested in nlp. Usually using python and C.

Please contact us if you have any requests for joint research, writing, speaking engagements, or employment.

RikkaBotan_Logo

References

SSE_Logo

Community

Sign up or log in to comment