SSE Retrieval MRL v2: Regularization of Representation Space and Performance Improvement via Hyperparameter Optimization
Rikka Botan
Independent Researcher, Japan
https://rikka-botan.github.io
Abstract
1 Introduction
In the field of Natural Language Processing (NLP), techniques for embedding sentence semantics into vector spaces are fundamental to information retrieval and semantic search. While Transformer-based models demonstrate high accuracy, their computational costs remain a significant challenge for real-time inference on edge devices. Conversely, approaches utilizing static embeddings offer speed but have been noted for limitations in expressiveness and difficulties in regularization.
This study reports on the development of SSE Retrieval MRL v2 to address these challenges. We emphasize improvements over version 1 (v1), specifically focusing on enhanced representation space regularization through hyperparameter optimization to achieve a balance between accuracy and efficiency.
SSE Retrieval MRL v2:
https://huggingface.co/RikkaBotan/stable-static-embedding-fast-retrieval-mrl-en-v2
version 1 technical report:
https://huggingface.co/blog/RikkaBotan/stable-static-embedding-technical-report
2 Methodology
2.1 Architecture
Figure 1 | SSE Architecture
2.2 Training Configuration
The model was trained with the following hyperparameters:
- Batch Size: 2048 (
per_device_train_batch_size) - Gradient Accumulation Steps: 4 (
gradient_accumulation_steps) - Learning Rate: 0.1
- Optimizer: AdamW (beta2: 0.9999, epsilon: 1e-10)
- Scheduler: Cosine with Warmup (ratio: 0.1)
- Epochs: 1
- Computer: A100 SXM4 (80GB) (vast ai)
2.3 Datasets and Loss Functions
Training was conducted using 14 datasets, including SQuAD, TriviaQA, and AllNLI. We employed a combination of MatryoshkaLoss—aimed at maintaining performance across diverse dimensions—and MultipleNegativesRankingLoss.
3 Results
3.1 Training results
Figure 2 | Training Loss Across Training Steps
Figure 3 | NanoBEIR mean nDCG@10 Across Training Steps
3.2 Evaluation results
Evaluation results using the NanoBEIR benchmark are presented below.
Table 1 | Model Comparison (NanoBEIR Mean NDCG@10)
| Model | NanoBEIR NDCG@10 | Dimensions | Parameters | Inference Speed Advantage |
|---|---|---|---|---|
| SSE Retrieval MRL v2 | 0.5158 | 512 | ~16M | Fast |
| SSE Retrieval MRL (v1) | 0.5124 | 512 | ~16M | Fast |
| static-retrieval-mrl-en-v1 | 0.5032 | 1024 | ~33M | Baseline |
3.3 Differences between v2 and v1
Version 2 demonstrates superiority over previous versions (SSE Retrieval MRL and static-retrieval-mrl-en-v1) in the following areas:
- Hyperparameter Optimization: Adjusting training hyperparameters strengthened representation space regularization, achieving an NDCG@10 of 0.5158, surpassing v1 (0.5124) and the baseline (0.5032).
- Dimensionality Compression and Speed: Compared to a 1024-dimensional model, we halved the embedding dimension to 512 while maintaining or improving accuracy. This resulted in approximately a two-fold increase in inference speed.
3.4 Matryoshka Truncation and Spectral Analysis
The performance improvement of SSE Retrieval MRL v2 is attributed to the synergistic effect of gradient control via DyT layers and hyperparameter tuning. Notably, the ability to maintain a score of 0.503 even when down-sampled to 256 dimensions due to Matryoshka properties suggests flexible applicability under resource constraints.
Figure 4 | NanoBEIR English mean nDCG@10 vs Matryoshka Embedding Truncation.
Table 2 | NanoBEIR English mean nDCG@10 vs Matryoshka Embedding Truncation.
| Model | 32 | 64 | 128 | 256 | 512 | 1024 |
|---|---|---|---|---|---|---|
| SSE (Static Embedding + Separable DyT) v2 | 0.349 | 0.424 | 0.473 | 0.503 | 0.516 | - |
| SSE (Static Embedding + Separable DyT) v1 | 0.345 | 0.428 | 0.466 | 0.497 | 0.512 | - |
| Static Embedding + DyT | 0.334 | 0.413 | 0.462 | 0.492 | 0.503 | - |
| Static Embedding (no DyT) | 0.337 | 0.416 | 0.463 | 0.491 | 0.507 | - |
| static-retrieval-mrl-en-v1 (For reference) | 0.353 | 0.418 | 0.462 | 0.482 | 0.496 | 0.503 |
Figure 5 | PCA Spectrum on the 13 NanoBEIR English Datasets: Normalized Eigenvalue Decay (Logarithmic Scale).
4 Discussion
Compared to previous models, SSE Retrieval MRL v2 shows even stronger low-rank regularization. These results provide further evidence supporting the hypothesis that there is a correlation between low-rank regularization in the representation space and model performance when training with matryoshka loss and target learning.
However, this is a trend observed with SSE models, and it is not confirmed whether it holds for general embedding models.
5 Conclusion
SSE Retrieval MRL v2 has been demonstrated as a lightweight and high-performance information retrieval model by achieving regularization of the representation space through hyperparameter optimization. Improvements from version 1 have yielded significant progress in both accuracy and speed.
Acknowledgements
Our interest in this topic originated from reading Tom Aarsen's seminal article, Train 400x faster Static Embedding Models with Sentence Transformers, which motivated us to investigate on static embedding.
I thank the developers of sentence-transformers, python and pytorch.
I thank all the researchers for their efforts to date.
I thank Japan's high standard of education.
And most of all, thank you for your interest in this blog.
About us
Japanese independent researcher having shy and pampered personality. Twin-tail hair is a charm point. Interested in nlp. Usually using python and C.
Please contact us if you have any requests for joint research, writing, speaking engagements, or employment.






