StaRSE-512

StaRSE stands for Static Russian Sentence Embeddings. It is a compact Russian sentence embedding model implemented as a Sentence-Transformers StaticEmbedding endpoint.

The model is intended for CPU-friendly semantic similarity, clustering, classification features, and retrieval-style first-stage representations when a full Transformer encoder is too expensive to run at high throughput.

RuMTEB quality-latency trade-off

Performance

Evaluation is reported on MTEB(rus, v1.1) across 23 tasks. The main score is mean_task_main_score = 51.16.

Task type Tasks Mean score
Classification 9 56.81
Clustering 3 51.80
MultilabelClassification 2 35.01
PairClassification 1 52.50
Reranking 2 41.88
Retrieval 3 39.09
STS 3 62.18

Usage

Install Sentence Transformers:

pip install -U sentence-transformers

Load the model with trust_remote_code=True.

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("BorisTM/starse-512", trust_remote_code=True)

sentences = [
    "Партитуры Чайковского часто звучат в консерватории.",
    "Балетная сцена хранит музыку Щелкунчика.",
    "Футбольная команда выиграла матч.",
]

embeddings = model.encode(sentences, normalize_embeddings=True)
similarities = model.similarity(embeddings, embeddings)
print(embeddings.shape)           # (3, 512)
print(tuple(similarities.shape))  # (3, 3)
print(similarities)
# tensor([[1.0000, 0.3521, 0.0626],
#         [0.3521, 1.0000, 0.0420],
#         [0.0626, 0.0420, 1.0000]])

Citation

@misc{starse2026,
  title = {TBD},
  author = {TBD},
  year = {TBD},
  url = {https://huggingface.co/BorisTM/starse-512}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Safetensors
Model size
7.81M params
Tensor type
F32
·
U8
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support