Sentence Similarity
sentence-transformers
Safetensors
Russian
feature-extraction
static-embeddings
binary
russian
8-bit precision
Instructions to use BorisTM/starse with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use BorisTM/starse with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("BorisTM/starse") sentences = [ "Это счастливый человек", "Это счастливая собака", "Это очень счастливый человек", "Сегодня солнечный день" ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [4, 4] - Notebooks
- Google Colab
- Kaggle
StaRSE-512
StaRSE stands for Static Russian Sentence Embeddings. It is a compact Russian sentence embedding model implemented as a
Sentence-Transformers StaticEmbedding
endpoint.
The model is intended for CPU-friendly semantic similarity, clustering, classification features, and retrieval-style first-stage representations when a full Transformer encoder is too expensive to run at high throughput.
Performance
Evaluation is reported on
MTEB(rus, v1.1)
across 23 tasks. The main score is mean_task_main_score = 51.16.
| Task type | Tasks | Mean score |
|---|---|---|
| Classification | 9 | 56.81 |
| Clustering | 3 | 51.80 |
| MultilabelClassification | 2 | 35.01 |
| PairClassification | 1 | 52.50 |
| Reranking | 2 | 41.88 |
| Retrieval | 3 | 39.09 |
| STS | 3 | 62.18 |
Usage
Install Sentence Transformers:
pip install -U sentence-transformers
Load the model with trust_remote_code=True.
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("BorisTM/starse-512", trust_remote_code=True)
sentences = [
"Партитуры Чайковского часто звучат в консерватории.",
"Балетная сцена хранит музыку Щелкунчика.",
"Футбольная команда выиграла матч.",
]
embeddings = model.encode(sentences, normalize_embeddings=True)
similarities = model.similarity(embeddings, embeddings)
print(embeddings.shape) # (3, 512)
print(tuple(similarities.shape)) # (3, 3)
print(similarities)
# tensor([[1.0000, 0.3521, 0.0626],
# [0.3521, 1.0000, 0.0420],
# [0.0626, 0.0420, 1.0000]])
Citation
@misc{starse2026,
title = {TBD},
author = {TBD},
year = {TBD},
url = {https://huggingface.co/BorisTM/starse-512}
}
