starse / README.md
BorisTM's picture
Clean StaRSE-512 repository state
8ae7fd7
|
Raw
History Blame Contribute Delete
2.3 kB
metadata
language:
  - ru
library_name: sentence-transformers
pipeline_tag: sentence-similarity
license: apache-2.0
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - static-embeddings
  - binary
  - russian

StaRSE-512

StaRSE stands for Static Russian Sentence Embeddings. It is a compact Russian sentence embedding model implemented as a Sentence-Transformers StaticEmbedding endpoint.

The model is intended for CPU-friendly semantic similarity, clustering, classification features, and retrieval-style first-stage representations when a full Transformer encoder is too expensive to run at high throughput.

RuMTEB quality-latency trade-off

Performance

Evaluation is reported on MTEB(rus, v1.1) across 23 tasks. The main score is mean_task_main_score = 51.16.

Task type Tasks Mean score
Classification 9 56.81
Clustering 3 51.80
MultilabelClassification 2 35.01
PairClassification 1 52.50
Reranking 2 41.88
Retrieval 3 39.09
STS 3 62.18

Usage

Install Sentence Transformers:

pip install -U sentence-transformers

Load the model with trust_remote_code=True.

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("BorisTM/starse-512", trust_remote_code=True)

sentences = [
    "Партитуры Чайковского часто звучат в консерватории.",
    "Балетная сцена хранит музыку Щелкунчика.",
    "Футбольная команда выиграла матч.",
]

embeddings = model.encode(sentences, normalize_embeddings=True)
similarities = model.similarity(embeddings, embeddings)
print(embeddings.shape)           # (3, 512)
print(tuple(similarities.shape))  # (3, 3)
print(similarities)
# tensor([[1.0000, 0.3521, 0.0626],
#         [0.3521, 1.0000, 0.0420],
#         [0.0626, 0.0420, 1.0000]])

Citation

@misc{starse2026,
  title = {TBD},
  author = {TBD},
  year = {TBD},
  url = {https://huggingface.co/BorisTM/starse-512}
}