starse / README.md

Clean StaRSE-512 repository state

8ae7fd7 6 days ago

2.3 kB

language:
  - ru
library_name: sentence-transformers
pipeline_tag: sentence-similarity
license: apache-2.0
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - static-embeddings
  - binary
  - russian

StaRSE-512

StaRSE stands for Static Russian Sentence Embeddings. It is a compact Russian sentence embedding model implemented as a Sentence-Transformers StaticEmbedding endpoint.

The model is intended for CPU-friendly semantic similarity, clustering, classification features, and retrieval-style first-stage representations when a full Transformer encoder is too expensive to run at high throughput.

Performance

Evaluation is reported on MTEB(rus, v1.1) across 23 tasks. The main score is mean_task_main_score = 51.16.

Task type	Tasks	Mean score
Classification	9	56.81
Clustering	3	51.80
MultilabelClassification	2	35.01
PairClassification	1	52.50
Reranking	2	41.88
Retrieval	3	39.09
STS	3	62.18

Usage

Install Sentence Transformers:

pip install -U sentence-transformers

Load the model with trust_remote_code=True.

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("BorisTM/starse-512", trust_remote_code=True)

sentences = [
    "Партитуры Чайковского часто звучат в консерватории.",
    "Балетная сцена хранит музыку Щелкунчика.",
    "Футбольная команда выиграла матч.",
]

embeddings = model.encode(sentences, normalize_embeddings=True)
similarities = model.similarity(embeddings, embeddings)
print(embeddings.shape)           # (3, 512)
print(tuple(similarities.shape))  # (3, 3)
print(similarities)
# tensor([[1.0000, 0.3521, 0.0626],
#         [0.3521, 1.0000, 0.0420],
#         [0.0626, 0.0420, 1.0000]])

Citation

@misc{starse2026,
  title = {TBD},
  author = {TBD},
  year = {TBD},
  url = {https://huggingface.co/BorisTM/starse-512}
}