starse / README.md
BorisTM's picture
Clean StaRSE-512 repository state
8ae7fd7
|
Raw
History Blame Contribute Delete
2.3 kB
---
language:
- ru
library_name: sentence-transformers
pipeline_tag: sentence-similarity
license: apache-2.0
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- static-embeddings
- binary
- russian
---
# StaRSE-512
**StaRSE** stands for **Sta**tic **R**ussian **S**entence **E**mbeddings. It is a compact Russian sentence embedding model implemented as a
[Sentence-Transformers](https://www.sbert.net/) `StaticEmbedding`
endpoint.
The model is intended for CPU-friendly semantic similarity, clustering,
classification features, and retrieval-style first-stage representations when a
full Transformer encoder is too expensive to run at high throughput.
![RuMTEB quality-latency trade-off](assets/rumteb_cpu_latency.png)
## Performance
Evaluation is reported on
[`MTEB(rus, v1.1)`](https://docs.mteb.org/overview/available_benchmarks/#mtebrus-v11)
across 23 tasks. The main score is `mean_task_main_score = 51.16`.
| Task type | Tasks | Mean score |
|---|---:|---:|
| Classification | 9 | 56.81 |
| Clustering | 3 | 51.80 |
| MultilabelClassification | 2 | 35.01 |
| PairClassification | 1 | 52.50 |
| Reranking | 2 | 41.88 |
| Retrieval | 3 | 39.09 |
| STS | 3 | 62.18 |
## Usage
Install [Sentence Transformers](https://www.sbert.net/docs/installation.html):
```bash
pip install -U sentence-transformers
```
Load the model with `trust_remote_code=True`.
```python
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("BorisTM/starse-512", trust_remote_code=True)
sentences = [
"Партитуры Чайковского часто звучат в консерватории.",
"Балетная сцена хранит музыку Щелкунчика.",
"Футбольная команда выиграла матч.",
]
embeddings = model.encode(sentences, normalize_embeddings=True)
similarities = model.similarity(embeddings, embeddings)
print(embeddings.shape) # (3, 512)
print(tuple(similarities.shape)) # (3, 3)
print(similarities)
# tensor([[1.0000, 0.3521, 0.0626],
# [0.3521, 1.0000, 0.0420],
# [0.0626, 0.0420, 1.0000]])
```
## Citation
```bibtex
@misc{starse2026,
title = {TBD},
author = {TBD},
year = {TBD},
url = {https://huggingface.co/BorisTM/starse-512}
}
```