BorisTM
/

starse

Sentence Similarity

sentence-transformers

feature-extraction

static-embeddings

8-bit precision

Model card Files Files and versions

starse / README.md

BorisTM's picture

Clean StaRSE-512 repository state

8ae7fd7 7 days ago

|

History Blame Contribute Delete

2.3 kB

	---
	language:
	- ru
	library_name: sentence-transformers
	pipeline_tag: sentence-similarity
	license: apache-2.0
	tags:
	- sentence-transformers
	- sentence-similarity
	- feature-extraction
	- static-embeddings
	- binary
	- russian
	---

	# StaRSE-512

	StaRSE stands for Static Russian Sentence Embeddings. It is a compact Russian sentence embedding model implemented as a
	[Sentence-Transformers](https://www.sbert.net/) `StaticEmbedding`
	endpoint.

	The model is intended for CPU-friendly semantic similarity, clustering,
	classification features, and retrieval-style first-stage representations when a
	full Transformer encoder is too expensive to run at high throughput.

	![RuMTEB quality-latency trade-off](assets/rumteb_cpu_latency.png)

	## Performance

	Evaluation is reported on
	[`MTEB(rus, v1.1)`](https://docs.mteb.org/overview/available_benchmarks/#mtebrus-v11)
	across 23 tasks. The main score is `mean_task_main_score = 51.16`.

	\| Task type \| Tasks \| Mean score \|
	\|---\|---:\|---:\|
	\| Classification \| 9 \| 56.81 \|
	\| Clustering \| 3 \| 51.80 \|
	\| MultilabelClassification \| 2 \| 35.01 \|
	\| PairClassification \| 1 \| 52.50 \|
	\| Reranking \| 2 \| 41.88 \|
	\| Retrieval \| 3 \| 39.09 \|
	\| STS \| 3 \| 62.18 \|

	## Usage

	Install [Sentence Transformers](https://www.sbert.net/docs/installation.html):

	```bash
	pip install -U sentence-transformers
	```

	Load the model with `trust_remote_code=True`.

	```python
	from sentence_transformers import SentenceTransformer

	model = SentenceTransformer("BorisTM/starse-512", trust_remote_code=True)

	sentences = [
	"Партитуры Чайковского часто звучат в консерватории.",
	"Балетная сцена хранит музыку Щелкунчика.",
	"Футбольная команда выиграла матч.",
	]

	embeddings = model.encode(sentences, normalize_embeddings=True)
	similarities = model.similarity(embeddings, embeddings)
	print(embeddings.shape) # (3, 512)
	print(tuple(similarities.shape)) # (3, 3)
	print(similarities)
	# tensor([[1.0000, 0.3521, 0.0626],
	# [0.3521, 1.0000, 0.0420],
	# [0.0626, 0.0420, 1.0000]])
	```


	## Citation

	```bibtex
	@misc{starse2026,
	title = {TBD},
	author = {TBD},
	year = {TBD},
	url = {https://huggingface.co/BorisTM/starse-512}
	}
	```