Sentence Similarity
sentence-transformers
Safetensors
Russian
feature-extraction
static-embeddings
binary
russian
8-bit precision
Instructions to use BorisTM/starse with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use BorisTM/starse with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("BorisTM/starse") sentences = [ "Это счастливый человек", "Это счастливая собака", "Это очень счастливый человек", "Сегодня солнечный день" ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [4, 4] - Notebooks
- Google Colab
- Kaggle
| language: | |
| - ru | |
| library_name: sentence-transformers | |
| pipeline_tag: sentence-similarity | |
| license: apache-2.0 | |
| tags: | |
| - sentence-transformers | |
| - sentence-similarity | |
| - feature-extraction | |
| - static-embeddings | |
| - binary | |
| - russian | |
| # StaRSE-512 | |
| **StaRSE** stands for **Sta**tic **R**ussian **S**entence **E**mbeddings. It is a compact Russian sentence embedding model implemented as a | |
| [Sentence-Transformers](https://www.sbert.net/) `StaticEmbedding` | |
| endpoint. | |
| The model is intended for CPU-friendly semantic similarity, clustering, | |
| classification features, and retrieval-style first-stage representations when a | |
| full Transformer encoder is too expensive to run at high throughput. | |
|  | |
| ## Performance | |
| Evaluation is reported on | |
| [`MTEB(rus, v1.1)`](https://docs.mteb.org/overview/available_benchmarks/#mtebrus-v11) | |
| across 23 tasks. The main score is `mean_task_main_score = 51.16`. | |
| | Task type | Tasks | Mean score | | |
| |---|---:|---:| | |
| | Classification | 9 | 56.81 | | |
| | Clustering | 3 | 51.80 | | |
| | MultilabelClassification | 2 | 35.01 | | |
| | PairClassification | 1 | 52.50 | | |
| | Reranking | 2 | 41.88 | | |
| | Retrieval | 3 | 39.09 | | |
| | STS | 3 | 62.18 | | |
| ## Usage | |
| Install [Sentence Transformers](https://www.sbert.net/docs/installation.html): | |
| ```bash | |
| pip install -U sentence-transformers | |
| ``` | |
| Load the model with `trust_remote_code=True`. | |
| ```python | |
| from sentence_transformers import SentenceTransformer | |
| model = SentenceTransformer("BorisTM/starse-512", trust_remote_code=True) | |
| sentences = [ | |
| "Партитуры Чайковского часто звучат в консерватории.", | |
| "Балетная сцена хранит музыку Щелкунчика.", | |
| "Футбольная команда выиграла матч.", | |
| ] | |
| embeddings = model.encode(sentences, normalize_embeddings=True) | |
| similarities = model.similarity(embeddings, embeddings) | |
| print(embeddings.shape) # (3, 512) | |
| print(tuple(similarities.shape)) # (3, 3) | |
| print(similarities) | |
| # tensor([[1.0000, 0.3521, 0.0626], | |
| # [0.3521, 1.0000, 0.0420], | |
| # [0.0626, 0.0420, 1.0000]]) | |
| ``` | |
| ## Citation | |
| ```bibtex | |
| @misc{starse2026, | |
| title = {TBD}, | |
| author = {TBD}, | |
| year = {TBD}, | |
| url = {https://huggingface.co/BorisTM/starse-512} | |
| } | |
| ``` | |