RuBERT for Semantic Textual Similarity (STS)

This repository contains a fine-tuned RuBERT model optimized for the Semantic Textual Similarity (STS) task on Russian text. It is structured as a Cross-Encoder (Sequence Classification) architecture, passing sentence pairs simultaneously through the network to leverage full cross-attention mechanics.

Model Description

Base Model: DeepPavlov/rubert-base-cased
Language: Russian (ru)
Task: Semantic Textual Similarity / Regression (outputs a continuous score mapping semantic similarity)
Framework: PyTorch & Hugging Face Transformers (v5+)

The model evaluates the semantic closeness of two sentences and scores them. Thanks to the Cross-Encoder setup, it excels at capturing nuanced differences (such as negation particles like “не”) and identifying synonyms even when the phrases share zero overlapping words.

Evaluation Results (Test Split)

The model was evaluated on an unseen out-of-domain test split, achieving robust industrial-grade metrics:

Metric	Value
Pearson Correlation ($r$)	0.7954
Spearman Rank Correlation ($\rho$)	0.7751
Mean Absolute Error (MAE)	0.1525

Training Hyperparameters

Effective Batch Size: 64 (32 per device × 2 Gradient Accumulation Steps)
Learning Rate: 3e-5 (Cosine decay scheduler)
Optimizer: AdamW (adamw_torch)
Precision: Mixed Precision (FP16 optimized for Tesla T4)
Techniques: Gradient Checkpointing enabled for VRAM optimization

Installation

Ensure you have transformers (v5.0+) and torch installed:

pip install transformers torch

Downloads last month: 14

Safetensors

Model size

0.2B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support