🧠 RuBERT Emotion MoE — Russian Emotion Classifier

Multi-label emotion classifier for Russian text with a Mixture of Experts classification head.

Architecture

A standard BERT encoder augmented with an MoE head instead of a plain linear classifier:

BERT encoder (rubert-tiny2)
        ↓
[CLS] embedding  (312-dim)
        ↓
Router (linear)  →  softmax  →  top-2 of 4 experts
        ↓
Expert 1 │ Expert 2 │ Expert 3 │ Expert 4
(Linear → GELU → Dropout → Linear)
        ↓
Weighted sum of expert outputs
        ↓
Logits  →  sigmoid  →  multi-label output

The Router dynamically selects 2 of 4 experts for each input. Auxiliary loss penalizes uneven expert load, ensuring all experts train equally. Weighted BCE loss addresses class imbalance (anger occurs ~4× less often than joy).

Model Parameters

Parameter Value
Base model cointegrated/rubert-tiny2
Total parameters ~29.5M
Number of experts 4
Active experts (top-k) 2
Expert hidden dim 256
Dataset sagteam/cedr_v1

Usage

import torch
from transformers import AutoTokenizer

# Copy BertMoEForMultiLabelClassification and MoEClassificationHead
# from the repository (modeling.py)

LABELS    = ['joy', 'sadness', 'surprise', 'fear', 'anger']
THRESHOLD = 0.5

tokenizer = AutoTokenizer.from_pretrained("ilyali034/rubert-emotion-moe-ru")
model     = BertMoEForMultiLabelClassification.from_pretrained("ilyali034/rubert-emotion-moe-ru")
model.eval()

def predict(text):
    inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128)
    with torch.no_grad():
        probs = torch.sigmoid(model(**inputs).logits)[0]
    return [
        (LABELS[i], round(float(p), 3))
        for i, p in enumerate(probs) if p > THRESHOLD
    ]

print(predict("Я очень рад, но немного боюсь!"))
# [('joy', 0.821), ('fear', 0.743)]

Metrics

Metric Value
F1 micro 0.7349
F1 macro 0.6903
F1 weighted 0.7442
Precision micro 0.6467
Recall micro 0.8510

Per-class F1

Class F1
😄 joy 0.8373
😢 sadness 0.8091
😮 surprise 0.6780
😨 fear 0.6796
😠 anger 0.4478

Dataset

CEDR — Russian-language corpus with emotion annotations. Train: 7528 | Test: 1882 examples.

Classes

Class Emotion Train examples
joy 😄 Joy 1569
sadness 😢 Sadness 1417
surprise 😮 Surprise 607
fear 😨 Fear 589
anger 😠 Anger 411

Citation

If you use this model, please cite the CEDR dataset:

@inproceedings{cedr2021,
  title={CEDR: Corpus for Emotions Detection in Russian},
  author={Sboev, Alexander and Naumov, Artem and Rybka, Roman},
  year={2021}
}
Downloads last month
37
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ilyali034/rubert-emotion-moe-ru

Finetuned
(69)
this model

Dataset used to train ilyali034/rubert-emotion-moe-ru