Djacon/ru-izard-emotions
Viewer • Updated • 24.9k • 82 • 4
How to use ilyali034/rubert-emotion-ru-large with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-classification", model="ilyali034/rubert-emotion-ru-large") # Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("ilyali034/rubert-emotion-ru-large", dtype="auto")How to use ilyali034/rubert-emotion-ru-large with PEFT:
Task type is invalid.
Multi-label emotion classifier for Russian text based on Izard's 10 basic emotions, fine-tuned with QLoRA on RuIzardEmotions.
Base model: ai-forever/ruBert-large
Method: QLoRA (4-bit NF4 + LoRA r=8)
Labels: joy, sadness, anger, enthusiasm, surprise, disgust, fear, guilt, shame, neutral
| F1 Micro | F1 Macro | F1 Weighted |
|---|---|---|
| 0.6121 | 0.5878 | 0.6237 |
| Emotion | Precision | Recall | F1 | Threshold | Support |
|---|---|---|---|---|---|
| Joy | 0.67 | 0.69 | 0.68 | 0.514 | 697 |
| Sadness | 0.55 | 0.80 | 0.65 | 0.468 | 679 |
| Anger | 0.62 | 0.72 | 0.67 | 0.505 | 792 |
| Enthusiasm | 0.63 | 0.72 | 0.67 | 0.514 | 491 |
| Surprise | 0.53 | 0.49 | 0.51 | 0.605 | 257 |
| Disgust | 0.41 | 0.60 | 0.48 | 0.550 | 282 |
| Fear | 0.69 | 0.61 | 0.65 | 0.641 | 229 |
| Guilt | 0.63 | 0.51 | 0.56 | 0.623 | 161 |
| Shame | 0.23 | 0.48 | 0.31 | 0.559 | 153 |
| Neutral | 0.46 | 0.86 | 0.60 | 0.459 | 777 |
Shame is the hardest class due to low support (153 samples) and high overlap with guilt.
import json
import torch
import torch.nn as nn
from transformers import AutoTokenizer, AutoModel, BitsAndBytesConfig
from transformers.modeling_outputs import SequenceClassifierOutput
from peft import PeftModel
class BertWithClassifier(nn.Module):
def __init__(self, encoder, hidden_size, num_labels):
super().__init__()
self.encoder = encoder
self.dropout = nn.Dropout(0.1)
self.classifier = nn.Linear(hidden_size, num_labels)
def forward(self, input_ids=None, attention_mask=None,
token_type_ids=None, **kwargs):
out = self.encoder(
input_ids=input_ids,
attention_mask=attention_mask,
token_type_ids=token_type_ids,
)
pooled = self.dropout(out.last_hidden_state[:, 0, :].float())
return SequenceClassifierOutput(logits=self.classifier(pooled))
REPO = "ilyali034/rubert-emotion-ru-large"
with open("emotion_config.json") as f:
cfg = json.load(f)
tokenizer = AutoTokenizer.from_pretrained("ai-forever/ruBert-large")
base = AutoModel.from_pretrained(
"ai-forever/ruBert-large",
quantization_config=BitsAndBytesConfig(load_in_4bit=True),
device_map="auto",
)
base = PeftModel.from_pretrained(base, REPO + "/lora_adapter")
model = BertWithClassifier(base, base.config.hidden_size, len(cfg["labels"]))
model.classifier.load_state_dict(torch.load("classifier.pt", map_location="cpu"))
model.eval()
def predict(text: str) -> dict:
inputs = tokenizer(
text,
return_tensors="pt",
truncation=True,
max_length=128,
padding=True,
).to("cuda")
with torch.no_grad():
probs = torch.sigmoid(model(**inputs).logits).cpu().numpy()[0]
thresholds = list(cfg["thresholds"].values())
return {
lbl: round(float(p), 4)
for lbl, p, thr in zip(cfg["labels"], probs, thresholds)
if p > thr
}
print(predict("Я очень рад этой новости!"))
# {'joy': 0.8231, 'enthusiasm': 0.6714}
print(predict("Мне стыдно за своё поведение, я чувствую себя виноватым"))
# {'guilt': 0.7102, 'shame': 0.5891}
| Parameter | Value |
|---|---|
| Learning rate | 2e-4 |
| Effective batch size | 32 |
| Best epoch | 4 / 8 |
| Max sequence length | 128 |
| LoRA rank | 8 |
| LoRA alpha | 16 |
| Focal loss γ | 2.5 |
| Quantization | 4-bit NF4 (double quant) |
| GPU | NVIDIA T4 16 GB |
| Early stopping patience | 3 |
| Emotion | Threshold |
|---|---|
| joy | 0.514 |
| sadness | 0.468 |
| anger | 0.505 |
| enthusiasm | 0.514 |
| surprise | 0.605 |
| disgust | 0.550 |
| fear | 0.641 |
| guilt | 0.623 |
| shame | 0.559 |
| neutral | 0.459 |
| File | Description |
|---|---|
lora_adapter/ |
LoRA adapter weights (PEFT) |
classifier.pt |
Linear classifier head weights |
emotion_config.json |
Labels, thresholds, model config |
tokenizer.json |
Tokenizer |