File size: 1,549 Bytes
170be15 999724b 170be15 999724b 3e8990d 450d22d 79d0dc3 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 | ---
language: ru
library_name: transformers
pipeline_tag: text-classification
tags:
- toxicity
- safetensors
base_model:
- DeepPavlov/rubert-base-cased-conversational
---
A model for toxicity classification in Russian texts.
Fine-tuned based on the [DeepPavlov/rubert-base-cased-conversational](https://huggingface.co/DeepPavlov/rubert-base-cased-conversational) model.
It's a binary classifier designed to detect toxicity in text.
* **Label 0 (NEUTRAL):** Neutral text
* **Label 1 (TOXIC):** Toxic text / Insults / Threats
**Dataset**
This model was trained on two datasets:
[Toxic Russian Comments](https://www.kaggle.com/datasets/alexandersemiletov/toxic-russian-comments)
[Russian Language Toxic Comments](https://www.kaggle.com/datasets/blackmoon/russian-language-toxic-comments)
**Usage**
```python
from transformers import pipeline
classifier = pipeline("text-classification", model="fasherr/toxicity_rubert")
text_1 = "Ты сегодня прекрасно выглядишь!"
text_2 = "Ты очень плохой человек"
print(classifier(text_1))
# [{'label': 'NEUTRAL', 'score': 0.99...}]
print(classifier(text_2))
#[{'label': 'TOXIC', 'score': 1}]
```
**Eval results**
|| Accuracy | Precision | Recall | F1-Score | AUC-ROC | Support |
| :--- | :---: | :---: | :---: | :---: | :---: | :---: |
| **Overall (Macro)** | 97.93% | 96.37% | 96.86% | 96.61% | 0.9962 | 26271 |
| **Neutral** | 97.93% | 98.88% | 98.57% | 98.72% | 0.9962 | 21347 |
| **Toxic** | 97.93% | 93.87% | 95.15% | 94.50% | 0.9962 | 4924 |
|