Update README.md
Browse files
README.md
CHANGED
|
@@ -14,4 +14,40 @@ widget:
|
|
| 14 |
example_title: NEUTRAL - 100.00%
|
| 15 |
---
|
| 16 |
|
| 17 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 14 |
example_title: NEUTRAL - 100.00%
|
| 15 |
---
|
| 16 |
|
| 17 |
+
A model for toxicity classification in Russian texts.
|
| 18 |
+
Fine-tuned based on the [DeepPavlov/rubert-base-cased-conversational](https://huggingface.co/DeepPavlov/rubert-base-cased-conversational) model.
|
| 19 |
+
|
| 20 |
+
It's a binary classifier designed to detect toxicity in text.
|
| 21 |
+
|
| 22 |
+
* **Label 0 (NEUTRAL):** Neutral text
|
| 23 |
+
* **Label 1 (TOXIC):** Toxic text / Insults / Threats
|
| 24 |
+
|
| 25 |
+
**Dataset**
|
| 26 |
+
|
| 27 |
+
This model was trained on two datasets:
|
| 28 |
+
|
| 29 |
+
[Toxic Russian Comments](https://www.kaggle.com/datasets/alexandersemiletov/toxic-russian-comments)
|
| 30 |
+
|
| 31 |
+
[Russian Language Toxic Comments](https://www.kaggle.com/datasets/blackmoon/russian-language-toxic-comments)
|
| 32 |
+
|
| 33 |
+
|
| 34 |
+
**Usage**
|
| 35 |
+
|
| 36 |
+
```python
|
| 37 |
+
from transformers import pipeline
|
| 38 |
+
|
| 39 |
+
classifier = pipeline("text-classification", model="fasherr/toxicity_rubert")
|
| 40 |
+
text_1 = "Ты сегодня прекрасно выглядишь!"
|
| 41 |
+
text_2 = "Ты очень плохой человек"
|
| 42 |
+
print(classifier(text_1))
|
| 43 |
+
# [{'label': 'NEUTRAL', 'score': 0.99...}]
|
| 44 |
+
print(classifier(text_2))
|
| 45 |
+
#[{'label': 'TOXIC', 'score': 1}]
|
| 46 |
+
```
|
| 47 |
+
|
| 48 |
+
**Eval results**
|
| 49 |
+
|| Accuracy | Precision | Recall | F1-Score | AUC-ROC | Support |
|
| 50 |
+
| :--- | :---: | :---: | :---: | :---: | :---: | :---: |
|
| 51 |
+
| **Overall (Macro)** | 97.93% | 96.37% | 96.86% | 96.61% | 0.9962 | 26271 |
|
| 52 |
+
| **Neutral** | 97.93% | 98.88% | 98.57% | 98.72% | 0.9962 | 21347 |
|
| 53 |
+
| **Toxic** | 97.93% | 93.87% | 95.15% | 94.50% | 0.9962 | 4924 |
|