Create README.md
Browse files
README.md
ADDED
|
@@ -0,0 +1,24 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
language:
|
| 3 |
+
- ru
|
| 4 |
+
|
| 5 |
+
tags:
|
| 6 |
+
- toxic comments classification
|
| 7 |
+
|
| 8 |
+
licenses:
|
| 9 |
+
- cc-by-nc-sa
|
| 10 |
+
---
|
| 11 |
+
|
| 12 |
+
Bert-based classifier trained on merge of Russian Language Toxic Comments [dataset](https://www.kaggle.com/blackmoon/russian-language-toxic-comments/metadata) collected from 2ch.hk and Toxic Russian Comments [dataset](https://www.kaggle.com/alexandersemiletov/toxic-russian-comments) collected from ok.ru.
|
| 13 |
+
|
| 14 |
+
The datasets were merged, shuffled, and split into train,dev,test splits in 80-10-10 proportion.
|
| 15 |
+
The metrics obtained from test dataset is as follows
|
| 16 |
+
|
| 17 |
+
| | precision | recall | f1-score | support |
|
| 18 |
+
|:------------:|:---------:|:------:|:--------:|:-------:|
|
| 19 |
+
| 0 | 0.98 | 0.99 | 0.98 | 21384 |
|
| 20 |
+
| 1 | 0.94 | 0.92 | 0.93 | 4886 |
|
| 21 |
+
| accuracy | 0.97 | 26270 | 0.94 | |
|
| 22 |
+
| macro avg | 0.96 | 0.96 | 0.96 | 26270 |
|
| 23 |
+
| weighted avg | 0.97 | 0.97 | 0.97 | 26270 |
|
| 24 |
+
|