Text Classification
Transformers
PyTorch
Safetensors
Russian
bert
russian
classification
toxicity
multilabel
text-embeddings-inference
Instructions to use cointegrated/rubert-tiny-toxicity with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use cointegrated/rubert-tiny-toxicity with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="cointegrated/rubert-tiny-toxicity")# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("cointegrated/rubert-tiny-toxicity") model = AutoModelForSequenceClassification.from_pretrained("cointegrated/rubert-tiny-toxicity") - Inference
- Notebooks
- Google Colab
- Kaggle
Commit ·
01d80d4
1
Parent(s): b31c78a
Update README.md
Browse files
README.md
CHANGED
|
@@ -60,7 +60,7 @@ print(text2toxicity(['я люблю нигеров', 'я люблю африка
|
|
| 60 |
|
| 61 |
## Training
|
| 62 |
|
| 63 |
-
The model has been
|
| 64 |
```
|
| 65 |
non-toxic : 0.9909
|
| 66 |
insult : 0.9882
|
|
|
|
| 60 |
|
| 61 |
## Training
|
| 62 |
|
| 63 |
+
The model has been trained on the joint dataset of [OK ML Cup](https://cups.mail.ru/ru/tasks/1048) and [Babakov et.al.](https://arxiv.org/abs/2103.05345) with `Adam` optimizer, learning rate of `1e-5`, and batch size of `128` for `5` epochs. The data was not filtered in any way. A text was considered inappropriate if its inappropritateness score was higher than 0.2. The per-label ROC AUC on the dev set is:
|
| 64 |
```
|
| 65 |
non-toxic : 0.9909
|
| 66 |
insult : 0.9882
|