Text Classification
Transformers
PyTorch
TensorFlow
Safetensors
Russian
bert
toxic comments classification
Instructions to use s-nlp/russian_toxicity_classifier with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use s-nlp/russian_toxicity_classifier with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="s-nlp/russian_toxicity_classifier")# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("s-nlp/russian_toxicity_classifier") model = AutoModelForSequenceClassification.from_pretrained("s-nlp/russian_toxicity_classifier") - Inference
- Notebooks
- Google Colab
- Kaggle
# Load model directly
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("s-nlp/russian_toxicity_classifier")
model = AutoModelForSequenceClassification.from_pretrained("s-nlp/russian_toxicity_classifier")Quick Links
Bert-based classifier (finetuned from Conversational Rubert) trained on merge of Russian Language Toxic Comments dataset collected from 2ch.hk and Toxic Russian Comments dataset collected from ok.ru.
The datasets were merged, shuffled, and split into train, dev, test splits in 80-10-10 proportion. The metrics obtained from test dataset is as follows
| precision | recall | f1-score | support | |
|---|---|---|---|---|
| 0 | 0.98 | 0.99 | 0.98 | 21384 |
| 1 | 0.94 | 0.92 | 0.93 | 4886 |
| accuracy | 0.97 | 26270 | ||
| macro avg | 0.96 | 0.96 | 0.96 | 26270 |
| weighted avg | 0.97 | 0.97 | 0.97 | 26270 |
How to use
from transformers import BertTokenizer, BertForSequenceClassification
# load tokenizer and model weights
tokenizer = BertTokenizer.from_pretrained('s-nlp/russian_toxicity_classifier')
model = BertForSequenceClassification.from_pretrained('s-nlp/russian_toxicity_classifier')
# prepare the input
batch = tokenizer.encode('ты супер', return_tensors='pt')
# inference
model(batch)
Citation
To acknowledge our work, please, use the corresponding citation:
@article{dementieva2022russe,
title={RUSSE-2022: Findings of the First Russian Detoxification Shared Task Based on Parallel Corpora},
author={Dementieva, Daryna and Logacheva, Varvara and Nikishina, Irina and Fenogenova, Alena and Dale, David and Krotova, Irina and Semenov, Nikita and Shavrina, Tatiana and Panchenko, Alexander}
}
Licensing Information
This model is licensed under the OpenRAIL++ License, which supports the development of various technologies—both industrial and academic—that serve the public good.
- Downloads last month
- 3,708
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="s-nlp/russian_toxicity_classifier")