Instructions to use EIStakovskii/xlm_roberta_base_multilingual_toxicity_classifier_plus with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use EIStakovskii/xlm_roberta_base_multilingual_toxicity_classifier_plus with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="EIStakovskii/xlm_roberta_base_multilingual_toxicity_classifier_plus")# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("EIStakovskii/xlm_roberta_base_multilingual_toxicity_classifier_plus") model = AutoModelForSequenceClassification.from_pretrained("EIStakovskii/xlm_roberta_base_multilingual_toxicity_classifier_plus") - Notebooks
- Google Colab
- Kaggle
This model was trained for multilingual toxicity labeling. Label_1 means TOXIC, Label_0 means NOT TOXIC.
The model was fine-tuned based off the xlm_roberta_base model for 4 languages: EN, RU, FR, DE
The validation accuracy is 92%.
The model was finetuned on the total sum of 100933k sentences. The train data for English and Russian came from https://github.com/s-nlp/multilingual_detox, French data comprised the translated to French data from https://github.com/s-nlp/multilingual_detox as well as all the French data from the Jigsaw dataset, the German data was similarly composed using translations and semi-manual data collection techniquies, in particular for offensive words and phrases were crawled the dict.cc dictionary (https://www.dict.cc/) and the Reverso Context (https://context.reverso.net/translation/).
- Downloads last month
- 7,519