| | --- |
| | library_name: transformers |
| | language: |
| | - en |
| | - fr |
| | - it |
| | - es |
| | - ru |
| | - uk |
| | - tt |
| | - ar |
| | - hi |
| | - ja |
| | - zh |
| | - he |
| | - am |
| | - de |
| | license: openrail++ |
| | datasets: |
| | - textdetox/multilingual_toxicity_dataset |
| | metrics: |
| | - f1 |
| | base_model: |
| | - cis-lmu/glot500-base |
| | pipeline_tag: text-classification |
| | tags: |
| | - toxic |
| | --- |
| | |
| | ## Multilingual Toxicity Classifier for 15 Languages (2025) |
| |
|
| | This is an instance of [Glot500](https://huggingface.co/cis-lmu/glot500-base) that was fine-tuned on binary toxicity classification task based on our updated (2025) dataset [textdetox/multilingual_toxicity_dataset](https://huggingface.co/datasets/textdetox/multilingual_toxicity_dataset). |
| |
|
| | Now, the models covers 15 languages from various language families: |
| |
|
| | | Language | Code | F1 Score | |
| | |-----------|------|---------| |
| | | English | en | 0.9071 | |
| | | Russian | ru | 0.9022 | |
| | | Ukrainian | uk | 0.9075 | |
| | | German | de | 0.6528 | |
| | | Spanish | es | 0.7430 | |
| | | Arabic | ar | 0.6207 | |
| | | Amharic | am | 0.6676 | |
| | | Hindi | hi | 0.7171 | |
| | | Chinese | zh | 0.6483 | |
| | | Italian | it | 0.5975 | |
| | | French | fr | 0.9125 | |
| | | Hinglish | hin | 0.7051 | |
| | | Hebrew | he | 0.8911 | |
| | | Japanese | ja | 0.9058 | |
| | | Tatar | tt | 0.5834 | |
| |
|
| | ## How to use |
| |
|
| | ```python |
| | import torch |
| | from transformers import AutoTokenizer, AutoModelForSequenceClassification |
| | |
| | tokenizer = AutoTokenizer.from_pretrained('textdetox/glot500-toxicity-classifier') |
| | model = AutoModelForSequenceClassification.from_pretrained('textdetox/glot500-toxicity-classifier') |
| | |
| | batch = tokenizer.encode("You are amazing!", return_tensors="pt") |
| | |
| | output = model(batch) |
| | # idx 0 for neutral, idx 1 for toxic |
| | ``` |
| |
|
| | ## Citation |
| | The model is prepared for [TextDetox 2025 Shared Task](https://pan.webis.de/clef25/pan25-web/text-detoxification.html) evaluation. |
| |
|
| | Citation TBD soon. |