| --- |
| language: |
| - en |
| - de |
| license: mit |
| tags: |
| - text-classification |
| - offensive-language |
| - hate-speech |
| - xlm-roberta |
| - multilingual |
| - onnx |
| datasets: |
| - germeval2018 |
| - germeval2019 |
| - hasoc2019 |
| - hasoc2020 |
| - jigsaw |
| metrics: |
| - f1 |
| - accuracy |
| - precision |
| - recall |
| base_model: FacebookAI/xlm-roberta-base |
| pipeline_tag: text-classification |
| --- |
| |
| # GTox β XLM-RoBERTa Multilingual Offensive Speech Classifier (v1) |
|
|
| A fine-tuned **XLM-RoBERTa base** model for detecting offensive speech in **English** and **German**. This is a binary classifier (`0` = safe, `1` = offensive) trained to catch a broad range of harmful language β from explicit hate speech to subtle microaggressions. |
|
|
| ## Model Description |
|
|
| | Property | Value | |
| | ---------------- | ------------------------------------- | |
| | Base model | `FacebookAI/xlm-roberta-base` | |
| | Architecture | `XLMRobertaForSequenceClassification` | |
| | Task | Binary text classification | |
| | Languages | English (`en`), German (`de`) | |
| | Max input length | 256 tokens | |
| | Labels | `0` β Safe, `1` β Offensive | |
|
|
| ### Categories of Offensive Speech |
|
|
| | # | Category | Description | |
| | --- | ---------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------- | |
| | 1 | **Hate Speech** (Targeted) | Attacks, threatens, or insults based on protected characteristics (race, religion, gender, etc.). Includes dehumanizing language and calls for violence. | |
| | 2 | **Targeted Insults** (Harassment) | Directed at a specific person based on behavior, appearance, or status β e.g. direct "you" statements or @-mentions. | |
| | 3 | **Profanity & Vulgarity** (Non-Targeted) | Swearing used for emphasis or emotion without a specific target. Often considered low-severity. | |
| | 4 | **Cyberbullying & Threats** | Intent to intimidate or cause fear β includes threats of physical harm and encouragement of self-harm. | |
| | 5 | **Implicit Offensive Speech** | Microaggressions, sarcasm, and stereotyping conveyed through otherwise "clean" language. Hardest to detect. | |
|
|
| --- |
|
|
| ## Training Data |
|
|
| The model was trained on a combined multilingual dataset assembled from the following sources: |
|
|
| **German:** |
|
|
| - [GermEval 2018](https://heidata.uni-heidelberg.de/dataset.xhtml?persistentId=doi:10.11588/data/0B5VML) β Offensive language detection in German tweets |
| - [GermEval 2019](https://fz.h-da.de/iggsa/data) β Offensive language detection in German tweets |
| - [HASOC 2019 DE](https://hasocfire.github.io/hasoc/2019/index.html) β Hate speech and offensive content in German |
| - [HASOC 2020 DE](https://hasocfire.github.io/hasoc/2020/index.html) β Hate speech and offensive content in German |
|
|
| **English:** |
|
|
| - [HASOC 2019 EN](https://hasocfire.github.io/hasoc/2019/index.html) β Hate speech and offensive content in English |
| - [HASOC 2020 EN](https://hasocfire.github.io/hasoc/2020/index.html) β Hate speech and offensive content in English |
| - [Jigsaw Toxicity](https://www.kaggle.com/c/jigsaw-unintended-bias-in-toxicity-classification) β Toxic comment classification |
|
|
| --- |
|
|
| ## Evaluation Results |
|
|
| > Results will be filled in after evaluation is completed. |
|
|
| ### GermEval 2018 Test Set (DE) |
|
|
| | Metric | Score | |
| | ---------- | ------ | |
| | Accuracy | 0.7928 | |
| | F1 (macro) | 0.6586 | |
| | Precision | 0.7445 | |
| | Recall | 0.5904 | |
|
|
| ### GermEval 2019 Test Set (DE) |
|
|
| | Metric | Score | |
| | ---------- | ------ | |
| | Accuracy | 0.8247 | |
| | F1 (macro) | 0.9039 | |
| | Precision | 1.0000 | |
| | Recall | 0.8247 | |
|
|
| ### HASOC 2019 Test Set (DE) |
|
|
| | Metric | Score | |
| | ---------- | ------ | |
| | Accuracy | 0.7976 | |
| | F1 (macro) | 0.4342 | |
| | Precision | 0.3929 | |
| | Recall | 0.4853 | |
|
|
| ### HASOC 2020 Test Set (DE) |
|
|
| | Metric | Score | |
| | ---------- | ------ | |
| | Accuracy | 0.8213 | |
| | F1 (macro) | 0.7134 | |
| | Precision | 0.6031 | |
| | Recall | 0.8731 | |
|
|
| ### HASOC 2019 Test Set (EN) |
|
|
| | Metric | Score | |
| | ---------- | ------ | |
| | Accuracy | 0.8335 | |
| | F1 (macro) | 0.6933 | |
| | Precision | 0.6420 | |
| | Recall | 0.7535 | |
|
|
| ### HASOC 2020 Test Set (EN) |
|
|
| | Metric | Score | |
| | ---------- | ------ | |
| | Accuracy | 0.8213 | |
| | F1 (macro) | 0.7134 | |
| | Precision | 0.6031 | |
| | Recall | 0.8731 | |
|
|
| ### HateCheck (DE) |
|
|
| | Metric | Score | |
| | ---------- | ------ | |
| | Accuracy | 0.6694 | |
| | F1 (macro) | 0.7571 | |
| | Precision | 0.7789 | |
| | Recall | 0.7365 | |
|
|
| ### HateCheck (EN) |
|
|
| | Metric | Score | |
| | ---------- | ------ | |
| | Accuracy | 0.6765 | |
| | F1 (macro) | 0.7752 | |
| | Precision | 0.7422 | |
| | Recall | 0.8112 | |
|
|
| --- |
|
|
| ## Usage |
|
|
| ### With Hugging Face `pipeline` |
|
|
| ```python |
| from transformers import pipeline |
| |
| classifier = pipeline( |
| "text-classification", |
| model="Horbee/xlm-roberta-base-offensive-comment-classifier", |
| tokenizer="Horbee/xlm-roberta-base-offensive-comment-classifier", |
| ) |
| |
| # English |
| result = classifier("You are such an idiot!") |
| print(result) |
| # [{'label': 'LABEL_1', 'score': 0.99}] β Offensive |
| |
| # German |
| result = classifier("Du bist ein Idiot!") |
| print(result) |
| # [{'label': 'LABEL_1', 'score': 0.99}] β Offensive |
| |
| # Safe example |
| result = classifier("This is a completely normal and friendly comment.") |
| print(result) |
| # [{'label': 'LABEL_0', 'score': 0.99}] β Safe |
| ``` |
|
|
| The label mapping is: |
|
|
| - `LABEL_0` β Safe (not offensive) |
| - `LABEL_1` β Offensive |
|
|
| ### Batch inference |
|
|
| ```python |
| from transformers import pipeline |
| |
| classifier = pipeline( |
| "text-classification", |
| model="Horbee/xlm-roberta-base-offensive-comment-classifier", |
| device=0, # use GPU; set to -1 or omit for CPU |
| ) |
| |
| texts = [ |
| "I hate all of you!", |
| "Have a nice day!", |
| "Das ist wirklich schrecklich.", |
| "SchΓΆnen guten Morgen!", |
| ] |
| |
| results = classifier(texts, batch_size=8) |
| for text, result in zip(texts, results): |
| label = "Offensive" if result["label"] == "LABEL_1" else "Safe" |
| print(f"{label} ({result['score']:.2f}): {text}") |
| ``` |
|
|
| --- |
|
|
| ## ONNX Usage |
|
|
| The model is also available as an INT8 quantized ONNX model (`onnx/model_quantized.onnx`) for fast CPU inference without a PyTorch dependency. |
|
|
| ### Installation |
|
|
| ```bash |
| pip install onnxruntime tokenizers numpy |
| ``` |
|
|
| ### Inference |
|
|
| ```python |
| from pathlib import Path |
| |
| import numpy as np |
| import onnxruntime as ort |
| from tokenizers import Tokenizer |
| |
| MODEL_DIR = Path("onnx") |
| |
| # Load tokenizer and ONNX session |
| tokenizer = Tokenizer.from_file(str(MODEL_DIR / "tokenizer.json")) |
| tokenizer.enable_truncation(max_length=256) |
| tokenizer.enable_padding(pad_id=1, pad_token="<pad>", length=256) |
| |
| session = ort.InferenceSession( |
| str(MODEL_DIR / "model.onnx"), |
| providers=["CPUExecutionProvider"], |
| ) |
| |
| def classify(text: str, threshold: float = 0.5) -> tuple[int, list[float]]: |
| encoded = tokenizer.encode(text) |
| inputs = { |
| "input_ids": np.array([encoded.ids], dtype=np.int64), |
| "attention_mask": np.array([encoded.attention_mask], dtype=np.int64), |
| } |
| logits = session.run(None, inputs)[0] |
| probs = np.exp(logits) / np.sum(np.exp(logits), axis=-1, keepdims=True) |
| label = int(probs[0][1] >= threshold) |
| return label, probs[0].tolist() |
| |
| # Example |
| label, probs = classify("Du bist ein Idiot!") |
| print(f"Label: {label} β {'Offensive' if label == 1 else 'Safe'}") |
| print(f"Probabilities: safe={probs[0]:.3f}, offensive={probs[1]:.3f}") |
| ``` |
|
|
| ### ONNX Model Files |
|
|
| | File | Description | |
| | --------------------------- | ------------------------------------------------- | |
| | `onnx/model.onnx` | Full-precision FP32 ONNX export | |
| | `onnx/model_quantized.onnx` | INT8 dynamic quantized ONNX (recommended for CPU) | |
|
|
| The quantized model uses **dynamic INT8 quantization** (`QInt8` weights, `QUInt8` activations) applied to `MatMul`, `Attention`, `Gather`, and embedding layers, resulting in significantly reduced model size and faster CPU throughput with minimal accuracy loss. |
|
|
| --- |
|
|
| ## Model Files |
|
|
| ``` |
| xlm-roberta-base-offensive-comment-classifier/ |
| βββ config.json |
| βββ model.safetensors |
| βββ tokenizer.json |
| βββ tokenizer_config.json |
| βββ onnx/ |
| β βββ model.onnx # FP32 ONNX export |
| β βββ model_quantized.onnx # INT8 quantized ONNX (recommended for production) |
| β βββ tokenizer.json |
| ``` |
|
|
| --- |
|
|
| ## Limitations and Bias |
|
|
| - The model was primarily trained on social media text (tweets, forum comments). Performance may degrade on formal or domain-specific text. |
| - Implicit offensive speech (microaggressions, sarcasm) remains the hardest category to detect reliably. |
| - The model supports **English and German only**. Using it on other languages may produce unreliable results even though XLM-RoBERTa has multilingual pretraining. |
| - As with all classifiers trained on human-annotated data, the model reflects the biases present in the annotation guidelines and annotator demographics. |
|
|
| --- |
|
|
| ## Citation |
|
|
| If you use this model in your research, please cite this repository: |
|
|
| ```bibtex |
| @misc{gtox2024, |
| title = {GTox: Multilingual Offensive Speech Classifier}, |
| year = {2026}, |
| url = {https://github.com/Horbee/gtox-offensive-comment-classifier} |
| } |
| ``` |
|
|