A fine-tuned XLM-RoBERTa base model for detecting offensive speech in English and German. This is a binary classifier (0 = safe, 1 = offensive) trained to catch a broad range of harmful language β from explicit hate speech to subtle microaggressions.
Model Description
Property
Value
Base model
FacebookAI/xlm-roberta-base
Architecture
XLMRobertaForSequenceClassification
Task
Binary text classification
Languages
English (en), German (de)
Max input length
256 tokens
Labels
0 β Safe, 1 β Offensive
Categories of Offensive Speech
#
Category
Description
1
Hate Speech (Targeted)
Attacks, threatens, or insults based on protected characteristics (race, religion, gender, etc.). Includes dehumanizing language and calls for violence.
2
Targeted Insults (Harassment)
Directed at a specific person based on behavior, appearance, or status β e.g. direct "you" statements or @-mentions.
3
Profanity & Vulgarity (Non-Targeted)
Swearing used for emphasis or emotion without a specific target. Often considered low-severity.
4
Cyberbullying & Threats
Intent to intimidate or cause fear β includes threats of physical harm and encouragement of self-harm.
5
Implicit Offensive Speech
Microaggressions, sarcasm, and stereotyping conveyed through otherwise "clean" language. Hardest to detect.
Training Data
The model was trained on a combined multilingual dataset assembled from the following sources:
German:
GermEval 2018 β Offensive language detection in German tweets
GermEval 2019 β Offensive language detection in German tweets
HASOC 2019 DE β Hate speech and offensive content in German
HASOC 2020 DE β Hate speech and offensive content in German
English:
HASOC 2019 EN β Hate speech and offensive content in English
HASOC 2020 EN β Hate speech and offensive content in English
Results will be filled in after evaluation is completed.
GermEval 2018 Test Set (DE)
Metric
Score
Accuracy
0.7928
F1 (macro)
0.6586
Precision
0.7445
Recall
0.5904
GermEval 2019 Test Set (DE)
Metric
Score
Accuracy
0.8247
F1 (macro)
0.9039
Precision
1.0000
Recall
0.8247
HASOC 2019 Test Set (DE)
Metric
Score
Accuracy
0.7976
F1 (macro)
0.4342
Precision
0.3929
Recall
0.4853
HASOC 2020 Test Set (DE)
Metric
Score
Accuracy
0.8213
F1 (macro)
0.7134
Precision
0.6031
Recall
0.8731
HASOC 2019 Test Set (EN)
Metric
Score
Accuracy
0.8335
F1 (macro)
0.6933
Precision
0.6420
Recall
0.7535
HASOC 2020 Test Set (EN)
Metric
Score
Accuracy
0.8213
F1 (macro)
0.7134
Precision
0.6031
Recall
0.8731
HateCheck (DE)
Metric
Score
Accuracy
0.6694
F1 (macro)
0.7571
Precision
0.7789
Recall
0.7365
HateCheck (EN)
Metric
Score
Accuracy
0.6765
F1 (macro)
0.7752
Precision
0.7422
Recall
0.8112
Usage
With Hugging Face pipeline
from transformers import pipeline
classifier = pipeline(
"text-classification",
model="Horbee/xlm-roberta-base-offensive-comment-classifier",
tokenizer="Horbee/xlm-roberta-base-offensive-comment-classifier",
)
# English
result = classifier("You are such an idiot!")
print(result)
# [{'label': 'LABEL_1', 'score': 0.99}] β Offensive# German
result = classifier("Du bist ein Idiot!")
print(result)
# [{'label': 'LABEL_1', 'score': 0.99}] β Offensive# Safe example
result = classifier("This is a completely normal and friendly comment.")
print(result)
# [{'label': 'LABEL_0', 'score': 0.99}] β Safe
The label mapping is:
LABEL_0 β Safe (not offensive)
LABEL_1 β Offensive
Batch inference
from transformers import pipeline
classifier = pipeline(
"text-classification",
model="Horbee/xlm-roberta-base-offensive-comment-classifier",
device=0, # use GPU; set to -1 or omit for CPU
)
texts = [
"I hate all of you!",
"Have a nice day!",
"Das ist wirklich schrecklich.",
"SchΓΆnen guten Morgen!",
]
results = classifier(texts, batch_size=8)
for text, result inzip(texts, results):
label = "Offensive"if result["label"] == "LABEL_1"else"Safe"print(f"{label} ({result['score']:.2f}): {text}")
ONNX Usage
The model is also available as an INT8 quantized ONNX model (onnx/model_quantized.onnx) for fast CPU inference without a PyTorch dependency.
The quantized model uses dynamic INT8 quantization (QInt8 weights, QUInt8 activations) applied to MatMul, Attention, Gather, and embedding layers, resulting in significantly reduced model size and faster CPU throughput with minimal accuracy loss.
The model was primarily trained on social media text (tweets, forum comments). Performance may degrade on formal or domain-specific text.
Implicit offensive speech (microaggressions, sarcasm) remains the hardest category to detect reliably.
The model supports English and German only. Using it on other languages may produce unreliable results even though XLM-RoBERTa has multilingual pretraining.
As with all classifiers trained on human-annotated data, the model reflects the biases present in the annotation guidelines and annotator demographics.
Citation
If you use this model in your research, please cite this repository:
@misc{gtox2024,
title = {GTox: Multilingual Offensive Speech Classifier},
year = {2026},
url = {https://github.com/Horbee/gtox-offensive-comment-classifier}
}