GTox β XLM-RoBERTa Multilingual Offensive Speech Classifier (v1)
A fine-tuned XLM-RoBERTa base model for detecting offensive speech in English and German. This is a binary classifier (0 = safe, 1 = offensive) trained to catch a broad range of harmful language β from explicit hate speech to subtle microaggressions.
Model Description
| Property |
Value |
| Base model |
FacebookAI/xlm-roberta-base |
| Architecture |
XLMRobertaForSequenceClassification |
| Task |
Binary text classification |
| Languages |
English (en), German (de) |
| Max input length |
256 tokens |
| Labels |
0 β Safe, 1 β Offensive |
Categories of Offensive Speech
| # |
Category |
Description |
| 1 |
Hate Speech (Targeted) |
Attacks, threatens, or insults based on protected characteristics (race, religion, gender, etc.). Includes dehumanizing language and calls for violence. |
| 2 |
Targeted Insults (Harassment) |
Directed at a specific person based on behavior, appearance, or status β e.g. direct "you" statements or @-mentions. |
| 3 |
Profanity & Vulgarity (Non-Targeted) |
Swearing used for emphasis or emotion without a specific target. Often considered low-severity. |
| 4 |
Cyberbullying & Threats |
Intent to intimidate or cause fear β includes threats of physical harm and encouragement of self-harm. |
| 5 |
Implicit Offensive Speech |
Microaggressions, sarcasm, and stereotyping conveyed through otherwise "clean" language. Hardest to detect. |
Training Data
The model was trained on a combined multilingual dataset assembled from the following sources:
German:
English:
Evaluation Results
Results will be filled in after evaluation is completed.
GermEval 2018 Test Set (DE)
| Metric |
Score |
| Accuracy |
0.7928 |
| F1 (macro) |
0.6586 |
| Precision |
0.7445 |
| Recall |
0.5904 |
GermEval 2019 Test Set (DE)
| Metric |
Score |
| Accuracy |
0.8247 |
| F1 (macro) |
0.9039 |
| Precision |
1.0000 |
| Recall |
0.8247 |
HASOC 2019 Test Set (DE)
| Metric |
Score |
| Accuracy |
0.7976 |
| F1 (macro) |
0.4342 |
| Precision |
0.3929 |
| Recall |
0.4853 |
HASOC 2020 Test Set (DE)
| Metric |
Score |
| Accuracy |
0.8213 |
| F1 (macro) |
0.7134 |
| Precision |
0.6031 |
| Recall |
0.8731 |
HASOC 2019 Test Set (EN)
| Metric |
Score |
| Accuracy |
0.8335 |
| F1 (macro) |
0.6933 |
| Precision |
0.6420 |
| Recall |
0.7535 |
HASOC 2020 Test Set (EN)
| Metric |
Score |
| Accuracy |
0.8213 |
| F1 (macro) |
0.7134 |
| Precision |
0.6031 |
| Recall |
0.8731 |
HateCheck (DE)
| Metric |
Score |
| Accuracy |
0.6694 |
| F1 (macro) |
0.7571 |
| Precision |
0.7789 |
| Recall |
0.7365 |
HateCheck (EN)
| Metric |
Score |
| Accuracy |
0.6765 |
| F1 (macro) |
0.7752 |
| Precision |
0.7422 |
| Recall |
0.8112 |
Usage
With Hugging Face pipeline
from transformers import pipeline
classifier = pipeline(
"text-classification",
model="Horbee/xlm-roberta-base-offensive-comment-classifier",
tokenizer="Horbee/xlm-roberta-base-offensive-comment-classifier",
)
result = classifier("You are such an idiot!")
print(result)
result = classifier("Du bist ein Idiot!")
print(result)
result = classifier("This is a completely normal and friendly comment.")
print(result)
The label mapping is:
LABEL_0 β Safe (not offensive)
LABEL_1 β Offensive
Batch inference
from transformers import pipeline
classifier = pipeline(
"text-classification",
model="Horbee/xlm-roberta-base-offensive-comment-classifier",
device=0,
)
texts = [
"I hate all of you!",
"Have a nice day!",
"Das ist wirklich schrecklich.",
"SchΓΆnen guten Morgen!",
]
results = classifier(texts, batch_size=8)
for text, result in zip(texts, results):
label = "Offensive" if result["label"] == "LABEL_1" else "Safe"
print(f"{label} ({result['score']:.2f}): {text}")
ONNX Usage
The model is also available as an INT8 quantized ONNX model (onnx/model_quantized.onnx) for fast CPU inference without a PyTorch dependency.
Installation
pip install onnxruntime tokenizers numpy
Inference
from pathlib import Path
import numpy as np
import onnxruntime as ort
from tokenizers import Tokenizer
MODEL_DIR = Path("onnx")
tokenizer = Tokenizer.from_file(str(MODEL_DIR / "tokenizer.json"))
tokenizer.enable_truncation(max_length=256)
tokenizer.enable_padding(pad_id=1, pad_token="<pad>", length=256)
session = ort.InferenceSession(
str(MODEL_DIR / "model.onnx"),
providers=["CPUExecutionProvider"],
)
def classify(text: str, threshold: float = 0.5) -> tuple[int, list[float]]:
encoded = tokenizer.encode(text)
inputs = {
"input_ids": np.array([encoded.ids], dtype=np.int64),
"attention_mask": np.array([encoded.attention_mask], dtype=np.int64),
}
logits = session.run(None, inputs)[0]
probs = np.exp(logits) / np.sum(np.exp(logits), axis=-1, keepdims=True)
label = int(probs[0][1] >= threshold)
return label, probs[0].tolist()
label, probs = classify("Du bist ein Idiot!")
print(f"Label: {label} β {'Offensive' if label == 1 else 'Safe'}")
print(f"Probabilities: safe={probs[0]:.3f}, offensive={probs[1]:.3f}")
ONNX Model Files
| File |
Description |
onnx/model.onnx |
Full-precision FP32 ONNX export |
onnx/model_quantized.onnx |
INT8 dynamic quantized ONNX (recommended for CPU) |
The quantized model uses dynamic INT8 quantization (QInt8 weights, QUInt8 activations) applied to MatMul, Attention, Gather, and embedding layers, resulting in significantly reduced model size and faster CPU throughput with minimal accuracy loss.
Model Files
xlm-roberta-base-offensive-comment-classifier/
βββ config.json
βββ model.safetensors
βββ tokenizer.json
βββ tokenizer_config.json
βββ onnx/
β βββ model.onnx # FP32 ONNX export
β βββ model_quantized.onnx # INT8 quantized ONNX (recommended for production)
β βββ tokenizer.json
Limitations and Bias
- The model was primarily trained on social media text (tweets, forum comments). Performance may degrade on formal or domain-specific text.
- Implicit offensive speech (microaggressions, sarcasm) remains the hardest category to detect reliably.
- The model supports English and German only. Using it on other languages may produce unreliable results even though XLM-RoBERTa has multilingual pretraining.
- As with all classifiers trained on human-annotated data, the model reflects the biases present in the annotation guidelines and annotator demographics.
Citation
If you use this model in your research, please cite this repository:
@misc{gtox2024,
title = {GTox: Multilingual Offensive Speech Classifier},
year = {2026},
url = {https://github.com/Horbee/gtox-offensive-comment-classifier}
}