GTox β€” XLM-RoBERTa Multilingual Offensive Speech Classifier (v1)

A fine-tuned XLM-RoBERTa base model for detecting offensive speech in English and German. This is a binary classifier (0 = safe, 1 = offensive) trained to catch a broad range of harmful language β€” from explicit hate speech to subtle microaggressions.

Model Description

Property Value
Base model FacebookAI/xlm-roberta-base
Architecture XLMRobertaForSequenceClassification
Task Binary text classification
Languages English (en), German (de)
Max input length 256 tokens
Labels 0 β€” Safe, 1 β€” Offensive

Categories of Offensive Speech

# Category Description
1 Hate Speech (Targeted) Attacks, threatens, or insults based on protected characteristics (race, religion, gender, etc.). Includes dehumanizing language and calls for violence.
2 Targeted Insults (Harassment) Directed at a specific person based on behavior, appearance, or status β€” e.g. direct "you" statements or @-mentions.
3 Profanity & Vulgarity (Non-Targeted) Swearing used for emphasis or emotion without a specific target. Often considered low-severity.
4 Cyberbullying & Threats Intent to intimidate or cause fear β€” includes threats of physical harm and encouragement of self-harm.
5 Implicit Offensive Speech Microaggressions, sarcasm, and stereotyping conveyed through otherwise "clean" language. Hardest to detect.

Training Data

The model was trained on a combined multilingual dataset assembled from the following sources:

German:

  • GermEval 2018 β€” Offensive language detection in German tweets
  • GermEval 2019 β€” Offensive language detection in German tweets
  • HASOC 2019 DE β€” Hate speech and offensive content in German
  • HASOC 2020 DE β€” Hate speech and offensive content in German

English:


Evaluation Results

Results will be filled in after evaluation is completed.

GermEval 2018 Test Set (DE)

Metric Score
Accuracy 0.7928
F1 (macro) 0.6586
Precision 0.7445
Recall 0.5904

GermEval 2019 Test Set (DE)

Metric Score
Accuracy 0.8247
F1 (macro) 0.9039
Precision 1.0000
Recall 0.8247

HASOC 2019 Test Set (DE)

Metric Score
Accuracy 0.7976
F1 (macro) 0.4342
Precision 0.3929
Recall 0.4853

HASOC 2020 Test Set (DE)

Metric Score
Accuracy 0.8213
F1 (macro) 0.7134
Precision 0.6031
Recall 0.8731

HASOC 2019 Test Set (EN)

Metric Score
Accuracy 0.8335
F1 (macro) 0.6933
Precision 0.6420
Recall 0.7535

HASOC 2020 Test Set (EN)

Metric Score
Accuracy 0.8213
F1 (macro) 0.7134
Precision 0.6031
Recall 0.8731

HateCheck (DE)

Metric Score
Accuracy 0.6694
F1 (macro) 0.7571
Precision 0.7789
Recall 0.7365

HateCheck (EN)

Metric Score
Accuracy 0.6765
F1 (macro) 0.7752
Precision 0.7422
Recall 0.8112

Usage

With Hugging Face pipeline

from transformers import pipeline

classifier = pipeline(
    "text-classification",
    model="Horbee/xlm-roberta-base-offensive-comment-classifier",
    tokenizer="Horbee/xlm-roberta-base-offensive-comment-classifier",
)

# English
result = classifier("You are such an idiot!")
print(result)
# [{'label': 'LABEL_1', 'score': 0.99}]  β†’  Offensive

# German
result = classifier("Du bist ein Idiot!")
print(result)
# [{'label': 'LABEL_1', 'score': 0.99}]  β†’  Offensive

# Safe example
result = classifier("This is a completely normal and friendly comment.")
print(result)
# [{'label': 'LABEL_0', 'score': 0.99}]  β†’  Safe

The label mapping is:

  • LABEL_0 β†’ Safe (not offensive)
  • LABEL_1 β†’ Offensive

Batch inference

from transformers import pipeline

classifier = pipeline(
    "text-classification",
    model="Horbee/xlm-roberta-base-offensive-comment-classifier",
    device=0,  # use GPU; set to -1 or omit for CPU
)

texts = [
    "I hate all of you!",
    "Have a nice day!",
    "Das ist wirklich schrecklich.",
    "SchΓΆnen guten Morgen!",
]

results = classifier(texts, batch_size=8)
for text, result in zip(texts, results):
    label = "Offensive" if result["label"] == "LABEL_1" else "Safe"
    print(f"{label} ({result['score']:.2f}): {text}")

ONNX Usage

The model is also available as an INT8 quantized ONNX model (onnx/model_quantized.onnx) for fast CPU inference without a PyTorch dependency.

Installation

pip install onnxruntime tokenizers numpy

Inference

from pathlib import Path

import numpy as np
import onnxruntime as ort
from tokenizers import Tokenizer

MODEL_DIR = Path("onnx")

# Load tokenizer and ONNX session
tokenizer = Tokenizer.from_file(str(MODEL_DIR / "tokenizer.json"))
tokenizer.enable_truncation(max_length=256)
tokenizer.enable_padding(pad_id=1, pad_token="<pad>", length=256)

session = ort.InferenceSession(
    str(MODEL_DIR / "model.onnx"),
    providers=["CPUExecutionProvider"],
)

def classify(text: str, threshold: float = 0.5) -> tuple[int, list[float]]:
    encoded = tokenizer.encode(text)
    inputs = {
        "input_ids": np.array([encoded.ids], dtype=np.int64),
        "attention_mask": np.array([encoded.attention_mask], dtype=np.int64),
    }
    logits = session.run(None, inputs)[0]
    probs = np.exp(logits) / np.sum(np.exp(logits), axis=-1, keepdims=True)
    label = int(probs[0][1] >= threshold)
    return label, probs[0].tolist()

# Example
label, probs = classify("Du bist ein Idiot!")
print(f"Label: {label} β€” {'Offensive' if label == 1 else 'Safe'}")
print(f"Probabilities: safe={probs[0]:.3f}, offensive={probs[1]:.3f}")

ONNX Model Files

File Description
onnx/model.onnx Full-precision FP32 ONNX export
onnx/model_quantized.onnx INT8 dynamic quantized ONNX (recommended for CPU)

The quantized model uses dynamic INT8 quantization (QInt8 weights, QUInt8 activations) applied to MatMul, Attention, Gather, and embedding layers, resulting in significantly reduced model size and faster CPU throughput with minimal accuracy loss.


Model Files

xlm-roberta-base-offensive-comment-classifier/
β”œβ”€β”€ config.json
β”œβ”€β”€ model.safetensors
β”œβ”€β”€ tokenizer.json
β”œβ”€β”€ tokenizer_config.json
β”œβ”€β”€ onnx/
β”‚   β”œβ”€β”€ model.onnx          # FP32 ONNX export
β”‚   β”œβ”€β”€ model_quantized.onnx          # INT8 quantized ONNX (recommended for production)
β”‚   └── tokenizer.json

Limitations and Bias

  • The model was primarily trained on social media text (tweets, forum comments). Performance may degrade on formal or domain-specific text.
  • Implicit offensive speech (microaggressions, sarcasm) remains the hardest category to detect reliably.
  • The model supports English and German only. Using it on other languages may produce unreliable results even though XLM-RoBERTa has multilingual pretraining.
  • As with all classifiers trained on human-annotated data, the model reflects the biases present in the annotation guidelines and annotator demographics.

Citation

If you use this model in your research, please cite this repository:

@misc{gtox2024,
  title  = {GTox: Multilingual Offensive Speech Classifier},
  year   = {2026},
  url    = {https://github.com/Horbee/gtox-offensive-comment-classifier}
}
Downloads last month
35
Safetensors
Model size
0.3B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Horbee/xlm-roberta-base-offensive-comment-classifier

Quantized
(14)
this model