Update README.md

a44153f verified 24 days ago

9.79 kB

language:
  - en
  - de
license: mit
tags:
  - text-classification
  - offensive-language
  - hate-speech
  - xlm-roberta
  - multilingual
  - onnx
datasets:
  - germeval2018
  - germeval2019
  - hasoc2019
  - hasoc2020
  - jigsaw
metrics:
  - f1
  - accuracy
  - precision
  - recall
base_model: FacebookAI/xlm-roberta-base
pipeline_tag: text-classification

GTox — XLM-RoBERTa Multilingual Offensive Speech Classifier (v1)

A fine-tuned XLM-RoBERTa base model for detecting offensive speech in English and German. This is a binary classifier (0 = safe, 1 = offensive) trained to catch a broad range of harmful language — from explicit hate speech to subtle microaggressions.

Model Description

Property	Value
Base model	`FacebookAI/xlm-roberta-base`
Architecture	`XLMRobertaForSequenceClassification`
Task	Binary text classification
Languages	English (`en`), German (`de`)
Max input length	256 tokens
Labels	`0` — Safe, `1` — Offensive

Categories of Offensive Speech

#	Category	Description
1	Hate Speech (Targeted)	Attacks, threatens, or insults based on protected characteristics (race, religion, gender, etc.). Includes dehumanizing language and calls for violence.
2	Targeted Insults (Harassment)	Directed at a specific person based on behavior, appearance, or status — e.g. direct "you" statements or @-mentions.
3	Profanity & Vulgarity (Non-Targeted)	Swearing used for emphasis or emotion without a specific target. Often considered low-severity.
4	Cyberbullying & Threats	Intent to intimidate or cause fear — includes threats of physical harm and encouragement of self-harm.
5	Implicit Offensive Speech	Microaggressions, sarcasm, and stereotyping conveyed through otherwise "clean" language. Hardest to detect.

Training Data

The model was trained on a combined multilingual dataset assembled from the following sources:

German:

GermEval 2018 — Offensive language detection in German tweets
GermEval 2019 — Offensive language detection in German tweets
HASOC 2019 DE — Hate speech and offensive content in German
HASOC 2020 DE — Hate speech and offensive content in German

English:

HASOC 2019 EN — Hate speech and offensive content in English
HASOC 2020 EN — Hate speech and offensive content in English
Jigsaw Toxicity — Toxic comment classification

Evaluation Results

Results will be filled in after evaluation is completed.

GermEval 2018 Test Set (DE)

Metric	Score
Accuracy	0.7928
F1 (macro)	0.6586
Precision	0.7445
Recall	0.5904

GermEval 2019 Test Set (DE)

Metric	Score
Accuracy	0.8247
F1 (macro)	0.9039
Precision	1.0000
Recall	0.8247

HASOC 2019 Test Set (DE)

Metric	Score
Accuracy	0.7976
F1 (macro)	0.4342
Precision	0.3929
Recall	0.4853

HASOC 2020 Test Set (DE)

Metric	Score
Accuracy	0.8213
F1 (macro)	0.7134
Precision	0.6031
Recall	0.8731

HASOC 2019 Test Set (EN)

Metric	Score
Accuracy	0.8335
F1 (macro)	0.6933
Precision	0.6420
Recall	0.7535

HASOC 2020 Test Set (EN)

Metric	Score
Accuracy	0.8213
F1 (macro)	0.7134
Precision	0.6031
Recall	0.8731

HateCheck (DE)

Metric	Score
Accuracy	0.6694
F1 (macro)	0.7571
Precision	0.7789
Recall	0.7365

HateCheck (EN)

Metric	Score
Accuracy	0.6765
F1 (macro)	0.7752
Precision	0.7422
Recall	0.8112

Usage

With Hugging Face `pipeline`

from transformers import pipeline

classifier = pipeline(
    "text-classification",
    model="Horbee/xlm-roberta-base-offensive-comment-classifier",
    tokenizer="Horbee/xlm-roberta-base-offensive-comment-classifier",
)

# English
result = classifier("You are such an idiot!")
print(result)
# [{'label': 'LABEL_1', 'score': 0.99}]  →  Offensive

# German
result = classifier("Du bist ein Idiot!")
print(result)
# [{'label': 'LABEL_1', 'score': 0.99}]  →  Offensive

# Safe example
result = classifier("This is a completely normal and friendly comment.")
print(result)
# [{'label': 'LABEL_0', 'score': 0.99}]  →  Safe

The label mapping is:

LABEL_0 → Safe (not offensive)
LABEL_1 → Offensive

Batch inference

from transformers import pipeline

classifier = pipeline(
    "text-classification",
    model="Horbee/xlm-roberta-base-offensive-comment-classifier",
    device=0,  # use GPU; set to -1 or omit for CPU
)

texts = [
    "I hate all of you!",
    "Have a nice day!",
    "Das ist wirklich schrecklich.",
    "Schönen guten Morgen!",
]

results = classifier(texts, batch_size=8)
for text, result in zip(texts, results):
    label = "Offensive" if result["label"] == "LABEL_1" else "Safe"
    print(f"{label} ({result['score']:.2f}): {text}")

ONNX Usage

The model is also available as an INT8 quantized ONNX model (onnx/model_quantized.onnx) for fast CPU inference without a PyTorch dependency.

Installation

pip install onnxruntime tokenizers numpy

Inference

from pathlib import Path

import numpy as np
import onnxruntime as ort
from tokenizers import Tokenizer

MODEL_DIR = Path("onnx")

# Load tokenizer and ONNX session
tokenizer = Tokenizer.from_file(str(MODEL_DIR / "tokenizer.json"))
tokenizer.enable_truncation(max_length=256)
tokenizer.enable_padding(pad_id=1, pad_token="<pad>", length=256)

session = ort.InferenceSession(
    str(MODEL_DIR / "model.onnx"),
    providers=["CPUExecutionProvider"],
)

def classify(text: str, threshold: float = 0.5) -> tuple[int, list[float]]:
    encoded = tokenizer.encode(text)
    inputs = {
        "input_ids": np.array([encoded.ids], dtype=np.int64),
        "attention_mask": np.array([encoded.attention_mask], dtype=np.int64),
    }
    logits = session.run(None, inputs)[0]
    probs = np.exp(logits) / np.sum(np.exp(logits), axis=-1, keepdims=True)
    label = int(probs[0][1] >= threshold)
    return label, probs[0].tolist()

# Example
label, probs = classify("Du bist ein Idiot!")
print(f"Label: {label} — {'Offensive' if label == 1 else 'Safe'}")
print(f"Probabilities: safe={probs[0]:.3f}, offensive={probs[1]:.3f}")

ONNX Model Files

File	Description
`onnx/model.onnx`	Full-precision FP32 ONNX export
`onnx/model_quantized.onnx`	INT8 dynamic quantized ONNX (recommended for CPU)

The quantized model uses dynamic INT8 quantization (QInt8 weights, QUInt8 activations) applied to MatMul, Attention, Gather, and embedding layers, resulting in significantly reduced model size and faster CPU throughput with minimal accuracy loss.

Model Files

xlm-roberta-base-offensive-comment-classifier/
├── config.json
├── model.safetensors
├── tokenizer.json
├── tokenizer_config.json
├── onnx/
│   ├── model.onnx          # FP32 ONNX export
│   ├── model_quantized.onnx          # INT8 quantized ONNX (recommended for production)
│   └── tokenizer.json

Limitations and Bias

The model was primarily trained on social media text (tweets, forum comments). Performance may degrade on formal or domain-specific text.
Implicit offensive speech (microaggressions, sarcasm) remains the hardest category to detect reliably.
The model supports English and German only. Using it on other languages may produce unreliable results even though XLM-RoBERTa has multilingual pretraining.
As with all classifiers trained on human-annotated data, the model reflects the biases present in the annotation guidelines and annotator demographics.

Citation

If you use this model in your research, please cite this repository:

@misc{gtox2024,
  title  = {GTox: Multilingual Offensive Speech Classifier},
  year   = {2026},
  url    = {https://github.com/Horbee/gtox-offensive-comment-classifier}
}