File size: 9,791 Bytes

a44153f

---
language:
  - en
  - de
license: mit
tags:
  - text-classification
  - offensive-language
  - hate-speech
  - xlm-roberta
  - multilingual
  - onnx
datasets:
  - germeval2018
  - germeval2019
  - hasoc2019
  - hasoc2020
  - jigsaw
metrics:
  - f1
  - accuracy
  - precision
  - recall
base_model: FacebookAI/xlm-roberta-base
pipeline_tag: text-classification
---

# GTox — XLM-RoBERTa Multilingual Offensive Speech Classifier (v1)

A fine-tuned **XLM-RoBERTa base** model for detecting offensive speech in **English** and **German**. This is a binary classifier (`0` = safe, `1` = offensive) trained to catch a broad range of harmful language — from explicit hate speech to subtle microaggressions.

## Model Description

| Property         | Value                                 |
| ---------------- | ------------------------------------- |
| Base model       | `FacebookAI/xlm-roberta-base`         |
| Architecture     | `XLMRobertaForSequenceClassification` |
| Task             | Binary text classification            |
| Languages        | English (`en`), German (`de`)         |
| Max input length | 256 tokens                            |
| Labels           | `0` — Safe, `1` — Offensive           |

### Categories of Offensive Speech

| #   | Category                                 | Description                                                                                                                                              |
| --- | ---------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------- |
| 1   | **Hate Speech** (Targeted)               | Attacks, threatens, or insults based on protected characteristics (race, religion, gender, etc.). Includes dehumanizing language and calls for violence. |
| 2   | **Targeted Insults** (Harassment)        | Directed at a specific person based on behavior, appearance, or status — e.g. direct "you" statements or @-mentions.                                     |
| 3   | **Profanity & Vulgarity** (Non-Targeted) | Swearing used for emphasis or emotion without a specific target. Often considered low-severity.                                                          |
| 4   | **Cyberbullying & Threats**              | Intent to intimidate or cause fear — includes threats of physical harm and encouragement of self-harm.                                                   |
| 5   | **Implicit Offensive Speech**            | Microaggressions, sarcasm, and stereotyping conveyed through otherwise "clean" language. Hardest to detect.                                              |

---

## Training Data

The model was trained on a combined multilingual dataset assembled from the following sources:

**German:**

- [GermEval 2018](https://heidata.uni-heidelberg.de/dataset.xhtml?persistentId=doi:10.11588/data/0B5VML) — Offensive language detection in German tweets
- [GermEval 2019](https://fz.h-da.de/iggsa/data) — Offensive language detection in German tweets
- [HASOC 2019 DE](https://hasocfire.github.io/hasoc/2019/index.html) — Hate speech and offensive content in German
- [HASOC 2020 DE](https://hasocfire.github.io/hasoc/2020/index.html) — Hate speech and offensive content in German

**English:**

- [HASOC 2019 EN](https://hasocfire.github.io/hasoc/2019/index.html) — Hate speech and offensive content in English
- [HASOC 2020 EN](https://hasocfire.github.io/hasoc/2020/index.html) — Hate speech and offensive content in English
- [Jigsaw Toxicity](https://www.kaggle.com/c/jigsaw-unintended-bias-in-toxicity-classification) — Toxic comment classification

---

## Evaluation Results

> Results will be filled in after evaluation is completed.

### GermEval 2018 Test Set (DE)

| Metric     | Score  |
| ---------- | ------ |
| Accuracy   | 0.7928 |
| F1 (macro) | 0.6586 |
| Precision  | 0.7445 |
| Recall     | 0.5904 |

### GermEval 2019 Test Set (DE)

| Metric     | Score  |
| ---------- | ------ |
| Accuracy   | 0.8247 |
| F1 (macro) | 0.9039 |
| Precision  | 1.0000 |
| Recall     | 0.8247 |

### HASOC 2019 Test Set (DE)

| Metric     | Score  |
| ---------- | ------ |
| Accuracy   | 0.7976 |
| F1 (macro) | 0.4342 |
| Precision  | 0.3929 |
| Recall     | 0.4853 |

### HASOC 2020 Test Set (DE)

| Metric     | Score  |
| ---------- | ------ |
| Accuracy   | 0.8213 |
| F1 (macro) | 0.7134 |
| Precision  | 0.6031 |
| Recall     | 0.8731 |

### HASOC 2019 Test Set (EN)

| Metric     | Score  |
| ---------- | ------ |
| Accuracy   | 0.8335 |
| F1 (macro) | 0.6933 |
| Precision  | 0.6420 |
| Recall     | 0.7535 |

### HASOC 2020 Test Set (EN)

| Metric     | Score  |
| ---------- | ------ |
| Accuracy   | 0.8213 |
| F1 (macro) | 0.7134 |
| Precision  | 0.6031 |
| Recall     | 0.8731 |

### HateCheck (DE)

| Metric     | Score  |
| ---------- | ------ |
| Accuracy   | 0.6694 |
| F1 (macro) | 0.7571 |
| Precision  | 0.7789 |
| Recall     | 0.7365 |

### HateCheck (EN)

| Metric     | Score  |
| ---------- | ------ |
| Accuracy   | 0.6765 |
| F1 (macro) | 0.7752 |
| Precision  | 0.7422 |
| Recall     | 0.8112 |

---

## Usage

### With Hugging Face `pipeline`

```python
from transformers import pipeline

classifier = pipeline(
    "text-classification",
    model="Horbee/xlm-roberta-base-offensive-comment-classifier",
    tokenizer="Horbee/xlm-roberta-base-offensive-comment-classifier",
)

# English
result = classifier("You are such an idiot!")
print(result)
# [{'label': 'LABEL_1', 'score': 0.99}]  →  Offensive

# German
result = classifier("Du bist ein Idiot!")
print(result)
# [{'label': 'LABEL_1', 'score': 0.99}]  →  Offensive

# Safe example
result = classifier("This is a completely normal and friendly comment.")
print(result)
# [{'label': 'LABEL_0', 'score': 0.99}]  →  Safe
```

The label mapping is:

- `LABEL_0` → Safe (not offensive)
- `LABEL_1` → Offensive

### Batch inference

```python
from transformers import pipeline

classifier = pipeline(
    "text-classification",
    model="Horbee/xlm-roberta-base-offensive-comment-classifier",
    device=0,  # use GPU; set to -1 or omit for CPU
)

texts = [
    "I hate all of you!",
    "Have a nice day!",
    "Das ist wirklich schrecklich.",
    "Schönen guten Morgen!",
]

results = classifier(texts, batch_size=8)
for text, result in zip(texts, results):
    label = "Offensive" if result["label"] == "LABEL_1" else "Safe"
    print(f"{label} ({result['score']:.2f}): {text}")
```

---

## ONNX Usage

The model is also available as an INT8 quantized ONNX model (`onnx/model_quantized.onnx`) for fast CPU inference without a PyTorch dependency.

### Installation

```bash
pip install onnxruntime tokenizers numpy
```

### Inference

```python
from pathlib import Path

import numpy as np
import onnxruntime as ort
from tokenizers import Tokenizer

MODEL_DIR = Path("onnx")

# Load tokenizer and ONNX session
tokenizer = Tokenizer.from_file(str(MODEL_DIR / "tokenizer.json"))
tokenizer.enable_truncation(max_length=256)
tokenizer.enable_padding(pad_id=1, pad_token="<pad>", length=256)

session = ort.InferenceSession(
    str(MODEL_DIR / "model.onnx"),
    providers=["CPUExecutionProvider"],
)

def classify(text: str, threshold: float = 0.5) -> tuple[int, list[float]]:
    encoded = tokenizer.encode(text)
    inputs = {
        "input_ids": np.array([encoded.ids], dtype=np.int64),
        "attention_mask": np.array([encoded.attention_mask], dtype=np.int64),
    }
    logits = session.run(None, inputs)[0]
    probs = np.exp(logits) / np.sum(np.exp(logits), axis=-1, keepdims=True)
    label = int(probs[0][1] >= threshold)
    return label, probs[0].tolist()

# Example
label, probs = classify("Du bist ein Idiot!")
print(f"Label: {label} — {'Offensive' if label == 1 else 'Safe'}")
print(f"Probabilities: safe={probs[0]:.3f}, offensive={probs[1]:.3f}")
```

### ONNX Model Files

| File                        | Description                                       |
| --------------------------- | ------------------------------------------------- |
| `onnx/model.onnx`           | Full-precision FP32 ONNX export                   |
| `onnx/model_quantized.onnx` | INT8 dynamic quantized ONNX (recommended for CPU) |

The quantized model uses **dynamic INT8 quantization** (`QInt8` weights, `QUInt8` activations) applied to `MatMul`, `Attention`, `Gather`, and embedding layers, resulting in significantly reduced model size and faster CPU throughput with minimal accuracy loss.

---

## Model Files

```
xlm-roberta-base-offensive-comment-classifier/
├── config.json
├── model.safetensors
├── tokenizer.json
├── tokenizer_config.json
├── onnx/
│   ├── model.onnx          # FP32 ONNX export
│   ├── model_quantized.onnx          # INT8 quantized ONNX (recommended for production)
│   └── tokenizer.json
```

---

## Limitations and Bias

- The model was primarily trained on social media text (tweets, forum comments). Performance may degrade on formal or domain-specific text.
- Implicit offensive speech (microaggressions, sarcasm) remains the hardest category to detect reliably.
- The model supports **English and German only**. Using it on other languages may produce unreliable results even though XLM-RoBERTa has multilingual pretraining.
- As with all classifiers trained on human-annotated data, the model reflects the biases present in the annotation guidelines and annotator demographics.

---

## Citation

If you use this model in your research, please cite this repository:

```bibtex
@misc{gtox2024,
  title  = {GTox: Multilingual Offensive Speech Classifier},
  year   = {2026},
  url    = {https://github.com/Horbee/gtox-offensive-comment-classifier}
}
```