Horbee's picture
Update README.md
a44153f verified
---
language:
- en
- de
license: mit
tags:
- text-classification
- offensive-language
- hate-speech
- xlm-roberta
- multilingual
- onnx
datasets:
- germeval2018
- germeval2019
- hasoc2019
- hasoc2020
- jigsaw
metrics:
- f1
- accuracy
- precision
- recall
base_model: FacebookAI/xlm-roberta-base
pipeline_tag: text-classification
---
# GTox β€” XLM-RoBERTa Multilingual Offensive Speech Classifier (v1)
A fine-tuned **XLM-RoBERTa base** model for detecting offensive speech in **English** and **German**. This is a binary classifier (`0` = safe, `1` = offensive) trained to catch a broad range of harmful language β€” from explicit hate speech to subtle microaggressions.
## Model Description
| Property | Value |
| ---------------- | ------------------------------------- |
| Base model | `FacebookAI/xlm-roberta-base` |
| Architecture | `XLMRobertaForSequenceClassification` |
| Task | Binary text classification |
| Languages | English (`en`), German (`de`) |
| Max input length | 256 tokens |
| Labels | `0` β€” Safe, `1` β€” Offensive |
### Categories of Offensive Speech
| # | Category | Description |
| --- | ---------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------- |
| 1 | **Hate Speech** (Targeted) | Attacks, threatens, or insults based on protected characteristics (race, religion, gender, etc.). Includes dehumanizing language and calls for violence. |
| 2 | **Targeted Insults** (Harassment) | Directed at a specific person based on behavior, appearance, or status β€” e.g. direct "you" statements or @-mentions. |
| 3 | **Profanity & Vulgarity** (Non-Targeted) | Swearing used for emphasis or emotion without a specific target. Often considered low-severity. |
| 4 | **Cyberbullying & Threats** | Intent to intimidate or cause fear β€” includes threats of physical harm and encouragement of self-harm. |
| 5 | **Implicit Offensive Speech** | Microaggressions, sarcasm, and stereotyping conveyed through otherwise "clean" language. Hardest to detect. |
---
## Training Data
The model was trained on a combined multilingual dataset assembled from the following sources:
**German:**
- [GermEval 2018](https://heidata.uni-heidelberg.de/dataset.xhtml?persistentId=doi:10.11588/data/0B5VML) β€” Offensive language detection in German tweets
- [GermEval 2019](https://fz.h-da.de/iggsa/data) β€” Offensive language detection in German tweets
- [HASOC 2019 DE](https://hasocfire.github.io/hasoc/2019/index.html) β€” Hate speech and offensive content in German
- [HASOC 2020 DE](https://hasocfire.github.io/hasoc/2020/index.html) β€” Hate speech and offensive content in German
**English:**
- [HASOC 2019 EN](https://hasocfire.github.io/hasoc/2019/index.html) β€” Hate speech and offensive content in English
- [HASOC 2020 EN](https://hasocfire.github.io/hasoc/2020/index.html) β€” Hate speech and offensive content in English
- [Jigsaw Toxicity](https://www.kaggle.com/c/jigsaw-unintended-bias-in-toxicity-classification) β€” Toxic comment classification
---
## Evaluation Results
> Results will be filled in after evaluation is completed.
### GermEval 2018 Test Set (DE)
| Metric | Score |
| ---------- | ------ |
| Accuracy | 0.7928 |
| F1 (macro) | 0.6586 |
| Precision | 0.7445 |
| Recall | 0.5904 |
### GermEval 2019 Test Set (DE)
| Metric | Score |
| ---------- | ------ |
| Accuracy | 0.8247 |
| F1 (macro) | 0.9039 |
| Precision | 1.0000 |
| Recall | 0.8247 |
### HASOC 2019 Test Set (DE)
| Metric | Score |
| ---------- | ------ |
| Accuracy | 0.7976 |
| F1 (macro) | 0.4342 |
| Precision | 0.3929 |
| Recall | 0.4853 |
### HASOC 2020 Test Set (DE)
| Metric | Score |
| ---------- | ------ |
| Accuracy | 0.8213 |
| F1 (macro) | 0.7134 |
| Precision | 0.6031 |
| Recall | 0.8731 |
### HASOC 2019 Test Set (EN)
| Metric | Score |
| ---------- | ------ |
| Accuracy | 0.8335 |
| F1 (macro) | 0.6933 |
| Precision | 0.6420 |
| Recall | 0.7535 |
### HASOC 2020 Test Set (EN)
| Metric | Score |
| ---------- | ------ |
| Accuracy | 0.8213 |
| F1 (macro) | 0.7134 |
| Precision | 0.6031 |
| Recall | 0.8731 |
### HateCheck (DE)
| Metric | Score |
| ---------- | ------ |
| Accuracy | 0.6694 |
| F1 (macro) | 0.7571 |
| Precision | 0.7789 |
| Recall | 0.7365 |
### HateCheck (EN)
| Metric | Score |
| ---------- | ------ |
| Accuracy | 0.6765 |
| F1 (macro) | 0.7752 |
| Precision | 0.7422 |
| Recall | 0.8112 |
---
## Usage
### With Hugging Face `pipeline`
```python
from transformers import pipeline
classifier = pipeline(
"text-classification",
model="Horbee/xlm-roberta-base-offensive-comment-classifier",
tokenizer="Horbee/xlm-roberta-base-offensive-comment-classifier",
)
# English
result = classifier("You are such an idiot!")
print(result)
# [{'label': 'LABEL_1', 'score': 0.99}] β†’ Offensive
# German
result = classifier("Du bist ein Idiot!")
print(result)
# [{'label': 'LABEL_1', 'score': 0.99}] β†’ Offensive
# Safe example
result = classifier("This is a completely normal and friendly comment.")
print(result)
# [{'label': 'LABEL_0', 'score': 0.99}] β†’ Safe
```
The label mapping is:
- `LABEL_0` β†’ Safe (not offensive)
- `LABEL_1` β†’ Offensive
### Batch inference
```python
from transformers import pipeline
classifier = pipeline(
"text-classification",
model="Horbee/xlm-roberta-base-offensive-comment-classifier",
device=0, # use GPU; set to -1 or omit for CPU
)
texts = [
"I hate all of you!",
"Have a nice day!",
"Das ist wirklich schrecklich.",
"SchΓΆnen guten Morgen!",
]
results = classifier(texts, batch_size=8)
for text, result in zip(texts, results):
label = "Offensive" if result["label"] == "LABEL_1" else "Safe"
print(f"{label} ({result['score']:.2f}): {text}")
```
---
## ONNX Usage
The model is also available as an INT8 quantized ONNX model (`onnx/model_quantized.onnx`) for fast CPU inference without a PyTorch dependency.
### Installation
```bash
pip install onnxruntime tokenizers numpy
```
### Inference
```python
from pathlib import Path
import numpy as np
import onnxruntime as ort
from tokenizers import Tokenizer
MODEL_DIR = Path("onnx")
# Load tokenizer and ONNX session
tokenizer = Tokenizer.from_file(str(MODEL_DIR / "tokenizer.json"))
tokenizer.enable_truncation(max_length=256)
tokenizer.enable_padding(pad_id=1, pad_token="<pad>", length=256)
session = ort.InferenceSession(
str(MODEL_DIR / "model.onnx"),
providers=["CPUExecutionProvider"],
)
def classify(text: str, threshold: float = 0.5) -> tuple[int, list[float]]:
encoded = tokenizer.encode(text)
inputs = {
"input_ids": np.array([encoded.ids], dtype=np.int64),
"attention_mask": np.array([encoded.attention_mask], dtype=np.int64),
}
logits = session.run(None, inputs)[0]
probs = np.exp(logits) / np.sum(np.exp(logits), axis=-1, keepdims=True)
label = int(probs[0][1] >= threshold)
return label, probs[0].tolist()
# Example
label, probs = classify("Du bist ein Idiot!")
print(f"Label: {label} β€” {'Offensive' if label == 1 else 'Safe'}")
print(f"Probabilities: safe={probs[0]:.3f}, offensive={probs[1]:.3f}")
```
### ONNX Model Files
| File | Description |
| --------------------------- | ------------------------------------------------- |
| `onnx/model.onnx` | Full-precision FP32 ONNX export |
| `onnx/model_quantized.onnx` | INT8 dynamic quantized ONNX (recommended for CPU) |
The quantized model uses **dynamic INT8 quantization** (`QInt8` weights, `QUInt8` activations) applied to `MatMul`, `Attention`, `Gather`, and embedding layers, resulting in significantly reduced model size and faster CPU throughput with minimal accuracy loss.
---
## Model Files
```
xlm-roberta-base-offensive-comment-classifier/
β”œβ”€β”€ config.json
β”œβ”€β”€ model.safetensors
β”œβ”€β”€ tokenizer.json
β”œβ”€β”€ tokenizer_config.json
β”œβ”€β”€ onnx/
β”‚ β”œβ”€β”€ model.onnx # FP32 ONNX export
β”‚ β”œβ”€β”€ model_quantized.onnx # INT8 quantized ONNX (recommended for production)
β”‚ └── tokenizer.json
```
---
## Limitations and Bias
- The model was primarily trained on social media text (tweets, forum comments). Performance may degrade on formal or domain-specific text.
- Implicit offensive speech (microaggressions, sarcasm) remains the hardest category to detect reliably.
- The model supports **English and German only**. Using it on other languages may produce unreliable results even though XLM-RoBERTa has multilingual pretraining.
- As with all classifiers trained on human-annotated data, the model reflects the biases present in the annotation guidelines and annotator demographics.
---
## Citation
If you use this model in your research, please cite this repository:
```bibtex
@misc{gtox2024,
title = {GTox: Multilingual Offensive Speech Classifier},
year = {2026},
url = {https://github.com/Horbee/gtox-offensive-comment-classifier}
}
```