Update README.md

a44153f verified 25 days ago

9.79 kB

	---
	language:
	- en
	- de
	license: mit
	tags:
	- text-classification
	- offensive-language
	- hate-speech
	- xlm-roberta
	- multilingual
	- onnx
	datasets:
	- germeval2018
	- germeval2019
	- hasoc2019
	- hasoc2020
	- jigsaw
	metrics:
	- f1
	- accuracy
	- precision
	- recall
	base_model: FacebookAI/xlm-roberta-base
	pipeline_tag: text-classification
	---

	# GTox — XLM-RoBERTa Multilingual Offensive Speech Classifier (v1)

	A fine-tuned XLM-RoBERTa base model for detecting offensive speech in English and German. This is a binary classifier (`0` = safe, `1` = offensive) trained to catch a broad range of harmful language — from explicit hate speech to subtle microaggressions.

	## Model Description

	\| Property \| Value \|
	\| ---------------- \| ------------------------------------- \|
	\| Base model \| `FacebookAI/xlm-roberta-base` \|
	\| Architecture \| `XLMRobertaForSequenceClassification` \|
	\| Task \| Binary text classification \|
	\| Languages \| English (`en`), German (`de`) \|
	\| Max input length \| 256 tokens \|
	\| Labels \| `0` — Safe, `1` — Offensive \|

	### Categories of Offensive Speech

	\| # \| Category \| Description \|
	\| --- \| ---------------------------------------- \| -------------------------------------------------------------------------------------------------------------------------------------------------------- \|
	\| 1 \| Hate Speech (Targeted) \| Attacks, threatens, or insults based on protected characteristics (race, religion, gender, etc.). Includes dehumanizing language and calls for violence. \|
	\| 2 \| Targeted Insults (Harassment) \| Directed at a specific person based on behavior, appearance, or status — e.g. direct "you" statements or @-mentions. \|
	\| 3 \| Profanity & Vulgarity (Non-Targeted) \| Swearing used for emphasis or emotion without a specific target. Often considered low-severity. \|
	\| 4 \| Cyberbullying & Threats \| Intent to intimidate or cause fear — includes threats of physical harm and encouragement of self-harm. \|
	\| 5 \| Implicit Offensive Speech \| Microaggressions, sarcasm, and stereotyping conveyed through otherwise "clean" language. Hardest to detect. \|

	---

	## Training Data

	The model was trained on a combined multilingual dataset assembled from the following sources:

	German:

	- [GermEval 2018](https://heidata.uni-heidelberg.de/dataset.xhtml?persistentId=doi:10.11588/data/0B5VML) — Offensive language detection in German tweets
	- [GermEval 2019](https://fz.h-da.de/iggsa/data) — Offensive language detection in German tweets
	- [HASOC 2019 DE](https://hasocfire.github.io/hasoc/2019/index.html) — Hate speech and offensive content in German
	- [HASOC 2020 DE](https://hasocfire.github.io/hasoc/2020/index.html) — Hate speech and offensive content in German

	English:

	- [HASOC 2019 EN](https://hasocfire.github.io/hasoc/2019/index.html) — Hate speech and offensive content in English
	- [HASOC 2020 EN](https://hasocfire.github.io/hasoc/2020/index.html) — Hate speech and offensive content in English
	- [Jigsaw Toxicity](https://www.kaggle.com/c/jigsaw-unintended-bias-in-toxicity-classification) — Toxic comment classification

	---

	## Evaluation Results

	> Results will be filled in after evaluation is completed.

	### GermEval 2018 Test Set (DE)

	\| Metric \| Score \|
	\| ---------- \| ------ \|
	\| Accuracy \| 0.7928 \|
	\| F1 (macro) \| 0.6586 \|
	\| Precision \| 0.7445 \|
	\| Recall \| 0.5904 \|

	### GermEval 2019 Test Set (DE)

	\| Metric \| Score \|
	\| ---------- \| ------ \|
	\| Accuracy \| 0.8247 \|
	\| F1 (macro) \| 0.9039 \|
	\| Precision \| 1.0000 \|
	\| Recall \| 0.8247 \|

	### HASOC 2019 Test Set (DE)

	\| Metric \| Score \|
	\| ---------- \| ------ \|
	\| Accuracy \| 0.7976 \|
	\| F1 (macro) \| 0.4342 \|
	\| Precision \| 0.3929 \|
	\| Recall \| 0.4853 \|

	### HASOC 2020 Test Set (DE)

	\| Metric \| Score \|
	\| ---------- \| ------ \|
	\| Accuracy \| 0.8213 \|
	\| F1 (macro) \| 0.7134 \|
	\| Precision \| 0.6031 \|
	\| Recall \| 0.8731 \|

	### HASOC 2019 Test Set (EN)

	\| Metric \| Score \|
	\| ---------- \| ------ \|
	\| Accuracy \| 0.8335 \|
	\| F1 (macro) \| 0.6933 \|
	\| Precision \| 0.6420 \|
	\| Recall \| 0.7535 \|

	### HASOC 2020 Test Set (EN)

	\| Metric \| Score \|
	\| ---------- \| ------ \|
	\| Accuracy \| 0.8213 \|
	\| F1 (macro) \| 0.7134 \|
	\| Precision \| 0.6031 \|
	\| Recall \| 0.8731 \|

	### HateCheck (DE)

	\| Metric \| Score \|
	\| ---------- \| ------ \|
	\| Accuracy \| 0.6694 \|
	\| F1 (macro) \| 0.7571 \|
	\| Precision \| 0.7789 \|
	\| Recall \| 0.7365 \|

	### HateCheck (EN)

	\| Metric \| Score \|
	\| ---------- \| ------ \|
	\| Accuracy \| 0.6765 \|
	\| F1 (macro) \| 0.7752 \|
	\| Precision \| 0.7422 \|
	\| Recall \| 0.8112 \|

	---

	## Usage

	### With Hugging Face `pipeline`

	```python
	from transformers import pipeline

	classifier = pipeline(
	"text-classification",
	model="Horbee/xlm-roberta-base-offensive-comment-classifier",
	tokenizer="Horbee/xlm-roberta-base-offensive-comment-classifier",
	)

	# English
	result = classifier("You are such an idiot!")
	print(result)
	# [{'label': 'LABEL_1', 'score': 0.99}] → Offensive

	# German
	result = classifier("Du bist ein Idiot!")
	print(result)
	# [{'label': 'LABEL_1', 'score': 0.99}] → Offensive

	# Safe example
	result = classifier("This is a completely normal and friendly comment.")
	print(result)
	# [{'label': 'LABEL_0', 'score': 0.99}] → Safe
	```

	The label mapping is:

	- `LABEL_0` → Safe (not offensive)
	- `LABEL_1` → Offensive

	### Batch inference

	```python
	from transformers import pipeline

	classifier = pipeline(
	"text-classification",
	model="Horbee/xlm-roberta-base-offensive-comment-classifier",
	device=0, # use GPU; set to -1 or omit for CPU
	)

	texts = [
	"I hate all of you!",
	"Have a nice day!",
	"Das ist wirklich schrecklich.",
	"Schönen guten Morgen!",
	]

	results = classifier(texts, batch_size=8)
	for text, result in zip(texts, results):
	label = "Offensive" if result["label"] == "LABEL_1" else "Safe"
	print(f"{label} ({result['score']:.2f}): {text}")
	```

	---

	## ONNX Usage

	The model is also available as an INT8 quantized ONNX model (`onnx/model_quantized.onnx`) for fast CPU inference without a PyTorch dependency.

	### Installation

	```bash
	pip install onnxruntime tokenizers numpy
	```

	### Inference

	```python
	from pathlib import Path

	import numpy as np
	import onnxruntime as ort
	from tokenizers import Tokenizer

	MODEL_DIR = Path("onnx")

	# Load tokenizer and ONNX session
	tokenizer = Tokenizer.from_file(str(MODEL_DIR / "tokenizer.json"))
	tokenizer.enable_truncation(max_length=256)
	tokenizer.enable_padding(pad_id=1, pad_token="<pad>", length=256)

	session = ort.InferenceSession(
	str(MODEL_DIR / "model.onnx"),
	providers=["CPUExecutionProvider"],
	)

	def classify(text: str, threshold: float = 0.5) -> tuple[int, list[float]]:
	encoded = tokenizer.encode(text)
	inputs = {
	"input_ids": np.array([encoded.ids], dtype=np.int64),
	"attention_mask": np.array([encoded.attention_mask], dtype=np.int64),
	}
	logits = session.run(None, inputs)[0]
	probs = np.exp(logits) / np.sum(np.exp(logits), axis=-1, keepdims=True)
	label = int(probs[0][1] >= threshold)
	return label, probs[0].tolist()

	# Example
	label, probs = classify("Du bist ein Idiot!")
	print(f"Label: {label} — {'Offensive' if label == 1 else 'Safe'}")
	print(f"Probabilities: safe={probs[0]:.3f}, offensive={probs[1]:.3f}")
	```

	### ONNX Model Files

	\| File \| Description \|
	\| --------------------------- \| ------------------------------------------------- \|
	\| `onnx/model.onnx` \| Full-precision FP32 ONNX export \|
	\| `onnx/model_quantized.onnx` \| INT8 dynamic quantized ONNX (recommended for CPU) \|

	The quantized model uses dynamic INT8 quantization (`QInt8` weights, `QUInt8` activations) applied to `MatMul`, `Attention`, `Gather`, and embedding layers, resulting in significantly reduced model size and faster CPU throughput with minimal accuracy loss.

	---

	## Model Files

	```
	xlm-roberta-base-offensive-comment-classifier/
	├── config.json
	├── model.safetensors
	├── tokenizer.json
	├── tokenizer_config.json
	├── onnx/
	│ ├── model.onnx # FP32 ONNX export
	│ ├── model_quantized.onnx # INT8 quantized ONNX (recommended for production)
	│ └── tokenizer.json
	```

	---

	## Limitations and Bias

	- The model was primarily trained on social media text (tweets, forum comments). Performance may degrade on formal or domain-specific text.
	- Implicit offensive speech (microaggressions, sarcasm) remains the hardest category to detect reliably.
	- The model supports English and German only. Using it on other languages may produce unreliable results even though XLM-RoBERTa has multilingual pretraining.
	- As with all classifiers trained on human-annotated data, the model reflects the biases present in the annotation guidelines and annotator demographics.

	---

	## Citation

	If you use this model in your research, please cite this repository:

	```bibtex
	@misc{gtox2024,
	title = {GTox: Multilingual Offensive Speech Classifier},
	year = {2026},
	url = {https://github.com/Horbee/gtox-offensive-comment-classifier}
	}
	```