--- language: - en - de license: mit tags: - text-classification - offensive-language - hate-speech - xlm-roberta - multilingual - onnx datasets: - germeval2018 - germeval2019 - hasoc2019 - hasoc2020 - jigsaw metrics: - f1 - accuracy - precision - recall base_model: FacebookAI/xlm-roberta-base pipeline_tag: text-classification --- # GTox — XLM-RoBERTa Multilingual Offensive Speech Classifier (v1) A fine-tuned **XLM-RoBERTa base** model for detecting offensive speech in **English** and **German**. This is a binary classifier (`0` = safe, `1` = offensive) trained to catch a broad range of harmful language — from explicit hate speech to subtle microaggressions. ## Model Description | Property | Value | | ---------------- | ------------------------------------- | | Base model | `FacebookAI/xlm-roberta-base` | | Architecture | `XLMRobertaForSequenceClassification` | | Task | Binary text classification | | Languages | English (`en`), German (`de`) | | Max input length | 256 tokens | | Labels | `0` — Safe, `1` — Offensive | ### Categories of Offensive Speech | # | Category | Description | | --- | ---------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------- | | 1 | **Hate Speech** (Targeted) | Attacks, threatens, or insults based on protected characteristics (race, religion, gender, etc.). Includes dehumanizing language and calls for violence. | | 2 | **Targeted Insults** (Harassment) | Directed at a specific person based on behavior, appearance, or status — e.g. direct "you" statements or @-mentions. | | 3 | **Profanity & Vulgarity** (Non-Targeted) | Swearing used for emphasis or emotion without a specific target. Often considered low-severity. | | 4 | **Cyberbullying & Threats** | Intent to intimidate or cause fear — includes threats of physical harm and encouragement of self-harm. | | 5 | **Implicit Offensive Speech** | Microaggressions, sarcasm, and stereotyping conveyed through otherwise "clean" language. Hardest to detect. | --- ## Training Data The model was trained on a combined multilingual dataset assembled from the following sources: **German:** - [GermEval 2018](https://heidata.uni-heidelberg.de/dataset.xhtml?persistentId=doi:10.11588/data/0B5VML) — Offensive language detection in German tweets - [GermEval 2019](https://fz.h-da.de/iggsa/data) — Offensive language detection in German tweets - [HASOC 2019 DE](https://hasocfire.github.io/hasoc/2019/index.html) — Hate speech and offensive content in German - [HASOC 2020 DE](https://hasocfire.github.io/hasoc/2020/index.html) — Hate speech and offensive content in German **English:** - [HASOC 2019 EN](https://hasocfire.github.io/hasoc/2019/index.html) — Hate speech and offensive content in English - [HASOC 2020 EN](https://hasocfire.github.io/hasoc/2020/index.html) — Hate speech and offensive content in English - [Jigsaw Toxicity](https://www.kaggle.com/c/jigsaw-unintended-bias-in-toxicity-classification) — Toxic comment classification --- ## Evaluation Results > Results will be filled in after evaluation is completed. ### GermEval 2018 Test Set (DE) | Metric | Score | | ---------- | ------ | | Accuracy | 0.7928 | | F1 (macro) | 0.6586 | | Precision | 0.7445 | | Recall | 0.5904 | ### GermEval 2019 Test Set (DE) | Metric | Score | | ---------- | ------ | | Accuracy | 0.8247 | | F1 (macro) | 0.9039 | | Precision | 1.0000 | | Recall | 0.8247 | ### HASOC 2019 Test Set (DE) | Metric | Score | | ---------- | ------ | | Accuracy | 0.7976 | | F1 (macro) | 0.4342 | | Precision | 0.3929 | | Recall | 0.4853 | ### HASOC 2020 Test Set (DE) | Metric | Score | | ---------- | ------ | | Accuracy | 0.8213 | | F1 (macro) | 0.7134 | | Precision | 0.6031 | | Recall | 0.8731 | ### HASOC 2019 Test Set (EN) | Metric | Score | | ---------- | ------ | | Accuracy | 0.8335 | | F1 (macro) | 0.6933 | | Precision | 0.6420 | | Recall | 0.7535 | ### HASOC 2020 Test Set (EN) | Metric | Score | | ---------- | ------ | | Accuracy | 0.8213 | | F1 (macro) | 0.7134 | | Precision | 0.6031 | | Recall | 0.8731 | ### HateCheck (DE) | Metric | Score | | ---------- | ------ | | Accuracy | 0.6694 | | F1 (macro) | 0.7571 | | Precision | 0.7789 | | Recall | 0.7365 | ### HateCheck (EN) | Metric | Score | | ---------- | ------ | | Accuracy | 0.6765 | | F1 (macro) | 0.7752 | | Precision | 0.7422 | | Recall | 0.8112 | --- ## Usage ### With Hugging Face `pipeline` ```python from transformers import pipeline classifier = pipeline( "text-classification", model="Horbee/xlm-roberta-base-offensive-comment-classifier", tokenizer="Horbee/xlm-roberta-base-offensive-comment-classifier", ) # English result = classifier("You are such an idiot!") print(result) # [{'label': 'LABEL_1', 'score': 0.99}] → Offensive # German result = classifier("Du bist ein Idiot!") print(result) # [{'label': 'LABEL_1', 'score': 0.99}] → Offensive # Safe example result = classifier("This is a completely normal and friendly comment.") print(result) # [{'label': 'LABEL_0', 'score': 0.99}] → Safe ``` The label mapping is: - `LABEL_0` → Safe (not offensive) - `LABEL_1` → Offensive ### Batch inference ```python from transformers import pipeline classifier = pipeline( "text-classification", model="Horbee/xlm-roberta-base-offensive-comment-classifier", device=0, # use GPU; set to -1 or omit for CPU ) texts = [ "I hate all of you!", "Have a nice day!", "Das ist wirklich schrecklich.", "Schönen guten Morgen!", ] results = classifier(texts, batch_size=8) for text, result in zip(texts, results): label = "Offensive" if result["label"] == "LABEL_1" else "Safe" print(f"{label} ({result['score']:.2f}): {text}") ``` --- ## ONNX Usage The model is also available as an INT8 quantized ONNX model (`onnx/model_quantized.onnx`) for fast CPU inference without a PyTorch dependency. ### Installation ```bash pip install onnxruntime tokenizers numpy ``` ### Inference ```python from pathlib import Path import numpy as np import onnxruntime as ort from tokenizers import Tokenizer MODEL_DIR = Path("onnx") # Load tokenizer and ONNX session tokenizer = Tokenizer.from_file(str(MODEL_DIR / "tokenizer.json")) tokenizer.enable_truncation(max_length=256) tokenizer.enable_padding(pad_id=1, pad_token="", length=256) session = ort.InferenceSession( str(MODEL_DIR / "model.onnx"), providers=["CPUExecutionProvider"], ) def classify(text: str, threshold: float = 0.5) -> tuple[int, list[float]]: encoded = tokenizer.encode(text) inputs = { "input_ids": np.array([encoded.ids], dtype=np.int64), "attention_mask": np.array([encoded.attention_mask], dtype=np.int64), } logits = session.run(None, inputs)[0] probs = np.exp(logits) / np.sum(np.exp(logits), axis=-1, keepdims=True) label = int(probs[0][1] >= threshold) return label, probs[0].tolist() # Example label, probs = classify("Du bist ein Idiot!") print(f"Label: {label} — {'Offensive' if label == 1 else 'Safe'}") print(f"Probabilities: safe={probs[0]:.3f}, offensive={probs[1]:.3f}") ``` ### ONNX Model Files | File | Description | | --------------------------- | ------------------------------------------------- | | `onnx/model.onnx` | Full-precision FP32 ONNX export | | `onnx/model_quantized.onnx` | INT8 dynamic quantized ONNX (recommended for CPU) | The quantized model uses **dynamic INT8 quantization** (`QInt8` weights, `QUInt8` activations) applied to `MatMul`, `Attention`, `Gather`, and embedding layers, resulting in significantly reduced model size and faster CPU throughput with minimal accuracy loss. --- ## Model Files ``` xlm-roberta-base-offensive-comment-classifier/ ├── config.json ├── model.safetensors ├── tokenizer.json ├── tokenizer_config.json ├── onnx/ │ ├── model.onnx # FP32 ONNX export │ ├── model_quantized.onnx # INT8 quantized ONNX (recommended for production) │ └── tokenizer.json ``` --- ## Limitations and Bias - The model was primarily trained on social media text (tweets, forum comments). Performance may degrade on formal or domain-specific text. - Implicit offensive speech (microaggressions, sarcasm) remains the hardest category to detect reliably. - The model supports **English and German only**. Using it on other languages may produce unreliable results even though XLM-RoBERTa has multilingual pretraining. - As with all classifiers trained on human-annotated data, the model reflects the biases present in the annotation guidelines and annotator demographics. --- ## Citation If you use this model in your research, please cite this repository: ```bibtex @misc{gtox2024, title = {GTox: Multilingual Offensive Speech Classifier}, year = {2026}, url = {https://github.com/Horbee/gtox-offensive-comment-classifier} } ```