---
language:
  - multilingual
license: mit
tags:
  - hate-speech
  - classification
  - transformer
  - xlmr
  - multilingual-hate-speech
datasets:
  - HateXplain
  - HateCheck
model-index:
  - name: Multilingual Hate Speech Detection (WhiterBB)
    results: []
---

# Multilingual Hate Speech Detection - XLM-RoBERTa

This is a fine-tuned version of **XLM-RoBERTa Base** trained for multilingual hate speech detection in **Spanish 🇪🇸, English 🇬🇧, and French 🇫🇷**.  
It is part of a master's thesis project focused on real-time detection of hate in videos and transcripts.

## 🧠 Intended Use

This model is designed to work with short- to medium-length text snippets extracted from video subtitles or transcripts.  
It returns a binary classification (`hate` or `not hate`) with a probability score for further analysis.

## 📊 Training Data

This model was fine-tuned on a **custom multilingual dataset** composed of selected and preprocessed samples from **multiple public corpora** and **custom-curated sets**. The training set was carefully constructed to achieve **language balance** and mitigate **demographic bias** in hate speech detection.

| Source Dataset | Language(s) | Description |
|----------------|-------------|-------------|
| [`manueltonneau/spanish-hate-speech-superset`](https://huggingface.co/datasets/manueltonneau/spanish-hate-speech-superset) | Spanish 🇪🇸 | Aggregated Spanish hate speech datasets. |
| [`manueltonneau/english-hate-speech-superset`](https://huggingface.co/datasets/manueltonneau/english-hate-speech-superset) | English 🇬🇧 | Extensive superset with over 300k samples from English corpora. |
| [`manueltonneau/french-hate-speech-superset`](https://huggingface.co/datasets/manueltonneau/french-hate-speech-superset) | French 🇫🇷 | Curated superset from multiple French datasets. |
| `HateCheck` | English (original) + Spanish + French 🌐 | Translated into Spanish and French to test multilingual generalization and error cases. |
| `Custom Bias Correction Dataset` | Multilingual 🌍 | Designed to mitigate gender, racial, and cultural bias in predictions. |

> 🧩 The final dataset consists of **~60,000 balanced samples**, with **comparable representation across Spanish, English, and French**, ensuring no language dominates the training phase.

This balancing process involved **sampling**, **filtering**, and **label unification** from larger sources. The result is a compact, diverse, and inclusive dataset designed to generalize across cultures and languages while avoiding common pitfalls in hate speech modeling.


## 🔎 How to use

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch.nn.functional as F
import torch

model = AutoModelForSequenceClassification.from_pretrained("WhiterBB/multilingual-hatespeech-detection")
tokenizer = AutoTokenizer.from_pretrained("WhiterBB/multilingual-hatespeech-detection")

text = "Je déteste cette personne"
inputs = tokenizer(text, return_tensors="pt")
with torch.no_grad():
    logits = model(**inputs).logits
    probs = F.softmax(logits, dim=-1)
    predicted_class = torch.argmax(probs).item()
    confidence = probs[0][predicted_class].item()

label = "Hate" if predicted_class == 1 else "Not Hate"
print(f"{label} ({confidence:.2%})")
```

## 🧪 Metrics

The model was evaluated on a balanced multilingual dataset consisting of over 56,000 examples. Below are the performance metrics:

| Class     | Precision | Recall | F1-score | Support |
|-----------|-----------|--------|----------|---------|
| Not Hate  | 0.85      | 0.83   | 0.84     | 30,352  |
| Hate      | 0.81      | 0.83   | 0.82     | 26,609  |

**Overall Accuracy:** 0.83  
**Macro Average:** Precision: 0.83, Recall: 0.83, F1-score: 0.83  
**Weighted Average:** Precision: 0.83, Recall: 0.83, F1-score: 0.83

## 📄 License

MIT License – feel free to use for academic and non-commercial projects.

## ✍️ Author

Made with ❤️ by [WhiterBB](https://github.com/WhiterBB) as part of a final master's thesis (TFM) in Artificial Intelligence.