---
language: en
tags:
- text-classification
- abusive-language
- hate-speech
- toxicity
- cyberviolence
- abusive-language-detection
- BERT
license: mit
---

# AbuseBERT

## Model Description

**AbuseBERT** is a **BERT-based classification model** fine-tuned for **abusive language detection**, optimized for **cross-dataset generalization**.

> Abusive language detection models often suffer from poor generalization due to **sampling and lexical biases** in individual datasets. Our approach addresses this by integrating **publicly available abusive language datasets**, harmonizing labels and preprocessing textual samples to create a **broader and more representative training distribution**.

**Key Findings using 10 datasets:**
- Individual dataset models: average F1 = **0.60**  
- Integrated model: F1 = **0.84**  
- Dataset contribution to performance improvements correlates with **lexical diversity (0.71 correlation)**  
- Integration exposes models to diverse abuse patterns, enhancing **real-world generalization**

---

## Conclusion / Takeaways

- No single dataset captures the full spectrum of abusive language; each dataset reflects a **limited slice** of the problem space.  
- Systematically integrating ten heterogeneous datasets significantly improves classification performance on a **held-out benchmark**.  
- Lexically dissimilar datasets contribute more to **enhancing generalization**.  
- The integrated model demonstrates superior **cross-dataset performance** compared to models trained on individual datasets.

---

## Paper Reference

Samaneh Hosseini Moghaddam, Kelly Lyons, Frank Rudzicz, Cheryl Regehr, Vivek Goel, Kaitlyn Regehr,  
“**Enhancing machine learning in abusive language detection with dataset aggregation**,” in *Proc. 35th IEEE Int. Conf. Collaborative Advances in Software Computing (CASC)*, 2025.

---

## Intended Use

**Recommended:**
- Detecting abusive, offensive, or toxic language in text from social media, online forums, or messaging platforms.

- Supporting research on online harassment, cyber violence, and hate speech analysis.

- Assisting human moderators in content review or flagging potentially harmful content.

- Evaluating trends, prevalence, or patterns of abusive language in large-scale textual datasets.

**Not Recommended:**
- Fully automated moderation without human oversight
- High-stakes legal or policy decisions

---

## Usage Example

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline

# Load the model
model_name = "Samanehmoghaddam/AbuseBERT"  
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Create a pipeline for text classification
classifier = pipeline("text-classification", model=model, tokenizer=tokenizer)

# Example texts to classify
texts = [
    "@user You are amazing!",
    "@user You are stupid!",
]

# Run the classifier
results = classifier(texts)

# Print results
for text, result in zip(texts, results):
    print(f"Text: {text}")
    print(f"Prediction: {result}")
    print("-" * 40)