|
|
--- |
|
|
language: en |
|
|
tags: |
|
|
- text-classification |
|
|
- abusive-language |
|
|
- hate-speech |
|
|
- toxicity |
|
|
- cyberviolence |
|
|
- abusive-language-detection |
|
|
- BERT |
|
|
license: mit |
|
|
--- |
|
|
|
|
|
# AbuseBERT |
|
|
|
|
|
## Model Description |
|
|
|
|
|
**AbuseBERT** is a **BERT-based classification model** fine-tuned for **abusive language detection**, optimized for **cross-dataset generalization**. |
|
|
|
|
|
> Abusive language detection models often suffer from poor generalization due to **sampling and lexical biases** in individual datasets. Our approach addresses this by integrating **publicly available abusive language datasets**, harmonizing labels and preprocessing textual samples to create a **broader and more representative training distribution**. |
|
|
|
|
|
**Key Findings using 10 datasets:** |
|
|
- Individual dataset models: average F1 = **0.60** |
|
|
- Integrated model: F1 = **0.84** |
|
|
- Dataset contribution to performance improvements correlates with **lexical diversity (0.71 correlation)** |
|
|
- Integration exposes models to diverse abuse patterns, enhancing **real-world generalization** |
|
|
|
|
|
--- |
|
|
|
|
|
## Conclusion / Takeaways |
|
|
|
|
|
- No single dataset captures the full spectrum of abusive language; each dataset reflects a **limited slice** of the problem space. |
|
|
- Systematically integrating ten heterogeneous datasets significantly improves classification performance on a **held-out benchmark**. |
|
|
- Lexically dissimilar datasets contribute more to **enhancing generalization**. |
|
|
- The integrated model demonstrates superior **cross-dataset performance** compared to models trained on individual datasets. |
|
|
|
|
|
--- |
|
|
|
|
|
## Paper Reference |
|
|
|
|
|
Samaneh Hosseini Moghaddam, Kelly Lyons, Frank Rudzicz, Cheryl Regehr, Vivek Goel, Kaitlyn Regehr, |
|
|
“**Enhancing machine learning in abusive language detection with dataset aggregation**,” in *Proc. 35th IEEE Int. Conf. Collaborative Advances in Software Computing (CASC)*, 2025. |
|
|
|
|
|
--- |
|
|
|
|
|
## Intended Use |
|
|
|
|
|
**Recommended:** |
|
|
- Detecting abusive, offensive, or toxic language in text from social media, online forums, or messaging platforms. |
|
|
|
|
|
- Supporting research on online harassment, cyber violence, and hate speech analysis. |
|
|
|
|
|
- Assisting human moderators in content review or flagging potentially harmful content. |
|
|
|
|
|
- Evaluating trends, prevalence, or patterns of abusive language in large-scale textual datasets. |
|
|
|
|
|
**Not Recommended:** |
|
|
- Fully automated moderation without human oversight |
|
|
- High-stakes legal or policy decisions |
|
|
|
|
|
--- |
|
|
|
|
|
## Usage Example |
|
|
|
|
|
```python |
|
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline |
|
|
|
|
|
# Load the model |
|
|
model_name = "Samanehmoghaddam/AbuseBERT" |
|
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
|
model = AutoModelForSequenceClassification.from_pretrained(model_name) |
|
|
|
|
|
# Create a pipeline for text classification |
|
|
classifier = pipeline("text-classification", model=model, tokenizer=tokenizer) |
|
|
|
|
|
# Example texts to classify |
|
|
texts = [ |
|
|
"@user You are amazing!", |
|
|
"@user You are stupid!", |
|
|
] |
|
|
|
|
|
# Run the classifier |
|
|
results = classifier(texts) |
|
|
|
|
|
# Print results |
|
|
for text, result in zip(texts, results): |
|
|
print(f"Text: {text}") |
|
|
print(f"Prediction: {result}") |
|
|
print("-" * 40) |