File size: 2,743 Bytes
41974cd | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 | ---
language: en
tags:
- text-classification
- abusive-language
- hate-speech
- toxicity
- cyberviolence
- abusive-language-detection
- BERT
license: mit
---
# AbuseBERT
## Model Description
**AbuseBERT** is a **BERT-based classification model** fine-tuned for **abusive language detection**, optimized for **cross-dataset generalization**.
> Abusive language detection models often suffer from poor generalization due to **sampling and lexical biases** in individual datasets. Our approach addresses this by integrating **ten publicly available abusive language datasets**, harmonizing labels and preprocessing textual samples to create a **broader and more representative training distribution**.
**Key Findings:**
- Individual dataset models: average F1 = **0.60**
- Integrated model: F1 = **0.84**
- Dataset contribution to performance improvements correlates with **lexical diversity (0.71 correlation)**
- Integration exposes models to diverse abuse patterns, enhancing **real-world generalization**
---
## Conclusion / Takeaways
- No single dataset captures the full spectrum of abusive language; each dataset reflects a **limited slice** of the problem space.
- Systematically integrating ten heterogeneous datasets significantly improves classification performance on a **held-out benchmark**.
- Lexically dissimilar datasets contribute more to **enhancing generalization**.
- The integrated model demonstrates superior **cross-dataset performance** compared to models trained on individual datasets.
---
## Paper Reference
Samaneh Hosseini Moghaddam, Kelly Lyons, Frank Rudzicz, Cheryl Regehr, Vivek Goel, Kaitlyn Regehr,
“**Enhancing machine learning in abusive language detection with dataset aggregation**,” in *Proc. 35th IEEE Int. Conf. Collaborative Advances in Software Computing (CASC)*, 2025.
---
## Intended Use
**Recommended:**
- Detecting abusive language in text from social media or online platforms
- Research on bias mitigation and cross-dataset generalization
- Supporting safe and inclusive online environments
**Not Recommended:**
- Fully automated moderation without human oversight
- High-stakes legal or policy decisions
---
## Usage Example
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Samanehmoghaddam/AbuseBERT")
model = AutoModelForSequenceClassification.from_pretrained("Samanehmoghaddam/AbuseBERT")
# Sample input
text = "Your example text here."
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)
# Predicted label
predicted_label = torch.argmax(outputs.logits, dim=1).item()
print(f"Predicted label: {predicted_label}")
|