AbuseBERT / README.md

Samanehmoghaddam

meta data block is added

41974cd verified 4 months ago

preview code

raw

history blame

2.74 kB

metadata

language: en
tags:
  - text-classification
  - abusive-language
  - hate-speech
  - toxicity
  - cyberviolence
  - abusive-language-detection
  - BERT
license: mit

AbuseBERT

Model Description

AbuseBERT is a BERT-based classification model fine-tuned for abusive language detection, optimized for cross-dataset generalization.

Abusive language detection models often suffer from poor generalization due to sampling and lexical biases in individual datasets. Our approach addresses this by integrating ten publicly available abusive language datasets, harmonizing labels and preprocessing textual samples to create a broader and more representative training distribution.

Key Findings:

Individual dataset models: average F1 = 0.60
Integrated model: F1 = 0.84
Dataset contribution to performance improvements correlates with lexical diversity (0.71 correlation)
Integration exposes models to diverse abuse patterns, enhancing real-world generalization

Conclusion / Takeaways

No single dataset captures the full spectrum of abusive language; each dataset reflects a limited slice of the problem space.
Systematically integrating ten heterogeneous datasets significantly improves classification performance on a held-out benchmark.
Lexically dissimilar datasets contribute more to enhancing generalization.
The integrated model demonstrates superior cross-dataset performance compared to models trained on individual datasets.

Paper Reference

Samaneh Hosseini Moghaddam, Kelly Lyons, Frank Rudzicz, Cheryl Regehr, Vivek Goel, Kaitlyn Regehr,
“Enhancing machine learning in abusive language detection with dataset aggregation,” in Proc. 35th IEEE Int. Conf. Collaborative Advances in Software Computing (CASC), 2025.

Intended Use

Recommended:

Detecting abusive language in text from social media or online platforms
Research on bias mitigation and cross-dataset generalization
Supporting safe and inclusive online environments

Not Recommended:

Fully automated moderation without human oversight
High-stakes legal or policy decisions

Usage Example

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Samanehmoghaddam/AbuseBERT")
model = AutoModelForSequenceClassification.from_pretrained("Samanehmoghaddam/AbuseBERT")

# Sample input
text = "Your example text here."
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)

# Predicted label
predicted_label = torch.argmax(outputs.logits, dim=1).item()
print(f"Predicted label: {predicted_label}")