Samanehmoghaddam
/

AbuseBERT

Text Classification

abusive-language

abusive-language-detection

Model card Files Files and versions

AbuseBERT / README.md

Samanehmoghaddam's picture

Samanehmoghaddam

Update README.md

0ced76a verified 4 months ago

|

history blame contribute delete

3.12 kB

	---
	language: en
	tags:
	- text-classification
	- abusive-language
	- hate-speech
	- toxicity
	- cyberviolence
	- abusive-language-detection
	- BERT
	license: mit
	---

	# AbuseBERT

	## Model Description

	AbuseBERT is a BERT-based classification model fine-tuned for abusive language detection, optimized for cross-dataset generalization.

	> Abusive language detection models often suffer from poor generalization due to sampling and lexical biases in individual datasets. Our approach addresses this by integrating publicly available abusive language datasets, harmonizing labels and preprocessing textual samples to create a broader and more representative training distribution.

	Key Findings using 10 datasets:
	- Individual dataset models: average F1 = 0.60
	- Integrated model: F1 = 0.84
	- Dataset contribution to performance improvements correlates with lexical diversity (0.71 correlation)
	- Integration exposes models to diverse abuse patterns, enhancing real-world generalization

	---

	## Conclusion / Takeaways

	- No single dataset captures the full spectrum of abusive language; each dataset reflects a limited slice of the problem space.
	- Systematically integrating ten heterogeneous datasets significantly improves classification performance on a held-out benchmark.
	- Lexically dissimilar datasets contribute more to enhancing generalization.
	- The integrated model demonstrates superior cross-dataset performance compared to models trained on individual datasets.

	---

	## Paper Reference

	Samaneh Hosseini Moghaddam, Kelly Lyons, Frank Rudzicz, Cheryl Regehr, Vivek Goel, Kaitlyn Regehr,
	“Enhancing machine learning in abusive language detection with dataset aggregation,” in Proc. 35th IEEE Int. Conf. Collaborative Advances in Software Computing (CASC), 2025.

	---

	## Intended Use

	Recommended:
	- Detecting abusive, offensive, or toxic language in text from social media, online forums, or messaging platforms.

	- Supporting research on online harassment, cyber violence, and hate speech analysis.

	- Assisting human moderators in content review or flagging potentially harmful content.

	- Evaluating trends, prevalence, or patterns of abusive language in large-scale textual datasets.

	Not Recommended:
	- Fully automated moderation without human oversight
	- High-stakes legal or policy decisions

	---

	## Usage Example

	```python
	from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline

	# Load the model
	model_name = "Samanehmoghaddam/AbuseBERT"
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForSequenceClassification.from_pretrained(model_name)

	# Create a pipeline for text classification
	classifier = pipeline("text-classification", model=model, tokenizer=tokenizer)

	# Example texts to classify
	texts = [
	"@user You are amazing!",
	"@user You are stupid!",
	]

	# Run the classifier
	results = classifier(texts)

	# Print results
	for text, result in zip(texts, results):
	print(f"Text: {text}")
	print(f"Prediction: {result}")
	print("-" * 40)