Samanehmoghaddam
/

AbuseBERT

Text Classification

abusive-language

abusive-language-detection

Model card Files Files and versions

AbuseBERT / README.md

Samanehmoghaddam's picture

Samanehmoghaddam

meta data block is added

41974cd verified 4 months ago

|

2.74 kB

	---
	language: en
	tags:
	- text-classification
	- abusive-language
	- hate-speech
	- toxicity
	- cyberviolence
	- abusive-language-detection
	- BERT
	license: mit
	---

	# AbuseBERT

	## Model Description

	AbuseBERT is a BERT-based classification model fine-tuned for abusive language detection, optimized for cross-dataset generalization.

	> Abusive language detection models often suffer from poor generalization due to sampling and lexical biases in individual datasets. Our approach addresses this by integrating ten publicly available abusive language datasets, harmonizing labels and preprocessing textual samples to create a broader and more representative training distribution.

	Key Findings:
	- Individual dataset models: average F1 = 0.60
	- Integrated model: F1 = 0.84
	- Dataset contribution to performance improvements correlates with lexical diversity (0.71 correlation)
	- Integration exposes models to diverse abuse patterns, enhancing real-world generalization

	---

	## Conclusion / Takeaways

	- No single dataset captures the full spectrum of abusive language; each dataset reflects a limited slice of the problem space.
	- Systematically integrating ten heterogeneous datasets significantly improves classification performance on a held-out benchmark.
	- Lexically dissimilar datasets contribute more to enhancing generalization.
	- The integrated model demonstrates superior cross-dataset performance compared to models trained on individual datasets.

	---

	## Paper Reference

	Samaneh Hosseini Moghaddam, Kelly Lyons, Frank Rudzicz, Cheryl Regehr, Vivek Goel, Kaitlyn Regehr,
	“Enhancing machine learning in abusive language detection with dataset aggregation,” in Proc. 35th IEEE Int. Conf. Collaborative Advances in Software Computing (CASC), 2025.

	---

	## Intended Use

	Recommended:
	- Detecting abusive language in text from social media or online platforms
	- Research on bias mitigation and cross-dataset generalization
	- Supporting safe and inclusive online environments

	Not Recommended:
	- Fully automated moderation without human oversight
	- High-stakes legal or policy decisions

	---

	## Usage Example

	```python
	from transformers import AutoTokenizer, AutoModelForSequenceClassification
	import torch

	# Load model and tokenizer
	tokenizer = AutoTokenizer.from_pretrained("Samanehmoghaddam/AbuseBERT")
	model = AutoModelForSequenceClassification.from_pretrained("Samanehmoghaddam/AbuseBERT")

	# Sample input
	text = "Your example text here."
	inputs = tokenizer(text, return_tensors="pt")
	outputs = model(**inputs)

	# Predicted label
	predicted_label = torch.argmax(outputs.logits, dim=1).item()
	print(f"Predicted label: {predicted_label}")