--- language: en tags: - text-classification - abusive-language - hate-speech - toxicity - cyberviolence - abusive-language-detection - BERT license: mit --- # AbuseBERT ## Model Description **AbuseBERT** is a **BERT-based classification model** fine-tuned for **abusive language detection**, optimized for **cross-dataset generalization**. > Abusive language detection models often suffer from poor generalization due to **sampling and lexical biases** in individual datasets. Our approach addresses this by integrating **publicly available abusive language datasets**, harmonizing labels and preprocessing textual samples to create a **broader and more representative training distribution**. **Key Findings using 10 datasets:** - Individual dataset models: average F1 = **0.60** - Integrated model: F1 = **0.84** - Dataset contribution to performance improvements correlates with **lexical diversity (0.71 correlation)** - Integration exposes models to diverse abuse patterns, enhancing **real-world generalization** --- ## Conclusion / Takeaways - No single dataset captures the full spectrum of abusive language; each dataset reflects a **limited slice** of the problem space. - Systematically integrating ten heterogeneous datasets significantly improves classification performance on a **held-out benchmark**. - Lexically dissimilar datasets contribute more to **enhancing generalization**. - The integrated model demonstrates superior **cross-dataset performance** compared to models trained on individual datasets. --- ## Paper Reference Samaneh Hosseini Moghaddam, Kelly Lyons, Frank Rudzicz, Cheryl Regehr, Vivek Goel, Kaitlyn Regehr, “**Enhancing machine learning in abusive language detection with dataset aggregation**,” in *Proc. 35th IEEE Int. Conf. Collaborative Advances in Software Computing (CASC)*, 2025. --- ## Intended Use **Recommended:** - Detecting abusive, offensive, or toxic language in text from social media, online forums, or messaging platforms. - Supporting research on online harassment, cyber violence, and hate speech analysis. - Assisting human moderators in content review or flagging potentially harmful content. - Evaluating trends, prevalence, or patterns of abusive language in large-scale textual datasets. **Not Recommended:** - Fully automated moderation without human oversight - High-stakes legal or policy decisions --- ## Usage Example ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline # Load the model model_name = "Samanehmoghaddam/AbuseBERT" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForSequenceClassification.from_pretrained(model_name) # Create a pipeline for text classification classifier = pipeline("text-classification", model=model, tokenizer=tokenizer) # Example texts to classify texts = [ "@user You are amazing!", "@user You are stupid!", ] # Run the classifier results = classifier(texts) # Print results for text, result in zip(texts, results): print(f"Text: {text}") print(f"Prediction: {result}") print("-" * 40)