|
|
--- |
|
|
license: apache-2.0 |
|
|
language: |
|
|
- en |
|
|
metrics: |
|
|
- accuracy |
|
|
- precision |
|
|
- recall |
|
|
- f1 |
|
|
base_model: |
|
|
- FacebookAI/roberta-base |
|
|
pipeline_tag: text-classification |
|
|
library_name: transformers |
|
|
tags: |
|
|
- text-classification |
|
|
- roberta |
|
|
- transformers |
|
|
- pytorch |
|
|
- hate-speech-and-offensive-message-detection |
|
|
--- |
|
|
|
|
|
# Hate Speech & Offensive Message Classifier |
|
|
|
|
|
A state-of-the-art hate speech and offensive message classifier built with the **RoBERTa transformer model**, fine-tuned on the **Davidson et al. (2017) Twitter dataset**. This model achieves exceptional performance with 0.9774 F1-score for Hate speech and offencive message detection and 96.23% overall accuracy, making it suitable for **social media moderation, community platforms, and chat applications**. |
|
|
|
|
|
|
|
|
## Key Features |
|
|
|
|
|
* 🤖 **Transformer-based Architecture**: Built on `roberta-base` for advanced natural language understanding |
|
|
* ⚡ **High Performance**: 0.9774 F1-score for hate/offensive message detection, 96.23% overall accuracy |
|
|
* 🔧 **Hyperparameter Optimization**: Automated tuning using Optuna framework |
|
|
* ⚖️ **Class Imbalance Handling**: Weighted cross-entropy loss for fairness across labels |
|
|
* 📊 **Comprehensive Evaluation**: Precision, Recall, F1-score, confusion matrix |
|
|
* 🚀 **Production Ready**: Model + tokenizer saved in Hugging Face format for direct deployment |
|
|
|
|
|
|
|
|
## Model Performance |
|
|
|
|
|
### Final Results on Test Set: |
|
|
|
|
|
* **Overall Accuracy**: *96.23%* |
|
|
* **Weighted F1-Score**: *0.9621* |
|
|
* **Offensive/Hate** F1-Score: 0.9774 ✅ (Exceeds 0.90 acceptance threshold) |
|
|
* **Offensive/Hate** Precision: 97.49% |
|
|
* **Offensive/Hate** Recall: 98% (High hate/offensive message detection rate) |
|
|
* **Neither** Precision: 89.82% |
|
|
* **Neither** Recall: 87.52% |
|
|
|
|
|
Generalizability |
|
|
📊 Strong Generalization: All performance metrics are evaluated on a completely unseen test set (15% of data, 3718 messages) that was never used during training or hyperparameter tuning, ensuring robust real-world performance and preventing overfitting. |
|
|
|
|
|
--- |
|
|
## Dataset |
|
|
|
|
|
**Source**: [Hate Speech and Offensive Language Dataset (Davidson et al., 2017)](https://www.kaggle.com/datasets/mrmorj/hate-speech-and-offensive-language-dataset) |
|
|
|
|
|
### Dataset Statistics: |
|
|
|
|
|
* **Total Tweets**: 24,783 |
|
|
* **Hate Speech / Offensive**: 20620 |
|
|
* **Neutral**: 4163 |
|
|
* **Average Tweet Length**: ~86 characters |
|
|
* **Language**: English |
|
|
|
|
|
### Dataset Split: |
|
|
* Training Set: 70% (17,348 tweets) – model training |
|
|
* Validation Set: 15% (3,717 tweets) – hyperparameter tuning |
|
|
* Test Set: 15% (3,718 tweets) – final evaluation on unseen data |
|
|
|
|
|
### Preprocessing Steps: |
|
|
* Label mapping: 0 = Neither, 1 = Hate/Offensive. |
|
|
* Text cleaning. |
|
|
* Train/validation/test split. |
|
|
* Tokenization with RoBERTa tokenizer. |
|
|
* Dynamic padding and truncation. |
|
|
|
|
|
|
|
|
## Architecture & Methodology |
|
|
|
|
|
### Model Architecture |
|
|
|
|
|
* **Base Model**: `FacebokAI/roberta-base` (Hugging Face Transformers) |
|
|
* **Task**: Multi-class sequence classification (2 labels) |
|
|
* **Fine-tuning**: Custom classification head with 2 outputs |
|
|
* **Tokenization**: RoBERTa tokenizer with optimal sequence length |
|
|
|
|
|
### Training Strategy |
|
|
|
|
|
1. Data Preprocessing: Hate/offencive message cleaning and label encoding |
|
|
2. Tokenization: Dynamic padding with optimal max length |
|
|
3. Class Balancing: Weighted loss function to handle imbalanced dataset |
|
|
4. Hyperparameter Optimization: Optuna-based automated tuning |
|
|
5. Evaluation: Comprehensive metrics on held-out test set |
|
|
|
|
|
|
|
|
## Hyperparameter Optimization |
|
|
|
|
|
Optimized with **Optuna (15 trials)** across ranges: |
|
|
|
|
|
* Dropout rates: Hidden dropout (0.1-0.3), Attention dropout (0.1-0.2) |
|
|
* Learning rate: 1e-5 to 5e-5 range |
|
|
* Weight decay: 0.0 to 0.1 regularization |
|
|
* Batch size: 8, 16, or 32 samples |
|
|
* Gradient accumulation steps: 1 to 4 |
|
|
* Training epochs: 2 to 5 epochs |
|
|
* Warmup ratio: 0.05 to 0.1 for learning rate scheduling |
|
|
|
|
|
### Best Parameters Found: |
|
|
|
|
|
* Hidden Dropout: `0.13034059066330464` |
|
|
* Attention Dropout: `0.1935379847495239` |
|
|
* Learning Rate: `1.031409901695853e-05` |
|
|
* Weight Decay: `0.03606621145317628` |
|
|
* Batch Size: `16` |
|
|
* Gradient Accumulation: `1` |
|
|
* Epochs: `2` |
|
|
* Warmup Ratio: `0.0718442228846798` |
|
|
|
|
|
|
|
|
## 📊 Detailed Results |
|
|
|
|
|
### Confusion Matrix : |
|
|
|
|
|
| | Predicted Neither | Predicted Offensive/Hate | |
|
|
|---------------------|-------------------|--------------------------| |
|
|
| **Actual Neither** | 547 | 78 | |
|
|
| **Actual Offensive**| 62 | 3031 | |
|
|
|
|
|
### Performance Breakdown |
|
|
|
|
|
* **True Positives (Hate/Offensive correctly identified)**: 3031 |
|
|
* **True Negatives (Neutral correctly identified)**: 547 |
|
|
* **False Positives (Neutral incorrectly flagged)**: 78 |
|
|
* **False Negatives (Hate/offensive missed)**: 62 |
|
|
|
|
|
## Usage |
|
|
|
|
|
```python |
|
|
import re |
|
|
import html |
|
|
import contractions |
|
|
from transformers import RobertaTokenizer, RobertaForSequenceClassification |
|
|
import torch |
|
|
|
|
|
# Load the trained model + tokenizer |
|
|
model = RobertaForSequenceClassification.from_pretrained("AshiniR/hate-speech-and-offensive-message-classifier") |
|
|
tokenizer = RobertaTokenizer.from_pretrained("AshiniR/hate-speech-and-offensive-message-classifier") |
|
|
device = torch.device("cuda" if torch.cuda.is_available() else "cpu") |
|
|
model.to(device) |
|
|
model.eval() |
|
|
|
|
|
def preprocess_text(text: str) -> str: |
|
|
""" |
|
|
Preprocess raw text for transformer-based models like RoBERTa. |
|
|
|
|
|
This function is tailored for toxicity, sentiment, and social media classification. |
|
|
It removes noise (URLs, mentions, HTML codes) but keeps important signals |
|
|
such as casing, punctuation, and emojis. |
|
|
|
|
|
Steps: |
|
|
1. Decode HTML entities (e.g., '>' → '>') |
|
|
2. Replace URLs with placeholders ("") |
|
|
3. Replace mentions with placeholders ("") |
|
|
4. Remove '#' from hashtags but keep the word (e.g., "#love" → "love") |
|
|
5. Expand contractions (e.g., "you're" → "you are") |
|
|
6. Mildly normalize repeated characters (3+ → 2) |
|
|
7. Remove "RT" only if at start of tweet |
|
|
8. Normalize whitespace |
|
|
|
|
|
Args: |
|
|
text (str): Raw tweet text. |
|
|
|
|
|
Returns: |
|
|
str: Cleaned text suitable for RoBERTa tokenization. |
|
|
""" |
|
|
if not isinstance(text, str): |
|
|
return "" |
|
|
|
|
|
# 1. Decode HTML entities |
|
|
text = html.unescape(text) |
|
|
|
|
|
# 2. Replace URLs with placeholder |
|
|
text = re.sub(r"(https?://\S+|www\.\S+)", "", text) |
|
|
|
|
|
# 3. Replace user mentions with placeholder |
|
|
text = re.sub(r"@\w+", "", text) |
|
|
|
|
|
# 4. Simplify hashtags |
|
|
text = re.sub(r"#(\w+)", r"\1", text) |
|
|
|
|
|
# 5. Expand contractions |
|
|
text = contractions.fix(text) |
|
|
|
|
|
# 6. Mild normalization of character elongations (3+ → 2) |
|
|
text = re.sub(r"(.)\1{2,}", r"\1\1", text) |
|
|
|
|
|
# 7. Remove RT only if it starts the tweet (For tweets) |
|
|
text = re.sub( |
|
|
r"^[\s\W]*rt\s*@?\w*:?[\s-]*", |
|
|
"", |
|
|
text, |
|
|
flags=re.IGNORECASE |
|
|
) |
|
|
|
|
|
# 8. Normalize whitespace |
|
|
text = re.sub(r"\s+", " ", text).strip() |
|
|
|
|
|
return text |
|
|
|
|
|
|
|
|
def get_inference(text: str) -> list: |
|
|
"""Returns prediction results in [{'label': str, 'score': float}, ...] format.""" |
|
|
# Preprocess the text |
|
|
text = preprocess_text(text) |
|
|
|
|
|
# Tokenize input text |
|
|
inputs = tokenizer( |
|
|
text, |
|
|
return_tensors="pt", |
|
|
truncation=True, |
|
|
padding=False, |
|
|
max_length=128 |
|
|
) |
|
|
inputs = {k: v.to(device) for k, v in inputs.items()} |
|
|
|
|
|
# Get model predictions |
|
|
with torch.no_grad(): |
|
|
outputs = model(**inputs) |
|
|
probabilities = torch.softmax(outputs.logits, dim=-1) |
|
|
|
|
|
# Convert to label format |
|
|
labels = ["neither", "hate/offensive"] |
|
|
results = [] |
|
|
for i, prob in enumerate(probabilities[0]): |
|
|
results.append({ |
|
|
"label": labels[i], |
|
|
"score": prob.item() |
|
|
}) |
|
|
|
|
|
return sorted(results, key=lambda x: x["score"], reverse=True) |
|
|
|
|
|
# Example usage |
|
|
text = "your example massege" |
|
|
predictions = get_inference(text) |
|
|
print(f"Text: '{text}'") |
|
|
print(f"Predictions: {predictions}") |
|
|
``` |
|
|
|
|
|
## Use Cases |
|
|
This hate/offensive massege classifier is ideal for: |
|
|
|
|
|
### Messaging Platforms |
|
|
* Discord bot moderation (Primary use case) |
|
|
* SMS filtering systems |
|
|
* Chat application content filtering |
|
|
### Content Moderation |
|
|
* Social media platforms |
|
|
* Comment section filtering |
|
|
* User-generated content screening |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use this model in your research or application, please cite: |
|
|
|
|
|
```bibtex |
|
|
@misc{AshiniR_Hate/Offencive_Message_Classifier_2025, |
|
|
author = {Ashini Dhananjana}, |
|
|
title = {Hate/Offencive Message Classifier: RoBERTa-based Hate/Offencive Message Detection}, |
|
|
year = {2025}, |
|
|
publisher = {Hugging Face}, |
|
|
howpublished = {\url{https://huggingface.co/AshiniR/hate-speech-and-offensive-message-classifier}}, |
|
|
} |
|
|
``` |
|
|
|
|
|
## Model Card Contact |
|
|
|
|
|
AshiniR - [Hugging Face Profile](https://huggingface.co/AshiniR) |