File size: 8,804 Bytes
cf5ac6a 31ac22b cf5ac6a c15b555 cf5ac6a c15b555 cf5ac6a c15b555 cf5ac6a ad36270 cf5ac6a d48f93a cf5ac6a d48f93a cf5ac6a |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 |
---
license: apache-2.0
language:
- en
metrics:
- accuracy
- precision
- recall
- f1
base_model:
- FacebookAI/roberta-base
pipeline_tag: text-classification
library_name: transformers
tags:
- text-classification
- roberta
- transformers
- pytorch
- hate-speech-and-offensive-message-detection
---
# Hate Speech & Offensive Message Classifier
A state-of-the-art hate speech and offensive message classifier built with the **RoBERTa transformer model**, fine-tuned on the **Davidson et al. (2017) Twitter dataset**. This model achieves exceptional performance with 0.9774 F1-score for Hate speech and offencive message detection and 96.23% overall accuracy, making it suitable for **social media moderation, community platforms, and chat applications**.
## Key Features
* π€ **Transformer-based Architecture**: Built on `roberta-base` for advanced natural language understanding
* β‘ **High Performance**: 0.9774 F1-score for hate/offensive message detection, 96.23% overall accuracy
* π§ **Hyperparameter Optimization**: Automated tuning using Optuna framework
* βοΈ **Class Imbalance Handling**: Weighted cross-entropy loss for fairness across labels
* π **Comprehensive Evaluation**: Precision, Recall, F1-score, confusion matrix
* π **Production Ready**: Model + tokenizer saved in Hugging Face format for direct deployment
## Model Performance
### Final Results on Test Set:
* **Overall Accuracy**: *96.23%*
* **Weighted F1-Score**: *0.9621*
* **Offensive/Hate** F1-Score: 0.9774 β
(Exceeds 0.90 acceptance threshold)
* **Offensive/Hate** Precision: 97.49%
* **Offensive/Hate** Recall: 98% (High hate/offensive message detection rate)
* **Neither** Precision: 89.82%
* **Neither** Recall: 87.52%
Generalizability
π Strong Generalization: All performance metrics are evaluated on a completely unseen test set (15% of data, 3718 messages) that was never used during training or hyperparameter tuning, ensuring robust real-world performance and preventing overfitting.
---
## Dataset
**Source**: [Hate Speech and Offensive Language Dataset (Davidson et al., 2017)](https://www.kaggle.com/datasets/mrmorj/hate-speech-and-offensive-language-dataset)
### Dataset Statistics:
* **Total Tweets**: 24,783
* **Hate Speech / Offensive**: 20620
* **Neutral**: 4163
* **Average Tweet Length**: ~86 characters
* **Language**: English
### Dataset Split:
* Training Set: 70% (17,348 tweets) β model training
* Validation Set: 15% (3,717 tweets) β hyperparameter tuning
* Test Set: 15% (3,718 tweets) β final evaluation on unseen data
### Preprocessing Steps:
* Label mapping: 0 = Neither, 1 = Hate/Offensive.
* Text cleaning.
* Train/validation/test split.
* Tokenization with RoBERTa tokenizer.
* Dynamic padding and truncation.
## Architecture & Methodology
### Model Architecture
* **Base Model**: `FacebokAI/roberta-base` (Hugging Face Transformers)
* **Task**: Multi-class sequence classification (2 labels)
* **Fine-tuning**: Custom classification head with 2 outputs
* **Tokenization**: RoBERTa tokenizer with optimal sequence length
### Training Strategy
1. Data Preprocessing: Hate/offencive message cleaning and label encoding
2. Tokenization: Dynamic padding with optimal max length
3. Class Balancing: Weighted loss function to handle imbalanced dataset
4. Hyperparameter Optimization: Optuna-based automated tuning
5. Evaluation: Comprehensive metrics on held-out test set
## Hyperparameter Optimization
Optimized with **Optuna (15 trials)** across ranges:
* Dropout rates: Hidden dropout (0.1-0.3), Attention dropout (0.1-0.2)
* Learning rate: 1e-5 to 5e-5 range
* Weight decay: 0.0 to 0.1 regularization
* Batch size: 8, 16, or 32 samples
* Gradient accumulation steps: 1 to 4
* Training epochs: 2 to 5 epochs
* Warmup ratio: 0.05 to 0.1 for learning rate scheduling
### Best Parameters Found:
* Hidden Dropout: `0.13034059066330464`
* Attention Dropout: `0.1935379847495239`
* Learning Rate: `1.031409901695853e-05`
* Weight Decay: `0.03606621145317628`
* Batch Size: `16`
* Gradient Accumulation: `1`
* Epochs: `2`
* Warmup Ratio: `0.0718442228846798`
## π Detailed Results
### Confusion Matrix :
| | Predicted Neither | Predicted Offensive/Hate |
|---------------------|-------------------|--------------------------|
| **Actual Neither** | 547 | 78 |
| **Actual Offensive**| 62 | 3031 |
### Performance Breakdown
* **True Positives (Hate/Offensive correctly identified)**: 3031
* **True Negatives (Neutral correctly identified)**: 547
* **False Positives (Neutral incorrectly flagged)**: 78
* **False Negatives (Hate/offensive missed)**: 62
## Usage
```python
import re
import html
import contractions
from transformers import RobertaTokenizer, RobertaForSequenceClassification
import torch
# Load the trained model + tokenizer
model = RobertaForSequenceClassification.from_pretrained("AshiniR/hate-speech-and-offensive-message-classifier")
tokenizer = RobertaTokenizer.from_pretrained("AshiniR/hate-speech-and-offensive-message-classifier")
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
model.eval()
def preprocess_text(text: str) -> str:
"""
Preprocess raw text for transformer-based models like RoBERTa.
This function is tailored for toxicity, sentiment, and social media classification.
It removes noise (URLs, mentions, HTML codes) but keeps important signals
such as casing, punctuation, and emojis.
Steps:
1. Decode HTML entities (e.g., '>' β '>')
2. Replace URLs with placeholders ("")
3. Replace mentions with placeholders ("")
4. Remove '#' from hashtags but keep the word (e.g., "#love" β "love")
5. Expand contractions (e.g., "you're" β "you are")
6. Mildly normalize repeated characters (3+ β 2)
7. Remove "RT" only if at start of tweet
8. Normalize whitespace
Args:
text (str): Raw tweet text.
Returns:
str: Cleaned text suitable for RoBERTa tokenization.
"""
if not isinstance(text, str):
return ""
# 1. Decode HTML entities
text = html.unescape(text)
# 2. Replace URLs with placeholder
text = re.sub(r"(https?://\S+|www\.\S+)", "", text)
# 3. Replace user mentions with placeholder
text = re.sub(r"@\w+", "", text)
# 4. Simplify hashtags
text = re.sub(r"#(\w+)", r"\1", text)
# 5. Expand contractions
text = contractions.fix(text)
# 6. Mild normalization of character elongations (3+ β 2)
text = re.sub(r"(.)\1{2,}", r"\1\1", text)
# 7. Remove RT only if it starts the tweet (For tweets)
text = re.sub(
r"^[\s\W]*rt\s*@?\w*:?[\s-]*",
"",
text,
flags=re.IGNORECASE
)
# 8. Normalize whitespace
text = re.sub(r"\s+", " ", text).strip()
return text
def get_inference(text: str) -> list:
"""Returns prediction results in [{'label': str, 'score': float}, ...] format."""
# Preprocess the text
text = preprocess_text(text)
# Tokenize input text
inputs = tokenizer(
text,
return_tensors="pt",
truncation=True,
padding=False,
max_length=128
)
inputs = {k: v.to(device) for k, v in inputs.items()}
# Get model predictions
with torch.no_grad():
outputs = model(**inputs)
probabilities = torch.softmax(outputs.logits, dim=-1)
# Convert to label format
labels = ["neither", "hate/offensive"]
results = []
for i, prob in enumerate(probabilities[0]):
results.append({
"label": labels[i],
"score": prob.item()
})
return sorted(results, key=lambda x: x["score"], reverse=True)
# Example usage
text = "your example massege"
predictions = get_inference(text)
print(f"Text: '{text}'")
print(f"Predictions: {predictions}")
```
## Use Cases
This hate/offensive massege classifier is ideal for:
### Messaging Platforms
* Discord bot moderation (Primary use case)
* SMS filtering systems
* Chat application content filtering
### Content Moderation
* Social media platforms
* Comment section filtering
* User-generated content screening
## Citation
If you use this model in your research or application, please cite:
```bibtex
@misc{AshiniR_Hate/Offencive_Message_Classifier_2025,
author = {Ashini Dhananjana},
title = {Hate/Offencive Message Classifier: RoBERTa-based Hate/Offencive Message Detection},
year = {2025},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/AshiniR/hate-speech-and-offensive-message-classifier}},
}
```
## Model Card Contact
AshiniR - [Hugging Face Profile](https://huggingface.co/AshiniR) |