AshiniR
/

hate-speech-and-offensive-message-classifier

+---
+license: apache-2.0
+language:
+- en
+metrics:
+- accuracy
+- precision
+- recall
+- f1
+base_model:
+- FacebookAI/xlm-roberta-base
+pipeline_tag: text-classification
+library_name: transformers
+tags:
+- text-classification
+- roberta
+- transformers
+- pytorch
+- hate-speech-and-offensive-message-detection
+---
+# Hate Speech & Offensive Message Classifier
+A state-of-the-art hate speech and offensive message classifier built with the **RoBERTa transformer model**, fine-tuned on the **Davidson et al. (2017) Twitter dataset**.  This model achieves exceptional performance with 0.9774 F1-score for Hate speech and offencive message detection and 96.23% overall accuracy, making it suitable for **social media moderation, community platforms, and chat applications**.
+## Key Features
+* 🤖 **Transformer-based Architecture**: Built on `roberta-base` for advanced natural language understanding
+* ⚡ **High Performance**: 0.9774 F1-score for hate/offensive message detection, 96.23% overall accuracy
+* 🔧 **Hyperparameter Optimization**: Automated tuning using Optuna framework
+* ⚖️ **Class Imbalance Handling**: Weighted cross-entropy loss for fairness across labels
+* 📊 **Comprehensive Evaluation**: Precision, Recall, F1-score, confusion matrix
+* 🚀 **Production Ready**: Model + tokenizer saved in Hugging Face format for direct deployment
+## Model Performance
+### Final Results on Test Set:
+* **Overall Accuracy**: *96.23%*
+* **Weighted F1-Score**: *0.9621*
+* **Offensive/Hate** F1-Score: 0.9774 ✅ (Exceeds 0.90 acceptance threshold)
+* **Offensive/Hate** Precision: 97.49%
+* **Offensive/Hate** Recall: 98% (High hate/offensive message detection rate)
+* **Neither** Precision: 89.82%
+* **Neither** Recall: 87.52%
+Generalizability
+📊 Strong Generalization: All performance metrics are evaluated on a completely unseen test set (15% of data, 3718 messages) that was never used during training or hyperparameter tuning, ensuring robust real-world performance and preventing overfitting.
+---
+## Dataset
+**Source**: [Hate Speech and Offensive Language Dataset (Davidson et al., 2017)](https://www.kaggle.com/datasets/mrmorj/hate-speech-and-offensive-language-dataset)
+### Dataset Statistics:
+* **Total Tweets**: 24,783
+* **Hate Speech / Offensive**: 20620
+* **Neutral**: 4163
+* **Average Tweet Length**: ~86 characters
+* **Language**: English
+### Dataset Split:
+* Training Set: 70% (17,348 tweets) – model training
+* Validation Set: 15% (3,717 tweets) – hyperparameter tuning
+* Test Set: 15% (3,718 tweets) – final evaluation on unseen data
+### Preprocessing Steps:
+* Label mapping: 0 = Neither, 1 = Hate/Offensive.
+* Text cleaning.
+* Train/validation/test split.
+* Tokenization with RoBERTa tokenizer.
+* Dynamic padding and truncation.
+## Architecture & Methodology
+### Model Architecture
+* **Base Model**: `FacebokAI/roberta-base` (Hugging Face Transformers)
+* **Task**: Multi-class sequence classification (2 labels)
+* **Fine-tuning**: Custom classification head with 2 outputs
+* **Tokenization**: RoBERTa tokenizer with optimal sequence length
+### Training Strategy
+1. Data Preprocessing: Hate/offencive message cleaning and label encoding
+2. Tokenization: Dynamic padding with optimal max length
+3. Class Balancing: Weighted loss function to handle imbalanced dataset
+4. Hyperparameter Optimization: Optuna-based automated tuning
+5. Evaluation: Comprehensive metrics on held-out test set
+## Hyperparameter Optimization
+Optimized with **Optuna (15 trials)** across ranges:
+* Dropout rates: Hidden dropout (0.1-0.3), Attention dropout (0.1-0.2)
+* Learning rate: 1e-5 to 5e-5 range
+* Weight decay: 0.0 to 0.1 regularization
+* Batch size: 8, 16, or 32 samples
+* Gradient accumulation steps: 1 to 4
+* Training epochs: 2 to 5 epochs
+* Warmup ratio: 0.05 to 0.1 for learning rate scheduling
+### Best Parameters Found:
+* Hidden Dropout: `0.13034059066330464`
+* Attention Dropout: `0.1935379847495239`
+* Learning Rate: `1.031409901695853e-05`
+* Weight Decay: `0.03606621145317628`
+* Batch Size: `16`
+* Gradient Accumulation: `1`
+* Epochs: `2`
+* Warmup Ratio: `0.0718442228846798`
+## 📊 Detailed Results
+### Confusion Matrix :
+|                     | Predicted Neither | Predicted Offensive/Hate |
+|---------------------|-------------------|--------------------------|
+| **Actual Neither**  | 547               | 78                       |
+| **Actual Offensive**| 62                | 3031                     |
+### Performance Breakdown
+* **True Positives (Hate/Offensive correctly identified)**: 3031
+* **True Negatives (Neutral correctly identified)**: 547
+* **False Positives (Neutral incorrectly flagged)**: 78
+* **False Negatives (Hate/offensive missed)**: 62
+## Usage
+```python
+from transformers import RobertaTokenizer, RobertaForSequenceClassification
+import torch
+# Load the trained model + tokenizer
+model = RobertaForSequenceClassification.from_pretrained("AshiniR/hate-speech-and-offensive-message-classifier")
+tokenizer = RobertaTokenizer.from_pretrained("AshiniR/hate-speech-and-offensive-message-classifier")
+device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+model.to(device)
+def get_inference(text: str) -> list:
+    """Returns prediction results in [{'label': str, 'score': float}, ...] format."""
+    # Tokenize input text
+    inputs = tokenizer(
+        text,
+        return_tensors="pt",
+        truncation=True,
+        padding=False,
+        max_length=128
+    )
+    inputs = {k: v.to(device) for k, v in inputs.items()}
+    # Get model predictions
+    with torch.no_grad():
+        outputs = model(**inputs)
+        probabilities = torch.softmax(outputs.logits, dim=-1)
+    # Convert to label format
+    labels = ["neither", "hate/offensive"]
+    results = []
+    for i, prob in enumerate(probabilities[0]):
+        results.append({
+            "label": labels[i],
+            "score": prob.item()
+        })
+    return sorted(results, key=lambda x: x["score"], reverse=True)
+# Example usage
+text = "I hate you!"
+predictions = get_inference(text)
+print(f"Text: '{text}'")
+print(f"Predictions: {predictions}")
+```
+## Use Cases
+This hate/offensive massege classifier is ideal for:
+### Messaging Platforms
+* Discord bot moderation (Primary use case)
+* SMS filtering systems
+* Chat application content filtering
+### Content Moderation
+* Social media platforms
+* Comment section filtering
+* User-generated content screening
+## Citation
+If you use this model in your research or application, please cite:
+@misc{AshiniR_Hate/Offencive_Message_Classifier_2025,
+  author       = {Ashini Dhananjana},
+  title        = {Hate/Offencive Message Classifier: RoBERTa-based Hate/Offencive Message Detection},
+  year         = {2025},
+  publisher    = {Hugging Face},
+  howpublished = {\url{https://huggingface.co/AshiniR/hate-speech-and-offensive-message-classifier}},
+}
+## Model Card Contact
+AshiniR - [Hugging Face Profile](https://huggingface.co/AshiniR)