AshiniR commited on
Commit
cf5ac6a
Β·
verified Β·
1 Parent(s): 0764dd2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +210 -3
README.md CHANGED
@@ -1,3 +1,210 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ metrics:
6
+ - accuracy
7
+ - precision
8
+ - recall
9
+ - f1
10
+ base_model:
11
+ - FacebookAI/xlm-roberta-base
12
+ pipeline_tag: text-classification
13
+ library_name: transformers
14
+ tags:
15
+ - text-classification
16
+ - roberta
17
+ - transformers
18
+ - pytorch
19
+ - hate-speech-and-offensive-message-detection
20
+ ---
21
+
22
+ # Hate Speech & Offensive Message Classifier
23
+
24
+ A state-of-the-art hate speech and offensive message classifier built with the **RoBERTa transformer model**, fine-tuned on the **Davidson et al. (2017) Twitter dataset**. This model achieves exceptional performance with 0.9774 F1-score for Hate speech and offencive message detection and 96.23% overall accuracy, making it suitable for **social media moderation, community platforms, and chat applications**.
25
+
26
+
27
+ ## Key Features
28
+
29
+ * πŸ€– **Transformer-based Architecture**: Built on `roberta-base` for advanced natural language understanding
30
+ * ⚑ **High Performance**: 0.9774 F1-score for hate/offensive message detection, 96.23% overall accuracy
31
+ * πŸ”§ **Hyperparameter Optimization**: Automated tuning using Optuna framework
32
+ * βš–οΈ **Class Imbalance Handling**: Weighted cross-entropy loss for fairness across labels
33
+ * πŸ“Š **Comprehensive Evaluation**: Precision, Recall, F1-score, confusion matrix
34
+ * πŸš€ **Production Ready**: Model + tokenizer saved in Hugging Face format for direct deployment
35
+
36
+
37
+ ## Model Performance
38
+
39
+ ### Final Results on Test Set:
40
+
41
+ * **Overall Accuracy**: *96.23%*
42
+ * **Weighted F1-Score**: *0.9621*
43
+ * **Offensive/Hate** F1-Score: 0.9774 βœ… (Exceeds 0.90 acceptance threshold)
44
+ * **Offensive/Hate** Precision: 97.49%
45
+ * **Offensive/Hate** Recall: 98% (High hate/offensive message detection rate)
46
+ * **Neither** Precision: 89.82%
47
+ * **Neither** Recall: 87.52%
48
+
49
+ Generalizability
50
+ πŸ“Š Strong Generalization: All performance metrics are evaluated on a completely unseen test set (15% of data, 3718 messages) that was never used during training or hyperparameter tuning, ensuring robust real-world performance and preventing overfitting.
51
+
52
+ ---
53
+ ## Dataset
54
+
55
+ **Source**: [Hate Speech and Offensive Language Dataset (Davidson et al., 2017)](https://www.kaggle.com/datasets/mrmorj/hate-speech-and-offensive-language-dataset)
56
+
57
+ ### Dataset Statistics:
58
+
59
+ * **Total Tweets**: 24,783
60
+ * **Hate Speech / Offensive**: 20620
61
+ * **Neutral**: 4163
62
+ * **Average Tweet Length**: ~86 characters
63
+ * **Language**: English
64
+
65
+ ### Dataset Split:
66
+ * Training Set: 70% (17,348 tweets) – model training
67
+ * Validation Set: 15% (3,717 tweets) – hyperparameter tuning
68
+ * Test Set: 15% (3,718 tweets) – final evaluation on unseen data
69
+
70
+ ### Preprocessing Steps:
71
+ * Label mapping: 0 = Neither, 1 = Hate/Offensive.
72
+ * Text cleaning.
73
+ * Train/validation/test split.
74
+ * Tokenization with RoBERTa tokenizer.
75
+ * Dynamic padding and truncation.
76
+
77
+
78
+ ## Architecture & Methodology
79
+
80
+ ### Model Architecture
81
+
82
+ * **Base Model**: `FacebokAI/roberta-base` (Hugging Face Transformers)
83
+ * **Task**: Multi-class sequence classification (2 labels)
84
+ * **Fine-tuning**: Custom classification head with 2 outputs
85
+ * **Tokenization**: RoBERTa tokenizer with optimal sequence length
86
+
87
+ ### Training Strategy
88
+
89
+ 1. Data Preprocessing: Hate/offencive message cleaning and label encoding
90
+ 2. Tokenization: Dynamic padding with optimal max length
91
+ 3. Class Balancing: Weighted loss function to handle imbalanced dataset
92
+ 4. Hyperparameter Optimization: Optuna-based automated tuning
93
+ 5. Evaluation: Comprehensive metrics on held-out test set
94
+
95
+
96
+ ## Hyperparameter Optimization
97
+
98
+ Optimized with **Optuna (15 trials)** across ranges:
99
+
100
+ * Dropout rates: Hidden dropout (0.1-0.3), Attention dropout (0.1-0.2)
101
+ * Learning rate: 1e-5 to 5e-5 range
102
+ * Weight decay: 0.0 to 0.1 regularization
103
+ * Batch size: 8, 16, or 32 samples
104
+ * Gradient accumulation steps: 1 to 4
105
+ * Training epochs: 2 to 5 epochs
106
+ * Warmup ratio: 0.05 to 0.1 for learning rate scheduling
107
+
108
+ ### Best Parameters Found:
109
+
110
+ * Hidden Dropout: `0.13034059066330464`
111
+ * Attention Dropout: `0.1935379847495239`
112
+ * Learning Rate: `1.031409901695853e-05`
113
+ * Weight Decay: `0.03606621145317628`
114
+ * Batch Size: `16`
115
+ * Gradient Accumulation: `1`
116
+ * Epochs: `2`
117
+ * Warmup Ratio: `0.0718442228846798`
118
+
119
+
120
+ ## πŸ“Š Detailed Results
121
+
122
+ ### Confusion Matrix :
123
+
124
+ | | Predicted Neither | Predicted Offensive/Hate |
125
+ |---------------------|-------------------|--------------------------|
126
+ | **Actual Neither** | 547 | 78 |
127
+ | **Actual Offensive**| 62 | 3031 |
128
+
129
+ ### Performance Breakdown
130
+
131
+ * **True Positives (Hate/Offensive correctly identified)**: 3031
132
+ * **True Negatives (Neutral correctly identified)**: 547
133
+ * **False Positives (Neutral incorrectly flagged)**: 78
134
+ * **False Negatives (Hate/offensive missed)**: 62
135
+
136
+ ## Usage
137
+
138
+ ```python
139
+ from transformers import RobertaTokenizer, RobertaForSequenceClassification
140
+ import torch
141
+
142
+ # Load the trained model + tokenizer
143
+ model = RobertaForSequenceClassification.from_pretrained("AshiniR/hate-speech-and-offensive-message-classifier")
144
+ tokenizer = RobertaTokenizer.from_pretrained("AshiniR/hate-speech-and-offensive-message-classifier")
145
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
146
+ model.to(device)
147
+
148
+ def get_inference(text: str) -> list:
149
+ """Returns prediction results in [{'label': str, 'score': float}, ...] format."""
150
+ # Tokenize input text
151
+ inputs = tokenizer(
152
+ text,
153
+ return_tensors="pt",
154
+ truncation=True,
155
+ padding=False,
156
+ max_length=128
157
+ )
158
+ inputs = {k: v.to(device) for k, v in inputs.items()}
159
+
160
+ # Get model predictions
161
+ with torch.no_grad():
162
+ outputs = model(**inputs)
163
+ probabilities = torch.softmax(outputs.logits, dim=-1)
164
+
165
+ # Convert to label format
166
+ labels = ["neither", "hate/offensive"]
167
+ results = []
168
+ for i, prob in enumerate(probabilities[0]):
169
+ results.append({
170
+ "label": labels[i],
171
+ "score": prob.item()
172
+ })
173
+
174
+ return sorted(results, key=lambda x: x["score"], reverse=True)
175
+
176
+ # Example usage
177
+ text = "I hate you!"
178
+ predictions = get_inference(text)
179
+ print(f"Text: '{text}'")
180
+ print(f"Predictions: {predictions}")
181
+ ```
182
+
183
+
184
+ ## Use Cases
185
+ This hate/offensive massege classifier is ideal for:
186
+
187
+ ### Messaging Platforms
188
+ * Discord bot moderation (Primary use case)
189
+ * SMS filtering systems
190
+ * Chat application content filtering
191
+ ### Content Moderation
192
+ * Social media platforms
193
+ * Comment section filtering
194
+ * User-generated content screening
195
+
196
+ ## Citation
197
+
198
+ If you use this model in your research or application, please cite:
199
+
200
+ @misc{AshiniR_Hate/Offencive_Message_Classifier_2025,
201
+ author = {Ashini Dhananjana},
202
+ title = {Hate/Offencive Message Classifier: RoBERTa-based Hate/Offencive Message Detection},
203
+ year = {2025},
204
+ publisher = {Hugging Face},
205
+ howpublished = {\url{https://huggingface.co/AshiniR/hate-speech-and-offensive-message-classifier}},
206
+ }
207
+
208
+ ## Model Card Contact
209
+
210
+ AshiniR - [Hugging Face Profile](https://huggingface.co/AshiniR)