tuklu commited on
Commit
befd860
Β·
verified Β·
1 Parent(s): 09d272f

Fix YAML metadata in model card

Browse files
Files changed (1) hide show
  1. README.md +316 -172
README.md CHANGED
@@ -1,102 +1,172 @@
1
- # Hate Speech Detection β€” Multilingual Sequential Transfer Learning
2
- ### GloVe Embeddings + Bidirectional LSTM (BiLSTM)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
 
4
  ---
5
 
6
- ## What is this project about?
 
 
 
 
 
 
 
 
 
7
 
8
- This project builds a system that can automatically detect **hate speech** in text written in three languages:
9
- - **English** β€” standard English text
10
- - **Hindi** β€” Hindi text (transliterated or native script)
11
- - **Hinglish** β€” a mix of Hindi and English (very common in Indian social media)
12
 
13
- The core question we are trying to answer is:
14
 
15
- > **Does the order in which you teach a model different languages matter for how well it performs?**
16
 
17
- For example β€” is a model that learns English first, then Hindi, then Hinglish better or worse than one that learns Hinglish first?
 
 
 
 
18
 
19
  ---
20
 
21
- ## The Dataset
22
 
23
- | Property | Value |
 
 
24
  |---|---|
25
- | Total samples | 29,505 |
26
- | English samples | 14,994 (50.8%) |
27
- | Hindi samples | 9,738 (33.0%) |
28
- | Hinglish samples | 4,774 (16.2%) |
29
- | Hate speech (label=1) | 13,707 (46.5%) |
30
- | Non-hate speech (label=0) | 15,799 (53.5%) |
 
 
 
 
 
 
 
 
 
31
 
32
  ![Language Distribution](output/figures/language_distribution.png)
33
 
34
- The dataset was split into three parts:
35
- - **Training set** β€” 17,704 samples (used to teach the model)
36
- - **Validation set** β€” 2,950 samples (used to monitor learning during training)
37
- - **Test set** β€” 8,852 samples (used only at the end to measure real performance)
38
 
39
  ---
40
 
41
- ## The Model β€” What is GloVe + BiLSTM?
42
-
43
- Think of the model like a two-part reading machine:
44
 
45
- ### Part 1: GloVe Embeddings (the dictionary)
46
- Before the model can understand words, it needs to know what words *mean* relative to each other. GloVe (Global Vectors) is a pre-trained lookup table of **300,000+ English words**, where each word is represented as a list of 300 numbers that capture its meaning. Words with similar meanings end up with similar numbers.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
47
 
48
- - We used `glove.6B.300d.txt` β€” 6 billion word training corpus, 300 dimensions
49
- - The embedding layer is **frozen** (not updated during training) β€” we keep GloVe's knowledge as-is and only train the layers on top
50
 
51
- ### Part 2: Bidirectional LSTM (the reader)
52
- An LSTM (Long Short-Term Memory) is a type of neural network designed to read sequences β€” like sentences β€” and remember what it read. **Bidirectional** means it reads the sentence both forwards and backwards, so it understands context from both directions.
53
 
54
- ```
55
- Input sentence
56
- ↓
57
- GloVe Embeddings (300d, frozen)
58
- ↓
59
- BiLSTM (128 units, reads leftβ†’right AND right←left)
60
- ↓
61
- Dropout (50% β€” randomly switches off neurons to prevent overfitting)
62
- ↓
63
- Dense layer (64 neurons, ReLU activation)
64
- ↓
65
- Output (1 neuron, Sigmoid β€” gives a probability 0 to 1)
66
- ↓
67
- > 0.5 = Hate Speech, ≀ 0.5 = Not Hate Speech
68
- ```
69
 
70
  ---
71
 
72
- ## The Training Strategy β€” What is Transfer Learning?
 
 
73
 
74
- **Transfer learning** means the model carries what it learned from one task into the next. Like a student who already knows French β€” learning Spanish is easier because both share Latin roots.
75
 
76
- In our case, we train the model on one language, and instead of starting fresh for the next language, we **keep all the weights (knowledge)** from the previous training. The model continues learning from where it left off.
77
 
78
- ### The Bug We Fixed
79
- The original code was creating a **brand new model** for every language β€” resetting all the weights each time. That is not transfer learning, it's just training three separate models. We fixed this by building the model **once** and sequentially fine-tuning it.
 
80
 
81
  ```python
82
- # WRONG β€” model reset every loop iteration
83
  for lang in languages:
84
- model = Sequential() # ← new model = no transfer learning
85
- model.fit(...)
86
 
87
  # CORRECT β€” model built once, weights carry forward
88
- model = build_model() # ← built once
89
  for lang in languages:
90
- model.fit(...) # ← continues learning from previous language
91
  ```
92
 
 
 
93
  ---
94
 
95
- ## Plan B β€” The Experiment
96
 
97
- We ran all **6 possible orderings** of the three languages, each followed by a final training round on the complete shuffled dataset:
98
 
99
- | # | Strategy |
100
  |---|---|
101
  | 1 | English β†’ Hindi β†’ Hinglish β†’ Full |
102
  | 2 | English β†’ Hinglish β†’ Hindi β†’ Full |
@@ -105,73 +175,124 @@ We ran all **6 possible orderings** of the three languages, each followed by a f
105
  | 5 | Hinglish β†’ English β†’ Hindi β†’ Full |
106
  | 6 | Hinglish β†’ Hindi β†’ English β†’ Full |
107
 
108
- For each strategy, training happens in 4 phases. **After each phase**, we immediately evaluate the model on that specific language's test data and record all metrics. This tells us how well the model performs at each stage of the learning journey.
109
 
110
  ```
111
- Phase 1: Train on Language A β†’ Test on Language A test set β†’ Record metrics + plots
112
- Phase 2: Train on Language B β†’ Test on Language B test set β†’ Record metrics + plots
113
- Phase 3: Train on Language C β†’ Test on Language C test set β†’ Record metrics + plots
114
- Phase 4: Train on Full data β†’ Test on Full test set β†’ Record metrics + plots
115
  ```
116
 
117
- Each phase used **8 epochs** with batch size 32 (64 for the full phase).
118
 
119
  ---
120
 
121
- ## Metrics β€” What do we measure?
122
 
123
- | Metric | What it means in plain English |
124
- |---|---|
125
- | **Accuracy** | Out of all predictions, how many were correct? |
126
- | **Balanced Accuracy** | Accuracy adjusted for class imbalance (more fair) |
127
- | **Precision** | Of everything the model flagged as hate speech, how much actually was? |
128
- | **Recall** | Of all actual hate speech, how much did the model catch? |
129
- | **Specificity** | Of all non-hate speech, how much did the model correctly ignore? |
130
- | **F1 Score** | Balance between Precision and Recall (harmonic mean) |
131
- | **ROC-AUC** | Overall ability to distinguish hate from non-hate (1.0 = perfect) |
132
 
133
- ---
 
 
 
 
 
 
 
134
 
135
- ## Results Summary
136
 
137
- Full results are in `output/results_tables/all_strategies_results.csv`. Key highlights:
 
 
138
 
139
- ### English phase performance across strategies (best language)
140
 
141
- | Strategy | Accuracy | F1 | ROC-AUC |
142
- |---|---|---|---|
143
- | English β†’ Hindi β†’ Hinglish β†’ Full | 0.7701 | 0.7696 | 0.8504 |
144
- | English β†’ Hinglish β†’ Hindi β†’ Full | 0.7721 | 0.7743 | 0.8525 |
145
- | Hindi β†’ English β†’ Hinglish β†’ Full | 0.7780 | 0.7830 | 0.8549 |
146
- | Hindi β†’ Hinglish β†’ English β†’ Full | 0.7780 | 0.7816 | 0.8563 |
147
- | Hinglish β†’ English β†’ Hindi β†’ Full | 0.7716 | 0.7829 | 0.8484 |
148
- | Hinglish β†’ Hindi β†’ English β†’ Full | 0.7765 | 0.7811 | 0.8534 |
149
 
150
- ### Full dataset phase (final performance)
 
 
 
151
 
152
- | Strategy | Accuracy | F1 | ROC-AUC |
153
- |---|---|---|---|
154
- | English β†’ Hindi β†’ Hinglish β†’ Full | 0.6796 | 0.5923 | 0.7599 |
155
- | English β†’ Hinglish β†’ Hindi β†’ Full | 0.6813 | 0.6244 | 0.7535 |
156
- | Hindi β†’ English β†’ Hinglish β†’ Full | 0.6854 | 0.6419 | 0.7528 |
157
- | Hindi β†’ Hinglish β†’ English β†’ Full | 0.6865 | 0.6364 | 0.7507 |
158
- | Hinglish β†’ English β†’ Hindi β†’ Full | 0.6778 | 0.6285 | 0.7521 |
159
- | Hinglish β†’ Hindi β†’ English β†’ Full | 0.6845 | 0.6301 | 0.7548 |
160
-
161
- ### Key observations
162
- - **English** consistently achieves the highest accuracy (~77%) regardless of when it is trained β€” likely because GloVe embeddings are English-centric
163
- - **Hindi** is the hardest language β€” accuracy hovers around 55–59% across all strategies
164
- - **Hinglish** sits in the middle (~66–70%) which makes sense as it borrows heavily from English
165
- - Strategies that train **Hindi first** (`Hindi β†’ English β†’ Hinglish`) tend to recover better in later phases, suggesting the model benefits from tackling the hardest language early
166
- - The **Full phase** shows consistent ~68% accuracy across all strategies, suggesting the final shuffled training normalises the differences introduced by ordering
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
167
 
168
  ---
169
 
170
- ## Plots by Strategy
171
 
172
  ### Strategy 1: English β†’ Hindi β†’ Hinglish β†’ Full
173
 
174
- | Phase | Training Curves | Confusion Matrix | ROC Curve | PR Curve | F1 Curve |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
175
  |---|---|---|---|---|---|
176
  | English | ![](output/figures/english_to_hindi_to_hinglish/english_to_hindi_to_hinglish_[english]_curves.png) | ![](output/figures/english_to_hindi_to_hinglish/english_to_hindi_to_hinglish_[english]_cm.png) | ![](output/figures/english_to_hindi_to_hinglish/english_to_hindi_to_hinglish_[english]_roc.png) | ![](output/figures/english_to_hindi_to_hinglish/english_to_hindi_to_hinglish_[english]_pr.png) | ![](output/figures/english_to_hindi_to_hinglish/english_to_hindi_to_hinglish_[english]_f1.png) |
177
  | Hindi | ![](output/figures/english_to_hindi_to_hinglish/english_to_hindi_to_hinglish_[hindi]_curves.png) | ![](output/figures/english_to_hindi_to_hinglish/english_to_hindi_to_hinglish_[hindi]_cm.png) | ![](output/figures/english_to_hindi_to_hinglish/english_to_hindi_to_hinglish_[hindi]_roc.png) | ![](output/figures/english_to_hindi_to_hinglish/english_to_hindi_to_hinglish_[hindi]_pr.png) | ![](output/figures/english_to_hindi_to_hinglish/english_to_hindi_to_hinglish_[hindi]_f1.png) |
@@ -182,7 +303,14 @@ Full results are in `output/results_tables/all_strategies_results.csv`. Key high
182
 
183
  ### Strategy 2: English β†’ Hinglish β†’ Hindi β†’ Full
184
 
185
- | Phase | Training Curves | Confusion Matrix | ROC Curve | PR Curve | F1 Curve |
 
 
 
 
 
 
 
186
  |---|---|---|---|---|---|
187
  | English | ![](output/figures/english_to_hinglish_to_hindi/english_to_hinglish_to_hindi_[english]_curves.png) | ![](output/figures/english_to_hinglish_to_hindi/english_to_hinglish_to_hindi_[english]_cm.png) | ![](output/figures/english_to_hinglish_to_hindi/english_to_hinglish_to_hindi_[english]_roc.png) | ![](output/figures/english_to_hinglish_to_hindi/english_to_hinglish_to_hindi_[english]_pr.png) | ![](output/figures/english_to_hinglish_to_hindi/english_to_hinglish_to_hindi_[english]_f1.png) |
188
  | Hinglish | ![](output/figures/english_to_hinglish_to_hindi/english_to_hinglish_to_hindi_[hinglish]_curves.png) | ![](output/figures/english_to_hinglish_to_hindi/english_to_hinglish_to_hindi_[hinglish]_cm.png) | ![](output/figures/english_to_hinglish_to_hindi/english_to_hinglish_to_hindi_[hinglish]_roc.png) | ![](output/figures/english_to_hinglish_to_hindi/english_to_hinglish_to_hindi_[hinglish]_pr.png) | ![](output/figures/english_to_hinglish_to_hindi/english_to_hinglish_to_hindi_[hinglish]_f1.png) |
@@ -191,9 +319,18 @@ Full results are in `output/results_tables/all_strategies_results.csv`. Key high
191
 
192
  ---
193
 
194
- ### Strategy 3: Hindi β†’ English β†’ Hinglish β†’ Full
 
 
 
 
 
 
 
 
 
195
 
196
- | Phase | Training Curves | Confusion Matrix | ROC Curve | PR Curve | F1 Curve |
197
  |---|---|---|---|---|---|
198
  | Hindi | ![](output/figures/hindi_to_english_to_hinglish/hindi_to_english_to_hinglish_[hindi]_curves.png) | ![](output/figures/hindi_to_english_to_hinglish/hindi_to_english_to_hinglish_[hindi]_cm.png) | ![](output/figures/hindi_to_english_to_hinglish/hindi_to_english_to_hinglish_[hindi]_roc.png) | ![](output/figures/hindi_to_english_to_hinglish/hindi_to_english_to_hinglish_[hindi]_pr.png) | ![](output/figures/hindi_to_english_to_hinglish/hindi_to_english_to_hinglish_[hindi]_f1.png) |
199
  | English | ![](output/figures/hindi_to_english_to_hinglish/hindi_to_english_to_hinglish_[english]_curves.png) | ![](output/figures/hindi_to_english_to_hinglish/hindi_to_english_to_hinglish_[english]_cm.png) | ![](output/figures/hindi_to_english_to_hinglish/hindi_to_english_to_hinglish_[english]_roc.png) | ![](output/figures/hindi_to_english_to_hinglish/hindi_to_english_to_hinglish_[english]_pr.png) | ![](output/figures/hindi_to_english_to_hinglish/hindi_to_english_to_hinglish_[english]_f1.png) |
@@ -204,7 +341,14 @@ Full results are in `output/results_tables/all_strategies_results.csv`. Key high
204
 
205
  ### Strategy 4: Hindi β†’ Hinglish β†’ English β†’ Full
206
 
207
- | Phase | Training Curves | Confusion Matrix | ROC Curve | PR Curve | F1 Curve |
 
 
 
 
 
 
 
208
  |---|---|---|---|---|---|
209
  | Hindi | ![](output/figures/hindi_to_hinglish_to_english/hindi_to_hinglish_to_english_[hindi]_curves.png) | ![](output/figures/hindi_to_hinglish_to_english/hindi_to_hinglish_to_english_[hindi]_cm.png) | ![](output/figures/hindi_to_hinglish_to_english/hindi_to_hinglish_to_english_[hindi]_roc.png) | ![](output/figures/hindi_to_hinglish_to_english/hindi_to_hinglish_to_english_[hindi]_pr.png) | ![](output/figures/hindi_to_hinglish_to_english/hindi_to_hinglish_to_english_[hindi]_f1.png) |
210
  | Hinglish | ![](output/figures/hindi_to_hinglish_to_english/hindi_to_hinglish_to_english_[hinglish]_curves.png) | ![](output/figures/hindi_to_hinglish_to_english/hindi_to_hinglish_to_english_[hinglish]_cm.png) | ![](output/figures/hindi_to_hinglish_to_english/hindi_to_hinglish_to_english_[hinglish]_roc.png) | ![](output/figures/hindi_to_hinglish_to_english/hindi_to_hinglish_to_english_[hinglish]_pr.png) | ![](output/figures/hindi_to_hinglish_to_english/hindi_to_hinglish_to_english_[hinglish]_f1.png) |
@@ -215,7 +359,14 @@ Full results are in `output/results_tables/all_strategies_results.csv`. Key high
215
 
216
  ### Strategy 5: Hinglish β†’ English β†’ Hindi β†’ Full
217
 
218
- | Phase | Training Curves | Confusion Matrix | ROC Curve | PR Curve | F1 Curve |
 
 
 
 
 
 
 
219
  |---|---|---|---|---|---|
220
  | Hinglish | ![](output/figures/hinglish_to_english_to_hindi/hinglish_to_english_to_hindi_[hinglish]_curves.png) | ![](output/figures/hinglish_to_english_to_hindi/hinglish_to_english_to_hindi_[hinglish]_cm.png) | ![](output/figures/hinglish_to_english_to_hindi/hinglish_to_english_to_hindi_[hinglish]_roc.png) | ![](output/figures/hinglish_to_english_to_hindi/hinglish_to_english_to_hindi_[hinglish]_pr.png) | ![](output/figures/hinglish_to_english_to_hindi/hinglish_to_english_to_hindi_[hinglish]_f1.png) |
221
  | English | ![](output/figures/hinglish_to_english_to_hindi/hinglish_to_english_to_hindi_[english]_curves.png) | ![](output/figures/hinglish_to_english_to_hindi/hinglish_to_english_to_hindi_[english]_cm.png) | ![](output/figures/hinglish_to_english_to_hindi/hinglish_to_english_to_hindi_[english]_roc.png) | ![](output/figures/hinglish_to_english_to_hindi/hinglish_to_english_to_hindi_[english]_pr.png) | ![](output/figures/hinglish_to_english_to_hindi/hinglish_to_english_to_hindi_[english]_f1.png) |
@@ -226,7 +377,14 @@ Full results are in `output/results_tables/all_strategies_results.csv`. Key high
226
 
227
  ### Strategy 6: Hinglish β†’ Hindi β†’ English β†’ Full
228
 
229
- | Phase | Training Curves | Confusion Matrix | ROC Curve | PR Curve | F1 Curve |
 
 
 
 
 
 
 
230
  |---|---|---|---|---|---|
231
  | Hinglish | ![](output/figures/hinglish_to_hindi_to_english/hinglish_to_hindi_to_english_[hinglish]_curves.png) | ![](output/figures/hinglish_to_hindi_to_english/hinglish_to_hindi_to_english_[hinglish]_cm.png) | ![](output/figures/hinglish_to_hindi_to_english/hinglish_to_hindi_to_english_[hinglish]_roc.png) | ![](output/figures/hinglish_to_hindi_to_english/hinglish_to_hindi_to_english_[hinglish]_pr.png) | ![](output/figures/hinglish_to_hindi_to_english/hinglish_to_hindi_to_english_[hinglish]_f1.png) |
232
  | Hindi | ![](output/figures/hinglish_to_hindi_to_english/hinglish_to_hindi_to_english_[hindi]_curves.png) | ![](output/figures/hinglish_to_hindi_to_english/hinglish_to_hindi_to_english_[hindi]_cm.png) | ![](output/figures/hinglish_to_hindi_to_english/hinglish_to_hindi_to_english_[hindi]_roc.png) | ![](output/figures/hinglish_to_hindi_to_english/hinglish_to_hindi_to_english_[hindi]_pr.png) | ![](output/figures/hinglish_to_hindi_to_english/hinglish_to_hindi_to_english_[hindi]_f1.png) |
@@ -235,76 +393,62 @@ Full results are in `output/results_tables/all_strategies_results.csv`. Key high
235
 
236
  ---
237
 
238
- ## Output Files
239
-
240
- ```
241
- output/
242
- β”œβ”€β”€ dataset_splits/
243
- β”‚ β”œβ”€β”€ train.csv # 17,704 training samples
244
- β”‚ β”œβ”€β”€ val.csv # 2,950 validation samples
245
- β”‚ └── test.csv # 8,852 test samples
246
- β”‚
247
- β”œβ”€β”€ results_tables/
248
- β”‚ β”œβ”€β”€ all_strategies_results.csv # All 24 rows (6 strategies Γ— 4 phases)
249
- β”‚ β”œβ”€β”€ english_to_hindi_to_hinglish_results.csv
250
- β”‚ β”œβ”€β”€ english_to_hinglish_to_hindi_results.csv
251
- β”‚ β”œβ”€β”€ hindi_to_english_to_hinglish_results.csv
252
- β”‚ β”œοΏ½οΏ½β”€ hindi_to_hinglish_to_english_results.csv
253
- β”‚ β”œβ”€β”€ hinglish_to_english_to_hindi_results.csv
254
- β”‚ └── hinglish_to_hindi_to_english_results.csv
255
- β”‚
256
- └── figures/
257
- β”œβ”€β”€ language_distribution.png # Pie chart of dataset languages
258
- β”‚
259
- β”œβ”€β”€ english_to_hindi_to_hinglish/ # One folder per strategy
260
- β”‚ β”œβ”€β”€ *_[english]_curves.png # Train/Val accuracy + loss
261
- β”‚ β”œβ”€β”€ *_[english]_cm.png # Confusion matrix
262
- β”‚ β”œβ”€β”€ *_[english]_roc.png # ROC curve
263
- β”‚ β”œβ”€β”€ *_[english]_pr.png # Precision-Recall curve
264
- β”‚ β”œβ”€β”€ *_[english]_f1.png # F1 vs Threshold curve
265
- β”‚ β”œβ”€β”€ *_[hindi]_curves.png
266
- β”‚ β”œβ”€β”€ *_[hindi]_cm.png ...
267
- β”‚ β”œβ”€β”€ *_[hinglish]_curves.png
268
- β”‚ β”œβ”€β”€ *_[hinglish]_cm.png ...
269
- β”‚ β”œβ”€β”€ *_[Full]_curves.png
270
- β”‚ └── *_[Full]_cm.png ...
271
- β”‚
272
- β”œβ”€β”€ english_to_hinglish_to_hindi/
273
- β”œβ”€β”€ hindi_to_english_to_hinglish/
274
- β”œβ”€β”€ hindi_to_hinglish_to_english/
275
- β”œβ”€β”€ hinglish_to_english_to_hindi/
276
- └── hinglish_to_hindi_to_english/
277
- ```
278
 
279
- ---
280
 
281
- ## How to Run
 
 
 
 
 
 
 
 
282
 
283
- ### Requirements
284
- ```bash
285
- pip install tensorflow scikit-learn pandas seaborn matplotlib
286
- ```
287
 
288
- You also need GloVe embeddings (`glove.6B.300d.txt`) placed at `/root/glove.6B.300d.txt`:
289
- ```bash
290
- wget http://nlp.stanford.edu/data/glove.6B.zip && unzip glove.6B.zip
291
- ```
292
 
293
- ### Run
294
- ```bash
295
- python main.py
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
296
  ```
297
 
298
- Training was performed on an NVIDIA H200 GPU (Vast.ai) β€” total runtime approximately 15–20 minutes for all 6 strategies.
299
-
300
  ---
301
 
302
- ## Project Structure
303
 
304
  ```
305
- SASC/
306
- β”œβ”€β”€ main.py # Full training + evaluation pipeline
307
- β”œβ”€β”€ dataset.csv # Raw dataset (29,505 samples)
308
- β”œβ”€β”€ README.md # This file
309
- └── output/ # All results, figures, and model checkpoints
 
 
310
  ```
 
1
+ ---
2
+ language:
3
+ - en
4
+ - hi
5
+ tags:
6
+ - hate-speech
7
+ - text-classification
8
+ - bilstm
9
+ - glove
10
+ - multilingual
11
+ - transfer-learning
12
+ - hinglish
13
+ - sequential-learning
14
+ datasets:
15
+ - tuklu/nprism
16
+ license: mit
17
+ model-index:
18
+ - name: hate-speech-multilingual-bilstm
19
+ results:
20
+ - task:
21
+ type: text-classification
22
+ name: Hate Speech Detection
23
+ dataset:
24
+ name: nprism
25
+ type: tuklu/nprism
26
+ metrics:
27
+ - type: f1
28
+ value: 0.6419
29
+ name: F1 Score (Best Strategy - Full Phase)
30
+ - type: accuracy
31
+ value: 0.6854
32
+ name: Accuracy (Best Strategy - Full Phase)
33
+ - type: roc_auc
34
+ value: 0.7528
35
+ name: ROC-AUC (Best Strategy - Full Phase)
36
+ ---
37
+
38
+ # Multilingual Hate Speech Detection β€” GloVe + BiLSTM
39
+
40
+ **Task:** Binary text classification (Hate / Non-Hate)
41
+ **Languages:** English, Hindi, Hinglish (Hindi-English code-mixed)
42
+ **Architecture:** Bidirectional LSTM with frozen GloVe embeddings
43
+ **Best Strategy:** Hindi β†’ English β†’ Hinglish β†’ Full (F1: 0.6419, AUC: 0.7528)
44
 
45
  ---
46
 
47
+ ## Table of Contents
48
+ 1. [What This Project Does](#1-what-this-project-does)
49
+ 2. [The Dataset](#2-the-dataset)
50
+ 3. [Model Architecture](#3-model-architecture)
51
+ 4. [The Core Idea β€” Transfer Learning](#4-the-core-idea--transfer-learning)
52
+ 5. [The Experiment β€” Plan B](#5-the-experiment--plan-b)
53
+ 6. [Results & Best Model Selection](#6-results--best-model-selection)
54
+ 7. [Full Results by Strategy](#7-full-results-by-strategy)
55
+ 8. [All Model Checkpoints](#8-all-model-checkpoints)
56
+ 9. [How to Use](#9-how-to-use)
57
 
58
+ ---
 
 
 
59
 
60
+ ## 1. What This Project Does
61
 
62
+ This project investigates whether the **order of language exposure** during sequential transfer learning affects a model's ability to detect hate speech across three languages: English, Hindi, and Hinglish.
63
 
64
+ The key question:
65
+
66
+ > If you train a model on English first, then Hindi, then Hinglish β€” does it perform better or worse than training Hinglish first?
67
+
68
+ We ran all **6 possible orderings**, each followed by a final training pass on the complete shuffled dataset, and measured performance after every single phase.
69
 
70
  ---
71
 
72
+ ## 2. The Dataset
73
 
74
+ Dataset: [tuklu/nprism](https://huggingface.co/datasets/tuklu/nprism)
75
+
76
+ | Split | Samples |
77
  |---|---|
78
+ | Train | 17,704 |
79
+ | Validation | 2,950 |
80
+ | Test | 8,852 |
81
+ | **Total** | **29,505** |
82
+
83
+ | Language | Count | % |
84
+ |---|---|---|
85
+ | English | 14,994 | 50.8% |
86
+ | Hindi | 9,738 | 33.0% |
87
+ | Hinglish | 4,774 | 16.2% |
88
+
89
+ | Label | Count | % |
90
+ |---|---|---|
91
+ | Non-Hate (0) | 15,799 | 53.5% |
92
+ | Hate (1) | 13,707 | 46.5% |
93
 
94
  ![Language Distribution](output/figures/language_distribution.png)
95
 
96
+ The pie chart above shows the dataset is dominated by English (50.8%), with Hindi and Hinglish making up the rest. This imbalance is important β€” it means the model sees more English examples and GloVe embeddings are English-centric, which directly explains why English phase always achieves the highest accuracy.
 
 
 
97
 
98
  ---
99
 
100
+ ## 3. Model Architecture
 
 
101
 
102
+ ```
103
+ Input: Text sequence (max 100 tokens)
104
+ ↓
105
+ GloVe Embedding Layer (vocab: 50,000 Γ— 300d) β€” FROZEN
106
+ ↓
107
+ Bidirectional LSTM (128 units)
108
+ β†’ reads sentence left-to-right AND right-to-left
109
+ β†’ captures context from both directions
110
+ ↓
111
+ Dropout (0.5) β€” randomly disables 50% of neurons during training
112
+ β†’ prevents memorising training data (overfitting)
113
+ ↓
114
+ Dense Layer (64 neurons, ReLU activation)
115
+ ↓
116
+ Output Layer (1 neuron, Sigmoid)
117
+ β†’ outputs probability 0.0 to 1.0
118
+ β†’ > 0.5 = Hate Speech
119
+ β†’ ≀ 0.5 = Not Hate Speech
120
+ ```
121
 
122
+ **Why GloVe?**
123
+ GloVe (Global Vectors) is a pre-trained word embedding trained on 6 billion tokens. Each word becomes a 300-number vector that captures semantic meaning β€” "hate" and "violence" end up close together in this 300-dimensional space. We freeze it (don't update during training) to preserve this general knowledge and only train the layers on top.
124
 
125
+ **Why BiLSTM?**
126
+ A regular LSTM reads text left to right. A BiLSTM reads it both ways and combines the results. The sentence *"I don't hate you"* needs both directions to understand the negation β€” the word "don't" only makes sense in context of what comes after it.
127
 
128
+ **Training config:**
129
+ - Optimizer: Adam
130
+ - Loss: Binary Cross-Entropy
131
+ - Epochs per phase: 8
132
+ - Batch size: 32 (64 for full phase)
133
+ - Max sequence length: 100 tokens
 
 
 
 
 
 
 
 
 
134
 
135
  ---
136
 
137
+ ## 4. The Core Idea β€” Transfer Learning
138
+
139
+ **Transfer learning** = the model keeps what it learned from one task when starting the next one.
140
 
141
+ Think of it like a student who already knows French β€” learning Spanish is faster because both share Latin roots. The vocabulary, grammar intuitions, and reading skills transfer.
142
 
143
+ In our case: train on English β†’ the model learns what "hate speech patterns" look like in a language GloVe understands well β†’ then fine-tune on Hindi β†’ the model adapts those patterns to Hindi β†’ then Hinglish β†’ the model adapts again using everything it knows.
144
 
145
+ ### The Bug That Was Fixed
146
+
147
+ The original code was reinitialising the model inside the loop β€” meaning **every language got a brand new, untrained model**. That is not transfer learning at all.
148
 
149
  ```python
150
+ # WRONG β€” model reset every iteration, no knowledge transfer
151
  for lang in languages:
152
+ model = Sequential() # ← destroys all previous learning
153
+ model.fit(X_lang, ...)
154
 
155
  # CORRECT β€” model built once, weights carry forward
156
+ model = build_model() # ← built once before the loop
157
  for lang in languages:
158
+ model.fit(X_lang, ...) # ← each fit continues from where previous left off
159
  ```
160
 
161
+ This single fix is the entire point of the experiment.
162
+
163
  ---
164
 
165
+ ## 5. The Experiment β€” Plan B
166
 
167
+ We tested all 6 permutations of [English, Hindi, Hinglish], each ending with a full shuffled dataset phase:
168
 
169
+ | # | Training Order |
170
  |---|---|
171
  | 1 | English β†’ Hindi β†’ Hinglish β†’ Full |
172
  | 2 | English β†’ Hinglish β†’ Hindi β†’ Full |
 
175
  | 5 | Hinglish β†’ English β†’ Hindi β†’ Full |
176
  | 6 | Hinglish β†’ Hindi β†’ English β†’ Full |
177
 
178
+ **After each phase**, the model is immediately evaluated on **that specific language's test subset**. So for strategy `English β†’ Hindi β†’ Hinglish β†’ Full`:
179
 
180
  ```
181
+ Train on English β†’ evaluate English test set β†’ save metrics + plots
182
+ Train on Hindi β†’ evaluate Hindi test set β†’ save metrics + plots
183
+ Train on Hinglish β†’ evaluate Hinglish test set β†’ save metrics + plots
184
+ Train on Full data β†’ evaluate full test set β†’ save metrics + plots
185
  ```
186
 
187
+ This gives us 4 snapshots per strategy β€” letting us see exactly how the model evolves as it learns each new language.
188
 
189
  ---
190
 
191
+ ## 6. Results & Best Model Selection
192
 
193
+ ### Full Phase Results (Final Model Performance)
 
 
 
 
 
 
 
 
194
 
195
+ | Strategy | Accuracy | Balanced Acc | Precision | Recall | Specificity | F1 | ROC-AUC |
196
+ |---|---|---|---|---|---|---|---|
197
+ | **Hindi β†’ English β†’ Hinglish β†’ Full** | 0.6854 | **0.6802** | 0.6810 | 0.6070 | 0.7534 | **0.6419** | 0.7528 |
198
+ | Hindi β†’ Hinglish β†’ English β†’ Full | **0.6865** | 0.6801 | 0.6900 | 0.5905 | 0.7698 | 0.6364 | 0.7507 |
199
+ | Hinglish β†’ Hindi β†’ English β†’ Full | 0.6845 | 0.6775 | 0.6918 | 0.5786 | 0.7764 | 0.6301 | **0.7548** |
200
+ | English β†’ Hinglish β†’ Hindi β†’ Full | 0.6813 | 0.6740 | 0.6899 | 0.5703 | 0.7776 | 0.6244 | 0.7535 |
201
+ | Hinglish β†’ English β†’ Hindi β†’ Full | 0.6778 | 0.6718 | 0.6768 | 0.5866 | 0.7570 | 0.6285 | 0.7521 |
202
+ | English β†’ Hindi β†’ Hinglish β†’ Full | 0.6796 | 0.6678 | 0.7243 | 0.5010 | 0.8346 | 0.5923 | 0.7599 |
203
 
204
+ ### Why Hindi β†’ English β†’ Hinglish β†’ Full is the Best Model
205
 
206
+ **F1 Score is the most important metric here.** For hate speech detection, we need to balance two things:
207
+ - **Precision** β€” don't falsely flag innocent content as hate
208
+ - **Recall** β€” don't miss actual hate speech
209
 
210
+ F1 is the harmonic mean of both. A model that misses half the hate speech (low recall) or flags everything as hate (low precision) is useless in practice.
211
 
212
+ Look at `English β†’ Hindi β†’ Hinglish β†’ Full` β€” it has the highest ROC-AUC (0.7599) but an F1 of only 0.5923. Why? Its Recall is only 0.5010 β€” it misses **half of all hate speech**. High ROC-AUC can be misleading when threshold calibration is off.
 
 
 
 
 
 
 
213
 
214
+ `Hindi β†’ English β†’ Hinglish β†’ Full` has:
215
+ - Best F1 (0.6419) β€” best balance of precision and recall
216
+ - Best Balanced Accuracy (0.6802) β€” most fair across both classes
217
+ - Recall of 0.607 β€” catches significantly more hate speech than alternatives
218
 
219
+ **Why does Hindi-first work better?**
220
+
221
+ Hindi is the hardest language for this model (GloVe has limited Hindi coverage). Training on Hindi *first* forces the model to develop general hate-speech-detection features that aren't dependent on GloVe's English-centric embeddings. It learns to detect patterns from context and sequence rather than relying on word meanings alone. When English comes next, the model improves dramatically and carries robust features forward. English-first strategies give the model an easy start but it never develops the robustness needed for low-resource languages.
222
+
223
+ ### Best Model Training Curves (Hindi β†’ English β†’ Hinglish β†’ Full)
224
+
225
+ **Phase 1: Train on Hindi**
226
+
227
+ ![Hindi Training Curves](output/figures/hindi_to_english_to_hinglish/hindi_to_english_to_hinglish_[hindi]_curves.png)
228
+
229
+ The model starts cold on Hindi. Accuracy is low (~55-57%) and validation loss is unstable β€” this is expected. GloVe doesn't cover Hindi well so the model is learning purely from sequential patterns. The struggle here is valuable β€” it forces the model to build language-agnostic features.
230
+
231
+ **Phase 2: Train on English**
232
+
233
+ ![English Training Curves](output/figures/hindi_to_english_to_hinglish/hindi_to_english_to_hinglish_[english]_curves.png)
234
+
235
+ Dramatic improvement. The model jumps to ~77-78% accuracy. GloVe embeddings now align well with the input language. Notice that it doesn't start from scratch β€” the Hindi training gave it a base of sequential hate-speech patterns, and now with English vocabulary the model improves rapidly.
236
+
237
+ **Phase 3: Train on Hinglish**
238
+
239
+ ![Hinglish Training Curves](output/figures/hindi_to_english_to_hinglish/hindi_to_english_to_hinglish_[hinglish]_curves.png)
240
+
241
+ Hinglish is code-mixed β€” it borrows from both languages the model already knows. Training accuracy climbs to ~68-69%. The model adapts its existing knowledge to handle the mixed vocabulary.
242
+
243
+ **Phase 4: Train on Full Dataset**
244
+
245
+ ![Full Training Curves](output/figures/hindi_to_english_to_hinglish/hindi_to_english_to_hinglish_[Full]_curves.png)
246
+
247
+ Final fine-tuning on all 17,704 shuffled training samples. Training and validation accuracy converge, loss stabilises. This phase consolidates all language knowledge into the final model.
248
+
249
+ ### Best Model Evaluation Charts
250
+
251
+ **Confusion Matrix:**
252
+
253
+ ![Confusion Matrix](output/figures/hindi_to_english_to_hinglish/hindi_to_english_to_hinglish_[Full]_cm.png)
254
+
255
+ Shows actual vs predicted counts. A well-balanced confusion matrix means the model is not biased toward one class. True Positives (hate correctly identified) and True Negatives (non-hate correctly identified) should both be high.
256
+
257
+ **ROC Curve (AUC = 0.7528):**
258
+
259
+ ![ROC Curve](output/figures/hindi_to_english_to_hinglish/hindi_to_english_to_hinglish_[Full]_roc.png)
260
+
261
+ The ROC curve shows the trade-off between True Positive Rate (catching hate speech) and False Positive Rate (wrongly flagging non-hate). AUC of 0.7528 means the model has a 75.3% chance of correctly ranking a hate speech example higher than a non-hate example β€” significantly better than random (0.5).
262
+
263
+ **Precision-Recall Curve:**
264
+
265
+ ![PR Curve](output/figures/hindi_to_english_to_hinglish/hindi_to_english_to_hinglish_[Full]_pr.png)
266
+
267
+ Shows the trade-off between precision and recall at different thresholds. The curve staying high across recall values means the model maintains good precision even as it catches more hate speech. Useful for choosing the operating threshold based on deployment requirements.
268
+
269
+ **F1 vs Threshold Curve:**
270
+
271
+ ![F1 Curve](output/figures/hindi_to_english_to_hinglish/hindi_to_english_to_hinglish_[Full]_f1.png)
272
+
273
+ Shows F1 score at every possible decision threshold. The peak is near 0.5 confirming our threshold choice is well-calibrated. If deploying in a high-recall scenario (catch all hate speech even at cost of false positives), lower the threshold; for high-precision (only flag certain hate speech), raise it.
274
 
275
  ---
276
 
277
+ ## 7. Full Results by Strategy
278
 
279
  ### Strategy 1: English β†’ Hindi β†’ Hinglish β†’ Full
280
 
281
+ | Phase | Accuracy | F1 | ROC-AUC |
282
+ |---|---|---|---|
283
+ | English | 0.7701 | 0.7696 | 0.8504 |
284
+ | Hindi | 0.5507 | 0.0000 | 0.5689 |
285
+ | Hinglish | 0.6780 | 0.5155 | 0.6691 |
286
+ | Full | 0.6796 | 0.5923 | 0.7599 |
287
+
288
+ **Note on the Hindi phase row** β€” Precision=0, Recall=0, F1=0, Specificity=1.0. This is not a data error. After training only on English, the model predicted **zero hate speech** for every Hindi test sample β€” it classified everything as non-hate. This means:
289
+ - Specificity = 1.0 βœ“ (no false positives β€” because it never predicts hate at all)
290
+ - Recall = 0.0 (catches zero actual hate speech)
291
+ - F1 = 0.0 (completely useless for Hindi at this stage)
292
+
293
+ This is the strongest evidence that English-first is the wrong order β€” the model becomes so tuned to English patterns that it cannot generalise to Hindi at all.
294
+
295
+ | Phase | Training Curves | Confusion Matrix | ROC | PR | F1 Curve |
296
  |---|---|---|---|---|---|
297
  | English | ![](output/figures/english_to_hindi_to_hinglish/english_to_hindi_to_hinglish_[english]_curves.png) | ![](output/figures/english_to_hindi_to_hinglish/english_to_hindi_to_hinglish_[english]_cm.png) | ![](output/figures/english_to_hindi_to_hinglish/english_to_hindi_to_hinglish_[english]_roc.png) | ![](output/figures/english_to_hindi_to_hinglish/english_to_hindi_to_hinglish_[english]_pr.png) | ![](output/figures/english_to_hindi_to_hinglish/english_to_hindi_to_hinglish_[english]_f1.png) |
298
  | Hindi | ![](output/figures/english_to_hindi_to_hinglish/english_to_hindi_to_hinglish_[hindi]_curves.png) | ![](output/figures/english_to_hindi_to_hinglish/english_to_hindi_to_hinglish_[hindi]_cm.png) | ![](output/figures/english_to_hindi_to_hinglish/english_to_hindi_to_hinglish_[hindi]_roc.png) | ![](output/figures/english_to_hindi_to_hinglish/english_to_hindi_to_hinglish_[hindi]_pr.png) | ![](output/figures/english_to_hindi_to_hinglish/english_to_hindi_to_hinglish_[hindi]_f1.png) |
 
303
 
304
  ### Strategy 2: English β†’ Hinglish β†’ Hindi β†’ Full
305
 
306
+ | Phase | Accuracy | F1 | ROC-AUC |
307
+ |---|---|---|---|
308
+ | English | 0.7721 | 0.7743 | 0.8525 |
309
+ | Hinglish | 0.6631 | 0.5460 | 0.6899 |
310
+ | Hindi | 0.5810 | 0.4444 | 0.5975 |
311
+ | Full | 0.6813 | 0.6244 | 0.7535 |
312
+
313
+ | Phase | Training Curves | Confusion Matrix | ROC | PR | F1 Curve |
314
  |---|---|---|---|---|---|
315
  | English | ![](output/figures/english_to_hinglish_to_hindi/english_to_hinglish_to_hindi_[english]_curves.png) | ![](output/figures/english_to_hinglish_to_hindi/english_to_hinglish_to_hindi_[english]_cm.png) | ![](output/figures/english_to_hinglish_to_hindi/english_to_hinglish_to_hindi_[english]_roc.png) | ![](output/figures/english_to_hinglish_to_hindi/english_to_hinglish_to_hindi_[english]_pr.png) | ![](output/figures/english_to_hinglish_to_hindi/english_to_hinglish_to_hindi_[english]_f1.png) |
316
  | Hinglish | ![](output/figures/english_to_hinglish_to_hindi/english_to_hinglish_to_hindi_[hinglish]_curves.png) | ![](output/figures/english_to_hinglish_to_hindi/english_to_hinglish_to_hindi_[hinglish]_cm.png) | ![](output/figures/english_to_hinglish_to_hindi/english_to_hinglish_to_hindi_[hinglish]_roc.png) | ![](output/figures/english_to_hinglish_to_hindi/english_to_hinglish_to_hindi_[hinglish]_pr.png) | ![](output/figures/english_to_hinglish_to_hindi/english_to_hinglish_to_hindi_[hinglish]_f1.png) |
 
319
 
320
  ---
321
 
322
+ ### Strategy 3: Hindi β†’ English β†’ Hinglish β†’ Full ⭐ BEST MODEL
323
+
324
+ | Phase | Accuracy | F1 | ROC-AUC |
325
+ |---|---|---|---|
326
+ | Hindi | 0.5662 | 0.2860 | 0.5748 |
327
+ | English | 0.7780 | 0.7830 | 0.8549 |
328
+ | Hinglish | 0.6880 | 0.5641 | 0.7172 |
329
+ | **Full** | **0.6854** | **0.6419** | **0.7528** |
330
+
331
+ Starting with the hardest language (Hindi) builds robustness. Despite the rough start, the model recovers strongly and achieves the best final F1.
332
 
333
+ | Phase | Training Curves | Confusion Matrix | ROC | PR | F1 Curve |
334
  |---|---|---|---|---|---|
335
  | Hindi | ![](output/figures/hindi_to_english_to_hinglish/hindi_to_english_to_hinglish_[hindi]_curves.png) | ![](output/figures/hindi_to_english_to_hinglish/hindi_to_english_to_hinglish_[hindi]_cm.png) | ![](output/figures/hindi_to_english_to_hinglish/hindi_to_english_to_hinglish_[hindi]_roc.png) | ![](output/figures/hindi_to_english_to_hinglish/hindi_to_english_to_hinglish_[hindi]_pr.png) | ![](output/figures/hindi_to_english_to_hinglish/hindi_to_english_to_hinglish_[hindi]_f1.png) |
336
  | English | ![](output/figures/hindi_to_english_to_hinglish/hindi_to_english_to_hinglish_[english]_curves.png) | ![](output/figures/hindi_to_english_to_hinglish/hindi_to_english_to_hinglish_[english]_cm.png) | ![](output/figures/hindi_to_english_to_hinglish/hindi_to_english_to_hinglish_[english]_roc.png) | ![](output/figures/hindi_to_english_to_hinglish/hindi_to_english_to_hinglish_[english]_pr.png) | ![](output/figures/hindi_to_english_to_hinglish/hindi_to_english_to_hinglish_[english]_f1.png) |
 
341
 
342
  ### Strategy 4: Hindi β†’ Hinglish β†’ English β†’ Full
343
 
344
+ | Phase | Accuracy | F1 | ROC-AUC |
345
+ |---|---|---|---|
346
+ | Hindi | 0.5779 | 0.3898 | 0.5972 |
347
+ | Hinglish | 0.6986 | 0.5289 | 0.7109 |
348
+ | English | 0.7780 | 0.7816 | 0.8563 |
349
+ | Full | 0.6865 | 0.6364 | 0.7507 |
350
+
351
+ | Phase | Training Curves | Confusion Matrix | ROC | PR | F1 Curve |
352
  |---|---|---|---|---|---|
353
  | Hindi | ![](output/figures/hindi_to_hinglish_to_english/hindi_to_hinglish_to_english_[hindi]_curves.png) | ![](output/figures/hindi_to_hinglish_to_english/hindi_to_hinglish_to_english_[hindi]_cm.png) | ![](output/figures/hindi_to_hinglish_to_english/hindi_to_hinglish_to_english_[hindi]_roc.png) | ![](output/figures/hindi_to_hinglish_to_english/hindi_to_hinglish_to_english_[hindi]_pr.png) | ![](output/figures/hindi_to_hinglish_to_english/hindi_to_hinglish_to_english_[hindi]_f1.png) |
354
  | Hinglish | ![](output/figures/hindi_to_hinglish_to_english/hindi_to_hinglish_to_english_[hinglish]_curves.png) | ![](output/figures/hindi_to_hinglish_to_english/hindi_to_hinglish_to_english_[hinglish]_cm.png) | ![](output/figures/hindi_to_hinglish_to_english/hindi_to_hinglish_to_english_[hinglish]_roc.png) | ![](output/figures/hindi_to_hinglish_to_english/hindi_to_hinglish_to_english_[hinglish]_pr.png) | ![](output/figures/hindi_to_hinglish_to_english/hindi_to_hinglish_to_english_[hinglish]_f1.png) |
 
359
 
360
  ### Strategy 5: Hinglish β†’ English β†’ Hindi β†’ Full
361
 
362
+ | Phase | Accuracy | F1 | ROC-AUC |
363
+ |---|---|---|---|
364
+ | Hinglish | 0.6652 | 0.5119 | 0.6692 |
365
+ | English | 0.7716 | 0.7829 | 0.8484 |
366
+ | Hindi | 0.5638 | 0.2466 | 0.5982 |
367
+ | Full | 0.6778 | 0.6285 | 0.7521 |
368
+
369
+ | Phase | Training Curves | Confusion Matrix | ROC | PR | F1 Curve |
370
  |---|---|---|---|---|---|
371
  | Hinglish | ![](output/figures/hinglish_to_english_to_hindi/hinglish_to_english_to_hindi_[hinglish]_curves.png) | ![](output/figures/hinglish_to_english_to_hindi/hinglish_to_english_to_hindi_[hinglish]_cm.png) | ![](output/figures/hinglish_to_english_to_hindi/hinglish_to_english_to_hindi_[hinglish]_roc.png) | ![](output/figures/hinglish_to_english_to_hindi/hinglish_to_english_to_hindi_[hinglish]_pr.png) | ![](output/figures/hinglish_to_english_to_hindi/hinglish_to_english_to_hindi_[hinglish]_f1.png) |
372
  | English | ![](output/figures/hinglish_to_english_to_hindi/hinglish_to_english_to_hindi_[english]_curves.png) | ![](output/figures/hinglish_to_english_to_hindi/hinglish_to_english_to_hindi_[english]_cm.png) | ![](output/figures/hinglish_to_english_to_hindi/hinglish_to_english_to_hindi_[english]_roc.png) | ![](output/figures/hinglish_to_english_to_hindi/hinglish_to_english_to_hindi_[english]_pr.png) | ![](output/figures/hinglish_to_english_to_hindi/hinglish_to_english_to_hindi_[english]_f1.png) |
 
377
 
378
  ### Strategy 6: Hinglish β†’ Hindi β†’ English β†’ Full
379
 
380
+ | Phase | Accuracy | F1 | ROC-AUC |
381
+ |---|---|---|---|
382
+ | Hinglish | 0.6837 | 0.5369 | 0.6929 |
383
+ | Hindi | 0.5924 | 0.4656 | 0.5964 |
384
+ | English | 0.7765 | 0.7811 | 0.8534 |
385
+ | Full | 0.6845 | 0.6301 | 0.7548 |
386
+
387
+ | Phase | Training Curves | Confusion Matrix | ROC | PR | F1 Curve |
388
  |---|---|---|---|---|---|
389
  | Hinglish | ![](output/figures/hinglish_to_hindi_to_english/hinglish_to_hindi_to_english_[hinglish]_curves.png) | ![](output/figures/hinglish_to_hindi_to_english/hinglish_to_hindi_to_english_[hinglish]_cm.png) | ![](output/figures/hinglish_to_hindi_to_english/hinglish_to_hindi_to_english_[hinglish]_roc.png) | ![](output/figures/hinglish_to_hindi_to_english/hinglish_to_hindi_to_english_[hinglish]_pr.png) | ![](output/figures/hinglish_to_hindi_to_english/hinglish_to_hindi_to_english_[hinglish]_f1.png) |
390
  | Hindi | ![](output/figures/hinglish_to_hindi_to_english/hinglish_to_hindi_to_english_[hindi]_curves.png) | ![](output/figures/hinglish_to_hindi_to_english/hinglish_to_hindi_to_english_[hindi]_cm.png) | ![](output/figures/hinglish_to_hindi_to_english/hinglish_to_hindi_to_english_[hindi]_roc.png) | ![](output/figures/hinglish_to_hindi_to_english/hinglish_to_hindi_to_english_[hindi]_pr.png) | ![](output/figures/hinglish_to_hindi_to_english/hinglish_to_hindi_to_english_[hindi]_f1.png) |
 
393
 
394
  ---
395
 
396
+ ## 8. All Model Checkpoints
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
397
 
398
+ All 6 trained models are available as archives in the `models/` folder of this repo. Each filename encodes the training order.
399
 
400
+ | File | Strategy | Final F1 | Final AUC |
401
+ |---|---|---|---|
402
+ | `model.h5` | Hindi β†’ English β†’ Hinglish β†’ Full ⭐ | 0.6419 | 0.7528 |
403
+ | `models/planB_hindi_to_english_to_hinglish_Full.h5` | Hindi β†’ English β†’ Hinglish β†’ Full | 0.6419 | 0.7528 |
404
+ | `models/planB_hindi_to_hinglish_to_english_Full.h5` | Hindi β†’ Hinglish β†’ English β†’ Full | 0.6364 | 0.7507 |
405
+ | `models/planB_hinglish_to_hindi_to_english_Full.h5` | Hinglish β†’ Hindi β†’ English β†’ Full | 0.6301 | 0.7548 |
406
+ | `models/planB_english_to_hinglish_to_hindi_Full.h5` | English β†’ Hinglish β†’ Hindi β†’ Full | 0.6244 | 0.7535 |
407
+ | `models/planB_hinglish_to_english_to_hindi_Full.h5` | Hinglish β†’ English β†’ Hindi β†’ Full | 0.6285 | 0.7521 |
408
+ | `models/planB_english_to_hindi_to_hinglish_Full.h5` | English β†’ Hindi β†’ Hinglish β†’ Full | 0.5923 | 0.7599 |
409
 
410
+ ---
 
 
 
411
 
412
+ ## 9. How to Use
 
 
 
413
 
414
+ ```python
415
+ import json
416
+ import numpy as np
417
+ import tensorflow as tf
418
+ from tensorflow.keras.preprocessing.text import tokenizer_from_json
419
+ from tensorflow.keras.preprocessing.sequence import pad_sequences
420
+ from huggingface_hub import hf_hub_download
421
+
422
+ # Load tokenizer
423
+ tokenizer_path = hf_hub_download(repo_id="tuklu/SASC", filename="tokenizer.json")
424
+ with open(tokenizer_path) as f:
425
+ tokenizer = tokenizer_from_json(f.read())
426
+
427
+ # Load best model
428
+ model_path = hf_hub_download(repo_id="tuklu/SASC", filename="model.h5")
429
+ model = tf.keras.models.load_model(model_path)
430
+
431
+ # Predict
432
+ texts = ["I hate all of them", "Have a great day!"]
433
+ sequences = tokenizer.texts_to_sequences(texts)
434
+ padded = pad_sequences(sequences, maxlen=100)
435
+ probs = model.predict(padded).flatten()
436
+
437
+ for text, prob in zip(texts, probs):
438
+ label = "Hate Speech" if prob > 0.5 else "Non-Hate"
439
+ print(f"{label} ({prob:.3f}): {text}")
440
  ```
441
 
 
 
442
  ---
443
 
444
+ ## Citation
445
 
446
  ```
447
+ @misc{sasc2026,
448
+ title={Multilingual Hate Speech Detection via Sequential Transfer Learning},
449
+ author={tuklu},
450
+ year={2026},
451
+ publisher={HuggingFace},
452
+ url={https://huggingface.co/tuklu/SASC}
453
+ }
454
  ```