tuklu commited on
Commit
9989809
·
verified ·
1 Parent(s): 0ead2d7

Update README with inline figures and correct paths

Browse files
Files changed (1) hide show
  1. README.md +185 -70
README.md CHANGED
@@ -49,9 +49,12 @@ model-index:
49
  2. [The Dataset](#2-the-dataset)
50
  3. [Model Architecture](#3-model-architecture)
51
  4. [Training Strategy](#4-training-strategy)
52
- 5. [Results](#5-results)
53
- 6. [Figures](#6-figures)
54
- 7. [How to Use](#7-how-to-use)
 
 
 
55
 
56
  ---
57
 
@@ -59,9 +62,7 @@ model-index:
59
 
60
  This is **v2** of the SASC sequential transfer learning experiment.
61
 
62
- While v1 tested all 6 possible language orderings with 8 epochs per phase, **v2 focuses on a single fixed strategy** — `Hinglish → Hindi → English → Full` — but trains for **50 epochs per phase (200 total)**. This deeper training reveals how well knowledge accumulates across languages when starting from the hardest (most data-scarce, code-mixed) language first.
63
-
64
- After every phase the model is evaluated on **all three individual language test sets as well as the full test set**, giving a 4×4 cross-evaluation matrix.
65
 
66
  ---
67
 
@@ -87,120 +88,234 @@ Dataset: [tuklu/nprism](https://huggingface.co/datasets/tuklu/nprism)
87
  | Non-Hate (0) | 15,799 | 53.5% |
88
  | Hate (1) | 13,707 | 46.5% |
89
 
90
- ![Language Distribution](figures/language_distribution.png)
 
 
91
 
92
  ---
93
 
94
  ## 3. Model Architecture
95
 
96
  ```
97
- Embedding (GloVe 300d, frozen, vocab=50k, maxlen=100)
98
-
 
 
99
  Bidirectional LSTM (128 units)
100
-
 
101
  Dropout (0.5)
102
-
103
- Dense (64, ReLU)
104
-
105
- Dense (1, Sigmoid)
 
106
  ```
107
 
108
  - **Optimizer:** Adam
109
- - **Loss:** Binary Crossentropy
110
- - **Batch size:** 32 (language phases), 64 (full phase)
 
111
 
112
  ---
113
 
114
  ## 4. Training Strategy
115
 
116
- | Phase | Data | Epochs | Batch Size |
117
- |---|---|---|---|
118
- | 1 — Hinglish | Hinglish train subset | 50 | 32 |
119
- | 2 — Hindi | Hindi train subset | 50 | 32 |
120
- | 3 — English | English train subset | 50 | 32 |
121
- | 4 — Full | Full shuffled train | 50 | 64 |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
122
 
123
- The same model weights carry forward through all 4 phases — no reset between languages.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
124
 
125
  ---
126
 
127
- ## 5. Results
128
 
129
- Full cross-evaluation table (Phase × Eval Language):
130
 
131
  | Phase | Eval On | Accuracy | Balanced Acc | Precision | Recall | Specificity | F1 | ROC-AUC |
132
  |---|---|---|---|---|---|---|---|---|
133
- | hinglish | english | 0.5171 | 0.5125 | 0.5738 | 0.0916 | 0.9334 | 0.1580 | 0.5620 |
134
- | hinglish | hindi | 0.4493 | 0.5000 | 0.4493 | 1.0000 | 0.0000 | 0.6200 | 0.5234 |
135
  | hinglish | hinglish | 0.6688 | 0.6378 | 0.6058 | 0.4848 | 0.7908 | 0.5386 | 0.6579 |
 
 
136
  | hinglish | full | 0.5190 | 0.5133 | 0.4803 | 0.4331 | 0.5935 | 0.4555 | 0.5243 |
137
- | hindi | english | 0.4711 | 0.4744 | 0.4789 | 0.7878 | 0.1611 | 0.5957 | 0.4292 |
138
- | hindi | hindi | 0.5834 | 0.5730 | 0.5420 | 0.4705 | 0.6756 | 0.5037 | 0.5949 |
139
  | hindi | hinglish | 0.5409 | 0.4885 | 0.3761 | 0.2299 | 0.7470 | 0.2854 | 0.4771 |
 
 
140
  | hindi | full | 0.5190 | 0.5251 | 0.4859 | 0.6111 | 0.4390 | 0.5414 | 0.5255 |
141
- | english | english | 0.7721 | 0.7726 | 0.7453 | 0.8190 | 0.7262 | 0.7804 | 0.8458 |
142
- | english | hindi | 0.5424 | 0.5399 | 0.4912 | 0.5150 | 0.5648 | 0.5028 | 0.5377 |
143
  | english | hinglish | 0.4115 | 0.4938 | 0.3955 | 0.9002 | 0.0875 | 0.5495 | 0.4572 |
 
 
144
  | english | full | 0.6395 | 0.6458 | 0.5901 | 0.7337 | 0.5578 | 0.6541 | 0.6913 |
145
- | **Full** | **english** | **0.7747** | **0.7746** | **0.7747** | **0.7678** | **0.7815** | **0.7712** | **0.8476** |
146
- | **Full** | **hindi** | **0.5748** | **0.5676** | **0.5286** | **0.4958** | **0.6393** | **0.5117** | **0.5941** |
147
  | **Full** | **hinglish** | **0.6326** | **0.6101** | **0.5426** | **0.4991** | **0.7210** | **0.5200** | **0.6161** |
 
 
148
  | **Full** | **full** | **0.6866** | **0.6839** | **0.6687** | **0.6449** | **0.7228** | **0.6566** | **0.7556** |
149
 
150
  ### Key Observations
151
 
152
- - **English phase is the turning point**: F1 on full test jumps from 0.541 0.654 after seeing English data, reflecting GloVe's English-centric embeddings.
153
- - **Starting from Hinglish** forces the model to generalise from noisy code-mixed text first — the model reaches Hinglish F1=0.539 on the Hinglish test after just the Hinglish phase.
154
- - **Final Full phase** improves balanced accuracy and specificity across all languages, reaching AUC=0.756 on the full test set.
155
- - Hindi remains the hardest language to generalise to (F1=0.512 after Full phase), consistent with GloVe having limited Hindi coverage.
 
156
 
157
  ---
158
 
159
- ## 6. Figures
160
-
161
- Training curves and evaluation plots for every phase × language combination are in the `figures/hinglish_to_hindi_to_english/` directory.
162
-
163
- **Training curves (Accuracy & Loss):**
164
- - `Phase_hinglish_curves.png`
165
- - `Phase_hindi_curves.png`
166
- - `Phase_english_curves.png`
167
- - `Phase_Full_curves.png`
168
-
169
- **Per-phase evaluation (CM / ROC / PR / F1 curve) for each language + full:**
170
- - `Phase_{phase}_eval_{lang}_cm.png`
171
- - `Phase_{phase}_eval_{lang}_roc.png`
172
- - `Phase_{phase}_eval_{lang}_pr.png`
173
- - `Phase_{phase}_eval_{lang}_f1.png`
174
-
175
- ---
176
-
177
- ## 7. How to Use
178
 
179
  ```python
180
- import numpy as np
181
  import json
182
- from tensorflow.keras.models import load_model
 
 
183
  from tensorflow.keras.preprocessing.sequence import pad_sequences
 
184
 
185
- # Load model
186
- model = load_model("hinglish_hindi_english_full.h5")
 
 
187
 
188
- # Load tokenizer
189
- with open("tokenizer.json") as f:
190
- from tensorflow.keras.preprocessing.text import tokenizer_from_json
191
- tokenizer = tokenizer_from_json(json.load(f) if isinstance(json.load(open("tokenizer.json")), str) else open("tokenizer.json").read())
192
 
193
  # Predict
194
- texts = ["your text here"]
195
- seqs = pad_sequences(tokenizer.texts_to_sequences(texts), maxlen=100)
196
- prob = model.predict(seqs)[0][0]
197
- label = "Hate" if prob > 0.5 else "Non-Hate"
198
- print(f"{label} ({prob:.4f})")
 
 
 
199
  ```
200
 
201
  ---
202
 
203
  ## Related
204
 
205
- - **v1 (all 6 strategies, 8 epochs):** [tuklu/SASC](https://huggingface.co/tuklu/SASC)
206
  - **Dataset:** [tuklu/nprism](https://huggingface.co/datasets/tuklu/nprism)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
49
  2. [The Dataset](#2-the-dataset)
50
  3. [Model Architecture](#3-model-architecture)
51
  4. [Training Strategy](#4-training-strategy)
52
+ 5. [Phase 1 — Hinglish](#5-phase-1--hinglish)
53
+ 6. [Phase 2 — Hindi](#6-phase-2--hindi)
54
+ 7. [Phase 3 — English](#7-phase-3--english)
55
+ 8. [Phase 4 — Full Dataset](#8-phase-4--full-dataset)
56
+ 9. [Full Results Table](#9-full-results-table)
57
+ 10. [How to Use](#10-how-to-use)
58
 
59
  ---
60
 
 
62
 
63
  This is **v2** of the SASC sequential transfer learning experiment.
64
 
65
+ v1 ran all 6 permutations of [English, Hindi, Hinglish] with **8 epochs** per phase. v2 focuses on a single strategy — `Hinglish → Hindi → English → Full` — but trains for **50 epochs per phase (200 total)**. The key new addition: after every phase the model is evaluated on **all three individual language test sets AND the full test set**, giving a complete 4×4 cross-evaluation matrix showing how knowledge transfers across languages.
 
 
66
 
67
  ---
68
 
 
88
  | Non-Hate (0) | 15,799 | 53.5% |
89
  | Hate (1) | 13,707 | 46.5% |
90
 
91
+ ![Language Distribution](output_v2/figures/language_distribution.png)
92
+
93
+ The dataset is dominated by English (50.8%). GloVe embeddings are also English-centric, which directly explains why the English phase produces the sharpest accuracy jump regardless of training order.
94
 
95
  ---
96
 
97
  ## 3. Model Architecture
98
 
99
  ```
100
+ Input: Text sequence (max 100 tokens)
101
+
102
+ GloVe Embedding Layer (vocab: 50,000 × 300d) — FROZEN
103
+
104
  Bidirectional LSTM (128 units)
105
+ → reads sentence left-to-right AND right-to-left
106
+
107
  Dropout (0.5)
108
+
109
+ Dense Layer (64 neurons, ReLU)
110
+
111
+ Output Layer (1 neuron, Sigmoid)
112
+ → > 0.5 = Hate Speech | ≤ 0.5 = Non-Hate
113
  ```
114
 
115
  - **Optimizer:** Adam
116
+ - **Loss:** Binary Cross-Entropy
117
+ - **Max sequence length:** 100 tokens
118
+ - **Vocab size:** 50,000
119
 
120
  ---
121
 
122
  ## 4. Training Strategy
123
 
124
+ | Phase | Training Data | Epochs | Batch Size | Samples |
125
+ |---|---|---|---|---|
126
+ | 1 — Hinglish | Hinglish subset | 50 | 32 | ~2,908 |
127
+ | 2 — Hindi | Hindi subset | 50 | 32 | ~5,940 |
128
+ | 3 — English | English subset | 50 | 32 | ~8,856 |
129
+ | 4 — Full | All shuffled | 50 | 64 | 17,704 |
130
+
131
+ The **same model** carries its weights through all 4 phases — no resets between languages. After each phase the model is evaluated against all three language-specific test sets and the full test set.
132
+
133
+ ---
134
+
135
+ ## 5. Phase 1 — Hinglish
136
+
137
+ **Training on Hinglish only** (2,908 samples, 50 epochs). The model starts cold. Hinglish is code-mixed and GloVe has limited coverage — the model learns from sequential patterns rather than word semantics.
138
+
139
+ ![Hinglish Training Curves](output_v2/figures/hinglish_to_hindi_to_english/Phase_hinglish_curves.png)
140
+
141
+ ### Evaluation after Phase 1
142
+
143
+ | Eval On | Accuracy | Balanced Acc | Precision | Recall | Specificity | F1 | ROC-AUC |
144
+ |---|---|---|---|---|---|---|---|
145
+ | Hinglish | 0.6688 | 0.6378 | 0.6058 | 0.4848 | 0.7908 | 0.5386 | 0.6579 |
146
+ | Hindi | 0.4493 | 0.5000 | 0.4493 | 1.0000 | 0.0000 | 0.6200 | 0.5234 |
147
+ | English | 0.5171 | 0.5125 | 0.5738 | 0.0916 | 0.9334 | 0.1580 | 0.5620 |
148
+ | Full | 0.5190 | 0.5133 | 0.4803 | 0.4331 | 0.5935 | 0.4555 | 0.5243 |
149
+
150
+ The Hindi result (Recall=1.0, Specificity=0.0) shows the model predicts **everything as hate** on Hindi — it has no Hindi-specific knowledge yet. English performance is near-random. Hinglish F1=0.539 shows the model has learned something useful from its own language.
151
+
152
+ | Eval On | Confusion Matrix | ROC | Precision-Recall | F1 vs Threshold |
153
+ |---|---|---|---|---|
154
+ | Hinglish | ![](output_v2/figures/hinglish_to_hindi_to_english/Phase_hinglish_eval_hinglish_cm.png) | ![](output_v2/figures/hinglish_to_hindi_to_english/Phase_hinglish_eval_hinglish_roc.png) | ![](output_v2/figures/hinglish_to_hindi_to_english/Phase_hinglish_eval_hinglish_pr.png) | ![](output_v2/figures/hinglish_to_hindi_to_english/Phase_hinglish_eval_hinglish_f1.png) |
155
+ | Hindi | ![](output_v2/figures/hinglish_to_hindi_to_english/Phase_hinglish_eval_hindi_cm.png) | ![](output_v2/figures/hinglish_to_hindi_to_english/Phase_hinglish_eval_hindi_roc.png) | ![](output_v2/figures/hinglish_to_hindi_to_english/Phase_hinglish_eval_hindi_pr.png) | ![](output_v2/figures/hinglish_to_hindi_to_english/Phase_hinglish_eval_hindi_f1.png) |
156
+ | English | ![](output_v2/figures/hinglish_to_hindi_to_english/Phase_hinglish_eval_english_cm.png) | ![](output_v2/figures/hinglish_to_hindi_to_english/Phase_hinglish_eval_english_roc.png) | ![](output_v2/figures/hinglish_to_hindi_to_english/Phase_hinglish_eval_english_pr.png) | ![](output_v2/figures/hinglish_to_hindi_to_english/Phase_hinglish_eval_english_f1.png) |
157
+ | Full | ![](output_v2/figures/hinglish_to_hindi_to_english/Phase_hinglish_eval_full_cm.png) | ![](output_v2/figures/hinglish_to_hindi_to_english/Phase_hinglish_eval_full_roc.png) | ![](output_v2/figures/hinglish_to_hindi_to_english/Phase_hinglish_eval_full_pr.png) | ![](output_v2/figures/hinglish_to_hindi_to_english/Phase_hinglish_eval_full_f1.png) |
158
+
159
+ ---
160
+
161
+ ## 6. Phase 2 — Hindi
162
+
163
+ **Training on Hindi** (5,940 samples, 50 epochs). GloVe has limited Hindi coverage so the model must rely on contextual patterns. The struggle here is deliberate — it builds language-agnostic hate detection features.
164
+
165
+ ![Hindi Training Curves](output_v2/figures/hinglish_to_hindi_to_english/Phase_hindi_curves.png)
166
+
167
+ ### Evaluation after Phase 2
168
+
169
+ | Eval On | Accuracy | Balanced Acc | Precision | Recall | Specificity | F1 | ROC-AUC |
170
+ |---|---|---|---|---|---|---|---|
171
+ | Hinglish | 0.5409 | 0.4885 | 0.3761 | 0.2299 | 0.7470 | 0.2854 | 0.4771 |
172
+ | Hindi | 0.5834 | 0.5730 | 0.5420 | 0.4705 | 0.6756 | 0.5037 | 0.5949 |
173
+ | English | 0.4711 | 0.4744 | 0.4789 | 0.7878 | 0.1611 | 0.5957 | 0.4292 |
174
+ | Full | 0.5190 | 0.5251 | 0.4859 | 0.6111 | 0.4390 | 0.5414 | 0.5255 |
175
+
176
+ Hindi F1 improves to 0.504. Hinglish drops — the model has partially overwritten Hinglish-specific patterns. English recall spikes (high false positives) showing the model is now biased toward predicting hate. This is the expected "catastrophic interference" that the Full phase resolves.
177
+
178
+ | Eval On | Confusion Matrix | ROC | Precision-Recall | F1 vs Threshold |
179
+ |---|---|---|---|---|
180
+ | Hinglish | ![](output_v2/figures/hinglish_to_hindi_to_english/Phase_hindi_eval_hinglish_cm.png) | ![](output_v2/figures/hinglish_to_hindi_to_english/Phase_hindi_eval_hinglish_roc.png) | ![](output_v2/figures/hinglish_to_hindi_to_english/Phase_hindi_eval_hinglish_pr.png) | ![](output_v2/figures/hinglish_to_hindi_to_english/Phase_hindi_eval_hinglish_f1.png) |
181
+ | Hindi | ![](output_v2/figures/hinglish_to_hindi_to_english/Phase_hindi_eval_hindi_cm.png) | ![](output_v2/figures/hinglish_to_hindi_to_english/Phase_hindi_eval_hindi_roc.png) | ![](output_v2/figures/hinglish_to_hindi_to_english/Phase_hindi_eval_hindi_pr.png) | ![](output_v2/figures/hinglish_to_hindi_to_english/Phase_hindi_eval_hindi_f1.png) |
182
+ | English | ![](output_v2/figures/hinglish_to_hindi_to_english/Phase_hindi_eval_english_cm.png) | ![](output_v2/figures/hinglish_to_hindi_to_english/Phase_hindi_eval_english_roc.png) | ![](output_v2/figures/hinglish_to_hindi_to_english/Phase_hindi_eval_english_pr.png) | ![](output_v2/figures/hinglish_to_hindi_to_english/Phase_hindi_eval_english_f1.png) |
183
+ | Full | ![](output_v2/figures/hinglish_to_hindi_to_english/Phase_hindi_eval_full_cm.png) | ![](output_v2/figures/hinglish_to_hindi_to_english/Phase_hindi_eval_full_roc.png) | ![](output_v2/figures/hinglish_to_hindi_to_english/Phase_hindi_eval_full_pr.png) | ![](output_v2/figures/hinglish_to_hindi_to_english/Phase_hindi_eval_full_f1.png) |
184
+
185
+ ---
186
+
187
+ ## 7. Phase 3 — English
188
+
189
+ **Training on English** (8,856 samples, 50 epochs). This is the turning point. GloVe embeddings align well with English — the model jumps sharply and the English-phase knowledge partially generalises back to the other languages.
190
+
191
+ ![English Training Curves](output_v2/figures/hinglish_to_hindi_to_english/Phase_english_curves.png)
192
+
193
+ ### Evaluation after Phase 3
194
+
195
+ | Eval On | Accuracy | Balanced Acc | Precision | Recall | Specificity | F1 | ROC-AUC |
196
+ |---|---|---|---|---|---|---|---|
197
+ | Hinglish | 0.4115 | 0.4938 | 0.3955 | 0.9002 | 0.0875 | 0.5495 | 0.4572 |
198
+ | Hindi | 0.5424 | 0.5399 | 0.4912 | 0.5150 | 0.5648 | 0.5028 | 0.5377 |
199
+ | **English** | **0.7721** | **0.7726** | **0.7453** | **0.8190** | **0.7262** | **0.7804** | **0.8458** |
200
+ | Full | 0.6395 | 0.6458 | 0.5901 | 0.7337 | 0.5578 | 0.6541 | 0.6913 |
201
+
202
+ English F1 leaps to 0.780 — the model now performs strongly on its native language. Full AUC reaches 0.691. Hinglish specificity collapses again (high recall, low precision) — the model over-predicts hate on unseen languages after English fine-tuning.
203
+
204
+ | Eval On | Confusion Matrix | ROC | Precision-Recall | F1 vs Threshold |
205
+ |---|---|---|---|---|
206
+ | Hinglish | ![](output_v2/figures/hinglish_to_hindi_to_english/Phase_english_eval_hinglish_cm.png) | ![](output_v2/figures/hinglish_to_hindi_to_english/Phase_english_eval_hinglish_roc.png) | ![](output_v2/figures/hinglish_to_hindi_to_english/Phase_english_eval_hinglish_pr.png) | ![](output_v2/figures/hinglish_to_hindi_to_english/Phase_english_eval_hinglish_f1.png) |
207
+ | Hindi | ![](output_v2/figures/hinglish_to_hindi_to_english/Phase_english_eval_hindi_cm.png) | ![](output_v2/figures/hinglish_to_hindi_to_english/Phase_english_eval_hindi_roc.png) | ![](output_v2/figures/hinglish_to_hindi_to_english/Phase_english_eval_hindi_pr.png) | ![](output_v2/figures/hinglish_to_hindi_to_english/Phase_english_eval_hindi_f1.png) |
208
+ | English | ![](output_v2/figures/hinglish_to_hindi_to_english/Phase_english_eval_english_cm.png) | ![](output_v2/figures/hinglish_to_hindi_to_english/Phase_english_eval_english_roc.png) | ![](output_v2/figures/hinglish_to_hindi_to_english/Phase_english_eval_english_pr.png) | ![](output_v2/figures/hinglish_to_hindi_to_english/Phase_english_eval_english_f1.png) |
209
+ | Full | ![](output_v2/figures/hinglish_to_hindi_to_english/Phase_english_eval_full_cm.png) | ![](output_v2/figures/hinglish_to_hindi_to_english/Phase_english_eval_full_roc.png) | ![](output_v2/figures/hinglish_to_hindi_to_english/Phase_english_eval_full_pr.png) | ![](output_v2/figures/hinglish_to_hindi_to_english/Phase_english_eval_full_f1.png) |
210
+
211
+ ---
212
+
213
+ ## 8. Phase 4 — Full Dataset
214
+
215
+ **Training on the full shuffled dataset** (17,704 samples, 50 epochs). This consolidation phase exposes the model to all three languages simultaneously, balancing out the per-language biases accumulated during sequential training.
216
+
217
+ ![Full Training Curves](output_v2/figures/hinglish_to_hindi_to_english/Phase_Full_curves.png)
218
 
219
+ ### Evaluation after Phase 4 (Final Model)
220
+
221
+ | Eval On | Accuracy | Balanced Acc | Precision | Recall | Specificity | F1 | ROC-AUC |
222
+ |---|---|---|---|---|---|---|---|
223
+ | Hinglish | 0.6326 | 0.6101 | 0.5426 | 0.4991 | 0.7210 | 0.5200 | 0.6161 |
224
+ | Hindi | 0.5748 | 0.5676 | 0.5286 | 0.4958 | 0.6393 | 0.5117 | 0.5941 |
225
+ | **English** | **0.7747** | **0.7746** | **0.7747** | **0.7678** | **0.7815** | **0.7712** | **0.8476** |
226
+ | **Full** | **0.6866** | **0.6839** | **0.6687** | **0.6449** | **0.7228** | **0.6566** | **0.7556** |
227
+
228
+ The Full phase restores balance across all languages. Hinglish specificity recovers to 0.721 (from 0.088 after English phase). Full-dataset AUC reaches **0.756** — the best of all phases. English performance is preserved at F1=0.771 while Hinglish and Hindi both improve substantially from their post-English-phase collapse.
229
+
230
+ | Eval On | Confusion Matrix | ROC | Precision-Recall | F1 vs Threshold |
231
+ |---|---|---|---|---|
232
+ | Hinglish | ![](output_v2/figures/hinglish_to_hindi_to_english/Phase_Full_eval_hinglish_cm.png) | ![](output_v2/figures/hinglish_to_hindi_to_english/Phase_Full_eval_hinglish_roc.png) | ![](output_v2/figures/hinglish_to_hindi_to_english/Phase_Full_eval_hinglish_pr.png) | ![](output_v2/figures/hinglish_to_hindi_to_english/Phase_Full_eval_hinglish_f1.png) |
233
+ | Hindi | ![](output_v2/figures/hinglish_to_hindi_to_english/Phase_Full_eval_hindi_cm.png) | ![](output_v2/figures/hinglish_to_hindi_to_english/Phase_Full_eval_hindi_roc.png) | ![](output_v2/figures/hinglish_to_hindi_to_english/Phase_Full_eval_hindi_pr.png) | ![](output_v2/figures/hinglish_to_hindi_to_english/Phase_Full_eval_hindi_f1.png) |
234
+ | English | ![](output_v2/figures/hinglish_to_hindi_to_english/Phase_Full_eval_english_cm.png) | ![](output_v2/figures/hinglish_to_hindi_to_english/Phase_Full_eval_english_roc.png) | ![](output_v2/figures/hinglish_to_hindi_to_english/Phase_Full_eval_english_pr.png) | ![](output_v2/figures/hinglish_to_hindi_to_english/Phase_Full_eval_english_f1.png) |
235
+ | Full | ![](output_v2/figures/hinglish_to_hindi_to_english/Phase_Full_eval_full_cm.png) | ![](output_v2/figures/hinglish_to_hindi_to_english/Phase_Full_eval_full_roc.png) | ![](output_v2/figures/hinglish_to_hindi_to_english/Phase_Full_eval_full_pr.png) | ![](output_v2/figures/hinglish_to_hindi_to_english/Phase_Full_eval_full_f1.png) |
236
 
237
  ---
238
 
239
+ ## 9. Full Results Table
240
 
241
+ Complete 16-row cross-evaluation (Phase × Eval Language):
242
 
243
  | Phase | Eval On | Accuracy | Balanced Acc | Precision | Recall | Specificity | F1 | ROC-AUC |
244
  |---|---|---|---|---|---|---|---|---|
 
 
245
  | hinglish | hinglish | 0.6688 | 0.6378 | 0.6058 | 0.4848 | 0.7908 | 0.5386 | 0.6579 |
246
+ | hinglish | hindi | 0.4493 | 0.5000 | 0.4493 | 1.0000 | 0.0000 | 0.6200 | 0.5234 |
247
+ | hinglish | english | 0.5171 | 0.5125 | 0.5738 | 0.0916 | 0.9334 | 0.1580 | 0.5620 |
248
  | hinglish | full | 0.5190 | 0.5133 | 0.4803 | 0.4331 | 0.5935 | 0.4555 | 0.5243 |
 
 
249
  | hindi | hinglish | 0.5409 | 0.4885 | 0.3761 | 0.2299 | 0.7470 | 0.2854 | 0.4771 |
250
+ | hindi | hindi | 0.5834 | 0.5730 | 0.5420 | 0.4705 | 0.6756 | 0.5037 | 0.5949 |
251
+ | hindi | english | 0.4711 | 0.4744 | 0.4789 | 0.7878 | 0.1611 | 0.5957 | 0.4292 |
252
  | hindi | full | 0.5190 | 0.5251 | 0.4859 | 0.6111 | 0.4390 | 0.5414 | 0.5255 |
 
 
253
  | english | hinglish | 0.4115 | 0.4938 | 0.3955 | 0.9002 | 0.0875 | 0.5495 | 0.4572 |
254
+ | english | hindi | 0.5424 | 0.5399 | 0.4912 | 0.5150 | 0.5648 | 0.5028 | 0.5377 |
255
+ | english | english | 0.7721 | 0.7726 | 0.7453 | 0.8190 | 0.7262 | 0.7804 | 0.8458 |
256
  | english | full | 0.6395 | 0.6458 | 0.5901 | 0.7337 | 0.5578 | 0.6541 | 0.6913 |
 
 
257
  | **Full** | **hinglish** | **0.6326** | **0.6101** | **0.5426** | **0.4991** | **0.7210** | **0.5200** | **0.6161** |
258
+ | **Full** | **hindi** | **0.5748** | **0.5676** | **0.5286** | **0.4958** | **0.6393** | **0.5117** | **0.5941** |
259
+ | **Full** | **english** | **0.7747** | **0.7746** | **0.7747** | **0.7678** | **0.7815** | **0.7712** | **0.8476** |
260
  | **Full** | **full** | **0.6866** | **0.6839** | **0.6687** | **0.6449** | **0.7228** | **0.6566** | **0.7556** |
261
 
262
  ### Key Observations
263
 
264
+ - **English phase is the sharpest turning point** English F1 jumps from 0.596 (after Hindi) to 0.780 in one phase, driven by GloVe's English-centric embeddings.
265
+ - **Starting from Hinglish** forces generalisation from noise — the model reaches Hinglish F1=0.539 after only its own phase, a stronger start than Hinglish gets in most v1 orderings.
266
+ - **Catastrophic interference is visible** Hinglish specificity drops from 0.791 0.747 0.088 as the model progressively shifts language bias. The Full phase restores it to 0.721.
267
+ - **Final Full phase AUC = 0.756** matches the best v1 strategies despite a harder starting language, confirming the robustness of the Hinglish-first approach with deeper training.
268
+ - **Hindi remains the hardest** (F1=0.512 at final) — consistent with GloVe's limited Hindi vocabulary coverage.
269
 
270
  ---
271
 
272
+ ## 10. How to Use
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
273
 
274
  ```python
 
275
  import json
276
+ import numpy as np
277
+ import tensorflow as tf
278
+ from tensorflow.keras.preprocessing.text import tokenizer_from_json
279
  from tensorflow.keras.preprocessing.sequence import pad_sequences
280
+ from huggingface_hub import hf_hub_download
281
 
282
+ # Load tokenizer (from v1 repo — same dataset/split)
283
+ tokenizer_path = hf_hub_download(repo_id="tuklu/SASC", filename="tokenizer.json")
284
+ with open(tokenizer_path) as f:
285
+ tokenizer = tokenizer_from_json(f.read())
286
 
287
+ # Load model
288
+ model_path = hf_hub_download(repo_id="tuklu/SASCv2", filename="model.h5")
289
+ model = tf.keras.models.load_model(model_path)
 
290
 
291
  # Predict
292
+ texts = ["I hate all of them", "Have a great day!"]
293
+ sequences = tokenizer.texts_to_sequences(texts)
294
+ padded = pad_sequences(sequences, maxlen=100)
295
+ probs = model.predict(padded).flatten()
296
+
297
+ for text, prob in zip(texts, probs):
298
+ label = "Hate Speech" if prob > 0.5 else "Non-Hate"
299
+ print(f"{label} ({prob:.3f}): {text}")
300
  ```
301
 
302
  ---
303
 
304
  ## Related
305
 
306
+ - **v1 (all 6 strategies, 8 epochs each):** [tuklu/SASC](https://huggingface.co/tuklu/SASC)
307
  - **Dataset:** [tuklu/nprism](https://huggingface.co/datasets/tuklu/nprism)
308
+
309
+ ---
310
+
311
+ ## Citation
312
+
313
+ ```
314
+ @misc{sasc2026,
315
+ title={Multilingual Hate Speech Detection via Sequential Transfer Learning (v2)},
316
+ author={tuklu},
317
+ year={2026},
318
+ publisher={HuggingFace},
319
+ url={https://huggingface.co/tuklu/SASCv2}
320
+ }
321
+ ```