tuklu
/

SASC

@@ -1,102 +1,172 @@
-# Hate Speech Detection — Multilingual Sequential Transfer Learning
-### GloVe Embeddings + Bidirectional LSTM (BiLSTM)
 ---
-## What is this project about?
-This project builds a system that can automatically detect **hate speech** in text written in three languages:
-- **English** — standard English text
-- **Hindi** — Hindi text (transliterated or native script)
-- **Hinglish** — a mix of Hindi and English (very common in Indian social media)
-The core question we are trying to answer is:
-> **Does the order in which you teach a model different languages matter for how well it performs?**
-For example — is a model that learns English first, then Hindi, then Hinglish better or worse than one that learns Hinglish first?
 ---
-## The Dataset
-| Property | Value |
 |---|---|
-| Total samples | 29,505 |
-| English samples | 14,994 (50.8%) |
-| Hindi samples | 9,738 (33.0%) |
-| Hinglish samples | 4,774 (16.2%) |
-| Hate speech (label=1) | 13,707 (46.5%) |
-| Non-hate speech (label=0) | 15,799 (53.5%) |
 ![Language Distribution](output/figures/language_distribution.png)
-The dataset was split into three parts:
-- **Training set** — 17,704 samples (used to teach the model)
-- **Validation set** — 2,950 samples (used to monitor learning during training)
-- **Test set** — 8,852 samples (used only at the end to measure real performance)
 ---
-## The Model — What is GloVe + BiLSTM?
-Think of the model like a two-part reading machine:
-### Part 1: GloVe Embeddings (the dictionary)
-Before the model can understand words, it needs to know what words *mean* relative to each other. GloVe (Global Vectors) is a pre-trained lookup table of **300,000+ English words**, where each word is represented as a list of 300 numbers that capture its meaning. Words with similar meanings end up with similar numbers.
-- We used `glove.6B.300d.txt` — 6 billion word training corpus, 300 dimensions
-- The embedding layer is **frozen** (not updated during training) — we keep GloVe's knowledge as-is and only train the layers on top
-### Part 2: Bidirectional LSTM (the reader)
-An LSTM (Long Short-Term Memory) is a type of neural network designed to read sequences — like sentences — and remember what it read. **Bidirectional** means it reads the sentence both forwards and backwards, so it understands context from both directions.
-```
-Input sentence
-     ↓
-GloVe Embeddings (300d, frozen)
-     ↓
-BiLSTM (128 units, reads left→right AND right←left)
-     ↓
-Dropout (50% — randomly switches off neurons to prevent overfitting)
-     ↓
-Dense layer (64 neurons, ReLU activation)
-     ↓
-Output (1 neuron, Sigmoid — gives a probability 0 to 1)
-     ↓
-> 0.5 = Hate Speech, ≤ 0.5 = Not Hate Speech
-```
 ---
-## The Training Strategy — What is Transfer Learning?
-**Transfer learning** means the model carries what it learned from one task into the next. Like a student who already knows French — learning Spanish is easier because both share Latin roots.
-In our case, we train the model on one language, and instead of starting fresh for the next language, we **keep all the weights (knowledge)** from the previous training. The model continues learning from where it left off.
-### The Bug We Fixed
-The original code was creating a **brand new model** for every language — resetting all the weights each time. That is not transfer learning, it's just training three separate models. We fixed this by building the model **once** and sequentially fine-tuning it.
 ```python
-# WRONG — model reset every loop iteration
 for lang in languages:
-    model = Sequential()   # ← new model = no transfer learning
-    model.fit(...)
 # CORRECT — model built once, weights carry forward
-model = build_model()      # ← built once
 for lang in languages:
-    model.fit(...)         # ← continues learning from previous language
 ```
 ---
-## Plan B — The Experiment
-We ran all **6 possible orderings** of the three languages, each followed by a final training round on the complete shuffled dataset:
-| # | Strategy |
 |---|---|
 | 1 | English → Hindi → Hinglish → Full |
 | 2 | English → Hinglish → Hindi → Full |
@@ -105,73 +175,124 @@ We ran all **6 possible orderings** of the three languages, each followed by a f
 | 5 | Hinglish → English → Hindi → Full |
 | 6 | Hinglish → Hindi → English → Full |
-For each strategy, training happens in 4 phases. **After each phase**, we immediately evaluate the model on that specific language's test data and record all metrics. This tells us how well the model performs at each stage of the learning journey.
 ```
-Phase 1: Train on Language A  →  Test on Language A test set  →  Record metrics + plots
-Phase 2: Train on Language B  →  Test on Language B test set  →  Record metrics + plots
-Phase 3: Train on Language C  →  Test on Language C test set  →  Record metrics + plots
-Phase 4: Train on Full data   →  Test on Full test set        →  Record metrics + plots
 ```
-Each phase used **8 epochs** with batch size 32 (64 for the full phase).
 ---
-## Metrics — What do we measure?
-| Metric | What it means in plain English |
-|---|---|
-| **Accuracy** | Out of all predictions, how many were correct? |
-| **Balanced Accuracy** | Accuracy adjusted for class imbalance (more fair) |
-| **Precision** | Of everything the model flagged as hate speech, how much actually was? |
-| **Recall** | Of all actual hate speech, how much did the model catch? |
-| **Specificity** | Of all non-hate speech, how much did the model correctly ignore? |
-| **F1 Score** | Balance between Precision and Recall (harmonic mean) |
-| **ROC-AUC** | Overall ability to distinguish hate from non-hate (1.0 = perfect) |
----
-## Results Summary
-Full results are in `output/results_tables/all_strategies_results.csv`. Key highlights:
-### English phase performance across strategies (best language)
-| Strategy | Accuracy | F1 | ROC-AUC |
-|---|---|---|---|
-| English → Hindi → Hinglish → Full | 0.7701 | 0.7696 | 0.8504 |
-| English → Hinglish → Hindi → Full | 0.7721 | 0.7743 | 0.8525 |
-| Hindi → English → Hinglish → Full | 0.7780 | 0.7830 | 0.8549 |
-| Hindi → Hinglish → English → Full | 0.7780 | 0.7816 | 0.8563 |
-| Hinglish → English → Hindi → Full | 0.7716 | 0.7829 | 0.8484 |
-| Hinglish → Hindi → English → Full | 0.7765 | 0.7811 | 0.8534 |
-### Full dataset phase (final performance)
-| Strategy | Accuracy | F1 | ROC-AUC |
-|---|---|---|---|
-| English → Hindi → Hinglish → Full | 0.6796 | 0.5923 | 0.7599 |
-| English → Hinglish → Hindi → Full | 0.6813 | 0.6244 | 0.7535 |
-| Hindi → English → Hinglish → Full | 0.6854 | 0.6419 | 0.7528 |
-| Hindi → Hinglish → English → Full | 0.6865 | 0.6364 | 0.7507 |
-| Hinglish → English → Hindi → Full | 0.6778 | 0.6285 | 0.7521 |
-| Hinglish → Hindi → English → Full | 0.6845 | 0.6301 | 0.7548 |
-### Key observations
-- **English** consistently achieves the highest accuracy (~77%) regardless of when it is trained — likely because GloVe embeddings are English-centric
-- **Hindi** is the hardest language — accuracy hovers around 55–59% across all strategies
-- **Hinglish** sits in the middle (~66–70%) which makes sense as it borrows heavily from English
-- Strategies that train **Hindi first** (`Hindi → English → Hinglish`) tend to recover better in later phases, suggesting the model benefits from tackling the hardest language early
-- The **Full phase** shows consistent ~68% accuracy across all strategies, suggesting the final shuffled training normalises the differences introduced by ordering
 ---
-## Plots by Strategy
 ### Strategy 1: English → Hindi → Hinglish → Full
-| Phase | Training Curves | Confusion Matrix | ROC Curve | PR Curve | F1 Curve |
 |---|---|---|---|---|---|
 | English | ![](output/figures/english_to_hindi_to_hinglish/english_to_hindi_to_hinglish_[english]_curves.png) | ![](output/figures/english_to_hindi_to_hinglish/english_to_hindi_to_hinglish_[english]_cm.png) | ![](output/figures/english_to_hindi_to_hinglish/english_to_hindi_to_hinglish_[english]_roc.png) | ![](output/figures/english_to_hindi_to_hinglish/english_to_hindi_to_hinglish_[english]_pr.png) | ![](output/figures/english_to_hindi_to_hinglish/english_to_hindi_to_hinglish_[english]_f1.png) |
 | Hindi | ![](output/figures/english_to_hindi_to_hinglish/english_to_hindi_to_hinglish_[hindi]_curves.png) | ![](output/figures/english_to_hindi_to_hinglish/english_to_hindi_to_hinglish_[hindi]_cm.png) | ![](output/figures/english_to_hindi_to_hinglish/english_to_hindi_to_hinglish_[hindi]_roc.png) | ![](output/figures/english_to_hindi_to_hinglish/english_to_hindi_to_hinglish_[hindi]_pr.png) | ![](output/figures/english_to_hindi_to_hinglish/english_to_hindi_to_hinglish_[hindi]_f1.png) |
@@ -182,7 +303,14 @@ Full results are in `output/results_tables/all_strategies_results.csv`. Key high
 ### Strategy 2: English → Hinglish → Hindi → Full
-| Phase | Training Curves | Confusion Matrix | ROC Curve | PR Curve | F1 Curve |
 |---|---|---|---|---|---|
 | English | ![](output/figures/english_to_hinglish_to_hindi/english_to_hinglish_to_hindi_[english]_curves.png) | ![](output/figures/english_to_hinglish_to_hindi/english_to_hinglish_to_hindi_[english]_cm.png) | ![](output/figures/english_to_hinglish_to_hindi/english_to_hinglish_to_hindi_[english]_roc.png) | ![](output/figures/english_to_hinglish_to_hindi/english_to_hinglish_to_hindi_[english]_pr.png) | ![](output/figures/english_to_hinglish_to_hindi/english_to_hinglish_to_hindi_[english]_f1.png) |
 | Hinglish | ![](output/figures/english_to_hinglish_to_hindi/english_to_hinglish_to_hindi_[hinglish]_curves.png) | ![](output/figures/english_to_hinglish_to_hindi/english_to_hinglish_to_hindi_[hinglish]_cm.png) | ![](output/figures/english_to_hinglish_to_hindi/english_to_hinglish_to_hindi_[hinglish]_roc.png) | ![](output/figures/english_to_hinglish_to_hindi/english_to_hinglish_to_hindi_[hinglish]_pr.png) | ![](output/figures/english_to_hinglish_to_hindi/english_to_hinglish_to_hindi_[hinglish]_f1.png) |
@@ -191,9 +319,18 @@ Full results are in `output/results_tables/all_strategies_results.csv`. Key high
 ---
-### Strategy 3: Hindi → English → Hinglish → Full
-| Phase | Training Curves | Confusion Matrix | ROC Curve | PR Curve | F1 Curve |
 |---|---|---|---|---|---|
 | Hindi | ![](output/figures/hindi_to_english_to_hinglish/hindi_to_english_to_hinglish_[hindi]_curves.png) | ![](output/figures/hindi_to_english_to_hinglish/hindi_to_english_to_hinglish_[hindi]_cm.png) | ![](output/figures/hindi_to_english_to_hinglish/hindi_to_english_to_hinglish_[hindi]_roc.png) | ![](output/figures/hindi_to_english_to_hinglish/hindi_to_english_to_hinglish_[hindi]_pr.png) | ![](output/figures/hindi_to_english_to_hinglish/hindi_to_english_to_hinglish_[hindi]_f1.png) |
 | English | ![](output/figures/hindi_to_english_to_hinglish/hindi_to_english_to_hinglish_[english]_curves.png) | ![](output/figures/hindi_to_english_to_hinglish/hindi_to_english_to_hinglish_[english]_cm.png) | ![](output/figures/hindi_to_english_to_hinglish/hindi_to_english_to_hinglish_[english]_roc.png) | ![](output/figures/hindi_to_english_to_hinglish/hindi_to_english_to_hinglish_[english]_pr.png) | ![](output/figures/hindi_to_english_to_hinglish/hindi_to_english_to_hinglish_[english]_f1.png) |
@@ -204,7 +341,14 @@ Full results are in `output/results_tables/all_strategies_results.csv`. Key high
 ### Strategy 4: Hindi → Hinglish → English → Full
-| Phase | Training Curves | Confusion Matrix | ROC Curve | PR Curve | F1 Curve |
 |---|---|---|---|---|---|
 | Hindi | ![](output/figures/hindi_to_hinglish_to_english/hindi_to_hinglish_to_english_[hindi]_curves.png) | ![](output/figures/hindi_to_hinglish_to_english/hindi_to_hinglish_to_english_[hindi]_cm.png) | ![](output/figures/hindi_to_hinglish_to_english/hindi_to_hinglish_to_english_[hindi]_roc.png) | ![](output/figures/hindi_to_hinglish_to_english/hindi_to_hinglish_to_english_[hindi]_pr.png) | ![](output/figures/hindi_to_hinglish_to_english/hindi_to_hinglish_to_english_[hindi]_f1.png) |
 | Hinglish | ![](output/figures/hindi_to_hinglish_to_english/hindi_to_hinglish_to_english_[hinglish]_curves.png) | ![](output/figures/hindi_to_hinglish_to_english/hindi_to_hinglish_to_english_[hinglish]_cm.png) | ![](output/figures/hindi_to_hinglish_to_english/hindi_to_hinglish_to_english_[hinglish]_roc.png) | ![](output/figures/hindi_to_hinglish_to_english/hindi_to_hinglish_to_english_[hinglish]_pr.png) | ![](output/figures/hindi_to_hinglish_to_english/hindi_to_hinglish_to_english_[hinglish]_f1.png) |
@@ -215,7 +359,14 @@ Full results are in `output/results_tables/all_strategies_results.csv`. Key high
 ### Strategy 5: Hinglish → English → Hindi → Full
-| Phase | Training Curves | Confusion Matrix | ROC Curve | PR Curve | F1 Curve |
 |---|---|---|---|---|---|
 | Hinglish | ![](output/figures/hinglish_to_english_to_hindi/hinglish_to_english_to_hindi_[hinglish]_curves.png) | ![](output/figures/hinglish_to_english_to_hindi/hinglish_to_english_to_hindi_[hinglish]_cm.png) | ![](output/figures/hinglish_to_english_to_hindi/hinglish_to_english_to_hindi_[hinglish]_roc.png) | ![](output/figures/hinglish_to_english_to_hindi/hinglish_to_english_to_hindi_[hinglish]_pr.png) | ![](output/figures/hinglish_to_english_to_hindi/hinglish_to_english_to_hindi_[hinglish]_f1.png) |
 | English | ![](output/figures/hinglish_to_english_to_hindi/hinglish_to_english_to_hindi_[english]_curves.png) | ![](output/figures/hinglish_to_english_to_hindi/hinglish_to_english_to_hindi_[english]_cm.png) | ![](output/figures/hinglish_to_english_to_hindi/hinglish_to_english_to_hindi_[english]_roc.png) | ![](output/figures/hinglish_to_english_to_hindi/hinglish_to_english_to_hindi_[english]_pr.png) | ![](output/figures/hinglish_to_english_to_hindi/hinglish_to_english_to_hindi_[english]_f1.png) |
@@ -226,7 +377,14 @@ Full results are in `output/results_tables/all_strategies_results.csv`. Key high
 ### Strategy 6: Hinglish → Hindi → English → Full
-| Phase | Training Curves | Confusion Matrix | ROC Curve | PR Curve | F1 Curve |
 |---|---|---|---|---|---|
 | Hinglish | ![](output/figures/hinglish_to_hindi_to_english/hinglish_to_hindi_to_english_[hinglish]_curves.png) | ![](output/figures/hinglish_to_hindi_to_english/hinglish_to_hindi_to_english_[hinglish]_cm.png) | ![](output/figures/hinglish_to_hindi_to_english/hinglish_to_hindi_to_english_[hinglish]_roc.png) | ![](output/figures/hinglish_to_hindi_to_english/hinglish_to_hindi_to_english_[hinglish]_pr.png) | ![](output/figures/hinglish_to_hindi_to_english/hinglish_to_hindi_to_english_[hinglish]_f1.png) |
 | Hindi | ![](output/figures/hinglish_to_hindi_to_english/hinglish_to_hindi_to_english_[hindi]_curves.png) | ![](output/figures/hinglish_to_hindi_to_english/hinglish_to_hindi_to_english_[hindi]_cm.png) | ![](output/figures/hinglish_to_hindi_to_english/hinglish_to_hindi_to_english_[hindi]_roc.png) | ![](output/figures/hinglish_to_hindi_to_english/hinglish_to_hindi_to_english_[hindi]_pr.png) | ![](output/figures/hinglish_to_hindi_to_english/hinglish_to_hindi_to_english_[hindi]_f1.png) |
@@ -235,76 +393,62 @@ Full results are in `output/results_tables/all_strategies_results.csv`. Key high
 ---
-## Output Files
-```
-output/
-├── dataset_splits/
-│   ├── train.csv                          # 17,704 training samples
-│   ├── val.csv                            # 2,950 validation samples
-│   └── test.csv                           # 8,852 test samples
-│
-├── results_tables/
-│   ├── all_strategies_results.csv         # All 24 rows (6 strategies × 4 phases)
-│   ├── english_to_hindi_to_hinglish_results.csv
-│   ├── english_to_hinglish_to_hindi_results.csv
-│   ├── hindi_to_english_to_hinglish_results.csv
-│   ├��─ hindi_to_hinglish_to_english_results.csv
-│   ├── hinglish_to_english_to_hindi_results.csv
-│   └── hinglish_to_hindi_to_english_results.csv
-│
-└── figures/
-    ├── language_distribution.png          # Pie chart of dataset languages
-    │
-    ├── english_to_hindi_to_hinglish/      # One folder per strategy
-    │   ├── *_[english]_curves.png         # Train/Val accuracy + loss
-    │   ├── *_[english]_cm.png             # Confusion matrix
-    │   ├── *_[english]_roc.png            # ROC curve
-    │   ├── *_[english]_pr.png             # Precision-Recall curve
-    │   ├── *_[english]_f1.png             # F1 vs Threshold curve
-    │   ├── *_[hindi]_curves.png
-    │   ├── *_[hindi]_cm.png  ...
-    │   ├── *_[hinglish]_curves.png
-    │   ├── *_[hinglish]_cm.png  ...
-    │   ├── *_[Full]_curves.png
-    │   └── *_[Full]_cm.png  ...
-    │
-    ├── english_to_hinglish_to_hindi/
-    ├── hindi_to_english_to_hinglish/
-    ├── hindi_to_hinglish_to_english/
-    ├── hinglish_to_english_to_hindi/
-    └── hinglish_to_hindi_to_english/
-```
----
-## How to Run
-### Requirements
-```bash
-pip install tensorflow scikit-learn pandas seaborn matplotlib
-```
-You also need GloVe embeddings (`glove.6B.300d.txt`) placed at `/root/glove.6B.300d.txt`:
-```bash
-wget http://nlp.stanford.edu/data/glove.6B.zip && unzip glove.6B.zip
-```
-### Run
-```bash
-python main.py
 ```
-Training was performed on an NVIDIA H200 GPU (Vast.ai) — total runtime approximately 15–20 minutes for all 6 strategies.
 ---
-## Project Structure
 ```
-SASC/
-├── main.py          # Full training + evaluation pipeline
-├── dataset.csv      # Raw dataset (29,505 samples)
-├── README.md        # This file
-└── output/          # All results, figures, and model checkpoints
 ```

+---
+language:
+- en
+- hi
+tags:
+- hate-speech
+- text-classification
+- bilstm
+- glove
+- multilingual
+- transfer-learning
+- hinglish
+- sequential-learning
+datasets:
+- tuklu/nprism
+license: mit
+model-index:
+- name: hate-speech-multilingual-bilstm
+  results:
+  - task:
+      type: text-classification
+      name: Hate Speech Detection
+    dataset:
+      name: nprism
+      type: tuklu/nprism
+    metrics:
+    - type: f1
+      value: 0.6419
+      name: F1 Score (Best Strategy - Full Phase)
+    - type: accuracy
+      value: 0.6854
+      name: Accuracy (Best Strategy - Full Phase)
+    - type: roc_auc
+      value: 0.7528
+      name: ROC-AUC (Best Strategy - Full Phase)
+---
+# Multilingual Hate Speech Detection — GloVe + BiLSTM
+**Task:** Binary text classification (Hate / Non-Hate)
+**Languages:** English, Hindi, Hinglish (Hindi-English code-mixed)
+**Architecture:** Bidirectional LSTM with frozen GloVe embeddings
+**Best Strategy:** Hindi → English → Hinglish → Full (F1: 0.6419, AUC: 0.7528)
 ---
+## Table of Contents
+1. [What This Project Does](#1-what-this-project-does)
+2. [The Dataset](#2-the-dataset)
+3. [Model Architecture](#3-model-architecture)
+4. [The Core Idea — Transfer Learning](#4-the-core-idea--transfer-learning)
+5. [The Experiment — Plan B](#5-the-experiment--plan-b)
+6. [Results & Best Model Selection](#6-results--best-model-selection)
+7. [Full Results by Strategy](#7-full-results-by-strategy)
+8. [All Model Checkpoints](#8-all-model-checkpoints)
+9. [How to Use](#9-how-to-use)
+---
+## 1. What This Project Does
+This project investigates whether the **order of language exposure** during sequential transfer learning affects a model's ability to detect hate speech across three languages: English, Hindi, and Hinglish.
+The key question:
+> If you train a model on English first, then Hindi, then Hinglish — does it perform better or worse than training Hinglish first?
+We ran all **6 possible orderings**, each followed by a final training pass on the complete shuffled dataset, and measured performance after every single phase.
 ---
+## 2. The Dataset
+Dataset: [tuklu/nprism](https://huggingface.co/datasets/tuklu/nprism)
+| Split | Samples |
 |---|---|
+| Train | 17,704 |
+| Validation | 2,950 |
+| Test | 8,852 |
+| **Total** | **29,505** |
+| Language | Count | % |
+|---|---|---|
+| English | 14,994 | 50.8% |
+| Hindi | 9,738 | 33.0% |
+| Hinglish | 4,774 | 16.2% |
+| Label | Count | % |
+|---|---|---|
+| Non-Hate (0) | 15,799 | 53.5% |
+| Hate (1) | 13,707 | 46.5% |
 ![Language Distribution](output/figures/language_distribution.png)
+The pie chart above shows the dataset is dominated by English (50.8%), with Hindi and Hinglish making up the rest. This imbalance is important — it means the model sees more English examples and GloVe embeddings are English-centric, which directly explains why English phase always achieves the highest accuracy.
 ---
+## 3. Model Architecture
+```
+Input: Text sequence (max 100 tokens)
+       ↓
+GloVe Embedding Layer (vocab: 50,000 × 300d) — FROZEN
+       ↓
+Bidirectional LSTM (128 units)
+   → reads sentence left-to-right AND right-to-left
+   → captures context from both directions
+       ↓
+Dropout (0.5) — randomly disables 50% of neurons during training
+   → prevents memorising training data (overfitting)
+       ↓
+Dense Layer (64 neurons, ReLU activation)
+       ↓
+Output Layer (1 neuron, Sigmoid)
+   → outputs probability 0.0 to 1.0
+   → > 0.5 = Hate Speech
+   → ≤ 0.5 = Not Hate Speech
+```
+**Why GloVe?**
+GloVe (Global Vectors) is a pre-trained word embedding trained on 6 billion tokens. Each word becomes a 300-number vector that captures semantic meaning — "hate" and "violence" end up close together in this 300-dimensional space. We freeze it (don't update during training) to preserve this general knowledge and only train the layers on top.
+**Why BiLSTM?**
+A regular LSTM reads text left to right. A BiLSTM reads it both ways and combines the results. The sentence *"I don't hate you"* needs both directions to understand the negation — the word "don't" only makes sense in context of what comes after it.
+**Training config:**
+- Optimizer: Adam
+- Loss: Binary Cross-Entropy
+- Epochs per phase: 8
+- Batch size: 32 (64 for full phase)
+- Max sequence length: 100 tokens
 ---
+## 4. The Core Idea — Transfer Learning
+**Transfer learning** = the model keeps what it learned from one task when starting the next one.
+Think of it like a student who already knows French — learning Spanish is faster because both share Latin roots. The vocabulary, grammar intuitions, and reading skills transfer.
+In our case: train on English → the model learns what "hate speech patterns" look like in a language GloVe understands well → then fine-tune on Hindi → the model adapts those patterns to Hindi → then Hinglish → the model adapts again using everything it knows.
+### The Bug That Was Fixed
+The original code was reinitialising the model inside the loop — meaning **every language got a brand new, untrained model**. That is not transfer learning at all.
 ```python
+# WRONG — model reset every iteration, no knowledge transfer
 for lang in languages:
+    model = Sequential()   # ← destroys all previous learning
+    model.fit(X_lang, ...)
 # CORRECT — model built once, weights carry forward
+model = build_model()      # ← built once before the loop
 for lang in languages:
+    model.fit(X_lang, ...) # ← each fit continues from where previous left off
 ```
+This single fix is the entire point of the experiment.
 ---
+## 5. The Experiment — Plan B
+We tested all 6 permutations of [English, Hindi, Hinglish], each ending with a full shuffled dataset phase:
+| # | Training Order |
 |---|---|
 | 1 | English → Hindi → Hinglish → Full |
 | 2 | English → Hinglish → Hindi → Full |
 | 5 | Hinglish → English → Hindi → Full |
 | 6 | Hinglish → Hindi → English → Full |
+**After each phase**, the model is immediately evaluated on **that specific language's test subset**. So for strategy `English → Hindi → Hinglish → Full`:
 ```
+Train on English   →  evaluate English test set   → save metrics + plots
+Train on Hindi     →  evaluate Hindi test set     → save metrics + plots
+Train on Hinglish  →  evaluate Hinglish test set  → save metrics + plots
+Train on Full data →  evaluate full test set      → save metrics + plots
 ```
+This gives us 4 snapshots per strategy — letting us see exactly how the model evolves as it learns each new language.
 ---
+## 6. Results & Best Model Selection
+### Full Phase Results (Final Model Performance)
+| Strategy | Accuracy | Balanced Acc | Precision | Recall | Specificity | F1 | ROC-AUC |
+|---|---|---|---|---|---|---|---|
+| **Hindi → English → Hinglish → Full** | 0.6854 | **0.6802** | 0.6810 | 0.6070 | 0.7534 | **0.6419** | 0.7528 |
+| Hindi → Hinglish → English → Full | **0.6865** | 0.6801 | 0.6900 | 0.5905 | 0.7698 | 0.6364 | 0.7507 |
+| Hinglish → Hindi → English → Full | 0.6845 | 0.6775 | 0.6918 | 0.5786 | 0.7764 | 0.6301 | **0.7548** |
+| English → Hinglish → Hindi → Full | 0.6813 | 0.6740 | 0.6899 | 0.5703 | 0.7776 | 0.6244 | 0.7535 |
+| Hinglish → English → Hindi → Full | 0.6778 | 0.6718 | 0.6768 | 0.5866 | 0.7570 | 0.6285 | 0.7521 |
+| English → Hindi → Hinglish → Full | 0.6796 | 0.6678 | 0.7243 | 0.5010 | 0.8346 | 0.5923 | 0.7599 |
+### Why Hindi → English → Hinglish → Full is the Best Model
+**F1 Score is the most important metric here.** For hate speech detection, we need to balance two things:
+- **Precision** — don't falsely flag innocent content as hate
+- **Recall** — don't miss actual hate speech
+F1 is the harmonic mean of both. A model that misses half the hate speech (low recall) or flags everything as hate (low precision) is useless in practice.
+Look at `English → Hindi → Hinglish → Full` — it has the highest ROC-AUC (0.7599) but an F1 of only 0.5923. Why? Its Recall is only 0.5010 — it misses **half of all hate speech**. High ROC-AUC can be misleading when threshold calibration is off.
+`Hindi → English → Hinglish → Full` has:
+- Best F1 (0.6419) — best balance of precision and recall
+- Best Balanced Accuracy (0.6802) — most fair across both classes
+- Recall of 0.607 — catches significantly more hate speech than alternatives
+**Why does Hindi-first work better?**
+Hindi is the hardest language for this model (GloVe has limited Hindi coverage). Training on Hindi *first* forces the model to develop general hate-speech-detection features that aren't dependent on GloVe's English-centric embeddings. It learns to detect patterns from context and sequence rather than relying on word meanings alone. When English comes next, the model improves dramatically and carries robust features forward. English-first strategies give the model an easy start but it never develops the robustness needed for low-resource languages.
+### Best Model Training Curves (Hindi → English → Hinglish → Full)
+**Phase 1: Train on Hindi**
+![Hindi Training Curves](output/figures/hindi_to_english_to_hinglish/hindi_to_english_to_hinglish_[hindi]_curves.png)
+The model starts cold on Hindi. Accuracy is low (~55-57%) and validation loss is unstable — this is expected. GloVe doesn't cover Hindi well so the model is learning purely from sequential patterns. The struggle here is valuable — it forces the model to build language-agnostic features.
+**Phase 2: Train on English**
+![English Training Curves](output/figures/hindi_to_english_to_hinglish/hindi_to_english_to_hinglish_[english]_curves.png)
+Dramatic improvement. The model jumps to ~77-78% accuracy. GloVe embeddings now align well with the input language. Notice that it doesn't start from scratch — the Hindi training gave it a base of sequential hate-speech patterns, and now with English vocabulary the model improves rapidly.
+**Phase 3: Train on Hinglish**
+![Hinglish Training Curves](output/figures/hindi_to_english_to_hinglish/hindi_to_english_to_hinglish_[hinglish]_curves.png)
+Hinglish is code-mixed — it borrows from both languages the model already knows. Training accuracy climbs to ~68-69%. The model adapts its existing knowledge to handle the mixed vocabulary.
+**Phase 4: Train on Full Dataset**
+![Full Training Curves](output/figures/hindi_to_english_to_hinglish/hindi_to_english_to_hinglish_[Full]_curves.png)
+Final fine-tuning on all 17,704 shuffled training samples. Training and validation accuracy converge, loss stabilises. This phase consolidates all language knowledge into the final model.
+### Best Model Evaluation Charts
+**Confusion Matrix:**
+![Confusion Matrix](output/figures/hindi_to_english_to_hinglish/hindi_to_english_to_hinglish_[Full]_cm.png)
+Shows actual vs predicted counts. A well-balanced confusion matrix means the model is not biased toward one class. True Positives (hate correctly identified) and True Negatives (non-hate correctly identified) should both be high.
+**ROC Curve (AUC = 0.7528):**
+![ROC Curve](output/figures/hindi_to_english_to_hinglish/hindi_to_english_to_hinglish_[Full]_roc.png)
+The ROC curve shows the trade-off between True Positive Rate (catching hate speech) and False Positive Rate (wrongly flagging non-hate). AUC of 0.7528 means the model has a 75.3% chance of correctly ranking a hate speech example higher than a non-hate example — significantly better than random (0.5).
+**Precision-Recall Curve:**
+![PR Curve](output/figures/hindi_to_english_to_hinglish/hindi_to_english_to_hinglish_[Full]_pr.png)
+Shows the trade-off between precision and recall at different thresholds. The curve staying high across recall values means the model maintains good precision even as it catches more hate speech. Useful for choosing the operating threshold based on deployment requirements.
+**F1 vs Threshold Curve:**
+![F1 Curve](output/figures/hindi_to_english_to_hinglish/hindi_to_english_to_hinglish_[Full]_f1.png)
+Shows F1 score at every possible decision threshold. The peak is near 0.5 confirming our threshold choice is well-calibrated. If deploying in a high-recall scenario (catch all hate speech even at cost of false positives), lower the threshold; for high-precision (only flag certain hate speech), raise it.
 ---
+## 7. Full Results by Strategy
 ### Strategy 1: English → Hindi → Hinglish → Full
+| Phase | Accuracy | F1 | ROC-AUC |
+|---|---|---|---|
+| English | 0.7701 | 0.7696 | 0.8504 |
+| Hindi | 0.5507 | 0.0000 | 0.5689 |
+| Hinglish | 0.6780 | 0.5155 | 0.6691 |
+| Full | 0.6796 | 0.5923 | 0.7599 |
+**Note on the Hindi phase row** — Precision=0, Recall=0, F1=0, Specificity=1.0. This is not a data error. After training only on English, the model predicted **zero hate speech** for every Hindi test sample — it classified everything as non-hate. This means:
+- Specificity = 1.0 ✓ (no false positives — because it never predicts hate at all)
+- Recall = 0.0 (catches zero actual hate speech)
+- F1 = 0.0 (completely useless for Hindi at this stage)
+This is the strongest evidence that English-first is the wrong order — the model becomes so tuned to English patterns that it cannot generalise to Hindi at all.
+| Phase | Training Curves | Confusion Matrix | ROC | PR | F1 Curve |
 |---|---|---|---|---|---|
 | English | ![](output/figures/english_to_hindi_to_hinglish/english_to_hindi_to_hinglish_[english]_curves.png) | ![](output/figures/english_to_hindi_to_hinglish/english_to_hindi_to_hinglish_[english]_cm.png) | ![](output/figures/english_to_hindi_to_hinglish/english_to_hindi_to_hinglish_[english]_roc.png) | ![](output/figures/english_to_hindi_to_hinglish/english_to_hindi_to_hinglish_[english]_pr.png) | ![](output/figures/english_to_hindi_to_hinglish/english_to_hindi_to_hinglish_[english]_f1.png) |
 | Hindi | ![](output/figures/english_to_hindi_to_hinglish/english_to_hindi_to_hinglish_[hindi]_curves.png) | ![](output/figures/english_to_hindi_to_hinglish/english_to_hindi_to_hinglish_[hindi]_cm.png) | ![](output/figures/english_to_hindi_to_hinglish/english_to_hindi_to_hinglish_[hindi]_roc.png) | ![](output/figures/english_to_hindi_to_hinglish/english_to_hindi_to_hinglish_[hindi]_pr.png) | ![](output/figures/english_to_hindi_to_hinglish/english_to_hindi_to_hinglish_[hindi]_f1.png) |
 ### Strategy 2: English → Hinglish → Hindi → Full
+| Phase | Accuracy | F1 | ROC-AUC |
+|---|---|---|---|
+| English | 0.7721 | 0.7743 | 0.8525 |
+| Hinglish | 0.6631 | 0.5460 | 0.6899 |
+| Hindi | 0.5810 | 0.4444 | 0.5975 |
+| Full | 0.6813 | 0.6244 | 0.7535 |
+| Phase | Training Curves | Confusion Matrix | ROC | PR | F1 Curve |
 |---|---|---|---|---|---|
 | English | ![](output/figures/english_to_hinglish_to_hindi/english_to_hinglish_to_hindi_[english]_curves.png) | ![](output/figures/english_to_hinglish_to_hindi/english_to_hinglish_to_hindi_[english]_cm.png) | ![](output/figures/english_to_hinglish_to_hindi/english_to_hinglish_to_hindi_[english]_roc.png) | ![](output/figures/english_to_hinglish_to_hindi/english_to_hinglish_to_hindi_[english]_pr.png) | ![](output/figures/english_to_hinglish_to_hindi/english_to_hinglish_to_hindi_[english]_f1.png) |
 | Hinglish | ![](output/figures/english_to_hinglish_to_hindi/english_to_hinglish_to_hindi_[hinglish]_curves.png) | ![](output/figures/english_to_hinglish_to_hindi/english_to_hinglish_to_hindi_[hinglish]_cm.png) | ![](output/figures/english_to_hinglish_to_hindi/english_to_hinglish_to_hindi_[hinglish]_roc.png) | ![](output/figures/english_to_hinglish_to_hindi/english_to_hinglish_to_hindi_[hinglish]_pr.png) | ![](output/figures/english_to_hinglish_to_hindi/english_to_hinglish_to_hindi_[hinglish]_f1.png) |
 ---
+### Strategy 3: Hindi → English → Hinglish → Full ⭐ BEST MODEL
+| Phase | Accuracy | F1 | ROC-AUC |
+|---|---|---|---|
+| Hindi | 0.5662 | 0.2860 | 0.5748 |
+| English | 0.7780 | 0.7830 | 0.8549 |
+| Hinglish | 0.6880 | 0.5641 | 0.7172 |
+| **Full** | **0.6854** | **0.6419** | **0.7528** |
+Starting with the hardest language (Hindi) builds robustness. Despite the rough start, the model recovers strongly and achieves the best final F1.
+| Phase | Training Curves | Confusion Matrix | ROC | PR | F1 Curve |
 |---|---|---|---|---|---|
 | Hindi | ![](output/figures/hindi_to_english_to_hinglish/hindi_to_english_to_hinglish_[hindi]_curves.png) | ![](output/figures/hindi_to_english_to_hinglish/hindi_to_english_to_hinglish_[hindi]_cm.png) | ![](output/figures/hindi_to_english_to_hinglish/hindi_to_english_to_hinglish_[hindi]_roc.png) | ![](output/figures/hindi_to_english_to_hinglish/hindi_to_english_to_hinglish_[hindi]_pr.png) | ![](output/figures/hindi_to_english_to_hinglish/hindi_to_english_to_hinglish_[hindi]_f1.png) |
 | English | ![](output/figures/hindi_to_english_to_hinglish/hindi_to_english_to_hinglish_[english]_curves.png) | ![](output/figures/hindi_to_english_to_hinglish/hindi_to_english_to_hinglish_[english]_cm.png) | ![](output/figures/hindi_to_english_to_hinglish/hindi_to_english_to_hinglish_[english]_roc.png) | ![](output/figures/hindi_to_english_to_hinglish/hindi_to_english_to_hinglish_[english]_pr.png) | ![](output/figures/hindi_to_english_to_hinglish/hindi_to_english_to_hinglish_[english]_f1.png) |
 ### Strategy 4: Hindi → Hinglish → English → Full
+| Phase | Accuracy | F1 | ROC-AUC |
+|---|---|---|---|
+| Hindi | 0.5779 | 0.3898 | 0.5972 |
+| Hinglish | 0.6986 | 0.5289 | 0.7109 |
+| English | 0.7780 | 0.7816 | 0.8563 |
+| Full | 0.6865 | 0.6364 | 0.7507 |
+| Phase | Training Curves | Confusion Matrix | ROC | PR | F1 Curve |
 |---|---|---|---|---|---|
 | Hindi | ![](output/figures/hindi_to_hinglish_to_english/hindi_to_hinglish_to_english_[hindi]_curves.png) | ![](output/figures/hindi_to_hinglish_to_english/hindi_to_hinglish_to_english_[hindi]_cm.png) | ![](output/figures/hindi_to_hinglish_to_english/hindi_to_hinglish_to_english_[hindi]_roc.png) | ![](output/figures/hindi_to_hinglish_to_english/hindi_to_hinglish_to_english_[hindi]_pr.png) | ![](output/figures/hindi_to_hinglish_to_english/hindi_to_hinglish_to_english_[hindi]_f1.png) |
 | Hinglish | ![](output/figures/hindi_to_hinglish_to_english/hindi_to_hinglish_to_english_[hinglish]_curves.png) | ![](output/figures/hindi_to_hinglish_to_english/hindi_to_hinglish_to_english_[hinglish]_cm.png) | ![](output/figures/hindi_to_hinglish_to_english/hindi_to_hinglish_to_english_[hinglish]_roc.png) | ![](output/figures/hindi_to_hinglish_to_english/hindi_to_hinglish_to_english_[hinglish]_pr.png) | ![](output/figures/hindi_to_hinglish_to_english/hindi_to_hinglish_to_english_[hinglish]_f1.png) |
 ### Strategy 5: Hinglish → English → Hindi → Full
+| Phase | Accuracy | F1 | ROC-AUC |
+|---|---|---|---|
+| Hinglish | 0.6652 | 0.5119 | 0.6692 |
+| English | 0.7716 | 0.7829 | 0.8484 |
+| Hindi | 0.5638 | 0.2466 | 0.5982 |
+| Full | 0.6778 | 0.6285 | 0.7521 |
+| Phase | Training Curves | Confusion Matrix | ROC | PR | F1 Curve |
 |---|---|---|---|---|---|
 | Hinglish | ![](output/figures/hinglish_to_english_to_hindi/hinglish_to_english_to_hindi_[hinglish]_curves.png) | ![](output/figures/hinglish_to_english_to_hindi/hinglish_to_english_to_hindi_[hinglish]_cm.png) | ![](output/figures/hinglish_to_english_to_hindi/hinglish_to_english_to_hindi_[hinglish]_roc.png) | ![](output/figures/hinglish_to_english_to_hindi/hinglish_to_english_to_hindi_[hinglish]_pr.png) | ![](output/figures/hinglish_to_english_to_hindi/hinglish_to_english_to_hindi_[hinglish]_f1.png) |
 | English | ![](output/figures/hinglish_to_english_to_hindi/hinglish_to_english_to_hindi_[english]_curves.png) | ![](output/figures/hinglish_to_english_to_hindi/hinglish_to_english_to_hindi_[english]_cm.png) | ![](output/figures/hinglish_to_english_to_hindi/hinglish_to_english_to_hindi_[english]_roc.png) | ![](output/figures/hinglish_to_english_to_hindi/hinglish_to_english_to_hindi_[english]_pr.png) | ![](output/figures/hinglish_to_english_to_hindi/hinglish_to_english_to_hindi_[english]_f1.png) |
 ### Strategy 6: Hinglish → Hindi → English → Full
+| Phase | Accuracy | F1 | ROC-AUC |
+|---|---|---|---|
+| Hinglish | 0.6837 | 0.5369 | 0.6929 |
+| Hindi | 0.5924 | 0.4656 | 0.5964 |
+| English | 0.7765 | 0.7811 | 0.8534 |
+| Full | 0.6845 | 0.6301 | 0.7548 |
+| Phase | Training Curves | Confusion Matrix | ROC | PR | F1 Curve |
 |---|---|---|---|---|---|
 | Hinglish | ![](output/figures/hinglish_to_hindi_to_english/hinglish_to_hindi_to_english_[hinglish]_curves.png) | ![](output/figures/hinglish_to_hindi_to_english/hinglish_to_hindi_to_english_[hinglish]_cm.png) | ![](output/figures/hinglish_to_hindi_to_english/hinglish_to_hindi_to_english_[hinglish]_roc.png) | ![](output/figures/hinglish_to_hindi_to_english/hinglish_to_hindi_to_english_[hinglish]_pr.png) | ![](output/figures/hinglish_to_hindi_to_english/hinglish_to_hindi_to_english_[hinglish]_f1.png) |
 | Hindi | ![](output/figures/hinglish_to_hindi_to_english/hinglish_to_hindi_to_english_[hindi]_curves.png) | ![](output/figures/hinglish_to_hindi_to_english/hinglish_to_hindi_to_english_[hindi]_cm.png) | ![](output/figures/hinglish_to_hindi_to_english/hinglish_to_hindi_to_english_[hindi]_roc.png) | ![](output/figures/hinglish_to_hindi_to_english/hinglish_to_hindi_to_english_[hindi]_pr.png) | ![](output/figures/hinglish_to_hindi_to_english/hinglish_to_hindi_to_english_[hindi]_f1.png) |
 ---
+## 8. All Model Checkpoints
+All 6 trained models are available as archives in the `models/` folder of this repo. Each filename encodes the training order.
+| File | Strategy | Final F1 | Final AUC |
+|---|---|---|---|
+| `model.h5` | Hindi → English → Hinglish → Full ⭐ | 0.6419 | 0.7528 |
+| `models/planB_hindi_to_english_to_hinglish_Full.h5` | Hindi → English → Hinglish → Full | 0.6419 | 0.7528 |
+| `models/planB_hindi_to_hinglish_to_english_Full.h5` | Hindi → Hinglish → English → Full | 0.6364 | 0.7507 |
+| `models/planB_hinglish_to_hindi_to_english_Full.h5` | Hinglish → Hindi → English → Full | 0.6301 | 0.7548 |
+| `models/planB_english_to_hinglish_to_hindi_Full.h5` | English → Hinglish → Hindi → Full | 0.6244 | 0.7535 |
+| `models/planB_hinglish_to_english_to_hindi_Full.h5` | Hinglish → English → Hindi → Full | 0.6285 | 0.7521 |
+| `models/planB_english_to_hindi_to_hinglish_Full.h5` | English → Hindi → Hinglish → Full | 0.5923 | 0.7599 |
+---
+## 9. How to Use
+```python
+import json
+import numpy as np
+import tensorflow as tf
+from tensorflow.keras.preprocessing.text import tokenizer_from_json
+from tensorflow.keras.preprocessing.sequence import pad_sequences
+from huggingface_hub import hf_hub_download
+# Load tokenizer
+tokenizer_path = hf_hub_download(repo_id="tuklu/SASC", filename="tokenizer.json")
+with open(tokenizer_path) as f:
+    tokenizer = tokenizer_from_json(f.read())
+# Load best model
+model_path = hf_hub_download(repo_id="tuklu/SASC", filename="model.h5")
+model = tf.keras.models.load_model(model_path)
+# Predict
+texts = ["I hate all of them", "Have a great day!"]
+sequences = tokenizer.texts_to_sequences(texts)
+padded = pad_sequences(sequences, maxlen=100)
+probs = model.predict(padded).flatten()
+for text, prob in zip(texts, probs):
+    label = "Hate Speech" if prob > 0.5 else "Non-Hate"
+    print(f"{label} ({prob:.3f}): {text}")
 ```
 ---
+## Citation
 ```
+@misc{sasc2026,
+  title={Multilingual Hate Speech Detection via Sequential Transfer Learning},
+  author={tuklu},
+  year={2026},
+  publisher={HuggingFace},
+  url={https://huggingface.co/tuklu/SASC}
+}
 ```