| --- |
| language: |
| - en |
| - hi |
| tags: |
| - hate-speech |
| - text-classification |
| - bilstm |
| - glove |
| - multilingual |
| - transfer-learning |
| - hinglish |
| - sequential-learning |
| datasets: |
| - tuklu/nprism |
| license: mit |
| model-index: |
| - name: hate-speech-multilingual-bilstm-v2 |
| results: |
| - task: |
| type: text-classification |
| name: Hate Speech Detection |
| dataset: |
| name: nprism |
| type: tuklu/nprism |
| metrics: |
| - type: f1 |
| value: 0.6566 |
| name: F1 Score (Full Phase — Full Test) |
| - type: accuracy |
| value: 0.6866 |
| name: Accuracy (Full Phase — Full Test) |
| - type: roc_auc |
| value: 0.7556 |
| name: ROC-AUC (Full Phase — Full Test) |
| --- |
| |
| # Multilingual Hate Speech Detection — GloVe + BiLSTM (v2) |
|
|
| **Task:** Binary text classification (Hate / Non-Hate) |
| **Languages:** English, Hindi, Hinglish (Hindi-English code-mixed) |
| **Architecture:** Bidirectional LSTM with frozen GloVe embeddings |
| **Strategy:** Hinglish → Hindi → English → Full (50 epochs per phase, 200 total) |
|
|
| --- |
|
|
| ## Table of Contents |
| 1. [What This Experiment Does](#1-what-this-experiment-does) |
| 2. [The Dataset](#2-the-dataset) |
| 3. [Model Architecture](#3-model-architecture) |
| 4. [Training Strategy](#4-training-strategy) |
| 5. [Phase 1 — Hinglish](#5-phase-1--hinglish) |
| 6. [Phase 2 — Hindi](#6-phase-2--hindi) |
| 7. [Phase 3 — English](#7-phase-3--english) |
| 8. [Phase 4 — Full Dataset](#8-phase-4--full-dataset) |
| 9. [Full Results Table](#9-full-results-table) |
| 10. [How to Use](#10-how-to-use) |
|
|
| --- |
|
|
| ## 1. What This Experiment Does |
|
|
| This is **v2** of the SASC sequential transfer learning experiment. |
|
|
| v1 ran all 6 permutations of [English, Hindi, Hinglish] with **8 epochs** per phase. v2 focuses on a single strategy — `Hinglish → Hindi → English → Full` — but trains for **50 epochs per phase (200 total)**. The key new addition: after every phase the model is evaluated on **all three individual language test sets AND the full test set**, giving a complete 4×4 cross-evaluation matrix showing how knowledge transfers across languages. |
|
|
| --- |
|
|
| ## 2. The Dataset |
|
|
| Dataset: [tuklu/nprism](https://huggingface.co/datasets/tuklu/nprism) |
|
|
| | Split | Samples | |
| |---|---| |
| | Train | 17,704 | |
| | Validation | 2,950 | |
| | Test | 8,852 | |
| | **Total** | **29,505** | |
|
|
| | Language | Count | % | |
| |---|---|---| |
| | English | 14,994 | 50.8% | |
| | Hindi | 9,738 | 33.0% | |
| | Hinglish | 4,774 | 16.2% | |
|
|
| | Label | Count | % | |
| |---|---|---| |
| | Non-Hate (0) | 15,799 | 53.5% | |
| | Hate (1) | 13,707 | 46.5% | |
|
|
|  |
|
|
| The dataset is dominated by English (50.8%). GloVe embeddings are also English-centric, which directly explains why the English phase produces the sharpest accuracy jump regardless of training order. |
|
|
| --- |
|
|
| ## 3. Model Architecture |
|
|
| ``` |
| Input: Text sequence (max 100 tokens) |
| ↓ |
| GloVe Embedding Layer (vocab: 50,000 × 300d) — FROZEN |
| ↓ |
| Bidirectional LSTM (128 units) |
| → reads sentence left-to-right AND right-to-left |
| ↓ |
| Dropout (0.5) |
| ↓ |
| Dense Layer (64 neurons, ReLU) |
| ↓ |
| Output Layer (1 neuron, Sigmoid) |
| → > 0.5 = Hate Speech | ≤ 0.5 = Non-Hate |
| ``` |
|
|
| - **Optimizer:** Adam |
| - **Loss:** Binary Cross-Entropy |
| - **Max sequence length:** 100 tokens |
| - **Vocab size:** 50,000 |
|
|
| --- |
|
|
| ## 4. Training Strategy |
|
|
| | Phase | Training Data | Epochs | Batch Size | Samples | |
| |---|---|---|---|---| |
| | 1 — Hinglish | Hinglish subset | 50 | 32 | ~2,908 | |
| | 2 — Hindi | Hindi subset | 50 | 32 | ~5,940 | |
| | 3 — English | English subset | 50 | 32 | ~8,856 | |
| | 4 — Full | All shuffled | 50 | 64 | 17,704 | |
|
|
| The **same model** carries its weights through all 4 phases — no resets between languages. After each phase the model is evaluated against all three language-specific test sets and the full test set. |
|
|
| --- |
|
|
| ## 5. Phase 1 — Hinglish |
|
|
| **Training on Hinglish only** (2,908 samples, 50 epochs). The model starts cold. Hinglish is code-mixed and GloVe has limited coverage — the model learns from sequential patterns rather than word semantics. |
|
|
|  |
|
|
| ### Evaluation after Phase 1 |
|
|
| | Eval On | Accuracy | Balanced Acc | Precision | Recall | Specificity | F1 | ROC-AUC | |
| |---|---|---|---|---|---|---|---| |
| | Hinglish | 0.6688 | 0.6378 | 0.6058 | 0.4848 | 0.7908 | 0.5386 | 0.6579 | |
| | Hindi | 0.4493 | 0.5000 | 0.4493 | 1.0000 | 0.0000 | 0.6200 | 0.5234 | |
| | English | 0.5171 | 0.5125 | 0.5738 | 0.0916 | 0.9334 | 0.1580 | 0.5620 | |
| | Full | 0.5190 | 0.5133 | 0.4803 | 0.4331 | 0.5935 | 0.4555 | 0.5243 | |
|
|
| The Hindi result (Recall=1.0, Specificity=0.0) shows the model predicts **everything as hate** on Hindi — it has no Hindi-specific knowledge yet. English performance is near-random. Hinglish F1=0.539 shows the model has learned something useful from its own language. |
|
|
| | Eval On | Confusion Matrix | ROC | Precision-Recall | F1 vs Threshold | |
| |---|---|---|---|---| |
| | Hinglish |  |  |  |  | |
| | Hindi |  |  |  |  | |
| | English |  |  |  |  | |
| | Full |  |  |  |  | |
|
|
| --- |
|
|
| ## 6. Phase 2 — Hindi |
|
|
| **Training on Hindi** (5,940 samples, 50 epochs). GloVe has limited Hindi coverage so the model must rely on contextual patterns. The struggle here is deliberate — it builds language-agnostic hate detection features. |
|
|
|  |
|
|
| ### Evaluation after Phase 2 |
|
|
| | Eval On | Accuracy | Balanced Acc | Precision | Recall | Specificity | F1 | ROC-AUC | |
| |---|---|---|---|---|---|---|---| |
| | Hinglish | 0.5409 | 0.4885 | 0.3761 | 0.2299 | 0.7470 | 0.2854 | 0.4771 | |
| | Hindi | 0.5834 | 0.5730 | 0.5420 | 0.4705 | 0.6756 | 0.5037 | 0.5949 | |
| | English | 0.4711 | 0.4744 | 0.4789 | 0.7878 | 0.1611 | 0.5957 | 0.4292 | |
| | Full | 0.5190 | 0.5251 | 0.4859 | 0.6111 | 0.4390 | 0.5414 | 0.5255 | |
|
|
| Hindi F1 improves to 0.504. Hinglish drops — the model has partially overwritten Hinglish-specific patterns. English recall spikes (high false positives) showing the model is now biased toward predicting hate. This is the expected "catastrophic interference" that the Full phase resolves. |
|
|
| | Eval On | Confusion Matrix | ROC | Precision-Recall | F1 vs Threshold | |
| |---|---|---|---|---| |
| | Hinglish |  |  |  |  | |
| | Hindi |  |  |  |  | |
| | English |  |  |  |  | |
| | Full |  |  |  |  | |
|
|
| --- |
|
|
| ## 7. Phase 3 — English |
|
|
| **Training on English** (8,856 samples, 50 epochs). This is the turning point. GloVe embeddings align well with English — the model jumps sharply and the English-phase knowledge partially generalises back to the other languages. |
|
|
|  |
|
|
| ### Evaluation after Phase 3 |
|
|
| | Eval On | Accuracy | Balanced Acc | Precision | Recall | Specificity | F1 | ROC-AUC | |
| |---|---|---|---|---|---|---|---| |
| | Hinglish | 0.4115 | 0.4938 | 0.3955 | 0.9002 | 0.0875 | 0.5495 | 0.4572 | |
| | Hindi | 0.5424 | 0.5399 | 0.4912 | 0.5150 | 0.5648 | 0.5028 | 0.5377 | |
| | **English** | **0.7721** | **0.7726** | **0.7453** | **0.8190** | **0.7262** | **0.7804** | **0.8458** | |
| | Full | 0.6395 | 0.6458 | 0.5901 | 0.7337 | 0.5578 | 0.6541 | 0.6913 | |
|
|
| English F1 leaps to 0.780 — the model now performs strongly on its native language. Full AUC reaches 0.691. Hinglish specificity collapses again (high recall, low precision) — the model over-predicts hate on unseen languages after English fine-tuning. |
|
|
| | Eval On | Confusion Matrix | ROC | Precision-Recall | F1 vs Threshold | |
| |---|---|---|---|---| |
| | Hinglish |  |  |  |  | |
| | Hindi |  |  |  |  | |
| | English |  |  |  |  | |
| | Full |  |  |  |  | |
|
|
| --- |
|
|
| ## 8. Phase 4 — Full Dataset |
|
|
| **Training on the full shuffled dataset** (17,704 samples, 50 epochs). This consolidation phase exposes the model to all three languages simultaneously, balancing out the per-language biases accumulated during sequential training. |
|
|
|  |
|
|
| ### Evaluation after Phase 4 (Final Model) |
|
|
| | Eval On | Accuracy | Balanced Acc | Precision | Recall | Specificity | F1 | ROC-AUC | |
| |---|---|---|---|---|---|---|---| |
| | Hinglish | 0.6326 | 0.6101 | 0.5426 | 0.4991 | 0.7210 | 0.5200 | 0.6161 | |
| | Hindi | 0.5748 | 0.5676 | 0.5286 | 0.4958 | 0.6393 | 0.5117 | 0.5941 | |
| | **English** | **0.7747** | **0.7746** | **0.7747** | **0.7678** | **0.7815** | **0.7712** | **0.8476** | |
| | **Full** | **0.6866** | **0.6839** | **0.6687** | **0.6449** | **0.7228** | **0.6566** | **0.7556** | |
|
|
| The Full phase restores balance across all languages. Hinglish specificity recovers to 0.721 (from 0.088 after English phase). Full-dataset AUC reaches **0.756** — the best of all phases. English performance is preserved at F1=0.771 while Hinglish and Hindi both improve substantially from their post-English-phase collapse. |
|
|
| | Eval On | Confusion Matrix | ROC | Precision-Recall | F1 vs Threshold | |
| |---|---|---|---|---| |
| | Hinglish |  |  |  |  | |
| | Hindi |  |  |  |  | |
| | English |  |  |  |  | |
| | Full |  |  |  |  | |
|
|
| --- |
|
|
| ## 9. Full Results Table |
|
|
| Complete 16-row cross-evaluation (Phase × Eval Language): |
|
|
| | Phase | Eval On | Accuracy | Balanced Acc | Precision | Recall | Specificity | F1 | ROC-AUC | |
| |---|---|---|---|---|---|---|---|---| |
| | hinglish | hinglish | 0.6688 | 0.6378 | 0.6058 | 0.4848 | 0.7908 | 0.5386 | 0.6579 | |
| | hinglish | hindi | 0.4493 | 0.5000 | 0.4493 | 1.0000 | 0.0000 | 0.6200 | 0.5234 | |
| | hinglish | english | 0.5171 | 0.5125 | 0.5738 | 0.0916 | 0.9334 | 0.1580 | 0.5620 | |
| | hinglish | full | 0.5190 | 0.5133 | 0.4803 | 0.4331 | 0.5935 | 0.4555 | 0.5243 | |
| | hindi | hinglish | 0.5409 | 0.4885 | 0.3761 | 0.2299 | 0.7470 | 0.2854 | 0.4771 | |
| | hindi | hindi | 0.5834 | 0.5730 | 0.5420 | 0.4705 | 0.6756 | 0.5037 | 0.5949 | |
| | hindi | english | 0.4711 | 0.4744 | 0.4789 | 0.7878 | 0.1611 | 0.5957 | 0.4292 | |
| | hindi | full | 0.5190 | 0.5251 | 0.4859 | 0.6111 | 0.4390 | 0.5414 | 0.5255 | |
| | english | hinglish | 0.4115 | 0.4938 | 0.3955 | 0.9002 | 0.0875 | 0.5495 | 0.4572 | |
| | english | hindi | 0.5424 | 0.5399 | 0.4912 | 0.5150 | 0.5648 | 0.5028 | 0.5377 | |
| | english | english | 0.7721 | 0.7726 | 0.7453 | 0.8190 | 0.7262 | 0.7804 | 0.8458 | |
| | english | full | 0.6395 | 0.6458 | 0.5901 | 0.7337 | 0.5578 | 0.6541 | 0.6913 | |
| | **Full** | **hinglish** | **0.6326** | **0.6101** | **0.5426** | **0.4991** | **0.7210** | **0.5200** | **0.6161** | |
| | **Full** | **hindi** | **0.5748** | **0.5676** | **0.5286** | **0.4958** | **0.6393** | **0.5117** | **0.5941** | |
| | **Full** | **english** | **0.7747** | **0.7746** | **0.7747** | **0.7678** | **0.7815** | **0.7712** | **0.8476** | |
| | **Full** | **full** | **0.6866** | **0.6839** | **0.6687** | **0.6449** | **0.7228** | **0.6566** | **0.7556** | |
|
|
| ### Key Observations |
|
|
| - **English phase is the sharpest turning point** — English F1 jumps from 0.596 (after Hindi) to 0.780 in one phase, driven by GloVe's English-centric embeddings. |
| - **Starting from Hinglish** forces generalisation from noise — the model reaches Hinglish F1=0.539 after only its own phase, a stronger start than Hinglish gets in most v1 orderings. |
| - **Catastrophic interference is visible** — Hinglish specificity drops from 0.791 → 0.747 → 0.088 as the model progressively shifts language bias. The Full phase restores it to 0.721. |
| - **Final Full phase AUC = 0.756** matches the best v1 strategies despite a harder starting language, confirming the robustness of the Hinglish-first approach with deeper training. |
| - **Hindi remains the hardest** (F1=0.512 at final) — consistent with GloVe's limited Hindi vocabulary coverage. |
|
|
| --- |
|
|
| ## 10. How to Use |
|
|
| ```python |
| import json |
| import numpy as np |
| import tensorflow as tf |
| from tensorflow.keras.preprocessing.text import tokenizer_from_json |
| from tensorflow.keras.preprocessing.sequence import pad_sequences |
| from huggingface_hub import hf_hub_download |
| |
| # Load tokenizer (from v1 repo — same dataset/split) |
| tokenizer_path = hf_hub_download(repo_id="tuklu/SASC", filename="tokenizer.json") |
| with open(tokenizer_path) as f: |
| tokenizer = tokenizer_from_json(f.read()) |
| |
| # Load model |
| model_path = hf_hub_download(repo_id="tuklu/SASCv2", filename="model.h5") |
| model = tf.keras.models.load_model(model_path) |
| |
| # Predict |
| texts = ["I hate all of them", "Have a great day!"] |
| sequences = tokenizer.texts_to_sequences(texts) |
| padded = pad_sequences(sequences, maxlen=100) |
| probs = model.predict(padded).flatten() |
| |
| for text, prob in zip(texts, probs): |
| label = "Hate Speech" if prob > 0.5 else "Non-Hate" |
| print(f"{label} ({prob:.3f}): {text}") |
| ``` |
|
|
| --- |
|
|
| ## Explainability — SHAP Analysis |
|
|
| We applied **SHAP (SHapley Additive exPlanations)** to the final trained model to understand which words drive hate speech predictions. A `GradientExplainer` runs on the BiLSTM sub-model (embedding layer bypassed — embeddings pre-computed as floats), with 200 background training samples, evaluated on all 4 test sets. |
|
|
| > Full methodology, all strategy comparisons, and detailed word tables: **[SHAP_REPORT.md](SHAP_REPORT.md)** |
|
|
| ### Top SHAP Words — Final Model |
|
|
| | Eval | Top Hate Words | Top Non-Hate Words | |
| |---|---|---| |
| | English | nas, fags, sicko, sabotage, advocating | grow, barrel, homosexual, pak, join | |
| | Hindi | वादा, वैज्ञानिकों, ऐ, उतारा, गला | जीतेगा, घोंटने, जिहादी, आपत्तिजनक | |
| | Hinglish | arey, bahir, punish, papa, interior | online, member, mam, messages, asha | |
| | Full | blamed, criticized, syntax, grown, sine | underneath, smack, online, hole, clue | |
|
|
|  |
|
|
|  |
|
|
|  |
|
|
|  |
|
|
| ### Key Takeaways |
|
|
| - **Hindi SHAP values are 10× smaller** than English/Hinglish — GloVe has near-zero Hindi coverage; model relies on positional patterns, not word semantics |
| - **Accusatory framing dominates full-dataset hate markers** (`blamed`, `criticized`, `advocating`) — the 50-epoch Full phase learns that hate speech in this corpus often targets victims through blame/accusation rather than direct slurs |
| - **"online"** is the most consistent non-hate signal — informational/conversational context across all three languages |
| - **Hinglish markers are semantically coherent** (`arey` = hey/exclamation in abusive context, `punish`, `interior`) despite code-mixing — v2's 50 epochs on Hinglish-first produced stronger Hinglish feature learning than v1 |
| - **Spurious correlations remain** (`syntax`, `sine`) — inherent limitation of non-contextual GloVe; a BERT-based model would resolve these |
|
|
| --- |
|
|
| ## Related |
|
|
| - **v1 (all 6 strategies, 8 epochs each):** [tuklu/SASC](https://huggingface.co/tuklu/SASC) |
| - **Dataset:** [tuklu/nprism](https://huggingface.co/datasets/tuklu/nprism) |
|
|
| --- |
|
|
| ## Citation |
|
|
| ``` |
| @misc{sasc2026, |
| title={Multilingual Hate Speech Detection via Sequential Transfer Learning (v2)}, |
| author={tuklu}, |
| year={2026}, |
| publisher={HuggingFace}, |
| url={https://huggingface.co/tuklu/SASCv2} |
| } |
| ``` |
|
|