SASCv2 / README.md
tuklu's picture
Add SHAP summary to README and full SHAP_REPORT.md
dfa11db verified
metadata
language:
  - en
  - hi
tags:
  - hate-speech
  - text-classification
  - bilstm
  - glove
  - multilingual
  - transfer-learning
  - hinglish
  - sequential-learning
datasets:
  - tuklu/nprism
license: mit
model-index:
  - name: hate-speech-multilingual-bilstm-v2
    results:
      - task:
          type: text-classification
          name: Hate Speech Detection
        dataset:
          name: nprism
          type: tuklu/nprism
        metrics:
          - type: f1
            value: 0.6566
            name: F1 Score (Full Phase  Full Test)
          - type: accuracy
            value: 0.6866
            name: Accuracy (Full Phase  Full Test)
          - type: roc_auc
            value: 0.7556
            name: ROC-AUC (Full Phase  Full Test)

Multilingual Hate Speech Detection — GloVe + BiLSTM (v2)

Task: Binary text classification (Hate / Non-Hate) Languages: English, Hindi, Hinglish (Hindi-English code-mixed) Architecture: Bidirectional LSTM with frozen GloVe embeddings Strategy: Hinglish → Hindi → English → Full (50 epochs per phase, 200 total)


Table of Contents

  1. What This Experiment Does
  2. The Dataset
  3. Model Architecture
  4. Training Strategy
  5. Phase 1 — Hinglish
  6. Phase 2 — Hindi
  7. Phase 3 — English
  8. Phase 4 — Full Dataset
  9. Full Results Table
  10. How to Use

1. What This Experiment Does

This is v2 of the SASC sequential transfer learning experiment.

v1 ran all 6 permutations of [English, Hindi, Hinglish] with 8 epochs per phase. v2 focuses on a single strategy — Hinglish → Hindi → English → Full — but trains for 50 epochs per phase (200 total). The key new addition: after every phase the model is evaluated on all three individual language test sets AND the full test set, giving a complete 4×4 cross-evaluation matrix showing how knowledge transfers across languages.


2. The Dataset

Dataset: tuklu/nprism

Split Samples
Train 17,704
Validation 2,950
Test 8,852
Total 29,505
Language Count %
English 14,994 50.8%
Hindi 9,738 33.0%
Hinglish 4,774 16.2%
Label Count %
Non-Hate (0) 15,799 53.5%
Hate (1) 13,707 46.5%

Language Distribution

The dataset is dominated by English (50.8%). GloVe embeddings are also English-centric, which directly explains why the English phase produces the sharpest accuracy jump regardless of training order.


3. Model Architecture

Input: Text sequence (max 100 tokens)
       ↓
GloVe Embedding Layer (vocab: 50,000 × 300d) — FROZEN
       ↓
Bidirectional LSTM (128 units)
   → reads sentence left-to-right AND right-to-left
       ↓
Dropout (0.5)
       ↓
Dense Layer (64 neurons, ReLU)
       ↓
Output Layer (1 neuron, Sigmoid)
   → > 0.5 = Hate Speech  |  ≤ 0.5 = Non-Hate
  • Optimizer: Adam
  • Loss: Binary Cross-Entropy
  • Max sequence length: 100 tokens
  • Vocab size: 50,000

4. Training Strategy

Phase Training Data Epochs Batch Size Samples
1 — Hinglish Hinglish subset 50 32 ~2,908
2 — Hindi Hindi subset 50 32 ~5,940
3 — English English subset 50 32 ~8,856
4 — Full All shuffled 50 64 17,704

The same model carries its weights through all 4 phases — no resets between languages. After each phase the model is evaluated against all three language-specific test sets and the full test set.


5. Phase 1 — Hinglish

Training on Hinglish only (2,908 samples, 50 epochs). The model starts cold. Hinglish is code-mixed and GloVe has limited coverage — the model learns from sequential patterns rather than word semantics.

Hinglish Training Curves

Evaluation after Phase 1

Eval On Accuracy Balanced Acc Precision Recall Specificity F1 ROC-AUC
Hinglish 0.6688 0.6378 0.6058 0.4848 0.7908 0.5386 0.6579
Hindi 0.4493 0.5000 0.4493 1.0000 0.0000 0.6200 0.5234
English 0.5171 0.5125 0.5738 0.0916 0.9334 0.1580 0.5620
Full 0.5190 0.5133 0.4803 0.4331 0.5935 0.4555 0.5243

The Hindi result (Recall=1.0, Specificity=0.0) shows the model predicts everything as hate on Hindi — it has no Hindi-specific knowledge yet. English performance is near-random. Hinglish F1=0.539 shows the model has learned something useful from its own language.

Eval On Confusion Matrix ROC Precision-Recall F1 vs Threshold
Hinglish
Hindi
English
Full

6. Phase 2 — Hindi

Training on Hindi (5,940 samples, 50 epochs). GloVe has limited Hindi coverage so the model must rely on contextual patterns. The struggle here is deliberate — it builds language-agnostic hate detection features.

Hindi Training Curves

Evaluation after Phase 2

Eval On Accuracy Balanced Acc Precision Recall Specificity F1 ROC-AUC
Hinglish 0.5409 0.4885 0.3761 0.2299 0.7470 0.2854 0.4771
Hindi 0.5834 0.5730 0.5420 0.4705 0.6756 0.5037 0.5949
English 0.4711 0.4744 0.4789 0.7878 0.1611 0.5957 0.4292
Full 0.5190 0.5251 0.4859 0.6111 0.4390 0.5414 0.5255

Hindi F1 improves to 0.504. Hinglish drops — the model has partially overwritten Hinglish-specific patterns. English recall spikes (high false positives) showing the model is now biased toward predicting hate. This is the expected "catastrophic interference" that the Full phase resolves.

Eval On Confusion Matrix ROC Precision-Recall F1 vs Threshold
Hinglish
Hindi
English
Full

7. Phase 3 — English

Training on English (8,856 samples, 50 epochs). This is the turning point. GloVe embeddings align well with English — the model jumps sharply and the English-phase knowledge partially generalises back to the other languages.

English Training Curves

Evaluation after Phase 3

Eval On Accuracy Balanced Acc Precision Recall Specificity F1 ROC-AUC
Hinglish 0.4115 0.4938 0.3955 0.9002 0.0875 0.5495 0.4572
Hindi 0.5424 0.5399 0.4912 0.5150 0.5648 0.5028 0.5377
English 0.7721 0.7726 0.7453 0.8190 0.7262 0.7804 0.8458
Full 0.6395 0.6458 0.5901 0.7337 0.5578 0.6541 0.6913

English F1 leaps to 0.780 — the model now performs strongly on its native language. Full AUC reaches 0.691. Hinglish specificity collapses again (high recall, low precision) — the model over-predicts hate on unseen languages after English fine-tuning.

Eval On Confusion Matrix ROC Precision-Recall F1 vs Threshold
Hinglish
Hindi
English
Full

8. Phase 4 — Full Dataset

Training on the full shuffled dataset (17,704 samples, 50 epochs). This consolidation phase exposes the model to all three languages simultaneously, balancing out the per-language biases accumulated during sequential training.

Full Training Curves

Evaluation after Phase 4 (Final Model)

Eval On Accuracy Balanced Acc Precision Recall Specificity F1 ROC-AUC
Hinglish 0.6326 0.6101 0.5426 0.4991 0.7210 0.5200 0.6161
Hindi 0.5748 0.5676 0.5286 0.4958 0.6393 0.5117 0.5941
English 0.7747 0.7746 0.7747 0.7678 0.7815 0.7712 0.8476
Full 0.6866 0.6839 0.6687 0.6449 0.7228 0.6566 0.7556

The Full phase restores balance across all languages. Hinglish specificity recovers to 0.721 (from 0.088 after English phase). Full-dataset AUC reaches 0.756 — the best of all phases. English performance is preserved at F1=0.771 while Hinglish and Hindi both improve substantially from their post-English-phase collapse.

Eval On Confusion Matrix ROC Precision-Recall F1 vs Threshold
Hinglish
Hindi
English
Full

9. Full Results Table

Complete 16-row cross-evaluation (Phase × Eval Language):

Phase Eval On Accuracy Balanced Acc Precision Recall Specificity F1 ROC-AUC
hinglish hinglish 0.6688 0.6378 0.6058 0.4848 0.7908 0.5386 0.6579
hinglish hindi 0.4493 0.5000 0.4493 1.0000 0.0000 0.6200 0.5234
hinglish english 0.5171 0.5125 0.5738 0.0916 0.9334 0.1580 0.5620
hinglish full 0.5190 0.5133 0.4803 0.4331 0.5935 0.4555 0.5243
hindi hinglish 0.5409 0.4885 0.3761 0.2299 0.7470 0.2854 0.4771
hindi hindi 0.5834 0.5730 0.5420 0.4705 0.6756 0.5037 0.5949
hindi english 0.4711 0.4744 0.4789 0.7878 0.1611 0.5957 0.4292
hindi full 0.5190 0.5251 0.4859 0.6111 0.4390 0.5414 0.5255
english hinglish 0.4115 0.4938 0.3955 0.9002 0.0875 0.5495 0.4572
english hindi 0.5424 0.5399 0.4912 0.5150 0.5648 0.5028 0.5377
english english 0.7721 0.7726 0.7453 0.8190 0.7262 0.7804 0.8458
english full 0.6395 0.6458 0.5901 0.7337 0.5578 0.6541 0.6913
Full hinglish 0.6326 0.6101 0.5426 0.4991 0.7210 0.5200 0.6161
Full hindi 0.5748 0.5676 0.5286 0.4958 0.6393 0.5117 0.5941
Full english 0.7747 0.7746 0.7747 0.7678 0.7815 0.7712 0.8476
Full full 0.6866 0.6839 0.6687 0.6449 0.7228 0.6566 0.7556

Key Observations

  • English phase is the sharpest turning point — English F1 jumps from 0.596 (after Hindi) to 0.780 in one phase, driven by GloVe's English-centric embeddings.
  • Starting from Hinglish forces generalisation from noise — the model reaches Hinglish F1=0.539 after only its own phase, a stronger start than Hinglish gets in most v1 orderings.
  • Catastrophic interference is visible — Hinglish specificity drops from 0.791 → 0.747 → 0.088 as the model progressively shifts language bias. The Full phase restores it to 0.721.
  • Final Full phase AUC = 0.756 matches the best v1 strategies despite a harder starting language, confirming the robustness of the Hinglish-first approach with deeper training.
  • Hindi remains the hardest (F1=0.512 at final) — consistent with GloVe's limited Hindi vocabulary coverage.

10. How to Use

import json
import numpy as np
import tensorflow as tf
from tensorflow.keras.preprocessing.text import tokenizer_from_json
from tensorflow.keras.preprocessing.sequence import pad_sequences
from huggingface_hub import hf_hub_download

# Load tokenizer (from v1 repo — same dataset/split)
tokenizer_path = hf_hub_download(repo_id="tuklu/SASC", filename="tokenizer.json")
with open(tokenizer_path) as f:
    tokenizer = tokenizer_from_json(f.read())

# Load model
model_path = hf_hub_download(repo_id="tuklu/SASCv2", filename="model.h5")
model = tf.keras.models.load_model(model_path)

# Predict
texts = ["I hate all of them", "Have a great day!"]
sequences = tokenizer.texts_to_sequences(texts)
padded = pad_sequences(sequences, maxlen=100)
probs = model.predict(padded).flatten()

for text, prob in zip(texts, probs):
    label = "Hate Speech" if prob > 0.5 else "Non-Hate"
    print(f"{label} ({prob:.3f}): {text}")

Explainability — SHAP Analysis

We applied SHAP (SHapley Additive exPlanations) to the final trained model to understand which words drive hate speech predictions. A GradientExplainer runs on the BiLSTM sub-model (embedding layer bypassed — embeddings pre-computed as floats), with 200 background training samples, evaluated on all 4 test sets.

Full methodology, all strategy comparisons, and detailed word tables: SHAP_REPORT.md

Top SHAP Words — Final Model

Eval Top Hate Words Top Non-Hate Words
English nas, fags, sicko, sabotage, advocating grow, barrel, homosexual, pak, join
Hindi वादा, वैज्ञानिकों, ऐ, उतारा, गला जीतेगा, घोंटने, जिहादी, आपत्तिजनक
Hinglish arey, bahir, punish, papa, interior online, member, mam, messages, asha
Full blamed, criticized, syntax, grown, sine underneath, smack, online, hole, clue

SHAP — English

SHAP — Hinglish

SHAP — Hindi

SHAP — Full

Key Takeaways

  • Hindi SHAP values are 10× smaller than English/Hinglish — GloVe has near-zero Hindi coverage; model relies on positional patterns, not word semantics
  • Accusatory framing dominates full-dataset hate markers (blamed, criticized, advocating) — the 50-epoch Full phase learns that hate speech in this corpus often targets victims through blame/accusation rather than direct slurs
  • "online" is the most consistent non-hate signal — informational/conversational context across all three languages
  • Hinglish markers are semantically coherent (arey = hey/exclamation in abusive context, punish, interior) despite code-mixing — v2's 50 epochs on Hinglish-first produced stronger Hinglish feature learning than v1
  • Spurious correlations remain (syntax, sine) — inherent limitation of non-contextual GloVe; a BERT-based model would resolve these

Related


Citation

@misc{sasc2026,
  title={Multilingual Hate Speech Detection via Sequential Transfer Learning (v2)},
  author={tuklu},
  year={2026},
  publisher={HuggingFace},
  url={https://huggingface.co/tuklu/SASCv2}
}