SASCv2 / README.md

Add SHAP summary to README and full SHAP_REPORT.md

dfa11db verified 5 days ago

20.6 kB

language:
  - en
  - hi
tags:
  - hate-speech
  - text-classification
  - bilstm
  - glove
  - multilingual
  - transfer-learning
  - hinglish
  - sequential-learning
datasets:
  - tuklu/nprism
license: mit
model-index:
  - name: hate-speech-multilingual-bilstm-v2
    results:
      - task:
          type: text-classification
          name: Hate Speech Detection
        dataset:
          name: nprism
          type: tuklu/nprism
        metrics:
          - type: f1
            value: 0.6566
            name: F1 Score (Full Phase — Full Test)
          - type: accuracy
            value: 0.6866
            name: Accuracy (Full Phase — Full Test)
          - type: roc_auc
            value: 0.7556
            name: ROC-AUC (Full Phase — Full Test)

Multilingual Hate Speech Detection — GloVe + BiLSTM (v2)

Task: Binary text classification (Hate / Non-Hate) Languages: English, Hindi, Hinglish (Hindi-English code-mixed) Architecture: Bidirectional LSTM with frozen GloVe embeddings Strategy: Hinglish → Hindi → English → Full (50 epochs per phase, 200 total)

What This Experiment Does
The Dataset
Model Architecture
Training Strategy
Phase 1 — Hinglish
Phase 2 — Hindi
Phase 3 — English
Phase 4 — Full Dataset
Full Results Table
How to Use

1. What This Experiment Does

This is v2 of the SASC sequential transfer learning experiment.

v1 ran all 6 permutations of [English, Hindi, Hinglish] with 8 epochs per phase. v2 focuses on a single strategy — Hinglish → Hindi → English → Full — but trains for 50 epochs per phase (200 total). The key new addition: after every phase the model is evaluated on all three individual language test sets AND the full test set, giving a complete 4×4 cross-evaluation matrix showing how knowledge transfers across languages.

2. The Dataset

Dataset: tuklu/nprism

Split	Samples
Train	17,704
Validation	2,950
Test	8,852
Total	29,505

Language	Count	%
English	14,994	50.8%
Hindi	9,738	33.0%
Hinglish	4,774	16.2%

Label	Count	%
Non-Hate (0)	15,799	53.5%
Hate (1)	13,707	46.5%

The dataset is dominated by English (50.8%). GloVe embeddings are also English-centric, which directly explains why the English phase produces the sharpest accuracy jump regardless of training order.

3. Model Architecture

Input: Text sequence (max 100 tokens)
       ↓
GloVe Embedding Layer (vocab: 50,000 × 300d) — FROZEN
       ↓
Bidirectional LSTM (128 units)
   → reads sentence left-to-right AND right-to-left
       ↓
Dropout (0.5)
       ↓
Dense Layer (64 neurons, ReLU)
       ↓
Output Layer (1 neuron, Sigmoid)
   → > 0.5 = Hate Speech  |  ≤ 0.5 = Non-Hate

Optimizer: Adam
Loss: Binary Cross-Entropy
Max sequence length: 100 tokens
Vocab size: 50,000

4. Training Strategy

Phase	Training Data	Epochs	Batch Size	Samples
1 — Hinglish	Hinglish subset	50	32	~2,908
2 — Hindi	Hindi subset	50	32	~5,940
3 — English	English subset	50	32	~8,856
4 — Full	All shuffled	50	64	17,704

The same model carries its weights through all 4 phases — no resets between languages. After each phase the model is evaluated against all three language-specific test sets and the full test set.

5. Phase 1 — Hinglish

Training on Hinglish only (2,908 samples, 50 epochs). The model starts cold. Hinglish is code-mixed and GloVe has limited coverage — the model learns from sequential patterns rather than word semantics.

Evaluation after Phase 1

Eval On	Accuracy	Balanced Acc	Precision	Recall	Specificity	F1	ROC-AUC
Hinglish	0.6688	0.6378	0.6058	0.4848	0.7908	0.5386	0.6579
Hindi	0.4493	0.5000	0.4493	1.0000	0.0000	0.6200	0.5234
English	0.5171	0.5125	0.5738	0.0916	0.9334	0.1580	0.5620
Full	0.5190	0.5133	0.4803	0.4331	0.5935	0.4555	0.5243

The Hindi result (Recall=1.0, Specificity=0.0) shows the model predicts everything as hate on Hindi — it has no Hindi-specific knowledge yet. English performance is near-random. Hinglish F1=0.539 shows the model has learned something useful from its own language.

Eval On	Confusion Matrix	ROC	Precision-Recall	F1 vs Threshold
Hinglish
Hindi
English
Full

6. Phase 2 — Hindi

Training on Hindi (5,940 samples, 50 epochs). GloVe has limited Hindi coverage so the model must rely on contextual patterns. The struggle here is deliberate — it builds language-agnostic hate detection features.

Evaluation after Phase 2

Eval On	Accuracy	Balanced Acc	Precision	Recall	Specificity	F1	ROC-AUC
Hinglish	0.5409	0.4885	0.3761	0.2299	0.7470	0.2854	0.4771
Hindi	0.5834	0.5730	0.5420	0.4705	0.6756	0.5037	0.5949
English	0.4711	0.4744	0.4789	0.7878	0.1611	0.5957	0.4292
Full	0.5190	0.5251	0.4859	0.6111	0.4390	0.5414	0.5255

Hindi F1 improves to 0.504. Hinglish drops — the model has partially overwritten Hinglish-specific patterns. English recall spikes (high false positives) showing the model is now biased toward predicting hate. This is the expected "catastrophic interference" that the Full phase resolves.

Eval On	Confusion Matrix	ROC	Precision-Recall	F1 vs Threshold
Hinglish
Hindi
English
Full

7. Phase 3 — English

Training on English (8,856 samples, 50 epochs). This is the turning point. GloVe embeddings align well with English — the model jumps sharply and the English-phase knowledge partially generalises back to the other languages.

Evaluation after Phase 3

Eval On	Accuracy	Balanced Acc	Precision	Recall	Specificity	F1	ROC-AUC
Hinglish	0.4115	0.4938	0.3955	0.9002	0.0875	0.5495	0.4572
Hindi	0.5424	0.5399	0.4912	0.5150	0.5648	0.5028	0.5377
English	0.7721	0.7726	0.7453	0.8190	0.7262	0.7804	0.8458
Full	0.6395	0.6458	0.5901	0.7337	0.5578	0.6541	0.6913

English F1 leaps to 0.780 — the model now performs strongly on its native language. Full AUC reaches 0.691. Hinglish specificity collapses again (high recall, low precision) — the model over-predicts hate on unseen languages after English fine-tuning.

Eval On	Confusion Matrix	ROC	Precision-Recall	F1 vs Threshold
Hinglish
Hindi
English
Full

8. Phase 4 — Full Dataset

Training on the full shuffled dataset (17,704 samples, 50 epochs). This consolidation phase exposes the model to all three languages simultaneously, balancing out the per-language biases accumulated during sequential training.

Evaluation after Phase 4 (Final Model)

Eval On	Accuracy	Balanced Acc	Precision	Recall	Specificity	F1	ROC-AUC
Hinglish	0.6326	0.6101	0.5426	0.4991	0.7210	0.5200	0.6161
Hindi	0.5748	0.5676	0.5286	0.4958	0.6393	0.5117	0.5941
English	0.7747	0.7746	0.7747	0.7678	0.7815	0.7712	0.8476
Full	0.6866	0.6839	0.6687	0.6449	0.7228	0.6566	0.7556

The Full phase restores balance across all languages. Hinglish specificity recovers to 0.721 (from 0.088 after English phase). Full-dataset AUC reaches 0.756 — the best of all phases. English performance is preserved at F1=0.771 while Hinglish and Hindi both improve substantially from their post-English-phase collapse.

Eval On	Confusion Matrix	ROC	Precision-Recall	F1 vs Threshold
Hinglish
Hindi
English
Full

9. Full Results Table

Complete 16-row cross-evaluation (Phase × Eval Language):

Phase	Eval On	Accuracy	Balanced Acc	Precision	Recall	Specificity	F1	ROC-AUC
hinglish	hinglish	0.6688	0.6378	0.6058	0.4848	0.7908	0.5386	0.6579
hinglish	hindi	0.4493	0.5000	0.4493	1.0000	0.0000	0.6200	0.5234
hinglish	english	0.5171	0.5125	0.5738	0.0916	0.9334	0.1580	0.5620
hinglish	full	0.5190	0.5133	0.4803	0.4331	0.5935	0.4555	0.5243
hindi	hinglish	0.5409	0.4885	0.3761	0.2299	0.7470	0.2854	0.4771
hindi	hindi	0.5834	0.5730	0.5420	0.4705	0.6756	0.5037	0.5949
hindi	english	0.4711	0.4744	0.4789	0.7878	0.1611	0.5957	0.4292
hindi	full	0.5190	0.5251	0.4859	0.6111	0.4390	0.5414	0.5255
english	hinglish	0.4115	0.4938	0.3955	0.9002	0.0875	0.5495	0.4572
english	hindi	0.5424	0.5399	0.4912	0.5150	0.5648	0.5028	0.5377
english	english	0.7721	0.7726	0.7453	0.8190	0.7262	0.7804	0.8458
english	full	0.6395	0.6458	0.5901	0.7337	0.5578	0.6541	0.6913
Full	hinglish	0.6326	0.6101	0.5426	0.4991	0.7210	0.5200	0.6161
Full	hindi	0.5748	0.5676	0.5286	0.4958	0.6393	0.5117	0.5941
Full	english	0.7747	0.7746	0.7747	0.7678	0.7815	0.7712	0.8476
Full	full	0.6866	0.6839	0.6687	0.6449	0.7228	0.6566	0.7556

Key Observations

English phase is the sharpest turning point — English F1 jumps from 0.596 (after Hindi) to 0.780 in one phase, driven by GloVe's English-centric embeddings.
Starting from Hinglish forces generalisation from noise — the model reaches Hinglish F1=0.539 after only its own phase, a stronger start than Hinglish gets in most v1 orderings.
Catastrophic interference is visible — Hinglish specificity drops from 0.791 → 0.747 → 0.088 as the model progressively shifts language bias. The Full phase restores it to 0.721.
Final Full phase AUC = 0.756 matches the best v1 strategies despite a harder starting language, confirming the robustness of the Hinglish-first approach with deeper training.
Hindi remains the hardest (F1=0.512 at final) — consistent with GloVe's limited Hindi vocabulary coverage.

10. How to Use

import json
import numpy as np
import tensorflow as tf
from tensorflow.keras.preprocessing.text import tokenizer_from_json
from tensorflow.keras.preprocessing.sequence import pad_sequences
from huggingface_hub import hf_hub_download

# Load tokenizer (from v1 repo — same dataset/split)
tokenizer_path = hf_hub_download(repo_id="tuklu/SASC", filename="tokenizer.json")
with open(tokenizer_path) as f:
    tokenizer = tokenizer_from_json(f.read())

# Load model
model_path = hf_hub_download(repo_id="tuklu/SASCv2", filename="model.h5")
model = tf.keras.models.load_model(model_path)

# Predict
texts = ["I hate all of them", "Have a great day!"]
sequences = tokenizer.texts_to_sequences(texts)
padded = pad_sequences(sequences, maxlen=100)
probs = model.predict(padded).flatten()

for text, prob in zip(texts, probs):
    label = "Hate Speech" if prob > 0.5 else "Non-Hate"
    print(f"{label} ({prob:.3f}): {text}")

Explainability — SHAP Analysis

We applied SHAP (SHapley Additive exPlanations) to the final trained model to understand which words drive hate speech predictions. A GradientExplainer runs on the BiLSTM sub-model (embedding layer bypassed — embeddings pre-computed as floats), with 200 background training samples, evaluated on all 4 test sets.

Full methodology, all strategy comparisons, and detailed word tables: SHAP_REPORT.md

Top SHAP Words — Final Model

Eval	Top Hate Words	Top Non-Hate Words
English	nas, fags, sicko, sabotage, advocating	grow, barrel, homosexual, pak, join
Hindi	वादा, वैज्ञानिकों, ऐ, उतारा, गला	जीतेगा, घोंटने, जिहादी, आपत्तिजनक
Hinglish	arey, bahir, punish, papa, interior	online, member, mam, messages, asha
Full	blamed, criticized, syntax, grown, sine	underneath, smack, online, hole, clue

Key Takeaways

Hindi SHAP values are 10× smaller than English/Hinglish — GloVe has near-zero Hindi coverage; model relies on positional patterns, not word semantics
Accusatory framing dominates full-dataset hate markers (blamed, criticized, advocating) — the 50-epoch Full phase learns that hate speech in this corpus often targets victims through blame/accusation rather than direct slurs
"online" is the most consistent non-hate signal — informational/conversational context across all three languages
Hinglish markers are semantically coherent (arey = hey/exclamation in abusive context, punish, interior) despite code-mixing — v2's 50 epochs on Hinglish-first produced stronger Hinglish feature learning than v1
Spurious correlations remain (syntax, sine) — inherent limitation of non-contextual GloVe; a BERT-based model would resolve these

v1 (all 6 strategies, 8 epochs each): tuklu/SASC
Dataset: tuklu/nprism

Citation

@misc{sasc2026,
  title={Multilingual Hate Speech Detection via Sequential Transfer Learning (v2)},
  author={tuklu},
  year={2026},
  publisher={HuggingFace},
  url={https://huggingface.co/tuklu/SASCv2}
}

tuklu
/

SASCv2