yaekobB commited on
Commit
061c06e
Β·
1 Parent(s): 9710b79

Add README and ensure LFS-tracked binaries

Browse files
Files changed (1) hide show
  1. README.md +136 -0
README.md ADDED
@@ -0,0 +1,136 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: Toxic Comment Classification
3
+ emoji: πŸ§ͺ
4
+ colorFrom: indigo
5
+ colorTo: gray
6
+ sdk: gradio
7
+ sdk_version: 4.20.0
8
+ app_file: app.py
9
+ pinned: true
10
+ license: mit
11
+ ---
12
+
13
+ # 🧠 Toxic Comment Classification β€” Explainable Multi-Label NLP Model
14
+
15
+ <p align="center">
16
+ <img src="banner.png" alt="Toxic Comment Classification Banner" width="100%">
17
+ </p>
18
+
19
+ <p align="center">
20
+ <b>DistilBERT-based multi-label classifier for detecting toxic online comments with explainability powered by Captum Integrated Gradients (IG).</b>
21
+ </p>
22
+
23
+ ---
24
+
25
+ ## πŸš€ Overview
26
+
27
+ This project presents an **explainable AI system** for identifying toxic comments in text, built using a fine-tuned Transformer model (DistilBERT).
28
+ It performs **multi-label classification** across six toxicity categories while offering **token-level explanations** for each prediction.
29
+
30
+ ### 🧩 Labels
31
+ - toxic
32
+ - severe_toxic
33
+ - obscene
34
+ - threat
35
+ - insult
36
+ - identity_hate
37
+
38
+ ### 🎯 Objectives
39
+ - Fine-tune DistilBERT for robust multi-label toxicity detection
40
+ - Enhance interpretability using **Captum Integrated Gradients**
41
+ - Deploy a real-time, user-friendly **Gradio interface**
42
+
43
+ ---
44
+
45
+ ## πŸ§ͺ How to Use the Demo
46
+
47
+ 1. Type or paste any comment in the text box
48
+ 2. Click **β€œClassify”** to view per-label probabilities and predictions
49
+ 3. Open the **β€œExplain”** tab β†’ select a target label
50
+ 4. Generate a heatmap showing which words **support (red)** or **oppose (blue)** the decision
51
+
52
+ ---
53
+
54
+ ## 🧠 Example Inputs
55
+
56
+ | Example | Expected Labels |
57
+ |----------|------------------|
58
+ | β€œYou are a complete idiot.” | toxic / insult |
59
+ | β€œI will kill you tomorrow.” | threat / toxic |
60
+ | β€œThanks for your help today!” | non-toxic |
61
+ | β€œGo away, you people don’t belong here.” | identity_hate / insult |
62
+
63
+ ---
64
+
65
+ ## βš™οΈ Technical Stack
66
+
67
+ | Component | Technology |
68
+ |------------|-------------|
69
+ | **Language Model** | DistilBERT (`distilbert-base-uncased`) |
70
+ | **Frameworks** | PyTorch β€’ Transformers β€’ Gradio |
71
+ | **Explainability** | Captum (Integrated Gradients) |
72
+ | **Training** | Stratified splits β€’ Early Stopping β€’ Regularization |
73
+ | **Visualization** | Gradio UI + Captum HTML heatmaps |
74
+ | **Deployment** | Hugging Face Spaces |
75
+
76
+ ---
77
+
78
+ ## πŸ“‚ Project Structure
79
+
80
+ ```
81
+ .
82
+ β”œβ”€β”€ app.py # Gradio app entry point
83
+ β”œβ”€β”€ requirements.txt # Runtime dependencies
84
+ β”œβ”€β”€ artifacts/
85
+ β”‚ β”œβ”€β”€ best/ # Fine-tuned model weights + tokenizer
86
+ β”‚ └── thresholds.json # Tuned thresholds for each label
87
+ └── README.md # (this file)
88
+ ```
89
+
90
+ ---
91
+
92
+ ## πŸ“Š Model Training Summary
93
+
94
+ - Dataset: [Jigsaw Toxic Comment Classification Challenge](https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge)
95
+ - Tokenization: DistilBERT (max length = 256)
96
+ - Loss: Binary Cross-Entropy with Logits (BCEWithLogitsLoss)
97
+ - Optimizer: AdamW (learning rate = 2e-5, weight decay = 0.02)
98
+ - Regularization: Dropout (head=0.5, encoder=0.2)
99
+ - Evaluation Metrics: Macro F1 β€’ Precision β€’ Recall β€’ AUC
100
+ - Explainability: Captum Layer Integrated Gradients (LIG)
101
+
102
+ ---
103
+
104
+ ## πŸ–₯️ Live Demo
105
+
106
+ > πŸš€ Try the interactive demo on Hugging Face Spaces:
107
+ > πŸ”— **[yaekobB / Toxic-Comment-Classification](https://huggingface.co/spaces/yaekobB/Toxic-Comment-Classification)**
108
+
109
+ ---
110
+
111
+ ## 🧰 Dependencies
112
+
113
+ ```txt
114
+ transformers>=4.41.0
115
+ torch>=2.2.0
116
+ safetensors>=0.4.2
117
+ gradio>=4.20.0
118
+ captum>=0.7.0
119
+ pandas>=2.0.0
120
+ numpy>=1.24.0
121
+ ```
122
+
123
+ ---
124
+
125
+ ---
126
+
127
+ ## πŸͺͺ License
128
+
129
+ This project is licensed under the **MIT License**.
130
+ You are free to use, modify, and distribute this work with attribution.
131
+
132
+ ---
133
+
134
+ <p align="center">
135
+ <i>β€œBuilding safer and explainable AI for online interactions.”</i>
136
+ </p>