File size: 4,212 Bytes
061c06e 457f484 061c06e 87c4532 061c06e d71c4b7 061c06e 457f484 061c06e | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 | ---
title: Toxic Comment Classifier & Explainer
emoji: π§ͺ
colorFrom: indigo
colorTo: gray
sdk: gradio
sdk_version: 4.44.1
python_version: "3.10"
app_file: app.py
pinned: true
license: mit
# π New fields
description: >
A multi-label transformer-based Toxic Comment Classifier trained on the Jigsaw dataset.
It includes an explainability module (Captum Integrated Gradients) that visualizes
which words contribute most to each toxic label, powered by Gradio UI.
tags:
- text-classification
- multi-label
- explainable-ai
- transformers
- gradio
- distilbert
- nlp
- toxicity-detection
- huggingface-space
---
# π§ Toxic Comment Classification β Explainable Multi-Label NLP Model
<p align="center">
<img src="banner.png" alt="Toxic Comment Classification Banner" width="100%">
</p>
<p align="center">
<b>DistilBERT-based multi-label classifier for detecting toxic online comments with explainability powered by Captum Integrated Gradients (IG).</b>
</p>
---
## π Overview
This project presents an **explainable AI system** for identifying toxic comments in text, built using a fine-tuned Transformer model (DistilBERT).
It performs **multi-label classification** across six toxicity categories while offering **token-level explanations** for each prediction.
### π§© Labels
- toxic
- severe_toxic
- obscene
- threat
- insult
- identity_hate
### π― Objectives
- Fine-tune DistilBERT for robust multi-label toxicity detection
- Enhance interpretability using **Captum Integrated Gradients**
- Deploy a real-time, user-friendly **Gradio interface**
---
## π§ͺ How to Use the Demo
1. Type or paste any comment in the text box
2. Click **βClassifyβ** to view per-label probabilities and predictions
3. Open the **βExplainβ** tab β select a target label
4. Generate a heatmap showing which words **support (red)** or **oppose (blue)** the decision
---
## π§ Example Inputs
| Example | Expected Labels |
|----------|------------------|
| βYou are a complete idiot.β | toxic / insult |
| βI will kill you tomorrow.β | threat / toxic |
| βThanks for your help today!β | non-toxic |
| βGo away, you people donβt belong here.β | identity_hate / insult |
---
## βοΈ Technical Stack
| Component | Technology |
|------------|-------------|
| **Language Model** | DistilBERT (`distilbert-base-uncased`) |
| **Frameworks** | PyTorch β’ Transformers β’ Gradio |
| **Explainability** | Captum (Integrated Gradients) |
| **Training** | Stratified splits β’ Early Stopping β’ Regularization |
| **Visualization** | Gradio UI + Captum HTML heatmaps |
| **Deployment** | Hugging Face Spaces |
---
## π Project Structure
```
.
βββ app.py # Gradio app entry point
βββ requirements.txt # Runtime dependencies
βββ artifacts/
β βββ best/ # Fine-tuned model weights + tokenizer
β βββ thresholds.json # Tuned thresholds for each label
βββ README.md # (this file)
```
---
## π Model Training Summary
- Dataset: [Jigsaw Toxic Comment Classification Challenge](https://www.kaggle.com/datasets/julian3833/jigsaw-toxic-comment-classification-challenge)
- Tokenization: DistilBERT (max length = 256)
- Loss: Binary Cross-Entropy with Logits (BCEWithLogitsLoss)
- Optimizer: AdamW (learning rate = 2e-5, weight decay = 0.02)
- Regularization: Dropout (head=0.5, encoder=0.2)
- Evaluation Metrics: Macro F1 β’ Precision β’ Recall β’ AUC
- Explainability: Captum Layer Integrated Gradients (LIG)
---
## π₯οΈ Live Demo
> π Try the interactive demo on Hugging Face Spaces:
> π **[yaekobB / Toxic-Comment-Classification](https://huggingface.co/spaces/yaekobB/Toxic-Comment-Classification)**
---
## π§° Dependencies
```txt
transformers>=4.41.0
torch>=2.2.0
safetensors>=0.4.2
gradio>=4.20.0
captum>=0.7.0
pandas>=2.0.0
numpy>=1.24.0
```
---
---
## πͺͺ License
This project is licensed under the **MIT License**.
You are free to use, modify, and distribute this work with attribution.
---
<p align="center">
<i>βBuilding safer and explainable AI for online interactions.β</i>
</p>
|