Spaces:

YaekobB
/

Toxic-Comment-Classification

Sleeping

App Files Files Community

yaekobB commited on Oct 15, 2025

Commit

061c06e

1 Parent(s): 9710b79

Add README and ensure LFS-tracked binaries

Browse files

Files changed (1) hide show

README.md +136 -0

README.md ADDED Viewed

	@@ -0,0 +1,136 @@

+---
+title: Toxic Comment Classification
+emoji: 🧪
+colorFrom: indigo
+colorTo: gray
+sdk: gradio
+sdk_version: 4.20.0
+app_file: app.py
+pinned: true
+license: mit
+---
+# 🧠 Toxic Comment Classification — Explainable Multi-Label NLP Model
+<p align="center">
+  <img src="banner.png" alt="Toxic Comment Classification Banner" width="100%">
+</p>
+<p align="center">
+  <b>DistilBERT-based multi-label classifier for detecting toxic online comments with explainability powered by Captum Integrated Gradients (IG).</b>
+</p>
+---
+## 🚀 Overview
+This project presents an **explainable AI system** for identifying toxic comments in text, built using a fine-tuned Transformer model (DistilBERT).
+It performs **multi-label classification** across six toxicity categories while offering **token-level explanations** for each prediction.
+### 🧩 Labels
+- toxic
+- severe_toxic
+- obscene
+- threat
+- insult
+- identity_hate
+### 🎯 Objectives
+- Fine-tune DistilBERT for robust multi-label toxicity detection
+- Enhance interpretability using **Captum Integrated Gradients**
+- Deploy a real-time, user-friendly **Gradio interface**
+---
+## 🧪 How to Use the Demo
+1. Type or paste any comment in the text box
+2. Click **“Classify”** to view per-label probabilities and predictions
+3. Open the **“Explain”** tab → select a target label
+4. Generate a heatmap showing which words **support (red)** or **oppose (blue)** the decision
+---
+## 🧠 Example Inputs
+| Example | Expected Labels |
+|----------|------------------|
+| “You are a complete idiot.” | toxic / insult |
+| “I will kill you tomorrow.” | threat / toxic |
+| “Thanks for your help today!” | non-toxic |
+| “Go away, you people don’t belong here.” | identity_hate / insult |
+---
+## ⚙️ Technical Stack
+| Component | Technology |
+|------------|-------------|
+| **Language Model** | DistilBERT (`distilbert-base-uncased`) |
+| **Frameworks** | PyTorch • Transformers • Gradio |
+| **Explainability** | Captum (Integrated Gradients) |
+| **Training** | Stratified splits • Early Stopping • Regularization |
+| **Visualization** | Gradio UI + Captum HTML heatmaps |
+| **Deployment** | Hugging Face Spaces |
+---
+## 📂 Project Structure
+```
+.
+├── app.py                # Gradio app entry point
+├── requirements.txt      # Runtime dependencies
+├── artifacts/
+│   ├── best/             # Fine-tuned model weights + tokenizer
+│   └── thresholds.json   # Tuned thresholds for each label
+└── README.md             # (this file)
+```
+---
+## 📊 Model Training Summary
+- Dataset: [Jigsaw Toxic Comment Classification Challenge](https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge)
+- Tokenization: DistilBERT (max length = 256)
+- Loss: Binary Cross-Entropy with Logits (BCEWithLogitsLoss)
+- Optimizer: AdamW (learning rate = 2e-5, weight decay = 0.02)
+- Regularization: Dropout (head=0.5, encoder=0.2)
+- Evaluation Metrics: Macro F1 • Precision • Recall • AUC
+- Explainability: Captum Layer Integrated Gradients (LIG)
+---
+## 🖥️ Live Demo
+> 🚀 Try the interactive demo on Hugging Face Spaces:
+> 🔗 **[yaekobB / Toxic-Comment-Classification](https://huggingface.co/spaces/yaekobB/Toxic-Comment-Classification)**
+---
+## 🧰 Dependencies
+```txt
+transformers>=4.41.0
+torch>=2.2.0
+safetensors>=0.4.2
+gradio>=4.20.0
+captum>=0.7.0
+pandas>=2.0.0
+numpy>=1.24.0
+```
+---
+---
+## 🪪 License
+This project is licensed under the **MIT License**.
+You are free to use, modify, and distribute this work with attribution.
+---
+<p align="center">
+  <i>“Building safer and explainable AI for online interactions.”</i>
+</p>