Spaces:

YaekobB
/

Toxic-Comment-Classification

Sleeping

App Files Files Community

Toxic-Comment-Classification / README.md

YaekobB

Update README.md

87c4532 verified 14 days ago

preview code

raw

history blame contribute delete

4.21 kB

	---
	title: Toxic Comment Classifier & Explainer
	emoji: 🧪
	colorFrom: indigo
	colorTo: gray
	sdk: gradio
	sdk_version: 4.44.1
	python_version: "3.10"
	app_file: app.py
	pinned: true
	license: mit

	# 👇 New fields
	description: >
	A multi-label transformer-based Toxic Comment Classifier trained on the Jigsaw dataset.
	It includes an explainability module (Captum Integrated Gradients) that visualizes
	which words contribute most to each toxic label, powered by Gradio UI.
	tags:
	- text-classification
	- multi-label
	- explainable-ai
	- transformers
	- gradio
	- distilbert
	- nlp
	- toxicity-detection
	- huggingface-space
	---

	# 🧠 Toxic Comment Classification — Explainable Multi-Label NLP Model

	<p align="center">
	<img src="banner.png" alt="Toxic Comment Classification Banner" width="100%">
	</p>

	<p align="center">
	<b>DistilBERT-based multi-label classifier for detecting toxic online comments with explainability powered by Captum Integrated Gradients (IG).</b>
	</p>

	---

	## 🚀 Overview

	This project presents an explainable AI system for identifying toxic comments in text, built using a fine-tuned Transformer model (DistilBERT).
	It performs multi-label classification across six toxicity categories while offering token-level explanations for each prediction.

	### 🧩 Labels
	- toxic
	- severe_toxic
	- obscene
	- threat
	- insult
	- identity_hate

	### 🎯 Objectives
	- Fine-tune DistilBERT for robust multi-label toxicity detection
	- Enhance interpretability using Captum Integrated Gradients
	- Deploy a real-time, user-friendly Gradio interface

	---

	## 🧪 How to Use the Demo

	1. Type or paste any comment in the text box
	2. Click “Classify” to view per-label probabilities and predictions
	3. Open the “Explain” tab → select a target label
	4. Generate a heatmap showing which words support (red) or oppose (blue) the decision

	---

	## 🧠 Example Inputs

	\| Example \| Expected Labels \|
	\|----------\|------------------\|
	\| “You are a complete idiot.” \| toxic / insult \|
	\| “I will kill you tomorrow.” \| threat / toxic \|
	\| “Thanks for your help today!” \| non-toxic \|
	\| “Go away, you people don’t belong here.” \| identity_hate / insult \|

	---

	## ⚙️ Technical Stack

	\| Component \| Technology \|
	\|------------\|-------------\|
	\| Language Model \| DistilBERT (`distilbert-base-uncased`) \|
	\| Frameworks \| PyTorch • Transformers • Gradio \|
	\| Explainability \| Captum (Integrated Gradients) \|
	\| Training \| Stratified splits • Early Stopping • Regularization \|
	\| Visualization \| Gradio UI + Captum HTML heatmaps \|
	\| Deployment \| Hugging Face Spaces \|

	---

	## 📂 Project Structure

	```
	.
	├── app.py # Gradio app entry point
	├── requirements.txt # Runtime dependencies
	├── artifacts/
	│ ├── best/ # Fine-tuned model weights + tokenizer
	│ └── thresholds.json # Tuned thresholds for each label
	└── README.md # (this file)
	```

	---

	## 📊 Model Training Summary

	- Dataset: [Jigsaw Toxic Comment Classification Challenge](https://www.kaggle.com/datasets/julian3833/jigsaw-toxic-comment-classification-challenge)
	- Tokenization: DistilBERT (max length = 256)
	- Loss: Binary Cross-Entropy with Logits (BCEWithLogitsLoss)
	- Optimizer: AdamW (learning rate = 2e-5, weight decay = 0.02)
	- Regularization: Dropout (head=0.5, encoder=0.2)
	- Evaluation Metrics: Macro F1 • Precision • Recall • AUC
	- Explainability: Captum Layer Integrated Gradients (LIG)

	---

	## 🖥️ Live Demo

	> 🚀 Try the interactive demo on Hugging Face Spaces:
	> 🔗 [yaekobB / Toxic-Comment-Classification](https://huggingface.co/spaces/yaekobB/Toxic-Comment-Classification)

	---

	## 🧰 Dependencies

	```txt
	transformers>=4.41.0
	torch>=2.2.0
	safetensors>=0.4.2
	gradio>=4.20.0
	captum>=0.7.0
	pandas>=2.0.0
	numpy>=1.24.0
	```

	---

	---

	## 🪪 License

	This project is licensed under the MIT License.
	You are free to use, modify, and distribute this work with attribution.

	---

	<p align="center">
	<i>“Building safer and explainable AI for online interactions.”</i>
	</p>