Spaces:

YaekobB
/

Toxic-Comment-Classification

Sleeping

App Files Files Community

Toxic-Comment-Classification / README.md

YaekobB

Update README.md

87c4532 verified 14 days ago

preview code

raw

history blame contribute delete

4.21 kB

A newer version of the Gradio SDK is available: 6.14.0

Upgrade

metadata

title: Toxic Comment Classifier & Explainer
emoji: 🧪
colorFrom: indigo
colorTo: gray
sdk: gradio
sdk_version: 4.44.1
python_version: '3.10'
app_file: app.py
pinned: true
license: mit
description: >
  A multi-label transformer-based Toxic Comment Classifier trained on the Jigsaw
  dataset. It includes an explainability module (Captum Integrated Gradients)
  that visualizes which words contribute most to each toxic label, powered by
  Gradio UI.
tags:
  - text-classification
  - multi-label
  - explainable-ai
  - transformers
  - gradio
  - distilbert
  - nlp
  - toxicity-detection
  - huggingface-space

🧠 Toxic Comment Classification — Explainable Multi-Label NLP Model

DistilBERT-based multi-label classifier for detecting toxic online comments with explainability powered by Captum Integrated Gradients (IG).

🚀 Overview

This project presents an explainable AI system for identifying toxic comments in text, built using a fine-tuned Transformer model (DistilBERT).
It performs multi-label classification across six toxicity categories while offering token-level explanations for each prediction.

🧩 Labels

toxic
severe_toxic
obscene
threat
insult
identity_hate

🎯 Objectives

Fine-tune DistilBERT for robust multi-label toxicity detection
Enhance interpretability using Captum Integrated Gradients
Deploy a real-time, user-friendly Gradio interface

🧪 How to Use the Demo

Type or paste any comment in the text box
Click “Classify” to view per-label probabilities and predictions
Open the “Explain” tab → select a target label
Generate a heatmap showing which words support (red) or oppose (blue) the decision

🧠 Example Inputs

Example	Expected Labels
“You are a complete idiot.”	toxic / insult
“I will kill you tomorrow.”	threat / toxic
“Thanks for your help today!”	non-toxic
“Go away, you people don’t belong here.”	identity_hate / insult

⚙️ Technical Stack

Component	Technology
Language Model	DistilBERT (`distilbert-base-uncased`)
Frameworks	PyTorch • Transformers • Gradio
Explainability	Captum (Integrated Gradients)
Training	Stratified splits • Early Stopping • Regularization
Visualization	Gradio UI + Captum HTML heatmaps
Deployment	Hugging Face Spaces

📂 Project Structure

.
├── app.py                # Gradio app entry point
├── requirements.txt      # Runtime dependencies
├── artifacts/
│   ├── best/             # Fine-tuned model weights + tokenizer
│   └── thresholds.json   # Tuned thresholds for each label
└── README.md             # (this file)

📊 Model Training Summary

Dataset: Jigsaw Toxic Comment Classification Challenge
Tokenization: DistilBERT (max length = 256)
Loss: Binary Cross-Entropy with Logits (BCEWithLogitsLoss)
Optimizer: AdamW (learning rate = 2e-5, weight decay = 0.02)
Regularization: Dropout (head=0.5, encoder=0.2)
Evaluation Metrics: Macro F1 • Precision • Recall • AUC
Explainability: Captum Layer Integrated Gradients (LIG)

🖥️ Live Demo

🚀 Try the interactive demo on Hugging Face Spaces:
🔗 yaekobB / Toxic-Comment-Classification

🧰 Dependencies

transformers>=4.41.0
torch>=2.2.0
safetensors>=0.4.2
gradio>=4.20.0
captum>=0.7.0
pandas>=2.0.0
numpy>=1.24.0

🪪 License

This project is licensed under the MIT License.
You are free to use, modify, and distribute this work with attribution.

“Building safer and explainable AI for online interactions.”