YaekobB's picture
Update README.md
87c4532 verified

A newer version of the Gradio SDK is available: 6.14.0

Upgrade
metadata
title: Toxic Comment Classifier & Explainer
emoji: πŸ§ͺ
colorFrom: indigo
colorTo: gray
sdk: gradio
sdk_version: 4.44.1
python_version: '3.10'
app_file: app.py
pinned: true
license: mit
description: >
  A multi-label transformer-based Toxic Comment Classifier trained on the Jigsaw
  dataset. It includes an explainability module (Captum Integrated Gradients)
  that visualizes which words contribute most to each toxic label, powered by
  Gradio UI.
tags:
  - text-classification
  - multi-label
  - explainable-ai
  - transformers
  - gradio
  - distilbert
  - nlp
  - toxicity-detection
  - huggingface-space

🧠 Toxic Comment Classification β€” Explainable Multi-Label NLP Model

Toxic Comment Classification Banner

DistilBERT-based multi-label classifier for detecting toxic online comments with explainability powered by Captum Integrated Gradients (IG).


πŸš€ Overview

This project presents an explainable AI system for identifying toxic comments in text, built using a fine-tuned Transformer model (DistilBERT).
It performs multi-label classification across six toxicity categories while offering token-level explanations for each prediction.

🧩 Labels

  • toxic
  • severe_toxic
  • obscene
  • threat
  • insult
  • identity_hate

🎯 Objectives

  • Fine-tune DistilBERT for robust multi-label toxicity detection
  • Enhance interpretability using Captum Integrated Gradients
  • Deploy a real-time, user-friendly Gradio interface

πŸ§ͺ How to Use the Demo

  1. Type or paste any comment in the text box
  2. Click β€œClassify” to view per-label probabilities and predictions
  3. Open the β€œExplain” tab β†’ select a target label
  4. Generate a heatmap showing which words support (red) or oppose (blue) the decision

🧠 Example Inputs

Example Expected Labels
β€œYou are a complete idiot.” toxic / insult
β€œI will kill you tomorrow.” threat / toxic
β€œThanks for your help today!” non-toxic
β€œGo away, you people don’t belong here.” identity_hate / insult

βš™οΈ Technical Stack

Component Technology
Language Model DistilBERT (distilbert-base-uncased)
Frameworks PyTorch β€’ Transformers β€’ Gradio
Explainability Captum (Integrated Gradients)
Training Stratified splits β€’ Early Stopping β€’ Regularization
Visualization Gradio UI + Captum HTML heatmaps
Deployment Hugging Face Spaces

πŸ“‚ Project Structure

.
β”œβ”€β”€ app.py                # Gradio app entry point
β”œβ”€β”€ requirements.txt      # Runtime dependencies
β”œβ”€β”€ artifacts/
β”‚   β”œβ”€β”€ best/             # Fine-tuned model weights + tokenizer
β”‚   └── thresholds.json   # Tuned thresholds for each label
└── README.md             # (this file)

πŸ“Š Model Training Summary

  • Dataset: Jigsaw Toxic Comment Classification Challenge
  • Tokenization: DistilBERT (max length = 256)
  • Loss: Binary Cross-Entropy with Logits (BCEWithLogitsLoss)
  • Optimizer: AdamW (learning rate = 2e-5, weight decay = 0.02)
  • Regularization: Dropout (head=0.5, encoder=0.2)
  • Evaluation Metrics: Macro F1 β€’ Precision β€’ Recall β€’ AUC
  • Explainability: Captum Layer Integrated Gradients (LIG)

πŸ–₯️ Live Demo

πŸš€ Try the interactive demo on Hugging Face Spaces:
πŸ”— yaekobB / Toxic-Comment-Classification


🧰 Dependencies

transformers>=4.41.0
torch>=2.2.0
safetensors>=0.4.2
gradio>=4.20.0
captum>=0.7.0
pandas>=2.0.0
numpy>=1.24.0


πŸͺͺ License

This project is licensed under the MIT License.
You are free to use, modify, and distribute this work with attribution.


β€œBuilding safer and explainable AI for online interactions.”