| --- |
| title: Toxic Comment Classifier & Explainer |
| emoji: π§ͺ |
| colorFrom: indigo |
| colorTo: gray |
| sdk: gradio |
| sdk_version: 4.44.1 |
| python_version: "3.10" |
| app_file: app.py |
| pinned: true |
| license: mit |
|
|
| |
| description: > |
| A multi-label transformer-based Toxic Comment Classifier trained on the Jigsaw dataset. |
| It includes an explainability module (Captum Integrated Gradients) that visualizes |
| which words contribute most to each toxic label, powered by Gradio UI. |
| tags: |
| - text-classification |
| - multi-label |
| - explainable-ai |
| - transformers |
| - gradio |
| - distilbert |
| - nlp |
| - toxicity-detection |
| - huggingface-space |
| --- |
| |
| # π§ Toxic Comment Classification β Explainable Multi-Label NLP Model |
|
|
| <p align="center"> |
| <img src="banner.png" alt="Toxic Comment Classification Banner" width="100%"> |
| </p> |
|
|
| <p align="center"> |
| <b>DistilBERT-based multi-label classifier for detecting toxic online comments with explainability powered by Captum Integrated Gradients (IG).</b> |
| </p> |
|
|
| --- |
|
|
| ## π Overview |
|
|
| This project presents an **explainable AI system** for identifying toxic comments in text, built using a fine-tuned Transformer model (DistilBERT). |
| It performs **multi-label classification** across six toxicity categories while offering **token-level explanations** for each prediction. |
|
|
| ### π§© Labels |
| - toxic |
| - severe_toxic |
| - obscene |
| - threat |
| - insult |
| - identity_hate |
|
|
| ### π― Objectives |
| - Fine-tune DistilBERT for robust multi-label toxicity detection |
| - Enhance interpretability using **Captum Integrated Gradients** |
| - Deploy a real-time, user-friendly **Gradio interface** |
|
|
| --- |
|
|
| ## π§ͺ How to Use the Demo |
|
|
| 1. Type or paste any comment in the text box |
| 2. Click **βClassifyβ** to view per-label probabilities and predictions |
| 3. Open the **βExplainβ** tab β select a target label |
| 4. Generate a heatmap showing which words **support (red)** or **oppose (blue)** the decision |
|
|
| --- |
|
|
| ## π§ Example Inputs |
|
|
| | Example | Expected Labels | |
| |----------|------------------| |
| | βYou are a complete idiot.β | toxic / insult | |
| | βI will kill you tomorrow.β | threat / toxic | |
| | βThanks for your help today!β | non-toxic | |
| | βGo away, you people donβt belong here.β | identity_hate / insult | |
| |
| --- |
| |
| ## βοΈ Technical Stack |
| |
| | Component | Technology | |
| |------------|-------------| |
| | **Language Model** | DistilBERT (`distilbert-base-uncased`) | |
| | **Frameworks** | PyTorch β’ Transformers β’ Gradio | |
| | **Explainability** | Captum (Integrated Gradients) | |
| | **Training** | Stratified splits β’ Early Stopping β’ Regularization | |
| | **Visualization** | Gradio UI + Captum HTML heatmaps | |
| | **Deployment** | Hugging Face Spaces | |
| |
| --- |
| |
| ## π Project Structure |
| |
| ``` |
| . |
| βββ app.py # Gradio app entry point |
| βββ requirements.txt # Runtime dependencies |
| βββ artifacts/ |
| β βββ best/ # Fine-tuned model weights + tokenizer |
| β βββ thresholds.json # Tuned thresholds for each label |
| βββ README.md # (this file) |
| ``` |
| |
| --- |
| |
| ## π Model Training Summary |
| |
| - Dataset: [Jigsaw Toxic Comment Classification Challenge](https://www.kaggle.com/datasets/julian3833/jigsaw-toxic-comment-classification-challenge) |
| - Tokenization: DistilBERT (max length = 256) |
| - Loss: Binary Cross-Entropy with Logits (BCEWithLogitsLoss) |
| - Optimizer: AdamW (learning rate = 2e-5, weight decay = 0.02) |
| - Regularization: Dropout (head=0.5, encoder=0.2) |
| - Evaluation Metrics: Macro F1 β’ Precision β’ Recall β’ AUC |
| - Explainability: Captum Layer Integrated Gradients (LIG) |
| |
| --- |
| |
| ## π₯οΈ Live Demo |
| |
| > π Try the interactive demo on Hugging Face Spaces: |
| > π **[yaekobB / Toxic-Comment-Classification](https://huggingface.co/spaces/yaekobB/Toxic-Comment-Classification)** |
| |
| --- |
| |
| ## π§° Dependencies |
| |
| ```txt |
| transformers>=4.41.0 |
| torch>=2.2.0 |
| safetensors>=0.4.2 |
| gradio>=4.20.0 |
| captum>=0.7.0 |
| pandas>=2.0.0 |
| numpy>=1.24.0 |
| ``` |
| |
| --- |
| |
| --- |
| |
| ## πͺͺ License |
| |
| This project is licensed under the **MIT License**. |
| You are free to use, modify, and distribute this work with attribution. |
| |
| --- |
| |
| <p align="center"> |
| <i>βBuilding safer and explainable AI for online interactions.β</i> |
| </p> |
| |