Toxic Comment Classifier
Fine-tuned DistilBERT for multi-label toxic comment classification,
wrapped in a live FastAPI endpoint with Docker deployment.
Model Details
- Base model: distilbert-base-uncased
- Task: Multi-label text classification (6 labels)
- Labels: toxic, severe_toxic, obscene, threat, insult, identity_hate
- Training data: Jigsaw Toxic Comment Classification (159,571 comments)
- Framework: HuggingFace Transformers + Trainer API
- Training time: ~28 minutes on T4 GPU
Performance
| Metric |
Score |
| ROC-AUC (macro) |
0.990 |
| F1 (macro) |
0.652 |
F1 is lower due to severe class imbalance โ threat label has only 478
positive examples out of 159k. ROC-AUC of 0.990 reflects true model quality.
Latency Benchmark (GPU, single request)
| Percentile |
Latency |
| p50 |
7.3ms |
| p95 |
9.9ms |
| p99 |
19.4ms |
| min |
4.1ms |
| max |
26.2ms |
| mean |
7.3ms |
Live API
Example Predictions
| Comment |
Flagged |
| I love this community! |
clean |
| You are the most stupid idiot |
toxic, insult, obscene |
| I will find you and hurt you |
toxic, threat |
| I will beat you in the hotel |
toxic, threat |
Limitations
- Trained on Wikipedia comments, may not generalise to all domains
- Severe class imbalance: threat label has only 478 training examples
- Predictions above 0.5 threshold are flagged (adjustable)
- Not suitable for production content moderation without human review