Toxic Comment Classifier

Fine-tuned DistilBERT for multi-label toxic comment classification, wrapped in a live FastAPI endpoint with Docker deployment.

Model Details

  • Base model: distilbert-base-uncased
  • Task: Multi-label text classification (6 labels)
  • Labels: toxic, severe_toxic, obscene, threat, insult, identity_hate
  • Training data: Jigsaw Toxic Comment Classification (159,571 comments)
  • Framework: HuggingFace Transformers + Trainer API
  • Training time: ~28 minutes on T4 GPU

Performance

Metric Score
ROC-AUC (macro) 0.990
F1 (macro) 0.652

F1 is lower due to severe class imbalance โ€” threat label has only 478 positive examples out of 159k. ROC-AUC of 0.990 reflects true model quality.

Latency Benchmark (GPU, single request)

Percentile Latency
p50 7.3ms
p95 9.9ms
p99 19.4ms
min 4.1ms
max 26.2ms
mean 7.3ms

Live API

Example Predictions

Comment Flagged
I love this community! clean
You are the most stupid idiot toxic, insult, obscene
I will find you and hurt you toxic, threat
I will beat you in the hotel toxic, threat

Limitations

  • Trained on Wikipedia comments, may not generalise to all domains
  • Severe class imbalance: threat label has only 478 training examples
  • Predictions above 0.5 threshold are flagged (adjustable)
  • Not suitable for production content moderation without human review
Downloads last month
11
Safetensors
Model size
67M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Space using chandkr123/toxic-comment-classifier 1