Youssef-El-SaYed's picture
Update README.md
e192b60 verified
metadata
license: mit
datasets:
  - thesofakillers/jigsaw-toxic-comment-classification-challenge
language:
  - en
metrics:
  - accuracy
  - f1
tags:
  - text-classification
  - toxic_comment
  - nlp
  - transformers
  - distilbert
pipeline_tag: text-classification

Toxic Comment Classifier (Distil-bert-uncased)

This model is a fine-tuned Distil-bert-uncased model for toxic comment classification.
It classifies comments as either toxic or non-toxic.

Training

The model was trained using Hugging Face Trainer on a labeled toxic comment dataset.
Evaluation metrics:

  • Accuracy: ~97%
  • F1 score: ~83%

Intended Use

  • Detecting toxic or harmful language in text.
  • Useful for moderation in forums, social media, and chat systems.

Limitations

  • May not capture sarcasm or subtle toxicity.
  • Biases in the training dataset may affect predictions.

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline
 
model_id = "Youssef-El-SaYed/toxic-comment-classifier"

# Define mapping
id2label = {0: "Non-Toxic", 1: "Toxic"}
label2id = {"Non-Toxic": 0, "Toxic": 1}

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(
    model_id,
    id2label=id2label,
    label2id=label2id
)

nlp = pipeline("text-classification", model=model, tokenizer=tokenizer)

print(nlp("You are so stupid and annoying!"))  
print(nlp("I really like your work, keep it up!"))