toxic-comment-classifier / README.md

Youssef-El-SaYed

Update README.md

e192b60 verified 4 months ago

preview code

raw

history blame contribute delete

1.5 kB

metadata

license: mit
datasets:
  - thesofakillers/jigsaw-toxic-comment-classification-challenge
language:
  - en
metrics:
  - accuracy
  - f1
tags:
  - text-classification
  - toxic_comment
  - nlp
  - transformers
  - distilbert
pipeline_tag: text-classification

Toxic Comment Classifier (Distil-bert-uncased)

This model is a fine-tuned Distil-bert-uncased model for toxic comment classification.
It classifies comments as either toxic or non-toxic.

Training

The model was trained using Hugging Face Trainer on a labeled toxic comment dataset.
Evaluation metrics:

Accuracy: ~97%
F1 score: ~83%

Intended Use

Detecting toxic or harmful language in text.
Useful for moderation in forums, social media, and chat systems.

Limitations

May not capture sarcasm or subtle toxicity.
Biases in the training dataset may affect predictions.

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline
 
model_id = "Youssef-El-SaYed/toxic-comment-classifier"

# Define mapping
id2label = {0: "Non-Toxic", 1: "Toxic"}
label2id = {"Non-Toxic": 0, "Toxic": 1}

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(
    model_id,
    id2label=id2label,
    label2id=label2id
)

nlp = pipeline("text-classification", model=model, tokenizer=tokenizer)

print(nlp("You are so stupid and annoying!"))  
print(nlp("I really like your work, keep it up!"))