--- license: mit datasets: - thesofakillers/jigsaw-toxic-comment-classification-challenge language: - en metrics: - accuracy - f1 tags: - text-classification - toxic_comment - nlp - transformers - distilbert pipeline_tag: text-classification --- # Toxic Comment Classifier (Distil-bert-uncased) This model is a fine-tuned **Distil-bert-uncased** model for **toxic comment classification**. It classifies comments as either **toxic** or **non-toxic**. ## Training The model was trained using Hugging Face `Trainer` on a labeled toxic comment dataset. Evaluation metrics: - **Accuracy:** ~97% - **F1 score:** ~83% ## Intended Use - Detecting toxic or harmful language in text. - Useful for moderation in forums, social media, and chat systems. ## Limitations - May not capture sarcasm or subtle toxicity. - Biases in the training dataset may affect predictions. ## Usage ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline model_id = "Youssef-El-SaYed/toxic-comment-classifier" # Define mapping id2label = {0: "Non-Toxic", 1: "Toxic"} label2id = {"Non-Toxic": 0, "Toxic": 1} tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForSequenceClassification.from_pretrained( model_id, id2label=id2label, label2id=label2id ) nlp = pipeline("text-classification", model=model, tokenizer=tokenizer) print(nlp("You are so stupid and annoying!")) print(nlp("I really like your work, keep it up!"))