File size: 1,495 Bytes
760ac8f 051029c 760ac8f e192b60 760ac8f b714058 760ac8f ffd025b 760ac8f ffd025b 760ac8f ffd025b db6f0a4 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 | ---
license: mit
datasets:
- thesofakillers/jigsaw-toxic-comment-classification-challenge
language:
- en
metrics:
- accuracy
- f1
tags:
- text-classification
- toxic_comment
- nlp
- transformers
- distilbert
pipeline_tag: text-classification
---
# Toxic Comment Classifier (Distil-bert-uncased)
This model is a fine-tuned **Distil-bert-uncased** model for **toxic comment classification**.
It classifies comments as either **toxic** or **non-toxic**.
## Training
The model was trained using Hugging Face `Trainer` on a labeled toxic comment dataset.
Evaluation metrics:
- **Accuracy:** ~97%
- **F1 score:** ~83%
## Intended Use
- Detecting toxic or harmful language in text.
- Useful for moderation in forums, social media, and chat systems.
## Limitations
- May not capture sarcasm or subtle toxicity.
- Biases in the training dataset may affect predictions.
## Usage
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline
model_id = "Youssef-El-SaYed/toxic-comment-classifier"
# Define mapping
id2label = {0: "Non-Toxic", 1: "Toxic"}
label2id = {"Non-Toxic": 0, "Toxic": 1}
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(
model_id,
id2label=id2label,
label2id=label2id
)
nlp = pipeline("text-classification", model=model, tokenizer=tokenizer)
print(nlp("You are so stupid and annoying!"))
print(nlp("I really like your work, keep it up!"))
|