File size: 1,495 Bytes
760ac8f
 
 
 
 
 
 
 
 
 
 
 
 
051029c
 
 
760ac8f
 
 
 
 
 
e192b60
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
760ac8f
 
 
b714058
 
760ac8f
 
ffd025b
 
 
 
760ac8f
ffd025b
 
 
 
 
760ac8f
 
 
ffd025b
db6f0a4
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
---
license: mit
datasets:
- thesofakillers/jigsaw-toxic-comment-classification-challenge
language:
- en
metrics:
- accuracy
- f1
tags:
- text-classification
- toxic_comment
- nlp
- transformers
- distilbert
pipeline_tag: text-classification
---
# Toxic Comment Classifier (Distil-bert-uncased)

This model is a fine-tuned **Distil-bert-uncased** model for **toxic comment classification**.  
It classifies comments as either **toxic** or **non-toxic**.


## Training

The model was trained using Hugging Face `Trainer` on a labeled toxic comment dataset.  
Evaluation metrics:

- **Accuracy:** ~97%  
- **F1 score:** ~83%  

## Intended Use

- Detecting toxic or harmful language in text.  
- Useful for moderation in forums, social media, and chat systems.  

## Limitations

- May not capture sarcasm or subtle toxicity.  
- Biases in the training dataset may affect predictions.  

## Usage

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline
 
model_id = "Youssef-El-SaYed/toxic-comment-classifier"

# Define mapping
id2label = {0: "Non-Toxic", 1: "Toxic"}
label2id = {"Non-Toxic": 0, "Toxic": 1}

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(
    model_id,
    id2label=id2label,
    label2id=label2id
)

nlp = pipeline("text-classification", model=model, tokenizer=tokenizer)

print(nlp("You are so stupid and annoying!"))  
print(nlp("I really like your work, keep it up!"))