Mridul2003 commited on
Commit
c543a62
·
verified ·
1 Parent(s): 0282f29

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +62 -1
README.md CHANGED
@@ -10,4 +10,65 @@ from transformers import pipeline, AutoTokenizer, AutoModelForSequenceClassifica
10
  device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
11
  identity_model = AutoModelForSequenceClassification.from_pretrained("Mridul2003/identity-hate-detector").to(device)
12
  identity_tokenizer = AutoTokenizer.from_pretrained("Mridul2003/identity-hate-detector")
13
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
  device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
11
  identity_model = AutoModelForSequenceClassification.from_pretrained("Mridul2003/identity-hate-detector").to(device)
12
  identity_tokenizer = AutoTokenizer.from_pretrained("Mridul2003/identity-hate-detector")
13
+ ```
14
+
15
+ # Offensive Language Classifier (Fine-Tuned on Custom Dataset)
16
+
17
+ This repository contains a fine-tuned version of the [`unitary/toxic-bert`](https://huggingface.co/unitary/toxic-bert) model for binary classification of offensive language (labels: `Offensive` vs `Not Offensive`). The model has been specifically fine-tuned on a custom dataset due to limitations observed in the base model's performance — particularly with `identity_hate` related content.
18
+
19
+ ---
20
+
21
+ ## 🔍 Problem with Base Model (`unitary/toxic-bert`)
22
+
23
+ The original `unitary/toxic-bert` model is trained for multi-label toxicity detection with 6 categories:
24
+ - toxic
25
+ - severe toxic
26
+ - obscene
27
+ - threat
28
+ - insult
29
+ - identity_hate
30
+
31
+ While it performs reasonably well on generic toxicity, **it struggles with edge cases involving identity-based hate speech** — often:
32
+ - Misclassifying subtle or sarcastic identity attacks
33
+ - Underestimating offensive content with identity-specific slurs
34
+
35
+ ---
36
+
37
+ ## ✅ Why Fine-Tune?
38
+
39
+ We fine-tuned the model on a custom annotated dataset with two clear labels:
40
+ - `0`: Not Identity Hate
41
+ - `1`: Identity Hate
42
+
43
+ The new model simplifies the task into a **binary classification problem**, allowing more focused training for real-world moderation scenarios.
44
+
45
+ ---
46
+
47
+ ## 📊 Dataset Overview
48
+
49
+ - Total examples: ~4,000+
50
+ - Balanced between offensive and non-offensive labels
51
+ - Contains high proportions of `identity_hate`, `obscene`, `insult`, and more nuanced samples
52
+
53
+ ---
54
+
55
+ ## 🧠 Model Details
56
+
57
+ - **Base model**: [`unitary/toxic-bert`](https://huggingface.co/unitary/toxic-bert)
58
+ - **Fine-tuned using**: Hugging Face 🤗 `Trainer` API
59
+ - **Loss function**: CrossEntropyLoss (via `num_labels=2`)
60
+ - **Batch size**: 8
61
+ - **Epochs**: 3
62
+ - **Learning rate**: 2e-5
63
+
64
+ ---
65
+
66
+ ## 🔬 Performance (Binary Classification)
67
+
68
+ | Metric | Value |
69
+ |----------|---------|
70
+ | Accuracy | ~92% |
71
+ | Precision / Recall | Balanced |
72
+
73
+ ---
74
+