Mridul2003
/

identity-hate-detector

Model card Files Files and versions

Mridul2003 commited on May 30, 2025

Commit

c543a62

·

verified ·

1 Parent(s): 0282f29

Update README.md

Files changed (1) hide show

README.md +62 -1

README.md CHANGED Viewed

@@ -10,4 +10,65 @@ from transformers import pipeline, AutoTokenizer, AutoModelForSequenceClassifica
 device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
 identity_model = AutoModelForSequenceClassification.from_pretrained("Mridul2003/identity-hate-detector").to(device)
 identity_tokenizer = AutoTokenizer.from_pretrained("Mridul2003/identity-hate-detector")
-```

 device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
 identity_model = AutoModelForSequenceClassification.from_pretrained("Mridul2003/identity-hate-detector").to(device)
 identity_tokenizer = AutoTokenizer.from_pretrained("Mridul2003/identity-hate-detector")
+```
+# Offensive Language Classifier (Fine-Tuned on Custom Dataset)
+This repository contains a fine-tuned version of the [`unitary/toxic-bert`](https://huggingface.co/unitary/toxic-bert) model for binary classification of offensive language (labels: `Offensive` vs `Not Offensive`). The model has been specifically fine-tuned on a custom dataset due to limitations observed in the base model's performance — particularly with `identity_hate` related content.
+---
+## 🔍 Problem with Base Model (`unitary/toxic-bert`)
+The original `unitary/toxic-bert` model is trained for multi-label toxicity detection with 6 categories:
+- toxic
+- severe toxic
+- obscene
+- threat
+- insult
+- identity_hate
+While it performs reasonably well on generic toxicity, **it struggles with edge cases involving identity-based hate speech** — often:
+- Misclassifying subtle or sarcastic identity attacks
+- Underestimating offensive content with identity-specific slurs
+---
+## ✅ Why Fine-Tune?
+We fine-tuned the model on a custom annotated dataset with two clear labels:
+- `0`: Not Identity Hate
+- `1`: Identity Hate
+The new model simplifies the task into a **binary classification problem**, allowing more focused training for real-world moderation scenarios.
+---
+## 📊 Dataset Overview
+- Total examples: ~4,000+
+- Balanced between offensive and non-offensive labels
+- Contains high proportions of `identity_hate`, `obscene`, `insult`, and more nuanced samples
+---
+## 🧠 Model Details
+- **Base model**: [`unitary/toxic-bert`](https://huggingface.co/unitary/toxic-bert)
+- **Fine-tuned using**: Hugging Face 🤗 `Trainer` API
+- **Loss function**: CrossEntropyLoss (via `num_labels=2`)
+- **Batch size**: 8
+- **Epochs**: 3
+- **Learning rate**: 2e-5
+---
+## 🔬 Performance (Binary Classification)
+| Metric   | Value   |
+|----------|---------|
+| Accuracy | ~92%    |
+| Precision / Recall | Balanced |
+---