File size: 3,517 Bytes
32f02fb 0f58005 32f02fb 0f58005 32f02fb 0f58005 32f02fb 0f58005 32f02fb 0f58005 32f02fb 0f58005 32f02fb 0f58005 32f02fb 0f58005 32f02fb 0f58005 32f02fb 0f58005 32f02fb 0f58005 32f02fb 0f58005 32f02fb 0f58005 32f02fb 0f58005 32f02fb 0f58005 32f02fb 0f58005 32f02fb 0f58005 32f02fb 0f58005 32f02fb 0f58005 32f02fb 0f58005 32f02fb 0f58005 32f02fb 0f58005 32f02fb 0f58005 32f02fb 0f58005 32f02fb 0f58005 32f02fb 0f58005 32f02fb 0f58005 d119c21 32f02fb ae367d7 0f58005 ae367d7 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 | ---
library_name: transformers
pipeline_tag: text-classification
tags:
- hate-speech
- arabic
- classification
- bert
- social-media
- moderation
language:
- ar
license: mit
datasets:
- IbrahimAmin/egyptian-arabic-hate-speech
metrics:
- accuracy
- f1
widget:
- text: هذا نص عربي للاختبار
base_model:
- CAMeL-Lab/bert-base-arabic-camelbert-da-sentiment
---
# Model Card for hossam87/bert-base-arabic-hate-speech
A fine-tuned BERT model to classify Arabic text into: Neutral, Offensive, Sexism, Religious Discrimination, or Racism.
---
## Model Details
### Model Description
This model is based on `bert-base-multilingual-cased` and fine-tuned on an Arabic social media dataset for hate speech detection.
It classifies Arabic text into one of five categories: Neutral, Offensive, Sexism, Religious Discrimination, or Racism.
Intended uses include moderation, analytics, and academic research.
- **Developed by:** [hossam87](https://huggingface.co/hossam87)
- **Model type:** Sequence classification (BERT)
- **Language(s):** Arabic (ar)
- **License:** MIT
- **Finetuned from model:** [bert-base-multilingual-cased](https://huggingface.co/bert-base-multilingual-cased)
### Model Sources
- **Repository:** [https://huggingface.co/hossam87/bert-base-arabic-hate-speech](https://huggingface.co/hossam87/bert-base-arabic-hate-speech)
- **Demo:** [https://huggingface.co/spaces/hossam87/arabic-hate-speech-detector](https://huggingface.co/spaces/hossam87/arabic-hate-speech-detector)
## Training Details
### Training Data
The model was fine-tuned on a labeled dataset of Arabic social media posts, manually annotated for the five target categories.
### Training Procedure
- **Precision:** Mixed precision (`fp16`)
- **Epochs:** 4 (best model at epoch 3)
- **Batch size:** 32
- **Learning rate:** 3e-5
- **Optimizer:** AdamW
- **Hardware:** 2 x NVIDIA T4 GPUs (Kaggle)
---
## Evaluation
### Metrics
| Metric | Score |
|----------|:------:|
| Accuracy | 0.944 |
| F1 Macro | 0.946 |
## Uses
### Direct Use
- Content moderation for Arabic social media, forums, and chats.
- Analytics and research into hate speech patterns in Arabic.
- Educational and academic projects.
### Out-of-Scope Use
- Automated moderation without human oversight in sensitive or legal contexts.
- Use on languages other than Arabic.
- General text classification tasks outside hate speech detection.
## Bias, Risks, and Limitations
The model may misclassify:
- Sarcasm, slang, or context-dependent expressions.
- Formal written Arabic, since trained on social media content.
- Domain-specific or emerging hate speech not represented in the training data.
### Recommendations
Always keep a human-in-the-loop for sensitive moderation tasks. Use responsibly and be transparent about automation.
## How to Get Started with the Model
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline
model_id = "hossam87/bert-base-arabic-hate-speech"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(model_id)
classifier = pipeline("text-classification", model=model, tokenizer=tokenizer)
text = "هذا نص عربي للاختبار"
result = classifier(text)
print(result)
```
```bibtex
@misc{hossam87_2025_arabichate,
title = {BERT-base Arabic Hate Speech Detector},
author = {Hossam87},
year = {2025},
howpublished = {\url{https://huggingface.co/hossam87/bert-base-arabic-hate-speech}},
}
``` |