|
|
--- |
|
|
library_name: transformers |
|
|
pipeline_tag: text-classification |
|
|
tags: |
|
|
- hate-speech |
|
|
- arabic |
|
|
- classification |
|
|
- bert |
|
|
- social-media |
|
|
- moderation |
|
|
language: |
|
|
- ar |
|
|
license: mit |
|
|
datasets: |
|
|
- IbrahimAmin/egyptian-arabic-hate-speech |
|
|
metrics: |
|
|
- accuracy |
|
|
- f1 |
|
|
widget: |
|
|
- text: هذا نص عربي للاختبار |
|
|
base_model: |
|
|
- CAMeL-Lab/bert-base-arabic-camelbert-da-sentiment |
|
|
--- |
|
|
|
|
|
# Model Card for hossam87/bert-base-arabic-hate-speech |
|
|
|
|
|
A fine-tuned BERT model to classify Arabic text into: Neutral, Offensive, Sexism, Religious Discrimination, or Racism. |
|
|
|
|
|
--- |
|
|
|
|
|
## Model Details |
|
|
|
|
|
### Model Description |
|
|
|
|
|
This model is based on `bert-base-multilingual-cased` and fine-tuned on an Arabic social media dataset for hate speech detection. |
|
|
It classifies Arabic text into one of five categories: Neutral, Offensive, Sexism, Religious Discrimination, or Racism. |
|
|
Intended uses include moderation, analytics, and academic research. |
|
|
|
|
|
- **Developed by:** [hossam87](https://huggingface.co/hossam87) |
|
|
- **Model type:** Sequence classification (BERT) |
|
|
- **Language(s):** Arabic (ar) |
|
|
- **License:** MIT |
|
|
- **Finetuned from model:** [bert-base-multilingual-cased](https://huggingface.co/bert-base-multilingual-cased) |
|
|
|
|
|
### Model Sources |
|
|
|
|
|
- **Repository:** [https://huggingface.co/hossam87/bert-base-arabic-hate-speech](https://huggingface.co/hossam87/bert-base-arabic-hate-speech) |
|
|
- **Demo:** [https://huggingface.co/spaces/hossam87/arabic-hate-speech-detector](https://huggingface.co/spaces/hossam87/arabic-hate-speech-detector) |
|
|
|
|
|
## Training Details |
|
|
|
|
|
### Training Data |
|
|
|
|
|
The model was fine-tuned on a labeled dataset of Arabic social media posts, manually annotated for the five target categories. |
|
|
|
|
|
### Training Procedure |
|
|
|
|
|
- **Precision:** Mixed precision (`fp16`) |
|
|
- **Epochs:** 4 (best model at epoch 3) |
|
|
- **Batch size:** 32 |
|
|
- **Learning rate:** 3e-5 |
|
|
- **Optimizer:** AdamW |
|
|
- **Hardware:** 2 x NVIDIA T4 GPUs (Kaggle) |
|
|
|
|
|
--- |
|
|
|
|
|
## Evaluation |
|
|
|
|
|
### Metrics |
|
|
|
|
|
| Metric | Score | |
|
|
|----------|:------:| |
|
|
| Accuracy | 0.944 | |
|
|
| F1 Macro | 0.946 | |
|
|
|
|
|
|
|
|
## Uses |
|
|
|
|
|
### Direct Use |
|
|
|
|
|
- Content moderation for Arabic social media, forums, and chats. |
|
|
- Analytics and research into hate speech patterns in Arabic. |
|
|
- Educational and academic projects. |
|
|
|
|
|
### Out-of-Scope Use |
|
|
|
|
|
- Automated moderation without human oversight in sensitive or legal contexts. |
|
|
- Use on languages other than Arabic. |
|
|
- General text classification tasks outside hate speech detection. |
|
|
|
|
|
## Bias, Risks, and Limitations |
|
|
|
|
|
The model may misclassify: |
|
|
- Sarcasm, slang, or context-dependent expressions. |
|
|
- Formal written Arabic, since trained on social media content. |
|
|
- Domain-specific or emerging hate speech not represented in the training data. |
|
|
|
|
|
### Recommendations |
|
|
|
|
|
Always keep a human-in-the-loop for sensitive moderation tasks. Use responsibly and be transparent about automation. |
|
|
|
|
|
## How to Get Started with the Model |
|
|
|
|
|
```python |
|
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline |
|
|
|
|
|
model_id = "hossam87/bert-base-arabic-hate-speech" |
|
|
tokenizer = AutoTokenizer.from_pretrained(model_id) |
|
|
model = AutoModelForSequenceClassification.from_pretrained(model_id) |
|
|
|
|
|
classifier = pipeline("text-classification", model=model, tokenizer=tokenizer) |
|
|
|
|
|
text = "هذا نص عربي للاختبار" |
|
|
result = classifier(text) |
|
|
print(result) |
|
|
``` |
|
|
|
|
|
|
|
|
```bibtex |
|
|
@misc{hossam87_2025_arabichate, |
|
|
title = {BERT-base Arabic Hate Speech Detector}, |
|
|
author = {Hossam87}, |
|
|
year = {2025}, |
|
|
howpublished = {\url{https://huggingface.co/hossam87/bert-base-arabic-hate-speech}}, |
|
|
} |
|
|
``` |