File size: 3,517 Bytes
32f02fb
 
0f58005
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
32f02fb
 
0f58005
32f02fb
0f58005
32f02fb
0f58005
32f02fb
 
 
 
 
0f58005
 
 
32f02fb
0f58005
 
 
 
 
32f02fb
0f58005
32f02fb
0f58005
 
32f02fb
 
 
 
 
0f58005
32f02fb
 
 
0f58005
 
 
 
 
 
32f02fb
0f58005
32f02fb
 
 
0f58005
32f02fb
0f58005
 
 
 
32f02fb
 
0f58005
32f02fb
0f58005
32f02fb
0f58005
 
 
32f02fb
0f58005
32f02fb
0f58005
 
 
32f02fb
0f58005
32f02fb
0f58005
 
 
 
32f02fb
0f58005
32f02fb
0f58005
32f02fb
0f58005
32f02fb
0f58005
 
32f02fb
0f58005
 
 
32f02fb
0f58005
32f02fb
0f58005
 
d119c21
 
32f02fb
 
ae367d7
0f58005
 
 
 
 
ae367d7
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
---
library_name: transformers
pipeline_tag: text-classification
tags:
- hate-speech
- arabic
- classification
- bert
- social-media
- moderation
language:
- ar
license: mit
datasets:
- IbrahimAmin/egyptian-arabic-hate-speech
metrics:
- accuracy
- f1
widget:
- text: هذا نص عربي للاختبار
base_model:
- CAMeL-Lab/bert-base-arabic-camelbert-da-sentiment
---

# Model Card for hossam87/bert-base-arabic-hate-speech

A fine-tuned BERT model to classify Arabic text into: Neutral, Offensive, Sexism, Religious Discrimination, or Racism.

---

## Model Details

### Model Description

This model is based on `bert-base-multilingual-cased` and fine-tuned on an Arabic social media dataset for hate speech detection.  
It classifies Arabic text into one of five categories: Neutral, Offensive, Sexism, Religious Discrimination, or Racism.  
Intended uses include moderation, analytics, and academic research.

- **Developed by:** [hossam87](https://huggingface.co/hossam87)
- **Model type:** Sequence classification (BERT)
- **Language(s):** Arabic (ar)
- **License:** MIT
- **Finetuned from model:** [bert-base-multilingual-cased](https://huggingface.co/bert-base-multilingual-cased)

### Model Sources

- **Repository:** [https://huggingface.co/hossam87/bert-base-arabic-hate-speech](https://huggingface.co/hossam87/bert-base-arabic-hate-speech)
- **Demo:** [https://huggingface.co/spaces/hossam87/arabic-hate-speech-detector](https://huggingface.co/spaces/hossam87/arabic-hate-speech-detector)

## Training Details

### Training Data

The model was fine-tuned on a labeled dataset of Arabic social media posts, manually annotated for the five target categories.

### Training Procedure

- **Precision:** Mixed precision (`fp16`)
- **Epochs:** 4 (best model at epoch 3)
- **Batch size:** 32
- **Learning rate:** 3e-5
- **Optimizer:** AdamW
- **Hardware:** 2 x NVIDIA T4 GPUs (Kaggle)

---

## Evaluation

### Metrics

| Metric   | Score  |
|----------|:------:|
| Accuracy | 0.944  |
| F1 Macro | 0.946  |


## Uses

### Direct Use

- Content moderation for Arabic social media, forums, and chats.
- Analytics and research into hate speech patterns in Arabic.
- Educational and academic projects.

### Out-of-Scope Use

- Automated moderation without human oversight in sensitive or legal contexts.
- Use on languages other than Arabic.
- General text classification tasks outside hate speech detection.

## Bias, Risks, and Limitations

The model may misclassify:
- Sarcasm, slang, or context-dependent expressions.
- Formal written Arabic, since trained on social media content.
- Domain-specific or emerging hate speech not represented in the training data.

### Recommendations

Always keep a human-in-the-loop for sensitive moderation tasks. Use responsibly and be transparent about automation.

## How to Get Started with the Model

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline

model_id = "hossam87/bert-base-arabic-hate-speech"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(model_id)

classifier = pipeline("text-classification", model=model, tokenizer=tokenizer)

text = "هذا نص عربي للاختبار"
result = classifier(text)
print(result)
```


```bibtex
@misc{hossam87_2025_arabichate,
  title = {BERT-base Arabic Hate Speech Detector},
  author = {Hossam87},
  year = {2025},
  howpublished = {\url{https://huggingface.co/hossam87/bert-base-arabic-hate-speech}},
}
```