File size: 5,284 Bytes

3c6e625
cb777d5
 
4c790e4
cc3ed62
3c6e625
cb777d5
 
 
 
 
 
 
3c6e625
cb777d5
e0def9f
 
 
 
3c6e625
 
e0def9f
 
 
cb777d5
e0def9f
cb777d5
 
e0def9f
 
 
cb777d5
 
 
 
 
 
 
 
 
3c6e625
 
 
 
cb777d5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3c6e625
cb777d5
 
3c6e625
cb777d5
 
 
 
 
3c6e625
cb777d5
 
 
 
3c6e625
cb777d5
3c6e625
cb777d5
 
 
 
 
 
3c6e625
cb777d5
3c6e625
cb777d5
 
 
3c6e625
cb777d5
3c6e625
cb777d5
 
 
 
3c6e625
cb777d5
3c6e625
cb777d5
3c6e625
cb777d5
 
 
 
3c6e625
cb777d5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3c6e625
cb777d5

---
language:
- ar
license: apache-2.0
base_model: thejosango/nuha-mlm
tags:
- bert
- text-classification
- hate-speech
- gender-based-violence
- arabic
- binary-classification
- pilot
datasets:
- thejosango/nuha-dataset
metrics:
- f1
- precision
- recall
model-index:
- name: nuha-binary
  results:
  - task:
      type: text-classification
      name: Text Classification
    dataset:
      name: Jordanian NUHA Dataset
      type: thejosango/nuha-dataset
      config: binary
      split: validation
    metrics:
    - type: f1
      value: 0.6879
      name: F1
    - type: precision
      value: 0.6464
      name: Precision
    - type: recall
      value: 0.7351
      name: Recall
---

# nuha-binary

## Model Summary

`nuha-binary` is a binary Arabic text classifier that detects hate speech in Jordanian social media comments. It fine-tunes [`nuha-mlm`](https://huggingface.co/thejosango/nuha-mlm) — a domain-adapted Arabic BERT — and outputs one of two labels:

| Label | Meaning |
|---|---|
| `non-hate-speech` | Not Online Violence |
| `hate-speech` | Offensive Language or Online Gender Based Violence |

This model was developed as part of a **pilot proof-of-concept** for the NUHA project by the [Jordan Open Source Association (JOSA)](https://josa.ngo). Performance metrics reflect the complexity of hate speech detection in colloquial Arabic and the exploratory nature of this initial effort.

For a more granular three-class classifier, see [`nuha-multiclass`](https://huggingface.co/thejosango/nuha-multiclass).

## Uses

### Direct Use

Classifying Arabic social media comments as hate speech or non-hate speech, particularly for Jordanian Arabic content from Facebook and X (Twitter).

```python
from transformers import pipeline

classifier = pipeline(
    "text-classification",
    model="thejosango/nuha-binary",
    tokenizer="thejosango/nuha-binary",
)

result = classifier("أنتِ امرأة رائعة")
print(result)
# [{'label': 'non-hate-speech', 'score': ...}]
```

For batch inference:

```python
comments = ["أنتِ امرأة رائعة", "اخرسي يا غبية"]
results = classifier(comments)
for comment, result in zip(comments, results):
    print(f"{result['label']} ({result['score']:.2f}): {comment}")
```

### Out-of-Scope Use

- **Other Arabic dialects**: The model was trained primarily on Jordanian Arabic. Performance on Egyptian, Gulf, or Modern Standard Arabic is not validated.
- **General hate speech**: The model is specifically calibrated for online gender-based violence in a Jordanian context. It may not generalise to other forms of hate speech or other demographic targets.
- **High-stakes automated decisions**: Given the moderate performance (F1 ≈ 0.69) and pilot nature of this work, the model should not be used as a sole decision-maker in content moderation systems without human review.

## Bias, Risks, and Limitations

- **Pilot annotation quality**: Training labels were produced in an exploratory annotation effort with variable inter-annotator agreement. The model inherits noise from that process.
- **Class imbalance**: The training data contains approximately 59% non-hate-speech and 41% hate-speech examples. Weighted loss was used during training to partially compensate.
- **Colloquial Arabic only**: The aggressive text cleaning (Arabic-only filtering) means the model has never seen URLs, numbers, punctuation, or Latin-script text. Inputs containing these will have them silently ignored by the preprocessing.
- **Moderate performance**: An F1 of 0.69 on the validation set reflects genuine difficulty of the task and the limitations of the pilot training data. Precision (0.65) is lower than recall (0.74), meaning the model tends toward false positives.

## Training Details

### Training Data

Fine-tuned on the `binary` configuration of [`thejosango/nuha-dataset`](https://huggingface.co/datasets/thejosango/nuha-dataset), which maps:
- **Not Online Violence** → `non-hate-speech`
- **Offensive Language** → `hate-speech`
- **Gender Based Violence** → `hate-speech`

### Preprocessing

At training and inference time, the following normalisation is applied to input text (in addition to the dataset-level Arabic-only filtering):

1. URLs replaced with `[رابط]` token
2. @mentions replaced with `[مستخدم]` token
3. Email addresses replaced with `[بريد]` token
4. Numbers removed
5. Punctuation removed
6. Arabic diacritics (harakat) removed
7. Whitespace normalised

### Hyperparameters

| Parameter | Value |
|---|---|
| Base model | thejosango/nuha-mlm |
| Hidden layers | 4 (reduced from base's 12) |
| Classifier dropout | 0.50 |
| Learning rate | 5e-5 |
| LR schedule | Linear |
| Batch size | 64 |
| Epochs | 5 |
| Weight decay | 1e-3 |
| Label smoothing | 0.1 |
| Weighted loss | Yes (balanced class weights) |
| Data augmentation | Yes (contextual word substitution, ratio 0.75) |
| Framework | Transformers 4.32.1, PyTorch 2.0.1 |

### Evaluation Results

Evaluated on the validation split of `thejosango/nuha-dataset` (binary configuration):

| Metric | Value |
|---|---|
| F1 | 0.6879 |
| Precision | 0.6464 |
| Recall | 0.7351 |
| Loss | 0.5743 |

---

*This model was developed as part of an initial pilot study. Results should be interpreted accordingly.*