Instructions to use thejosango/nuha with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use thejosango/nuha with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="thejosango/nuha")# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("thejosango/nuha") model = AutoModelForSequenceClassification.from_pretrained("thejosango/nuha") - Notebooks
- Google Colab
- Kaggle
language:
- ar
license: apache-2.0
base_model: thejosango/nuha-mlm
tags:
- bert
- text-classification
- hate-speech
- gender-based-violence
- arabic
- multiclass-classification
- onnx
- pilot
datasets:
- thejosango/nuha-dataset
metrics:
- f1
- precision
- recall
model-index:
- name: nuha
results:
- task:
type: text-classification
name: Text Classification
dataset:
name: Jordanian NUHA Dataset
type: thejosango/nuha-dataset
config: methodology
split: validation
metrics:
- type: f1
value: 0.5363
name: F1
- type: precision
value: 0.666
name: Precision
- type: recall
value: 0.5188
name: Recall
nuha
Model Summary
nuha is a lightweight, ONNX-optimised Arabic text classifier that categorises Jordanian social media comments into three classes based on the NUHA methodology for online gender-based violence (OGBV). It fine-tunes nuha-mlm — a domain-adapted Arabic BERT — with a reduced 4-layer architecture for efficient CPU inference, and is exported to ONNX. It shares the same classification task and labels as nuha-multiclass but is optimised for production deployment. This is the model powering the NUHA analysis platform.
| Label | Meaning |
|---|---|
Not Online Violence |
Comments that are not hate speech |
Offensive Language |
Hate speech characterised by irony or sarcasm |
Gender Based Violence |
Direct hate speech targeting gender — the primary focus of NUHA |
This model was developed as part of a pilot proof-of-concept for the NUHA project by the Jordan Open Source Association (JOSA).
For the full-depth (12-layer) version of this classifier, see nuha-multiclass.
Uses
Direct Use
from optimum.onnxruntime import ORTModelForSequenceClassification
from transformers import AutoTokenizer, pipeline
model = ORTModelForSequenceClassification.from_pretrained("thejosango/nuha")
tokenizer = AutoTokenizer.from_pretrained("thejosango/nuha")
classifier = pipeline("text-classification", model=model, tokenizer=tokenizer)
result = classifier("اخرسي يا غبية")
print(result)
# [{'label': 'Gender Based Violence', 'score': ...}]
For batch inference:
comments = ["يعطيكم العافية", "أنتِ ساحرة", "اخرسي يا غبية"]
results = classifier(comments)
for comment, result in zip(comments, results):
print(f"{result['label']} ({result['score']:.2f}): {comment}")
Using the PyTorch Version
If you need the full PyTorch model (for fine-tuning or non-ONNX inference), use nuha-multiclass directly.
Out-of-Scope Use
- Other Arabic dialects: The model was trained primarily on Jordanian Arabic. Performance on Egyptian, Gulf, or Modern Standard Arabic is not validated.
- Other hate speech targets: NUHA is calibrated for online gender-based violence. It is not designed to detect hate speech targeting race, religion, or other demographics.
- High-stakes automated decisions: Given the moderate performance (F1 ≈ 0.54) and pilot nature of this work, the model should not be used as the sole decision-maker in content moderation systems without human review.
Preprocessing
At inference time, apply the following normalisation to input text before passing it to the model:
- URLs replaced with
[رابط]token - @mentions replaced with
[مستخدم]token - Email addresses replaced with
[بريد]token - Numbers removed
- Punctuation removed
- Arabic diacritics (harakat) removed
- Whitespace normalised
Evaluation Results
Evaluated on the validation split of thejosango/nuha-dataset (methodology configuration):
| Metric | Value |
|---|---|
| F1 (macro) | 0.5363 |
| Precision | 0.6660 |
| Recall | 0.5188 |
See nuha-multiclass for full training details and evaluation discussion.
This model was developed as part of an initial pilot study. Performance metrics reflect the complexity of the task and the proof-of-concept nature of this system.