Sawb — Primary Detection Model (AraBERT-Large + Glossary Augmentation)

Part of the Sawb Arabic Cultural Hallucination Detection Collection for ICAIRE 2026 Track 3.

Overview

Sawb is the primary detection model of the Sawb pipeline. It is a binary classifier that determines whether an Arabic LLM response contains a cultural hallucination — a factually or culturally incorrect output within Arab and Islamic contexts.

A cultural hallucination occurs when an LLM:

Applies Western legal frameworks (EU AI Act, GDPR) to Islamic jurisprudence contexts
Fabricates or misattributes hadith or Islamic scholarly rulings
Ignores Arab/Gulf institutional contributions to AI (KACST, SDAIA, MBZUAI, Vision 2030)
Applies Western social norms instead of Gulf or Islamic customs
Responds in the wrong Arabic dialect when a specific one was requested
Uses Western examples (e.g., US hiring law) in a Saudi or Gulf-specific context

This model is fine-tuned from aubmindlab/bert-large-arabertv2 (355M parameters) on the Sawb dataset augmented with 1,076 examples synthesized from the ICAIRE AI Glossary (1,188 AI/ML terms with official Arabic definitions). The glossary augmentation teaches the model to distinguish culturally-aligned ICAIRE definitions from generic, Western-centric AI definitions.

Role in the Sawb Pipeline

The full Sawb detect-then-explain pipeline:

Detection (this model): Classifies each (Arabic question, LLM answer) pair as hallucination or not, with probability score and optimal threshold θ = 0.50
Explanation: For detected hallucinations, the DeepSeek API (exp16 prompt with dialectal few-shot examples) generates a precise Arabic explanation, citing specific phrases from the LLM's answer

Model Architecture

Property	Value
Base model	`aubmindlab/bert-large-arabertv2`
Architecture	`BertForSequenceClassification`
Parameters	355M
Labels	`LABEL_1` = hallucination, `LABEL_0` = not hallucination
Max sequence length	512 tokens
Input format	`السؤال: {question}\n\nإجابة النموذج: {answer[:500]}`

Training

Hyperparameter	Value
Training examples	2,904 (1,828 original + 1,076 glossary-synthesized)
Epochs	3
Learning rate	1×10⁻⁵
Batch size	8 per device
Gradient accumulation	4 steps (effective batch: 32)
LR schedule	Cosine
Optimizer	AdamW
Model selection	Best macro F1 on validation set
Framework	Hugging Face Transformers

Glossary augmentation: For each of the 1,188 ICAIRE glossary terms, DeepSeek was asked to define the term without the ICAIRE cultural framing. A second judge call scored how well the answer matched the official ICAIRE definition (0–3). Definitions scoring ≤ 1 were labeled as hallucinations. This process generated 1,346 total examples (1,188 correct definitions + 158 wrong), of which 1,076 were used for training and 270 for validation.

Evaluation Results

Metric	Value
Macro F1 (validation, θ=0.50)	0.9246
Task	Binary classification (hallucination / not)
Evaluation set	457 Arabic (question, LLM answer) pairs
Optimal threshold	0.50

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

tokenizer = AutoTokenizer.from_pretrained("HassanB4/sawb")
model = AutoModelForSequenceClassification.from_pretrained("HassanB4/sawb")
model.eval()

question = "كيف تُطبَّق مبادئ أخلاقيات الذكاء الاصطناعي في القضاء الإسلامي؟"
answer = "يجب تطبيق AI Act الأوروبي على المحاكم الإسلامية لضمان الشفافية والمساءلة..."

text = f"السؤال: {question}\n\nإجابة النموذج: {answer[:500]}"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)

with torch.no_grad():
    logits = model(**inputs).logits

prob_hallucination = torch.softmax(logits, dim=-1)[0, 1].item()
is_hallucination = prob_hallucination > 0.50  # optimal threshold

print(f"Hallucination probability: {prob_hallucination:.3f}")
print(f"Is hallucination: {is_hallucination}")

Hallucination Categories

Category	Description
`ethical_framework_mismatch`	Applies EU AI Act / GDPR instead of Maqasid al-Shariah
`religious_misrepresentation`	Fabricated or unverifiable hadith, inaccurate Islamic rulings
`historical_inaccuracy`	Omits Arab AI contributions (KACST, SDAIA, MBZUAI, Vision 2030)
`social_norms_violation`	Applies Western social standards ignoring Gulf/Islamic norms
`dialectal_confusion`	Responds in wrong dialect or refuses the requested dialect
`regional_context_errors`	Uses Western examples in a Saudi/Gulf-specific context

Dataset

Trained on HassanB4/sawb-arabic-hallucination-dataset, augmented with ICAIRE Glossary synthesis.

Collection

See the full Sawb collection for all models and datasets: Sawb Arabic Cultural Hallucination Detection

Downloads last month: 48

Safetensors

Model size

0.4B params

Tensor type

F32

Model tree for HassanB4/sawb

Base model

aubmindlab/bert-large-arabertv2

Finetuned

(3)

this model

Dataset used to train HassanB4/sawb

Collection including HassanB4/sawb

Sawb: Arabic Cultural Hallucination Detection

Collection

9 models + 1 dataset for detecting cultural hallucinations in Arabic LLM outputs. ICAIRE 2026 Track 3. • 9 items • Updated 4 days ago