Sawb — Primary Detection Model (AraBERT-Large + Glossary Augmentation)

Part of the Sawb Arabic Cultural Hallucination Detection Collection for ICAIRE 2026 Track 3.

Overview

Sawb is the primary detection model of the Sawb pipeline. It is a binary classifier that determines whether an Arabic LLM response contains a cultural hallucination — a factually or culturally incorrect output within Arab and Islamic contexts.

A cultural hallucination occurs when an LLM:

  • Applies Western legal frameworks (EU AI Act, GDPR) to Islamic jurisprudence contexts
  • Fabricates or misattributes hadith or Islamic scholarly rulings
  • Ignores Arab/Gulf institutional contributions to AI (KACST, SDAIA, MBZUAI, Vision 2030)
  • Applies Western social norms instead of Gulf or Islamic customs
  • Responds in the wrong Arabic dialect when a specific one was requested
  • Uses Western examples (e.g., US hiring law) in a Saudi or Gulf-specific context

This model is fine-tuned from aubmindlab/bert-large-arabertv2 (355M parameters) on the Sawb dataset augmented with 1,076 examples synthesized from the ICAIRE AI Glossary (1,188 AI/ML terms with official Arabic definitions). The glossary augmentation teaches the model to distinguish culturally-aligned ICAIRE definitions from generic, Western-centric AI definitions.

Role in the Sawb Pipeline

The full Sawb detect-then-explain pipeline:

  1. Detection (this model): Classifies each (Arabic question, LLM answer) pair as hallucination or not, with probability score and optimal threshold θ = 0.50
  2. Explanation: For detected hallucinations, the DeepSeek API (exp16 prompt with dialectal few-shot examples) generates a precise Arabic explanation, citing specific phrases from the LLM's answer

Model Architecture

Property Value
Base model aubmindlab/bert-large-arabertv2
Architecture BertForSequenceClassification
Parameters 355M
Labels LABEL_1 = hallucination, LABEL_0 = not hallucination
Max sequence length 512 tokens
Input format السؤال: {question}\n\nإجابة النموذج: {answer[:500]}

Training

Hyperparameter Value
Training examples 2,904 (1,828 original + 1,076 glossary-synthesized)
Epochs 3
Learning rate 1×10⁻⁵
Batch size 8 per device
Gradient accumulation 4 steps (effective batch: 32)
LR schedule Cosine
Optimizer AdamW
Model selection Best macro F1 on validation set
Framework Hugging Face Transformers

Glossary augmentation: For each of the 1,188 ICAIRE glossary terms, DeepSeek was asked to define the term without the ICAIRE cultural framing. A second judge call scored how well the answer matched the official ICAIRE definition (0–3). Definitions scoring ≤ 1 were labeled as hallucinations. This process generated 1,346 total examples (1,188 correct definitions + 158 wrong), of which 1,076 were used for training and 270 for validation.

Evaluation Results

Metric Value
Macro F1 (validation, θ=0.50) 0.9246
Task Binary classification (hallucination / not)
Evaluation set 457 Arabic (question, LLM answer) pairs
Optimal threshold 0.50

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

tokenizer = AutoTokenizer.from_pretrained("HassanB4/sawb")
model = AutoModelForSequenceClassification.from_pretrained("HassanB4/sawb")
model.eval()

question = "كيف تُطبَّق مبادئ أخلاقيات الذكاء الاصطناعي في القضاء الإسلامي؟"
answer = "يجب تطبيق AI Act الأوروبي على المحاكم الإسلامية لضمان الشفافية والمساءلة..."

text = f"السؤال: {question}\n\nإجابة النموذج: {answer[:500]}"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)

with torch.no_grad():
    logits = model(**inputs).logits

prob_hallucination = torch.softmax(logits, dim=-1)[0, 1].item()
is_hallucination = prob_hallucination > 0.50  # optimal threshold

print(f"Hallucination probability: {prob_hallucination:.3f}")
print(f"Is hallucination: {is_hallucination}")

Hallucination Categories

Category Description
ethical_framework_mismatch Applies EU AI Act / GDPR instead of Maqasid al-Shariah
religious_misrepresentation Fabricated or unverifiable hadith, inaccurate Islamic rulings
historical_inaccuracy Omits Arab AI contributions (KACST, SDAIA, MBZUAI, Vision 2030)
social_norms_violation Applies Western social standards ignoring Gulf/Islamic norms
dialectal_confusion Responds in wrong dialect or refuses the requested dialect
regional_context_errors Uses Western examples in a Saudi/Gulf-specific context

Dataset

Trained on HassanB4/sawb-arabic-hallucination-dataset, augmented with ICAIRE Glossary synthesis.

Collection

See the full Sawb collection for all models and datasets: Sawb Arabic Cultural Hallucination Detection

Downloads last month
48
Safetensors
Model size
0.4B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for HassanB4/sawb

Finetuned
(3)
this model

Dataset used to train HassanB4/sawb

Collection including HassanB4/sawb