Sawb — Primary Detection Model (AraBERT-Large + Glossary Augmentation)
Part of the Sawb Arabic Cultural Hallucination Detection Collection for ICAIRE 2026 Track 3.
Overview
Sawb is the primary detection model of the Sawb pipeline. It is a binary classifier that determines whether an Arabic LLM response contains a cultural hallucination — a factually or culturally incorrect output within Arab and Islamic contexts.
A cultural hallucination occurs when an LLM:
- Applies Western legal frameworks (EU AI Act, GDPR) to Islamic jurisprudence contexts
- Fabricates or misattributes hadith or Islamic scholarly rulings
- Ignores Arab/Gulf institutional contributions to AI (KACST, SDAIA, MBZUAI, Vision 2030)
- Applies Western social norms instead of Gulf or Islamic customs
- Responds in the wrong Arabic dialect when a specific one was requested
- Uses Western examples (e.g., US hiring law) in a Saudi or Gulf-specific context
This model is fine-tuned from aubmindlab/bert-large-arabertv2 (355M parameters) on the Sawb dataset augmented with 1,076 examples synthesized from the ICAIRE AI Glossary (1,188 AI/ML terms with official Arabic definitions). The glossary augmentation teaches the model to distinguish culturally-aligned ICAIRE definitions from generic, Western-centric AI definitions.
Role in the Sawb Pipeline
The full Sawb detect-then-explain pipeline:
- Detection (this model): Classifies each (Arabic question, LLM answer) pair as hallucination or not, with probability score and optimal threshold θ = 0.50
- Explanation: For detected hallucinations, the DeepSeek API (exp16 prompt with dialectal few-shot examples) generates a precise Arabic explanation, citing specific phrases from the LLM's answer
Model Architecture
| Property | Value |
|---|---|
| Base model | aubmindlab/bert-large-arabertv2 |
| Architecture | BertForSequenceClassification |
| Parameters | 355M |
| Labels | LABEL_1 = hallucination, LABEL_0 = not hallucination |
| Max sequence length | 512 tokens |
| Input format | السؤال: {question}\n\nإجابة النموذج: {answer[:500]} |
Training
| Hyperparameter | Value |
|---|---|
| Training examples | 2,904 (1,828 original + 1,076 glossary-synthesized) |
| Epochs | 3 |
| Learning rate | 1×10⁻⁵ |
| Batch size | 8 per device |
| Gradient accumulation | 4 steps (effective batch: 32) |
| LR schedule | Cosine |
| Optimizer | AdamW |
| Model selection | Best macro F1 on validation set |
| Framework | Hugging Face Transformers |
Glossary augmentation: For each of the 1,188 ICAIRE glossary terms, DeepSeek was asked to define the term without the ICAIRE cultural framing. A second judge call scored how well the answer matched the official ICAIRE definition (0–3). Definitions scoring ≤ 1 were labeled as hallucinations. This process generated 1,346 total examples (1,188 correct definitions + 158 wrong), of which 1,076 were used for training and 270 for validation.
Evaluation Results
| Metric | Value |
|---|---|
| Macro F1 (validation, θ=0.50) | 0.9246 |
| Task | Binary classification (hallucination / not) |
| Evaluation set | 457 Arabic (question, LLM answer) pairs |
| Optimal threshold | 0.50 |
Usage
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
tokenizer = AutoTokenizer.from_pretrained("HassanB4/sawb")
model = AutoModelForSequenceClassification.from_pretrained("HassanB4/sawb")
model.eval()
question = "كيف تُطبَّق مبادئ أخلاقيات الذكاء الاصطناعي في القضاء الإسلامي؟"
answer = "يجب تطبيق AI Act الأوروبي على المحاكم الإسلامية لضمان الشفافية والمساءلة..."
text = f"السؤال: {question}\n\nإجابة النموذج: {answer[:500]}"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
with torch.no_grad():
logits = model(**inputs).logits
prob_hallucination = torch.softmax(logits, dim=-1)[0, 1].item()
is_hallucination = prob_hallucination > 0.50 # optimal threshold
print(f"Hallucination probability: {prob_hallucination:.3f}")
print(f"Is hallucination: {is_hallucination}")
Hallucination Categories
| Category | Description |
|---|---|
ethical_framework_mismatch |
Applies EU AI Act / GDPR instead of Maqasid al-Shariah |
religious_misrepresentation |
Fabricated or unverifiable hadith, inaccurate Islamic rulings |
historical_inaccuracy |
Omits Arab AI contributions (KACST, SDAIA, MBZUAI, Vision 2030) |
social_norms_violation |
Applies Western social standards ignoring Gulf/Islamic norms |
dialectal_confusion |
Responds in wrong dialect or refuses the requested dialect |
regional_context_errors |
Uses Western examples in a Saudi/Gulf-specific context |
Dataset
Trained on HassanB4/sawb-arabic-hallucination-dataset, augmented with ICAIRE Glossary synthesis.
Collection
See the full Sawb collection for all models and datasets: Sawb Arabic Cultural Hallucination Detection
- Downloads last month
- 48
Model tree for HassanB4/sawb
Base model
aubmindlab/bert-large-arabertv2