Sawb — Multilingual-E5-Large

Part of the Sawb Arabic Cultural Hallucination Detection Collection for ICAIRE 2026 Track 3.

Overview

Sawb — Multilingual-E5-Large is a binary classifier for detecting cultural hallucinations in Arabic LLM outputs. Fine-tuned from intfloat/multilingual-e5-large (560M parameters) — a cross-lingual embedding model pre-trained on multilingual text pairs. This model serves as a non-Arabic-specific baseline with cross-lingual generalization capabilities.

A cultural hallucination occurs when an LLM produces a factually or culturally incorrect response within Arab/Islamic contexts — misapplying Western legal frameworks, fabricating religious references, or responding in the wrong Arabic dialect.

Model Architecture

Property Value
Base model intfloat/multilingual-e5-large
Architecture BertForSequenceClassification
Parameters 560M
Labels LABEL_1 = hallucination, LABEL_0 = not hallucination
Max sequence length 512 tokens
Input format السؤال: {question}\n\nإجابة النموذج: {answer[:500]}

Training

Hyperparameter Value
Training examples 1,828
Epochs 5
Learning rate 2×10⁻⁵
Batch size 8 per device
Gradient accumulation 4 steps (effective batch: 32)
LR schedule Cosine
Optimizer AdamW
Model selection Best macro F1 on validation set
Framework Hugging Face Transformers

Evaluation Results

Metric Value
Macro F1 (validation, θ=0.30) 0.9467
Task Binary classification (hallucination / not)
Evaluation set 457 Arabic (question, LLM answer) pairs

Despite its cross-lingual (non-Arabic-specific) pre-training, mE5-Large achieves strong performance (F1=0.9467). This is comparable to the Arabic-specific ARBERTv2 (0.9457) and MARBERTv2 (0.9264), but below AraBERT base (0.9599) and AraBERT-Large (0.9788 at optimal threshold).

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

tokenizer = AutoTokenizer.from_pretrained("HassanB4/sawb-multilingual-e5")
model = AutoModelForSequenceClassification.from_pretrained("HassanB4/sawb-multilingual-e5")
model.eval()

question = "كيف تُطبَّق مبادئ أخلاقيات الذكاء الاصطناعي في القضاء الإسلامي؟"
answer = "يجب تطبيق AI Act الأوروبي على المحاكم الإسلامية..."

text = f"السؤال: {question}\n\nإجابة النموذج: {answer[:500]}"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)

with torch.no_grad():
    logits = model(**inputs).logits

prob_hallucination = torch.softmax(logits, dim=-1)[0, 1].item()
is_hallucination = prob_hallucination > 0.30

print(f"Hallucination probability: {prob_hallucination:.3f}")
print(f"Is hallucination: {is_hallucination}")

Dataset

Trained on HassanB4/sawb-arabic-hallucination-dataset.

Collection

Sawb Arabic Cultural Hallucination Detection

Downloads last month
29
Safetensors
Model size
0.6B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for HassanB4/sawb-multilingual-e5

Finetuned
(170)
this model

Dataset used to train HassanB4/sawb-multilingual-e5

Collection including HassanB4/sawb-multilingual-e5