You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Arabic NLI Binary Classifier โ€” s02-camelbert-nli

Binary Arabic text classifier (0 = faithful, 1 = unfaithful). Fine-tuned for Arabic NLI-based binary text classification.

Base model

CAMeL-Lab/bert-base-arabic-camelbert-mix

Input format

nli โ€” [CLS] gold_answer [SEP] model_answer [SEP] (NLI framing: gold_answer as premise, model_answer as hypothesis โ€” closes the gap left by the official qa baseline format, which ignores gold_answer entirely)

Dev results

  • AUC-ROC (official, full dev n=1300): 0.9574
  • AUC-ROC (clean dev, n=800, excludes ~100 questions also seen in train): 0.9272
  • Macro F1 (official, threshold=0.50): 0.9081

Note: the official-dev number is inflated by ~500 dev rows whose questions also appear in the training set (near-memorization). The clean-dev AUC-ROC (0.9272) is the honest generalization estimate and the number to use for ranking against other runs. It improves on s01-camelbert-qa's clean-dev AUC-ROC of 0.8713 by +5.6pp, confirming that NLI framing (using gold_answer as the premise) substantially closes the gap left by the qa format. Both already far exceed the published CAMeLBERT baseline (0.7093 dev AUC-ROC).

Training data

Arabic training set โ€” 4,705 Arabic (question, gold_answer, model_answer) triples, 5 source LLMs, 13 knowledge domains.

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

tokenizer = AutoTokenizer.from_pretrained("HassanB4/s02-camelbert-nli")
model = AutoModelForSequenceClassification.from_pretrained("HassanB4/s02-camelbert-nli")

inputs = tokenizer(gold_answer, model_answer, return_tensors="pt", truncation=True, max_length=512)
with torch.no_grad():
    logits = model(**inputs).logits
score = torch.softmax(logits, dim=-1)[0][1].item()  # unfaithfulness score
predicted_label = int(score > 0.5)
Downloads last month
-
Safetensors
Model size
0.1B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Collection including HassanB4/s02-camelbert-nli