aabolfadl
/

balash-faty-2

Text Generation

hallucination-detection

Model card Files Files and versions

aabolfadl commited on Jan 26

Commit

3f68a6b

·

verified ·

1 Parent(s): fcc2512

Upload README.md with huggingface_hub

Files changed (1) hide show

README.md +168 -0

README.md ADDED Viewed

	@@ -0,0 +1,168 @@

+---
+license: apache-2.0
+language:
+- en
+- ar
+pipeline_tag: text-generation
+tags:
+- rag
+- hallucination-detection
+- evaluation
+- qwen
+- peft
+- lora
+- classification
+---
+# 🧠 Balash Faty — RAG Hallucination Judge (EN/AR)
+This model is a **fine-tuned Qwen2.5-3B-Instruct** model specialized in detecting **hallucinations in Retrieval-Augmented Generation (RAG)** answers in both English and Arabic.
+It acts as an **LLM judge** that determines whether an answer is **fully supported by the retrieved context**.
+---
+## 🎯 Task
+Given:
+- **Context** (retrieved documents)
+- **Question**
+- **Answer** (generated by an LLM)
+The model outputs:
+```
+PASS  → Answer is grounded in the context
+FAIL  → Answer contains hallucinations or unsupported claims
+```
+---
+## 🏗 Base Model
+- **Model:** Qwen/Qwen2.5-3B-Instruct
+- **Fine-tuning:** LoRA → merged into base weights
+- **Languages:** English + Arabic
+- **Training Objective:** Hallucination classification for RAG systems
+---
+## ⚙️ Inference Format
+**Prompt Template:**
+```
+You are a system that detects hallucinations in RAG answers.
+Decide whether the answer is fully supported by the context.
+Reply with only one word: PASS or FAIL.
+[CONTEXT]
+{context}
+[QUESTION]
+{question}
+[ANSWER]
+{answer}
+Judgment:
+````
+---
+## 💻 Example (Python)
+```python
+import requests
+API_URL = "YOUR_HF_ENDPOINT_URL"
+HF_TOKEN = "hf_xxx"
+headers = {
+    "Authorization": f"Bearer {HF_TOKEN}",
+    "Content-Type": "application/json"
+}
+def judge(context, question, answer):
+    prompt = f"""You are a system that detects hallucinations in RAG answers.
+Decide whether the answer is fully supported by the context.
+Reply with only one word: PASS or FAIL.
+[CONTEXT]
+{context}
+[QUESTION]
+{question}
+[ANSWER]
+{answer}
+Judgment:"""
+    payload = {
+        "inputs": prompt,
+        "parameters": {
+            "max_new_tokens": 5,
+            "do_sample": False,
+            "temperature": 0.0
+        }
+    }
+    response = requests.post(API_URL, headers=headers, json=payload)
+    return response.json()[0]["generated_text"]
+````
+---
+## 📊 Training Data
+The model was trained on a labeled dataset of RAG examples from HaluBench:
+| Field    | Description          |
+| -------- | -------------------- |
+| Context  | Retrieved passages   |
+| Question | User query           |
+| Answer   | LLM-generated answer |
+| Label    | PASS / FAIL          |
+The dataset is balanced between grounded and hallucinated answers.
+---
+## 🚀 Intended Use
+✅ Evaluating RAG pipelines
+✅ LLM-as-a-judge research
+✅ Automatic hallucination detection
+✅ Benchmarking grounding quality
+❌ Not for open-ended chat
+❌ Not a knowledge source
+---
+## 🧩 Deployment
+Optimized for **low-latency inference** using Hugging Face **Text Generation Inference (TGI)** endpoints.
+---
+## 👤 Author
+Ahmed Abolfadl
+B.Sc. Computer Science & Engineering — German University in Cairo
+Research focus: ML, AI, Data Science
+---
+## 📅 Model Version
+Uploaded on: 2026-01-26