aabolfadl commited on
Commit
3f68a6b
Β·
verified Β·
1 Parent(s): fcc2512

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +168 -0
README.md ADDED
@@ -0,0 +1,168 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ ---
3
+ license: apache-2.0
4
+ language:
5
+ - en
6
+ - ar
7
+ pipeline_tag: text-generation
8
+ tags:
9
+ - rag
10
+ - hallucination-detection
11
+ - evaluation
12
+ - qwen
13
+ - peft
14
+ - lora
15
+ - classification
16
+ ---
17
+
18
+ # 🧠 Balash Faty β€” RAG Hallucination Judge (EN/AR)
19
+
20
+ This model is a **fine-tuned Qwen2.5-3B-Instruct** model specialized in detecting **hallucinations in Retrieval-Augmented Generation (RAG)** answers in both English and Arabic.
21
+
22
+ It acts as an **LLM judge** that determines whether an answer is **fully supported by the retrieved context**.
23
+
24
+ ---
25
+
26
+ ## 🎯 Task
27
+
28
+ Given:
29
+
30
+ - **Context** (retrieved documents)
31
+ - **Question**
32
+ - **Answer** (generated by an LLM)
33
+
34
+ The model outputs:
35
+
36
+ ```
37
+
38
+ PASS β†’ Answer is grounded in the context
39
+ FAIL β†’ Answer contains hallucinations or unsupported claims
40
+
41
+ ```
42
+
43
+ ---
44
+
45
+ ## πŸ— Base Model
46
+
47
+ - **Model:** Qwen/Qwen2.5-3B-Instruct
48
+ - **Fine-tuning:** LoRA β†’ merged into base weights
49
+ - **Languages:** English + Arabic
50
+ - **Training Objective:** Hallucination classification for RAG systems
51
+
52
+ ---
53
+
54
+ ## βš™οΈ Inference Format
55
+
56
+ **Prompt Template:**
57
+
58
+ ```
59
+
60
+ You are a system that detects hallucinations in RAG answers.
61
+
62
+ Decide whether the answer is fully supported by the context.
63
+ Reply with only one word: PASS or FAIL.
64
+
65
+ [CONTEXT]
66
+ {context}
67
+
68
+ [QUESTION]
69
+ {question}
70
+
71
+ [ANSWER]
72
+ {answer}
73
+
74
+ Judgment:
75
+
76
+ ````
77
+
78
+ ---
79
+
80
+ ## πŸ’» Example (Python)
81
+
82
+ ```python
83
+ import requests
84
+
85
+ API_URL = "YOUR_HF_ENDPOINT_URL"
86
+ HF_TOKEN = "hf_xxx"
87
+
88
+ headers = {
89
+ "Authorization": f"Bearer {HF_TOKEN}",
90
+ "Content-Type": "application/json"
91
+ }
92
+
93
+ def judge(context, question, answer):
94
+ prompt = f"""You are a system that detects hallucinations in RAG answers.
95
+
96
+ Decide whether the answer is fully supported by the context.
97
+ Reply with only one word: PASS or FAIL.
98
+
99
+ [CONTEXT]
100
+ {context}
101
+
102
+ [QUESTION]
103
+ {question}
104
+
105
+ [ANSWER]
106
+ {answer}
107
+
108
+ Judgment:"""
109
+
110
+ payload = {
111
+ "inputs": prompt,
112
+ "parameters": {
113
+ "max_new_tokens": 5,
114
+ "do_sample": False,
115
+ "temperature": 0.0
116
+ }
117
+ }
118
+
119
+ response = requests.post(API_URL, headers=headers, json=payload)
120
+ return response.json()[0]["generated_text"]
121
+ ````
122
+
123
+ ---
124
+
125
+ ## πŸ“Š Training Data
126
+
127
+ The model was trained on a labeled dataset of RAG examples from HaluBench:
128
+
129
+ | Field | Description |
130
+ | -------- | -------------------- |
131
+ | Context | Retrieved passages |
132
+ | Question | User query |
133
+ | Answer | LLM-generated answer |
134
+ | Label | PASS / FAIL |
135
+
136
+ The dataset is balanced between grounded and hallucinated answers.
137
+
138
+ ---
139
+
140
+ ## πŸš€ Intended Use
141
+
142
+ βœ… Evaluating RAG pipelines
143
+ βœ… LLM-as-a-judge research
144
+ βœ… Automatic hallucination detection
145
+ βœ… Benchmarking grounding quality
146
+
147
+ ❌ Not for open-ended chat
148
+ ❌ Not a knowledge source
149
+
150
+ ---
151
+
152
+ ## 🧩 Deployment
153
+
154
+ Optimized for **low-latency inference** using Hugging Face **Text Generation Inference (TGI)** endpoints.
155
+
156
+ ---
157
+
158
+ ## πŸ‘€ Author
159
+
160
+ Ahmed Abolfadl
161
+ B.Sc. Computer Science & Engineering β€” German University in Cairo
162
+ Research focus: ML, AI, Data Science
163
+
164
+ ---
165
+
166
+ ## πŸ“… Model Version
167
+
168
+ Uploaded on: 2026-01-26