abdoelsayed commited on
Commit
952b3b9
Β·
verified Β·
1 Parent(s): a0a7dfd

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +416 -0
README.md ADDED
@@ -0,0 +1,416 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: mit
5
+ library_name: transformers
6
+ tags:
7
+ - reranking
8
+ - information-retrieval
9
+ - listwise
10
+ - generative
11
+ - llama
12
+ - chain-of-thought
13
+ base_model: meta-llama/Llama-3.1-8B
14
+ datasets:
15
+ - abdoelsayed/DeAR-COT
16
+ pipeline_tag: text-generation
17
+ ---
18
+
19
+ # DeAR-8B-Reranker-Listwise-v1
20
+
21
+ ## Model Description
22
+
23
+ **DeAR-8B-Reranker-Listwise-v1** is an 8B parameter listwise neural reranker that generates document rankings through text generation. Unlike pointwise models that score documents independently, this model considers multiple documents simultaneously and produces rankings with Chain-of-Thought reasoning.
24
+
25
+ ## Model Details
26
+
27
+ - **Model Type:** Listwise Reranker (Causal Language Model)
28
+ - **Base Model:** LLaMA-3.1-8B
29
+ - **Parameters:** 8 billion
30
+ - **Training Method:** Supervised Fine-tuning with Chain-of-Thought
31
+ - **Training Data:** [DeAR-COT Dataset](https://huggingface.co/datasets/abdoelsayed/DeAR-COT)
32
+ - **Training Framework:** LLaMA-Factory
33
+ - **Precision:** BFloat16
34
+
35
+ ## Key Features
36
+
37
+ βœ… **Listwise Ranking:** Considers inter-document dependencies
38
+ βœ… **Chain-of-Thought:** Generates reasoning for ranking decisions
39
+ βœ… **State-of-the-Art:** Best performance on NovelEval (90.97 NDCG@10)
40
+ βœ… **Flexible:** Handles variable numbers of documents
41
+ βœ… **Interpretable:** Provides explanations for rankings
42
+
43
+ ## Performance
44
+
45
+ | Benchmark | NDCG@10 | vs. GPT-4 |
46
+ |-----------|---------|-----------|
47
+ | TREC DL19 | 77.91 | +2.32 |
48
+ | TREC DL20 | 75.63 | +5.07 |
49
+ | NovelEval | **90.97** | **+3.09** |
50
+ | BEIR (Avg) | 46.8 | +2.3 |
51
+
52
+ **Key Achievement:** Outperforms GPT-4 on NovelEval by +3.09 points!
53
+
54
+ ## Usage
55
+
56
+ ### Quick Start
57
+
58
+ ```python
59
+ import torch
60
+ from transformers import AutoTokenizer, AutoModelForCausalLM
61
+
62
+ # Load model
63
+ model_path = "abdoelsayed/dear-8b-reranker-listwise-v1"
64
+ tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast=True)
65
+ model = AutoModelForCausalLM.from_pretrained(
66
+ model_path,
67
+ torch_dtype=torch.bfloat16,
68
+ device_map="auto"
69
+ )
70
+
71
+ if tokenizer.pad_token is None:
72
+ tokenizer.pad_token = tokenizer.eos_token
73
+
74
+ # Prepare input
75
+ query = "When did Thomas Edison invent the light bulb?"
76
+ documents = [
77
+ "Lightning strike at Seoul National University",
78
+ "Thomas Edison tried to invent a device for car but failed",
79
+ "Coffee is good for diet",
80
+ "KEPCO fixes light problems",
81
+ "Thomas Edison invented the light bulb in 1879",
82
+ ]
83
+
84
+ # Create listwise prompt
85
+ doc_list = "\n".join([f"[{i}] {doc}" for i, doc in enumerate(documents)])
86
+ prompt = f"""I will provide you with {len(documents)} passages, each indicated by a number identifier [].
87
+ Rank the passages based on their relevance to the search query: {query}.
88
+
89
+ {doc_list}
90
+
91
+ Search Query: {query}.
92
+ Rank the passages above based on their relevance to the search query. Output the ranking as a list of numbers."""
93
+
94
+ # Generate ranking
95
+ inputs = tokenizer(prompt, return_tensors="pt", truncation=True, max_length=2048)
96
+ inputs = {k: v.to(model.device) for k, v in inputs.items()}
97
+
98
+ with torch.no_grad():
99
+ outputs = model.generate(
100
+ **inputs,
101
+ max_new_tokens=50,
102
+ temperature=0.7,
103
+ do_sample=False,
104
+ pad_token_id=tokenizer.pad_token_id
105
+ )
106
+
107
+ ranking_text = tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True)
108
+ print(f"Ranking: {ranking_text}")
109
+ # Output: [4] > [1] > [0] > [3] > [2]
110
+ ```
111
+
112
+ ### Complete Reranking Pipeline
113
+
114
+ ```python
115
+ import torch
116
+ from typing import List
117
+ from transformers import AutoTokenizer, AutoModelForCausalLM
118
+ import re
119
+
120
+ class ListwiseReranker:
121
+ def __init__(self, model_path: str, device: str = "auto"):
122
+ self.tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast=True)
123
+ self.model = AutoModelForCausalLM.from_pretrained(
124
+ model_path,
125
+ torch_dtype=torch.bfloat16,
126
+ device_map=device,
127
+ low_cpu_mem_usage=True
128
+ )
129
+
130
+ if self.tokenizer.pad_token is None:
131
+ self.tokenizer.pad_token = self.tokenizer.eos_token
132
+
133
+ def create_prompt(self, query: str, documents: List[str], max_doc_len: int = 300) -> str:
134
+ """Create listwise ranking prompt."""
135
+ doc_list = "\n".join([f"[{i}] {doc[:max_doc_len]}" for i, doc in enumerate(documents)])
136
+
137
+ prompt = f"""I will provide you with {len(documents)} passages, each indicated by a number identifier [].
138
+ Rank the passages based on their relevance to the search query: {query}.
139
+
140
+ {doc_list}
141
+
142
+ Search Query: {query}.
143
+ Rank the passages above based on their relevance to the search query. Output the ranking as a list of numbers."""
144
+
145
+ return prompt
146
+
147
+ def parse_ranking(self, output_text: str, num_docs: int) -> List[int]:
148
+ """Parse model output to extract ranking."""
149
+ # Extract numbers from output
150
+ numbers = re.findall(r'\[(\d+)\]', output_text)
151
+ numbers = [int(n) for n in numbers if int(n) < num_docs]
152
+
153
+ # Add missing documents at the end
154
+ ranked = numbers.copy()
155
+ for i in range(num_docs):
156
+ if i not in ranked:
157
+ ranked.append(i)
158
+
159
+ return ranked[:num_docs]
160
+
161
+ def rerank(
162
+ self,
163
+ query: str,
164
+ documents: List[str],
165
+ max_new_tokens: int = 50,
166
+ temperature: float = 0.7
167
+ ) -> List[int]:
168
+ """
169
+ Rerank documents for a query.
170
+
171
+ Args:
172
+ query: Search query
173
+ documents: List of document texts
174
+ max_new_tokens: Max tokens to generate
175
+ temperature: Sampling temperature
176
+
177
+ Returns:
178
+ List of document indices ranked by relevance
179
+ """
180
+ prompt = self.create_prompt(query, documents)
181
+
182
+ inputs = self.tokenizer(
183
+ prompt,
184
+ return_tensors="pt",
185
+ truncation=True,
186
+ max_length=2048
187
+ )
188
+ inputs = {k: v.to(self.model.device) for k, v in inputs.items()}
189
+
190
+ with torch.no_grad():
191
+ outputs = self.model.generate(
192
+ **inputs,
193
+ max_new_tokens=max_new_tokens,
194
+ temperature=temperature,
195
+ do_sample=False,
196
+ pad_token_id=self.tokenizer.pad_token_id
197
+ )
198
+
199
+ output_text = self.tokenizer.decode(
200
+ outputs[0][inputs['input_ids'].shape[1]:],
201
+ skip_special_tokens=True
202
+ )
203
+
204
+ ranking = self.parse_ranking(output_text, len(documents))
205
+ return ranking
206
+
207
+
208
+ # Example usage
209
+ reranker = ListwiseReranker("abdoelsayed/dear-8b-reranker-listwise-v1")
210
+
211
+ query = "What are the health benefits of green tea?"
212
+ documents = [
213
+ "Green tea is a popular beverage in Asian countries.",
214
+ "Studies show green tea contains antioxidants that may reduce inflammation.",
215
+ "Coffee is another caffeinated drink consumed worldwide.",
216
+ "Green tea has been linked to improved brain function and fat loss.",
217
+ "The weather today is sunny and warm.",
218
+ ]
219
+
220
+ ranking = reranker.rerank(query, documents)
221
+ print(f"Ranked indices: {ranking}")
222
+ # Output: [1, 3, 0, 2, 4]
223
+
224
+ # Display ranked documents
225
+ for rank, idx in enumerate(ranking, 1):
226
+ print(f"{rank}. {documents[idx]}")
227
+ ```
228
+
229
+
230
+ ## Training Details
231
+
232
+ ### Training Data
233
+ - **Dataset:** [DeAR-COT](https://huggingface.co/datasets/abdoelsayed/DeAR-COT)
234
+ - **Format:** Instruction-following with ranking outputs
235
+
236
+ ### Training Configuration
237
+ ```yaml
238
+ model_name: meta-llama/Llama-3.1-8B
239
+ task_type: sft
240
+ training_method: listwise_ranking
241
+ framework: LLaMA-Factory
242
+
243
+ hyperparameters:
244
+ learning_rate: 1e-5
245
+ batch_size: 4
246
+ gradient_accumulation: 4
247
+ epochs: 2
248
+ max_length: 2048
249
+ warmup_ratio: 0.1
250
+ weight_decay: 0.01
251
+ optimizer: adamw_torch
252
+ lr_scheduler: cosine
253
+
254
+ distributed:
255
+ method: torch.distributed.run
256
+ num_gpus: 4
257
+ deepspeed: zero2
258
+ ```
259
+
260
+ ### Hardware
261
+ - **GPUs:** 4x NVIDIA A100 (80GB)
262
+ - **Training Time:** ~30 hours
263
+ - **Framework:** LLaMA-Factory with DeepSpeed
264
+ - **Memory Usage:** ~70GB per GPU
265
+
266
+ ### Prompt Format
267
+
268
+ **Training Format:**
269
+ ```
270
+ I will provide you with {N} passages, each indicated by a number identifier [].
271
+ Rank the passages based on their relevance to the search query: {query}.
272
+
273
+ [0] {doc_0}
274
+ [1] {doc_1}
275
+ ...
276
+ [N-1] {doc_N-1}
277
+
278
+ Search Query: {query}.
279
+ Rank the passages above based on their relevance to the search query. Output the ranking as a list of numbers.
280
+
281
+ Answer: [most_relevant] > [second] > ... > [least_relevant]
282
+ ```
283
+
284
+ ## Evaluation Results
285
+
286
+ ### TREC Deep Learning
287
+
288
+ | Method | DL19 (NDCG@10) | DL20 (NDCG@10) | Average |
289
+ |--------|----------------|----------------|---------|
290
+ | BM25 | 50.58 | 47.96 | 49.27 |
291
+ | RankGPT-4 | 75.59 | 70.56 | 73.08 |
292
+ | **DeAR-L-8B** | **77.91** | **75.63** | **76.77** |
293
+
294
+ ### NovelEval-2306 (Novel Query Generalization)
295
+
296
+ | Method | NDCG@1 | NDCG@5 | NDCG@10 | Average |
297
+ |--------|--------|--------|---------|---------|
298
+ | BM25 | 33.33 | 45.96 | 55.77 | 45.02 |
299
+ | RankGPT-4 | 85.71 | 87.49 | 90.45 | 87.88 |
300
+ | **DeAR-L-8B** | **92.86** | **88.04** | **92.01** | **90.97** |
301
+
302
+ πŸ† **+3.09 points better than GPT-4 on NovelEval!**
303
+
304
+ ### BEIR Benchmark
305
+
306
+ | Dataset | NDCG@10 |
307
+ |---------|---------|
308
+ | MS MARCO | 70.2 |
309
+ | NQ | 54.1 |
310
+ | HotpotQA | 64.5 |
311
+ | FiQA | 49.3 |
312
+ | ArguAna | 62.1 |
313
+ | SciFact | 76.2 |
314
+ | TREC-COVID | 88.4 |
315
+ | NFCorpus | 40.6 |
316
+ | **Average** | **46.8** |
317
+
318
+ ### Efficiency Analysis
319
+
320
+ | Metric | Value |
321
+ |--------|-------|
322
+ | Inference Time (20 docs) | 11.16s |
323
+ | Throughput | ~1.8 docs/sec |
324
+ | GPU Memory (inference) | 22GB |
325
+ | Model Size (BF16) | 16GB |
326
+
327
+ **Comparison with Other Methods:**
328
+ - **2.2x faster** than RankGPT-4 (24.5s)
329
+ - **1.9x faster** than RankZephyr (21.6s)
330
+ - Similar performance with much better efficiency
331
+
332
+ ## Advantages over Pointwise Models
333
+
334
+ | Aspect | Pointwise | Listwise (This Model) |
335
+ |--------|-----------|----------------------|
336
+ | Document Interaction | ❌ Independent | βœ… Considers relationships |
337
+ | Reasoning | ❌ None | βœ… Chain-of-Thought |
338
+ | Novel Queries | Good | βœ… **Excellent** (+3-5 NDCG@10) |
339
+ | Interpretability | ❌ Score only | βœ… Reasoning provided |
340
+ | Speed | βœ… Very Fast (2.2s) | Moderate (11.2s) |
341
+
342
+ ## Model Architecture
343
+
344
+ ```
345
+ Input: Listwise Prompt with Query + Multiple Documents
346
+ ↓
347
+ LLaMA-3.1-8B Decoder
348
+ ↓
349
+ Auto-regressive Generation
350
+ ↓
351
+ Output: "[4] > [1] > [0] > [3] > [2]"
352
+ ↓
353
+ Parse to Ranking: [4, 1, 0, 3, 2]
354
+ ```
355
+
356
+ ## When to Use This Model
357
+
358
+ **Best for:**
359
+ - βœ… Novel/complex queries requiring reasoning
360
+ - βœ… Tasks where interpretability matters
361
+ - βœ… Small candidate sets (<100 documents)
362
+ - βœ… Research and analysis applications
363
+
364
+ **Consider pointwise models for:**
365
+ - ❌ Large-scale reranking (1000s of docs)
366
+ - ❌ Real-time, low-latency applications
367
+ - ❌ When reasoning is not needed
368
+
369
+ ## Limitations
370
+
371
+ 1. **Inference Speed:** Slower than pointwise models (~5x)
372
+ 2. **Document Count:** Limited by context length (~20-50 docs optimal)
373
+ 3. **Parsing Errors:** May occasionally generate malformed rankings
374
+ 4. **Cost:** Higher computational cost for generation
375
+ 5. **Language:** English only
376
+
377
+ ## Bias and Ethical Considerations
378
+
379
+ - **Position Bias:** May favor documents in certain positions
380
+ - **Training Data Bias:** Inherits biases from CoT annotations
381
+ - **Reasoning Artifacts:** Generated explanations may contain hallucinations
382
+ - **Fairness:** Should be evaluated for fairness in your domain
383
+
384
+ ## Related Models
385
+
386
+ **DeAR Listwise:**
387
+ - [DeAR-8B-Listwise-LoRA](https://huggingface.co/abdoelsayed/dear-8b-reranker-listwise-lora-v1) - LoRA adapter version
388
+
389
+ **DeAR Pointwise (8B):**
390
+ - [DeAR-8B-RankNet](https://huggingface.co/abdoelsayed/dear-8b-reranker-ranknet-v1)
391
+ - [DeAR-8B-CE](https://huggingface.co/abdoelsayed/dear-8b-reranker-ce-v1)
392
+
393
+ **Resources:**
394
+ - [DeAR-COT Dataset](https://huggingface.co/datasets/abdoelsayed/DeAR-COT)
395
+ - [Teacher Model](https://huggingface.co/abdoelsayed/llama2-13b-rankllama-teacher)
396
+
397
+ ## Citation
398
+
399
+ ```bibtex
400
+ @article{abdallah2025dear,
401
+ title={DeAR: Dual-Stage Document Reranking with Reasoning Agents via LLM Distillation},
402
+ author={Abdallah, Abdelrahman and Mozafari, Jamshid and Piryani, Bhawna and Jatowt, Adam},
403
+ journal={arXiv preprint arXiv:2508.16998},
404
+ year={2025}
405
+ }
406
+ ```
407
+
408
+ ## License
409
+
410
+ MIT License
411
+
412
+ ## More Information
413
+
414
+ - **GitHub:** [DataScienceUIBK/DeAR-Reranking](https://github.com/DataScienceUIBK/DeAR-Reranking)
415
+ - **Paper:** [arXiv:2508.16998](https://arxiv.org/abs/2508.16998)
416
+ - **Collection:** [DeAR Models](https://huggingface.co/collections/abdoelsayed/dear-reranking)