abdoelsayed
/

dear-8b-reranker-listwise-v1

+---
+language:
+- en
+license: mit
+library_name: transformers
+tags:
+- reranking
+- information-retrieval
+- listwise
+- generative
+- llama
+- chain-of-thought
+base_model: meta-llama/Llama-3.1-8B
+datasets:
+- abdoelsayed/DeAR-COT
+pipeline_tag: text-generation
+---
+# DeAR-8B-Reranker-Listwise-v1
+## Model Description
+**DeAR-8B-Reranker-Listwise-v1** is an 8B parameter listwise neural reranker that generates document rankings through text generation. Unlike pointwise models that score documents independently, this model considers multiple documents simultaneously and produces rankings with Chain-of-Thought reasoning.
+## Model Details
+- **Model Type:** Listwise Reranker (Causal Language Model)
+- **Base Model:** LLaMA-3.1-8B
+- **Parameters:** 8 billion
+- **Training Method:** Supervised Fine-tuning with Chain-of-Thought
+- **Training Data:** [DeAR-COT Dataset](https://huggingface.co/datasets/abdoelsayed/DeAR-COT)
+- **Training Framework:** LLaMA-Factory
+- **Precision:** BFloat16
+## Key Features
+✅ **Listwise Ranking:** Considers inter-document dependencies
+✅ **Chain-of-Thought:** Generates reasoning for ranking decisions
+✅ **State-of-the-Art:** Best performance on NovelEval (90.97 NDCG@10)
+✅ **Flexible:** Handles variable numbers of documents
+✅ **Interpretable:** Provides explanations for rankings
+## Performance
+| Benchmark | NDCG@10 | vs. GPT-4 |
+|-----------|---------|-----------|
+| TREC DL19 | 77.91 | +2.32 |
+| TREC DL20 | 75.63 | +5.07 |
+| NovelEval | **90.97** | **+3.09** |
+| BEIR (Avg) | 46.8 | +2.3 |
+**Key Achievement:** Outperforms GPT-4 on NovelEval by +3.09 points!
+## Usage
+### Quick Start
+```python
+import torch
+from transformers import AutoTokenizer, AutoModelForCausalLM
+# Load model
+model_path = "abdoelsayed/dear-8b-reranker-listwise-v1"
+tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast=True)
+model = AutoModelForCausalLM.from_pretrained(
+    model_path,
+    torch_dtype=torch.bfloat16,
+    device_map="auto"
+)
+if tokenizer.pad_token is None:
+    tokenizer.pad_token = tokenizer.eos_token
+# Prepare input
+query = "When did Thomas Edison invent the light bulb?"
+documents = [
+    "Lightning strike at Seoul National University",
+    "Thomas Edison tried to invent a device for car but failed",
+    "Coffee is good for diet",
+    "KEPCO fixes light problems",
+    "Thomas Edison invented the light bulb in 1879",
+]
+# Create listwise prompt
+doc_list = "\n".join([f"[{i}] {doc}" for i, doc in enumerate(documents)])
+prompt = f"""I will provide you with {len(documents)} passages, each indicated by a number identifier [].
+Rank the passages based on their relevance to the search query: {query}.
+{doc_list}
+Search Query: {query}.
+Rank the passages above based on their relevance to the search query. Output the ranking as a list of numbers."""
+# Generate ranking
+inputs = tokenizer(prompt, return_tensors="pt", truncation=True, max_length=2048)
+inputs = {k: v.to(model.device) for k, v in inputs.items()}
+with torch.no_grad():
+    outputs = model.generate(
+        **inputs,
+        max_new_tokens=50,
+        temperature=0.7,
+        do_sample=False,
+        pad_token_id=tokenizer.pad_token_id
+    )
+ranking_text = tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True)
+print(f"Ranking: {ranking_text}")
+# Output: [4] > [1] > [0] > [3] > [2]
+```
+### Complete Reranking Pipeline
+```python
+import torch
+from typing import List
+from transformers import AutoTokenizer, AutoModelForCausalLM
+import re
+class ListwiseReranker:
+    def __init__(self, model_path: str, device: str = "auto"):
+        self.tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast=True)
+        self.model = AutoModelForCausalLM.from_pretrained(
+            model_path,
+            torch_dtype=torch.bfloat16,
+            device_map=device,
+            low_cpu_mem_usage=True
+        )
+        if self.tokenizer.pad_token is None:
+            self.tokenizer.pad_token = self.tokenizer.eos_token
+    def create_prompt(self, query: str, documents: List[str], max_doc_len: int = 300) -> str:
+        """Create listwise ranking prompt."""
+        doc_list = "\n".join([f"[{i}] {doc[:max_doc_len]}" for i, doc in enumerate(documents)])
+        prompt = f"""I will provide you with {len(documents)} passages, each indicated by a number identifier [].
+Rank the passages based on their relevance to the search query: {query}.
+{doc_list}
+Search Query: {query}.
+Rank the passages above based on their relevance to the search query. Output the ranking as a list of numbers."""
+        return prompt
+    def parse_ranking(self, output_text: str, num_docs: int) -> List[int]:
+        """Parse model output to extract ranking."""
+        # Extract numbers from output
+        numbers = re.findall(r'\[(\d+)\]', output_text)
+        numbers = [int(n) for n in numbers if int(n) < num_docs]
+        # Add missing documents at the end
+        ranked = numbers.copy()
+        for i in range(num_docs):
+            if i not in ranked:
+                ranked.append(i)
+        return ranked[:num_docs]
+    def rerank(
+        self,
+        query: str,
+        documents: List[str],
+        max_new_tokens: int = 50,
+        temperature: float = 0.7
+    ) -> List[int]:
+        """
+        Rerank documents for a query.
+        Args:
+            query: Search query
+            documents: List of document texts
+            max_new_tokens: Max tokens to generate
+            temperature: Sampling temperature
+        Returns:
+            List of document indices ranked by relevance
+        """
+        prompt = self.create_prompt(query, documents)
+        inputs = self.tokenizer(
+            prompt,
+            return_tensors="pt",
+            truncation=True,
+            max_length=2048
+        )
+        inputs = {k: v.to(self.model.device) for k, v in inputs.items()}
+        with torch.no_grad():
+            outputs = self.model.generate(
+                **inputs,
+                max_new_tokens=max_new_tokens,
+                temperature=temperature,
+                do_sample=False,
+                pad_token_id=self.tokenizer.pad_token_id
+            )
+        output_text = self.tokenizer.decode(
+            outputs[0][inputs['input_ids'].shape[1]:],
+            skip_special_tokens=True
+        )
+        ranking = self.parse_ranking(output_text, len(documents))
+        return ranking
+# Example usage
+reranker = ListwiseReranker("abdoelsayed/dear-8b-reranker-listwise-v1")
+query = "What are the health benefits of green tea?"
+documents = [
+    "Green tea is a popular beverage in Asian countries.",
+    "Studies show green tea contains antioxidants that may reduce inflammation.",
+    "Coffee is another caffeinated drink consumed worldwide.",
+    "Green tea has been linked to improved brain function and fat loss.",
+    "The weather today is sunny and warm.",
+]
+ranking = reranker.rerank(query, documents)
+print(f"Ranked indices: {ranking}")
+# Output: [1, 3, 0, 2, 4]
+# Display ranked documents
+for rank, idx in enumerate(ranking, 1):
+    print(f"{rank}. {documents[idx]}")
+```
+## Training Details
+### Training Data
+- **Dataset:** [DeAR-COT](https://huggingface.co/datasets/abdoelsayed/DeAR-COT)
+- **Format:** Instruction-following with ranking outputs
+### Training Configuration
+```yaml
+model_name: meta-llama/Llama-3.1-8B
+task_type: sft
+training_method: listwise_ranking
+framework: LLaMA-Factory
+hyperparameters:
+  learning_rate: 1e-5
+  batch_size: 4
+  gradient_accumulation: 4
+  epochs: 2
+  max_length: 2048
+  warmup_ratio: 0.1
+  weight_decay: 0.01
+  optimizer: adamw_torch
+  lr_scheduler: cosine
+distributed:
+  method: torch.distributed.run
+  num_gpus: 4
+  deepspeed: zero2
+```
+### Hardware
+- **GPUs:** 4x NVIDIA A100 (80GB)
+- **Training Time:** ~30 hours
+- **Framework:** LLaMA-Factory with DeepSpeed
+- **Memory Usage:** ~70GB per GPU
+### Prompt Format
+**Training Format:**
+```
+I will provide you with {N} passages, each indicated by a number identifier [].
+Rank the passages based on their relevance to the search query: {query}.
+[0] {doc_0}
+[1] {doc_1}
+...
+[N-1] {doc_N-1}
+Search Query: {query}.
+Rank the passages above based on their relevance to the search query. Output the ranking as a list of numbers.
+Answer: [most_relevant] > [second] > ... > [least_relevant]
+```
+## Evaluation Results
+### TREC Deep Learning
+| Method | DL19 (NDCG@10) | DL20 (NDCG@10) | Average |
+|--------|----------------|----------------|---------|
+| BM25 | 50.58 | 47.96 | 49.27 |
+| RankGPT-4 | 75.59 | 70.56 | 73.08 |
+| **DeAR-L-8B** | **77.91** | **75.63** | **76.77** |
+### NovelEval-2306 (Novel Query Generalization)
+| Method | NDCG@1 | NDCG@5 | NDCG@10 | Average |
+|--------|--------|--------|---------|---------|
+| BM25 | 33.33 | 45.96 | 55.77 | 45.02 |
+| RankGPT-4 | 85.71 | 87.49 | 90.45 | 87.88 |
+| **DeAR-L-8B** | **92.86** | **88.04** | **92.01** | **90.97** |
+🏆 **+3.09 points better than GPT-4 on NovelEval!**
+### BEIR Benchmark
+| Dataset | NDCG@10 |
+|---------|---------|
+| MS MARCO | 70.2 |
+| NQ | 54.1 |
+| HotpotQA | 64.5 |
+| FiQA | 49.3 |
+| ArguAna | 62.1 |
+| SciFact | 76.2 |
+| TREC-COVID | 88.4 |
+| NFCorpus | 40.6 |
+| **Average** | **46.8** |
+### Efficiency Analysis
+| Metric | Value |
+|--------|-------|
+| Inference Time (20 docs) | 11.16s |
+| Throughput | ~1.8 docs/sec |
+| GPU Memory (inference) | 22GB |
+| Model Size (BF16) | 16GB |
+**Comparison with Other Methods:**
+- **2.2x faster** than RankGPT-4 (24.5s)
+- **1.9x faster** than RankZephyr (21.6s)
+- Similar performance with much better efficiency
+## Advantages over Pointwise Models
+| Aspect | Pointwise | Listwise (This Model) |
+|--------|-----------|----------------------|
+| Document Interaction | ❌ Independent | ✅ Considers relationships |
+| Reasoning | ❌ None | ✅ Chain-of-Thought |
+| Novel Queries | Good | ✅ **Excellent** (+3-5 NDCG@10) |
+| Interpretability | ❌ Score only | ✅ Reasoning provided |
+| Speed | ✅ Very Fast (2.2s) | Moderate (11.2s) |
+## Model Architecture
+```
+Input: Listwise Prompt with Query + Multiple Documents
+    ↓
+LLaMA-3.1-8B Decoder
+    ↓
+Auto-regressive Generation
+    ↓
+Output: "[4] > [1] > [0] > [3] > [2]"
+    ↓
+Parse to Ranking: [4, 1, 0, 3, 2]
+```
+## When to Use This Model
+**Best for:**
+- ✅ Novel/complex queries requiring reasoning
+- ✅ Tasks where interpretability matters
+- ✅ Small candidate sets (<100 documents)
+- ✅ Research and analysis applications
+**Consider pointwise models for:**
+- ❌ Large-scale reranking (1000s of docs)
+- ❌ Real-time, low-latency applications
+- ❌ When reasoning is not needed
+## Limitations
+1. **Inference Speed:** Slower than pointwise models (~5x)
+2. **Document Count:** Limited by context length (~20-50 docs optimal)
+3. **Parsing Errors:** May occasionally generate malformed rankings
+4. **Cost:** Higher computational cost for generation
+5. **Language:** English only
+## Bias and Ethical Considerations
+- **Position Bias:** May favor documents in certain positions
+- **Training Data Bias:** Inherits biases from CoT annotations
+- **Reasoning Artifacts:** Generated explanations may contain hallucinations
+- **Fairness:** Should be evaluated for fairness in your domain
+## Related Models
+**DeAR Listwise:**
+- [DeAR-8B-Listwise-LoRA](https://huggingface.co/abdoelsayed/dear-8b-reranker-listwise-lora-v1) - LoRA adapter version
+**DeAR Pointwise (8B):**
+- [DeAR-8B-RankNet](https://huggingface.co/abdoelsayed/dear-8b-reranker-ranknet-v1)
+- [DeAR-8B-CE](https://huggingface.co/abdoelsayed/dear-8b-reranker-ce-v1)
+**Resources:**
+- [DeAR-COT Dataset](https://huggingface.co/datasets/abdoelsayed/DeAR-COT)
+- [Teacher Model](https://huggingface.co/abdoelsayed/llama2-13b-rankllama-teacher)
+## Citation
+```bibtex
+@article{abdallah2025dear,
+  title={DeAR: Dual-Stage Document Reranking with Reasoning Agents via LLM Distillation},
+  author={Abdallah, Abdelrahman and Mozafari, Jamshid and Piryani, Bhawna and Jatowt, Adam},
+  journal={arXiv preprint arXiv:2508.16998},
+  year={2025}
+}
+```
+## License
+MIT License
+## More Information
+- **GitHub:** [DataScienceUIBK/DeAR-Reranking](https://github.com/DataScienceUIBK/DeAR-Reranking)
+- **Paper:** [arXiv:2508.16998](https://arxiv.org/abs/2508.16998)
+- **Collection:** [DeAR Models](https://huggingface.co/collections/abdoelsayed/dear-reranking)