Spaces:
Sleeping
Sleeping
yash bhaskar
commited on
Commit
·
d72cf5d
1
Parent(s):
5fa6e3c
updating README.md
Browse files
README.md
CHANGED
|
@@ -1,132 +1,10 @@
|
|
| 1 |
-
# Multi-Agent Open-Domain QnA with Cross-Source Reranking
|
| 2 |
-
|
| 3 |
-
---
|
| 4 |
-
|
| 5 |
-
## **Introduction**
|
| 6 |
-
|
| 7 |
-
The objective of this project is to develop a multi-agent open-domain question-answering (ODQA) system capable of retrieving and synthesizing information from diverse sources. These sources include web searches, large language models (LLMs) such as **Llama 3**, and vision models for multi-modal retrieval. Leveraging datasets like **KILT**, **Natural Questions**, **HotspotQA**, **TriviaQA**, and **ELI5**, the system incorporates a **cross-source reranking model** to improve the selection of the most accurate answers. This project emphasizes scalability and reliability by addressing both context-free and context-based scenarios, even when confronted with an increasing volume of irrelevant documents.
|
| 8 |
-
|
| 9 |
-
---
|
| 10 |
-
|
| 11 |
-
## **Project Overview**
|
| 12 |
-
|
| 13 |
-
- **Pipeline Development**: Created a multi-agent ODQA pipeline integrating specialized retrieval agents.
|
| 14 |
-
- **Source Diversity**: Utilized web searches, LLMs, and vision models for retrieving information.
|
| 15 |
-
- **Cross-Source Reranking**: Applied methods such as Reciprocal Rank Fusion (RRF) to enhance answer accuracy.
|
| 16 |
-
- **Scalability Evaluation**: Tested the system on datasets with varying ratios of relevant and irrelevant documents.
|
| 17 |
-
|
| 18 |
---
|
| 19 |
-
|
| 20 |
-
|
| 21 |
-
|
| 22 |
-
|
| 23 |
-
|
| 24 |
-
|
| 25 |
-
A condensed version of the Wikipedia dump was created by selecting a subset of documents relevant to the validation sets of **Natural Questions**, **HotspotQA**, **TriviaQA**, and **ELI5**.
|
| 26 |
-
|
| 27 |
-
- **Document Ratio Variants**:
|
| 28 |
-
To evaluate retrieval scalability, multiple datasets with different relevant-to-irrelevant document ratios were constructed:
|
| 29 |
-
- **1:0**: Contains only 1000 relevant documents for 1000 queries.
|
| 30 |
-
- **1:1**: Contains 1000 relevant documents and 1000 irrelevant documents.
|
| 31 |
-
- **1:2**: Contains 1000 relevant documents and 2000 irrelevant documents.
|
| 32 |
-
|
| 33 |
-
---
|
| 34 |
-
|
| 35 |
-
## **Retrieval Models**
|
| 36 |
-
|
| 37 |
-
To ensure robust and efficient retrieval, the project combined sparse and dense methods:
|
| 38 |
-
|
| 39 |
-
### **Sparse Retrieval Models**
|
| 40 |
-
|
| 41 |
-
1. **TF-IDF**:
|
| 42 |
-
Measures the importance of terms in a document relative to the entire dataset.
|
| 43 |
-
- Effective for small datasets.
|
| 44 |
-
- Serves as a lightweight and interpretable baseline.
|
| 45 |
-
|
| 46 |
-
2. **BM25**:
|
| 47 |
-
- Extends TF-IDF with term frequency normalization and length penalization.
|
| 48 |
-
- Handles query-document term overlap better than TF-IDF.
|
| 49 |
-
|
| 50 |
-
3. **Bag of Words (BOW)**:
|
| 51 |
-
- A simple vector-space model using term frequency vectors.
|
| 52 |
-
- Acts as a baseline for comparison with more advanced methods.
|
| 53 |
-
|
| 54 |
-
---
|
| 55 |
-
|
| 56 |
-
### **Dense Retrieval Models**
|
| 57 |
-
|
| 58 |
-
1. **Text Embeddings (all-MiniLM-L6-v2)**:
|
| 59 |
-
- A pre-trained sentence-transformer for generating compact, high-quality embeddings.
|
| 60 |
-
- Captures semantic relationships between queries and documents.
|
| 61 |
-
- Lightweight and suitable for large-scale datasets.
|
| 62 |
-
|
| 63 |
-
2. **Vision Embeddings (ViT)**:
|
| 64 |
-
- Generates embeddings for image-based data, enabling multi-modal information retrieval.
|
| 65 |
-
- Complements text-based retrieval for answering questions requiring visual context.
|
| 66 |
-
|
| 67 |
---
|
| 68 |
|
| 69 |
-
|
| 70 |
-
|
| 71 |
-
### **Query Modification Agent**
|
| 72 |
-
Refines user queries to optimize them for retrieval, ensuring that they are better suited for identifying relevant documents.
|
| 73 |
-
|
| 74 |
-
### **Keyword Extraction Agent**
|
| 75 |
-
Extracts key terms from the query and passes them to a **Wiki Agent**, which uses n-grams to retrieve relevant Wikipedia pages.
|
| 76 |
-
|
| 77 |
-
### **Llama 3 Agent**
|
| 78 |
-
Synthesizes context directly related to the user query, enriching the system’s ability to answer complex questions.
|
| 79 |
-
|
| 80 |
-
---
|
| 81 |
-
|
| 82 |
-
## **Post-Retrieval Process**
|
| 83 |
-
|
| 84 |
-
1. **Top-Ranked Document as Context**
|
| 85 |
-
The highest-ranked document was used directly as context for QnA tasks.
|
| 86 |
-
|
| 87 |
-
2. **Iterative Use of Ranked Documents**
|
| 88 |
-
Explored answers using documents ranked in descending order of relevance.
|
| 89 |
-
|
| 90 |
-
3. **Rank Fusion (RRF)**
|
| 91 |
-
Combined rankings from multiple retrieval methods (e.g., BM25, TF-IDF, MiniLM) to improve robustness and accuracy.
|
| 92 |
-
|
| 93 |
-
---
|
| 94 |
-
|
| 95 |
-
## **Results and Evaluation**
|
| 96 |
-
|
| 97 |
-
### **Retrieval Model Scores**
|
| 98 |
-
|
| 99 |
-
| **Method** | **Query Type** | **Ranking Scores** |
|
| 100 |
-
|--------------------|----------------|---------------------|
|
| 101 |
-
| **BOW** | Modified | 13.82 - 33.39 |
|
| 102 |
-
| **BM25** | Modified | 736.74 - 785.09 |
|
| 103 |
-
| **TF-IDF** | Modified | 730.61 - 788.87 |
|
| 104 |
-
| **Vision** | Modified | 0.03 - 5.08 |
|
| 105 |
-
| **MiniLM (Open)** | Modified | 827.92 - 849.79 |
|
| 106 |
-
|
| 107 |
-
### **Question Answering Model Scores**
|
| 108 |
-
|
| 109 |
-
- **ROUGE Score**: Demonstrated improvements with RRF across most datasets.
|
| 110 |
-
- **Cosine Similarity Score**: Highlighted semantic alignment in dense methods.
|
| 111 |
-
- **BERT F1 Score**: Dense embeddings outperformed sparse methods.
|
| 112 |
-
|
| 113 |
-
---
|
| 114 |
-
|
| 115 |
-
## **Analysis**
|
| 116 |
-
|
| 117 |
-
1. **Sparse Models**:
|
| 118 |
-
- Sparse methods like **BM25** and **TF-IDF** performed well on context-free datasets but struggled with context-based tasks.
|
| 119 |
-
- **BOW** and **Vision** models were ineffective, worsening LLM performance compared to zero-shot baselines.
|
| 120 |
-
|
| 121 |
-
2. **Dense Models**:
|
| 122 |
-
- Dense retrieval methods showed significant improvements in relevance and answer accuracy, especially when combining results with RRF.
|
| 123 |
-
|
| 124 |
-
3. **Cross-Source Reranking**:
|
| 125 |
-
- RRF combining **BM25**, **TF-IDF**, and **MiniLM** yielded the best results.
|
| 126 |
-
- Using LLMs as rerankers was less reliable, with a bias toward zero-shot outputs.
|
| 127 |
-
|
| 128 |
-
---
|
| 129 |
-
|
| 130 |
-
## **Conclusion**
|
| 131 |
-
|
| 132 |
-
The multi-agent ODQA system successfully integrates sparse and dense retrieval methods, leveraging RRF for cross-source reranking. Dense methods and generative agents like **Llama 3** significantly enhance the system’s capability in open-domain settings. Future work can focus on improving multi-modal integration and reducing biases in LLM-based reranking.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
+
title: MultiAgent-QnA-ChatBot
|
| 3 |
+
emoji: 🏢
|
| 4 |
+
colorFrom: indigo
|
| 5 |
+
colorTo: green
|
| 6 |
+
sdk: gradio
|
| 7 |
+
pinned: false
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 8 |
---
|
| 9 |
|
| 10 |
+
Edit this `README.md` markdown file to author your organization card.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|