Spaces:
Sleeping
Sleeping
File size: 4,266 Bytes
b21601a 176a845 eeaced5 154dfe9 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 | ---
title: GAIA Agent with Deep Research
emoji: ๐
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: "5.25.2"
app_file: app.py
pinned: false
hf_oauth: true
hf_oauth_expiration_minutes: 480
license: mit
---
# GAIA Agent with Deep Research
An intelligent agent built with LangGraph for the GAIA Benchmark, featuring multi-source research capabilities.
## ๐ Features
- **Multi-Source Research**: Combines Wikipedia, Web Search, and Academic papers
- **RAG Enhancement**: Uses vector database for few-shot learning
- **Tool Orchestration**: Smart tool selection based on question type
- **Optimized System Prompt**: Guides the agent to choose appropriate tools
## ๐ Quick Start
### 1. Install Dependencies
```bash
pip install -r requirements.txt
```
### 2. Configure API Keys
Create a `.env` file:
```bash
HUGGINGFACEHUB_API_TOKEN=your_hf_token
TAVILY_API_KEY=your_tavily_key
```
### 3. Setup Vector Database
```bash
python setup_chromadb.py
```
### 4. Run the Agent
```bash
python app.py
```
## ๐ Project Structure
```
hf_submission/
โโโ agent.py # Main agent implementation
โโโ deep_research_tool.py # Multi-source research tool
โโโ app.py # Gradio UI
โโโ system_prompt.txt # Optimized system prompt
โโโ setup_chromadb.py # Vector DB initialization
โโโ requirements.txt # Dependencies
โโโ metadata.jsonl # Training data
```
## ๐ง Configuration
### Choosing LLM Provider
Edit `agent.py` line 147:
```python
graph = build_graph(provider="huggingface") # or "groq", "google"
```
### Adjusting Deep Research
Edit `deep_research_tool.py`:
```python
# Change search sources
WikipediaLoader(query=query, load_max_docs=2) # Adjust max_docs
TavilySearchResults(max_results=10) # Adjust max_results
ArxivLoader(query=query, load_max_docs=5) # Adjust max_docs
# Change content length
"content": doc.page_content[:2000] # Adjust truncation
```
## ๐ฏ How It Works
### Agent Architecture
```
START โ Retriever โ Assistant โ Tools โ Assistant โ END
โ โ โ
Vector DB LLM Decision Execute
```
### Tool Selection Strategy
The agent automatically selects tools based on question type:
- **Complex topics** โ `deep_research` (multi-source verification)
- **Simple facts** โ `wiki_search` (quick lookup)
- **Recent events** โ `web_search` (latest info)
- **Academic topics** โ `arxiv_search` (papers)
- **Calculations** โ math tools (multiply, add, etc.)
### Deep Research Process
1. **Multi-Source Search**: Queries Wikipedia, Web, and Arxiv in parallel
2. **Deduplication**: Removes redundant information using content hashing
3. **Formatting**: Creates structured report with source attribution
## ๐ Example Usage
```python
from agent import build_graph
from langchain_core.messages import HumanMessage
# Build agent
graph = build_graph(provider="huggingface")
# Ask question
question = "What is quantum computing and its applications?"
result = graph.invoke({"messages": [HumanMessage(content=question)]})
# Get answer
print(result["messages"][-1].content)
```
## ๐ ๏ธ Advanced Features
### System Prompt
The optimized system prompt (`system_prompt.txt`) guides the agent to:
- Choose appropriate tools for different question types
- Prioritize `deep_research` for complex topics
- Cross-validate important information
- Cite sources when possible
### Vector Database (RAG)
Uses ChromaDB to store past questions and answers:
- Provides few-shot examples to the agent
- Improves accuracy on similar questions
- Automatically populated from `metadata.jsonl`
## ๐ Educational Resources
This project includes comprehensive documentation:
- Architecture deep dive
- Tool calling mechanism explained
- Deep research tutorial
- Integration guide
(See full project repository for detailed docs)
## ๐ License
MIT
## ๐ค Contributing
This is an educational project for the GAIA Benchmark. Feel free to:
- Report issues
- Suggest improvements
- Share your results
## ๐ฎ Contact
For questions or feedback, please open an issue on the repository.
---
**Built with**: LangGraph, LangChain, HuggingFace, ChromaD
|