Spaces:
Sleeping
Sleeping
A newer version of the Gradio SDK is available:
6.2.0
metadata
title: GAIA Agent with Deep Research
emoji: ๐
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.25.2
app_file: app.py
pinned: false
hf_oauth: true
hf_oauth_expiration_minutes: 480
license: mit
GAIA Agent with Deep Research
An intelligent agent built with LangGraph for the GAIA Benchmark, featuring multi-source research capabilities.
๐ Features
- Multi-Source Research: Combines Wikipedia, Web Search, and Academic papers
- RAG Enhancement: Uses vector database for few-shot learning
- Tool Orchestration: Smart tool selection based on question type
- Optimized System Prompt: Guides the agent to choose appropriate tools
๐ Quick Start
1. Install Dependencies
pip install -r requirements.txt
2. Configure API Keys
Create a .env file:
HUGGINGFACEHUB_API_TOKEN=your_hf_token
TAVILY_API_KEY=your_tavily_key
3. Setup Vector Database
python setup_chromadb.py
4. Run the Agent
python app.py
๐ Project Structure
hf_submission/
โโโ agent.py # Main agent implementation
โโโ deep_research_tool.py # Multi-source research tool
โโโ app.py # Gradio UI
โโโ system_prompt.txt # Optimized system prompt
โโโ setup_chromadb.py # Vector DB initialization
โโโ requirements.txt # Dependencies
โโโ metadata.jsonl # Training data
๐ง Configuration
Choosing LLM Provider
Edit agent.py line 147:
graph = build_graph(provider="huggingface") # or "groq", "google"
Adjusting Deep Research
Edit deep_research_tool.py:
# Change search sources
WikipediaLoader(query=query, load_max_docs=2) # Adjust max_docs
TavilySearchResults(max_results=10) # Adjust max_results
ArxivLoader(query=query, load_max_docs=5) # Adjust max_docs
# Change content length
"content": doc.page_content[:2000] # Adjust truncation
๐ฏ How It Works
Agent Architecture
START โ Retriever โ Assistant โ Tools โ Assistant โ END
โ โ โ
Vector DB LLM Decision Execute
Tool Selection Strategy
The agent automatically selects tools based on question type:
- Complex topics โ
deep_research(multi-source verification) - Simple facts โ
wiki_search(quick lookup) - Recent events โ
web_search(latest info) - Academic topics โ
arxiv_search(papers) - Calculations โ math tools (multiply, add, etc.)
Deep Research Process
- Multi-Source Search: Queries Wikipedia, Web, and Arxiv in parallel
- Deduplication: Removes redundant information using content hashing
- Formatting: Creates structured report with source attribution
๐ Example Usage
from agent import build_graph
from langchain_core.messages import HumanMessage
# Build agent
graph = build_graph(provider="huggingface")
# Ask question
question = "What is quantum computing and its applications?"
result = graph.invoke({"messages": [HumanMessage(content=question)]})
# Get answer
print(result["messages"][-1].content)
๐ ๏ธ Advanced Features
System Prompt
The optimized system prompt (system_prompt.txt) guides the agent to:
- Choose appropriate tools for different question types
- Prioritize
deep_researchfor complex topics - Cross-validate important information
- Cite sources when possible
Vector Database (RAG)
Uses ChromaDB to store past questions and answers:
- Provides few-shot examples to the agent
- Improves accuracy on similar questions
- Automatically populated from
metadata.jsonl
๐ Educational Resources
This project includes comprehensive documentation:
- Architecture deep dive
- Tool calling mechanism explained
- Deep research tutorial
- Integration guide
(See full project repository for detailed docs)
๐ License
MIT
๐ค Contributing
This is an educational project for the GAIA Benchmark. Feel free to:
- Report issues
- Suggest improvements
- Share your results
๐ฎ Contact
For questions or feedback, please open an issue on the repository.
Built with: LangGraph, LangChain, HuggingFace, ChromaD