Updated agent
Browse files- README.md +264 -13
- app.py +632 -143
- requirements.txt +157 -2
- retriever.py +526 -0
- tools.py +656 -0
README.md
CHANGED
|
@@ -1,15 +1,266 @@
|
|
| 1 |
-
|
| 2 |
-
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
|
| 10 |
-
|
| 11 |
-
|
| 12 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 13 |
---
|
| 14 |
|
| 15 |
-
|
|
|
|
| 1 |
+
# 🎯 GAIA Benchmark Agent - Course Final Project
|
| 2 |
+
|
| 3 |
+
A comprehensive AI agent that demonstrates course learning while achieving 30%+ score on GAIA benchmark to earn your course certificate.
|
| 4 |
+
|
| 5 |
+
## 🌟 What This Agent Demonstrates
|
| 6 |
+
|
| 7 |
+
This project combines all major concepts from the course:
|
| 8 |
+
|
| 9 |
+
### 📚 **Course Learning Applied**
|
| 10 |
+
- **🔧 Tools Integration**: Multiple tool types working together
|
| 11 |
+
- **📖 RAG Implementation**: Persona database with 5K diverse individuals using vector embeddings
|
| 12 |
+
- **🤖 Agent Workflows**: LlamaIndex agent orchestration
|
| 13 |
+
- **🧠 LLM Integration**: Fallback options for accessibility
|
| 14 |
+
- **📁 Modular Architecture**: Clean separation of concerns
|
| 15 |
+
|
| 16 |
+
### 🎯 **GAIA Benchmark Optimized**
|
| 17 |
+
- **🔍 Web Search**: For current information and facts
|
| 18 |
+
- **🧮 Calculator**: For mathematical accuracy (critical for GAIA)
|
| 19 |
+
- **📊 File Analysis**: For data processing questions
|
| 20 |
+
- **💬 Conversational**: Natural language interaction
|
| 21 |
+
|
| 22 |
+
## 🗂️ Project Structure
|
| 23 |
+
|
| 24 |
+
```
|
| 25 |
+
your-space/
|
| 26 |
+
├── app.py # Main application with Gradio interface
|
| 27 |
+
├── tools.py # All agent tools (web search, calculator, etc.)
|
| 28 |
+
├── retriever.py # RAG implementation with guest database
|
| 29 |
+
├── requirements.txt # Python dependencies
|
| 30 |
+
└── README.md # This file
|
| 31 |
+
```
|
| 32 |
+
|
| 33 |
+
### 📁 **File Explanations**
|
| 34 |
+
|
| 35 |
+
**`app.py`** - Main Application
|
| 36 |
+
- Gradio interface for GAIA evaluation
|
| 37 |
+
- Agent initialization with error handling
|
| 38 |
+
- Question processing and answer submission
|
| 39 |
+
- Results display and certificate status
|
| 40 |
+
|
| 41 |
+
**`tools.py`** - Agent Tools
|
| 42 |
+
- **Web Search Tool**: DuckDuckGo integration for current info
|
| 43 |
+
- **Calculator Tool**: Safe mathematical expression evaluation
|
| 44 |
+
- **File Analysis Tool**: Process CSV, text, and data files
|
| 45 |
+
- All tools have detailed documentation and error handling
|
| 46 |
+
|
| 47 |
+
**`retriever.py`** - Advanced RAG System
|
| 48 |
+
- Persona database with 5K diverse individuals from HuggingFace
|
| 49 |
+
- Vector embeddings with ChromaDB for semantic search
|
| 50 |
+
- IngestionPipeline for document processing
|
| 51 |
+
- Demonstrates state-of-the-art RAG concepts
|
| 52 |
+
|
| 53 |
+
## 🚀 Quick Setup Guide
|
| 54 |
+
|
| 55 |
+
### 1. **Clone or Duplicate This Space**
|
| 56 |
+
```bash
|
| 57 |
+
# If cloning locally
|
| 58 |
+
git clone https://huggingface.co/spaces/your-username/your-space
|
| 59 |
+
cd your-space
|
| 60 |
+
|
| 61 |
+
# Or duplicate this space to your HF account
|
| 62 |
+
```
|
| 63 |
+
|
| 64 |
+
### 2. **Set API Keys** ⚡ **CRITICAL STEP**
|
| 65 |
+
|
| 66 |
+
In your HuggingFace Space:
|
| 67 |
+
1. Go to **Settings** → **Repository secrets**
|
| 68 |
+
2. Add **at least one** of these:
|
| 69 |
+
|
| 70 |
+
**Option A: OpenAI (Recommended)**
|
| 71 |
+
- Name: `OPENAI_API_KEY`
|
| 72 |
+
- Value: `sk-...` (your OpenAI API key)
|
| 73 |
+
- **Why**: Better performance on GAIA benchmark
|
| 74 |
+
|
| 75 |
+
**Option B: HuggingFace (Free Alternative)**
|
| 76 |
+
- Name: `HF_TOKEN`
|
| 77 |
+
- Value: `hf_...` (your HF token)
|
| 78 |
+
- **Why**: Free alternative, works without OpenAI credits
|
| 79 |
+
|
| 80 |
+
**Get API Keys:**
|
| 81 |
+
- **OpenAI**: https://platform.openai.com/api-keys
|
| 82 |
+
- **HuggingFace**: https://huggingface.co/settings/tokens
|
| 83 |
+
|
| 84 |
+
### 3. **Ensure Public Space**
|
| 85 |
+
- Your space must be **public** for leaderboard verification
|
| 86 |
+
- Go to Settings → Change from Private to Public
|
| 87 |
+
|
| 88 |
+
### 4. **Run Evaluation**
|
| 89 |
+
1. Click the HuggingFace login button
|
| 90 |
+
2. Click "Run GAIA Evaluation & Submit Results"
|
| 91 |
+
3. Wait 5-10 minutes for completion
|
| 92 |
+
4. Check your score - need 30%+ to pass! 🏆
|
| 93 |
+
|
| 94 |
+
## 🔧 Why Each Tool Matters for GAIA
|
| 95 |
+
|
| 96 |
+
### 🌐 **Web Search Tool**
|
| 97 |
+
```python
|
| 98 |
+
# Example GAIA questions this helps with:
|
| 99 |
+
"Who is the current president of France?"
|
| 100 |
+
"What was Tesla's stock price yesterday?"
|
| 101 |
+
"Recent developments in AI research"
|
| 102 |
+
```
|
| 103 |
+
**Why needed**: GAIA questions often require current information beyond LLM training data.
|
| 104 |
+
|
| 105 |
+
### 🧮 **Calculator Tool**
|
| 106 |
+
```python
|
| 107 |
+
# Example GAIA questions this helps with:
|
| 108 |
+
"What is 15% of 847?"
|
| 109 |
+
"Calculate the area of a circle with radius 23.7m"
|
| 110 |
+
"If I invest $5000 at 3.2% annual interest for 7 years..."
|
| 111 |
+
```
|
| 112 |
+
**Why needed**: LLMs can make arithmetic errors. GAIA requires exact numerical accuracy.
|
| 113 |
+
|
| 114 |
+
### 📊 **File Analysis Tool**
|
| 115 |
+
```python
|
| 116 |
+
# Example GAIA questions this helps with:
|
| 117 |
+
"Analyze this CSV file and tell me the average..."
|
| 118 |
+
"What is the most common value in column 3?"
|
| 119 |
+
"Process this data file and extract..."
|
| 120 |
+
```
|
| 121 |
+
**Why needed**: Some GAIA questions include file attachments requiring analysis.
|
| 122 |
+
|
| 123 |
+
### 📚 **Persona RAG Tool**
|
| 124 |
+
```python
|
| 125 |
+
# Example questions this demonstrates:
|
| 126 |
+
"Find writers and authors"
|
| 127 |
+
"Who are the scientists?"
|
| 128 |
+
"People interested in travel"
|
| 129 |
+
"Creative professionals at the event"
|
| 130 |
+
```
|
| 131 |
+
**Why included**: Demonstrates advanced RAG with 5K real personas, vector embeddings, and semantic search.
|
| 132 |
+
|
| 133 |
+
## 📖 Course Concepts Demonstrated
|
| 134 |
+
|
| 135 |
+
### 🔧 **Components** (From Course Unit 2)
|
| 136 |
+
- **LLM Integration**: OpenAI + HuggingFace fallback
|
| 137 |
+
- **Document Processing**: Text chunking and metadata
|
| 138 |
+
- **Response Synthesis**: Clean answer formatting
|
| 139 |
+
|
| 140 |
+
### 🛠️ **Tools** (From Course Unit 3)
|
| 141 |
+
- **FunctionTool Creation**: Multiple tool types
|
| 142 |
+
- **Tool Descriptions**: Proper LLM guidance
|
| 143 |
+
- **Error Handling**: Graceful tool failures
|
| 144 |
+
|
| 145 |
+
### 🤖 **Agents** (From Course Unit 4)
|
| 146 |
+
- **AgentWorkflow**: Multi-tool orchestration
|
| 147 |
+
- **System Prompts**: GAIA-optimized instructions
|
| 148 |
+
- **Async Processing**: Efficient question handling
|
| 149 |
+
|
| 150 |
+
### 📖 **RAG Implementation** (From Course Unit 5)
|
| 151 |
+
- **Dataset Integration**: 5K personas from HuggingFace
|
| 152 |
+
- **Vector Embeddings**: Semantic search with BAAI/bge-small-en-v1.5
|
| 153 |
+
- **ChromaDB Storage**: Persistent vector database
|
| 154 |
+
- **Ingestion Pipeline**: Document processing and chunking
|
| 155 |
+
|
| 156 |
+
### 🏗️ **Workflows** (From Course Unit 6)
|
| 157 |
+
- **Event-Driven**: Tool selection and execution
|
| 158 |
+
- **State Management**: Context preservation
|
| 159 |
+
- **Error Recovery**: Robust failure handling
|
| 160 |
+
|
| 161 |
+
## 🎓 Why This Approach Works for GAIA
|
| 162 |
+
|
| 163 |
+
### ✅ **Accuracy First**
|
| 164 |
+
- Calculator prevents math errors
|
| 165 |
+
- Web search provides current facts
|
| 166 |
+
- Low temperature LLM settings for consistency
|
| 167 |
+
|
| 168 |
+
### ✅ **Comprehensive Coverage**
|
| 169 |
+
- Factual questions → Web search
|
| 170 |
+
- Mathematical questions → Calculator
|
| 171 |
+
- Data questions → File analysis
|
| 172 |
+
- Knowledge questions → RAG system
|
| 173 |
+
|
| 174 |
+
### ✅ **Robust Error Handling**
|
| 175 |
+
- Graceful API failures
|
| 176 |
+
- Tool availability checking
|
| 177 |
+
- Fallback responses
|
| 178 |
+
|
| 179 |
+
### ✅ **GAIA-Specific Optimizations**
|
| 180 |
+
- Direct, concise answers
|
| 181 |
+
- Exact match optimization
|
| 182 |
+
- Minimal extra text
|
| 183 |
+
|
| 184 |
+
## 🔧 Troubleshooting
|
| 185 |
+
|
| 186 |
+
### ❌ **"No LLM available" Error**
|
| 187 |
+
**Problem**: No API keys set
|
| 188 |
+
**Solution**: Add `OPENAI_API_KEY` or `HF_TOKEN` to Space secrets
|
| 189 |
+
|
| 190 |
+
### ❌ **Import Errors**
|
| 191 |
+
**Problem**: Dependencies not installed
|
| 192 |
+
**Solution**: Check requirements.txt is in root directory, restart Space
|
| 193 |
+
|
| 194 |
+
### ❌ **Low GAIA Score**
|
| 195 |
+
**Problem**: Agent giving wrong answers
|
| 196 |
+
**Solutions**:
|
| 197 |
+
- Check API key is working (OpenAI generally performs better)
|
| 198 |
+
- Review agent logs for tool usage
|
| 199 |
+
- Ensure web search and calculator are working
|
| 200 |
+
|
| 201 |
+
### ❌ **"Could not submit" Error**
|
| 202 |
+
**Problem**: Network or authentication issue
|
| 203 |
+
**Solution**:
|
| 204 |
+
- Ensure logged in to HuggingFace
|
| 205 |
+
- Check space is public
|
| 206 |
+
- Try again (temporary network issues)
|
| 207 |
+
|
| 208 |
+
### ❌ **Tools Not Working**
|
| 209 |
+
**Problem**: Missing dependencies or API issues
|
| 210 |
+
**Solution**: Check Space logs, verify all packages installed
|
| 211 |
+
|
| 212 |
+
## 📊 Expected Performance
|
| 213 |
+
|
| 214 |
+
### 🎯 **Target Scores**
|
| 215 |
+
- **Minimum for Certificate**: 30%
|
| 216 |
+
- **Good Performance**: 40-50%
|
| 217 |
+
- **Excellent Performance**: 60%+
|
| 218 |
+
|
| 219 |
+
### 📈 **Performance Factors**
|
| 220 |
+
- **API Choice**: OpenAI typically scores higher than HuggingFace
|
| 221 |
+
- **Tool Usage**: Questions requiring tools score better when tools work
|
| 222 |
+
- **Answer Format**: Direct answers score better than verbose responses
|
| 223 |
+
|
| 224 |
+
## 🚀 Getting Better Scores
|
| 225 |
+
|
| 226 |
+
### 💡 **Optimization Tips**
|
| 227 |
+
1. **Use OpenAI**: Generally more accurate than HuggingFace for GAIA
|
| 228 |
+
2. **Check Tool Functionality**: Test web search and calculator work
|
| 229 |
+
3. **Review Failed Questions**: Look at specific errors in results table
|
| 230 |
+
4. **Adjust System Prompt**: Fine-tune for your specific weak areas
|
| 231 |
+
|
| 232 |
+
### 🔄 **Iterative Improvement**
|
| 233 |
+
1. Run evaluation and check results
|
| 234 |
+
2. Identify patterns in failed questions
|
| 235 |
+
3. Adjust tools or prompts accordingly
|
| 236 |
+
4. Re-run evaluation
|
| 237 |
+
|
| 238 |
+
## 🏆 Certificate Achievement
|
| 239 |
+
|
| 240 |
+
**To earn your course certificate:**
|
| 241 |
+
1. ✅ Score 30% or higher on GAIA evaluation
|
| 242 |
+
2. ✅ Keep your space public for verification
|
| 243 |
+
3. ✅ Submit through the official interface
|
| 244 |
+
|
| 245 |
+
**When you pass:**
|
| 246 |
+
- You'll see "✅ PASSED - Certificate Earned!" in results
|
| 247 |
+
- Your score will appear on the student leaderboard
|
| 248 |
+
- You can download your official certificate
|
| 249 |
+
|
| 250 |
+
## 🤝 Getting Help
|
| 251 |
+
|
| 252 |
+
**If you're stuck:**
|
| 253 |
+
1. Check the troubleshooting section above
|
| 254 |
+
2. Review Space logs for specific errors
|
| 255 |
+
3. Test individual components (tools.py, retriever.py)
|
| 256 |
+
4. Ask in the course Discord for community help
|
| 257 |
+
|
| 258 |
+
## 🎉 Good Luck!
|
| 259 |
+
|
| 260 |
+
This agent represents everything you've learned in the course. The modular design makes it easy to understand, debug, and improve. Focus on getting those API keys set up correctly, and you'll be well on your way to earning your certificate!
|
| 261 |
+
|
| 262 |
+
**Remember**: The goal isn't just to pass the benchmark, but to demonstrate your understanding of modern AI agent development. This codebase serves as a portfolio piece showing your skills in RAG, tool integration, and agent orchestration.
|
| 263 |
+
|
| 264 |
---
|
| 265 |
|
| 266 |
+
*Built with ❤️ using LlamaIndex and course concepts*
|
app.py
CHANGED
|
@@ -1,196 +1,685 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
import os
|
| 2 |
import gradio as gr
|
| 3 |
import requests
|
| 4 |
-
import inspect
|
| 5 |
import pandas as pd
|
|
|
|
|
|
|
|
|
|
| 6 |
|
| 7 |
-
#
|
| 8 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 9 |
DEFAULT_API_URL = "https://agents-course-unit4-scoring.hf.space"
|
| 10 |
|
| 11 |
-
#
|
| 12 |
-
|
| 13 |
-
|
| 14 |
-
|
| 15 |
-
|
| 16 |
-
|
| 17 |
-
print(f"Agent received question (first 50 chars): {question[:50]}...")
|
| 18 |
-
fixed_answer = "This is a default answer."
|
| 19 |
-
print(f"Agent returning fixed answer: {fixed_answer}")
|
| 20 |
-
return fixed_answer
|
| 21 |
|
| 22 |
-
def
|
| 23 |
"""
|
| 24 |
-
|
| 25 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 26 |
"""
|
| 27 |
-
|
| 28 |
-
|
| 29 |
-
|
| 30 |
-
|
| 31 |
-
|
| 32 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 33 |
else:
|
| 34 |
-
|
| 35 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 36 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 37 |
api_url = DEFAULT_API_URL
|
| 38 |
questions_url = f"{api_url}/questions"
|
| 39 |
submit_url = f"{api_url}/submit"
|
| 40 |
-
|
| 41 |
-
#
|
|
|
|
| 42 |
try:
|
| 43 |
-
agent =
|
|
|
|
| 44 |
except Exception as e:
|
| 45 |
-
|
| 46 |
-
|
| 47 |
-
|
| 48 |
-
|
| 49 |
-
|
| 50 |
-
|
| 51 |
-
# 2. Fetch Questions
|
| 52 |
-
print(f"Fetching questions from: {questions_url}")
|
| 53 |
try:
|
| 54 |
response = requests.get(questions_url, timeout=15)
|
| 55 |
response.raise_for_status()
|
| 56 |
questions_data = response.json()
|
|
|
|
| 57 |
if not questions_data:
|
| 58 |
-
|
| 59 |
-
|
| 60 |
-
|
|
|
|
| 61 |
except requests.exceptions.RequestException as e:
|
| 62 |
-
|
| 63 |
-
|
| 64 |
-
|
| 65 |
-
print(f"Error decoding JSON response from questions endpoint: {e}")
|
| 66 |
-
print(f"Response text: {response.text[:500]}")
|
| 67 |
-
return f"Error decoding server response for questions: {e}", None
|
| 68 |
except Exception as e:
|
| 69 |
-
|
| 70 |
-
|
| 71 |
-
|
| 72 |
-
|
|
|
|
|
|
|
| 73 |
results_log = []
|
| 74 |
answers_payload = []
|
| 75 |
-
|
| 76 |
-
for item in questions_data:
|
| 77 |
task_id = item.get("task_id")
|
| 78 |
question_text = item.get("question")
|
|
|
|
| 79 |
if not task_id or question_text is None:
|
| 80 |
-
|
| 81 |
continue
|
|
|
|
|
|
|
|
|
|
| 82 |
try:
|
|
|
|
| 83 |
submitted_answer = agent(question_text)
|
| 84 |
-
|
| 85 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 86 |
except Exception as e:
|
| 87 |
-
|
| 88 |
-
|
| 89 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 90 |
if not answers_payload:
|
| 91 |
-
|
| 92 |
-
|
| 93 |
-
|
| 94 |
-
|
| 95 |
-
submission_data = {
|
| 96 |
-
|
| 97 |
-
|
| 98 |
-
|
| 99 |
-
|
| 100 |
-
|
| 101 |
try:
|
| 102 |
response = requests.post(submit_url, json=submission_data, timeout=60)
|
| 103 |
response.raise_for_status()
|
| 104 |
result_data = response.json()
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 105 |
final_status = (
|
| 106 |
-
f"
|
| 107 |
-
f"User: {
|
| 108 |
-
f"
|
| 109 |
-
f"
|
| 110 |
-
f"
|
|
|
|
| 111 |
)
|
| 112 |
-
|
| 113 |
-
|
| 114 |
-
return final_status,
|
| 115 |
-
|
| 116 |
-
error_detail = f"Server responded with status {e.response.status_code}."
|
| 117 |
-
try:
|
| 118 |
-
error_json = e.response.json()
|
| 119 |
-
error_detail += f" Detail: {error_json.get('detail', e.response.text)}"
|
| 120 |
-
except requests.exceptions.JSONDecodeError:
|
| 121 |
-
error_detail += f" Response: {e.response.text[:500]}"
|
| 122 |
-
status_message = f"Submission Failed: {error_detail}"
|
| 123 |
-
print(status_message)
|
| 124 |
-
results_df = pd.DataFrame(results_log)
|
| 125 |
-
return status_message, results_df
|
| 126 |
-
except requests.exceptions.Timeout:
|
| 127 |
-
status_message = "Submission Failed: The request timed out."
|
| 128 |
-
print(status_message)
|
| 129 |
-
results_df = pd.DataFrame(results_log)
|
| 130 |
-
return status_message, results_df
|
| 131 |
except requests.exceptions.RequestException as e:
|
| 132 |
-
|
| 133 |
-
|
| 134 |
-
|
| 135 |
-
return status_message, results_df
|
| 136 |
except Exception as e:
|
| 137 |
-
|
| 138 |
-
|
| 139 |
-
|
| 140 |
-
return status_message, results_df
|
| 141 |
-
|
| 142 |
-
|
| 143 |
-
# --- Build Gradio Interface using Blocks ---
|
| 144 |
-
with gr.Blocks() as demo:
|
| 145 |
-
gr.Markdown("# Basic Agent Evaluation Runner")
|
| 146 |
-
gr.Markdown(
|
| 147 |
-
"""
|
| 148 |
-
**Instructions:**
|
| 149 |
|
| 150 |
-
1. Please clone this space, then modify the code to define your agent's logic, the tools, the necessary packages, etc ...
|
| 151 |
-
2. Log in to your Hugging Face account using the button below. This uses your HF username for submission.
|
| 152 |
-
3. Click 'Run Evaluation & Submit All Answers' to fetch questions, run your agent, submit answers, and see the score.
|
| 153 |
|
| 154 |
-
|
| 155 |
-
|
| 156 |
-
|
| 157 |
-
This space provides a basic setup and is intentionally sub-optimal to encourage you to develop your own, more robust solution. For instance for the delay process of the submit button, a solution could be to cache the answers and submit in a seperate action or even to answer the questions in async.
|
| 158 |
-
"""
|
| 159 |
-
)
|
| 160 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 161 |
gr.LoginButton()
|
| 162 |
-
|
| 163 |
-
|
| 164 |
-
|
| 165 |
-
|
| 166 |
-
|
| 167 |
-
|
| 168 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 169 |
run_button.click(
|
| 170 |
fn=run_and_submit_all,
|
| 171 |
outputs=[status_output, results_table]
|
| 172 |
)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 173 |
|
| 174 |
-
if __name__ == "__main__":
|
| 175 |
-
print("\n" + "-"*30 + " App Starting " + "-"*30)
|
| 176 |
-
# Check for SPACE_HOST and SPACE_ID at startup for information
|
| 177 |
-
space_host_startup = os.getenv("SPACE_HOST")
|
| 178 |
-
space_id_startup = os.getenv("SPACE_ID") # Get SPACE_ID at startup
|
| 179 |
-
|
| 180 |
-
if space_host_startup:
|
| 181 |
-
print(f"✅ SPACE_HOST found: {space_host_startup}")
|
| 182 |
-
print(f" Runtime URL should be: https://{space_host_startup}.hf.space")
|
| 183 |
-
else:
|
| 184 |
-
print("ℹ️ SPACE_HOST environment variable not found (running locally?).")
|
| 185 |
-
|
| 186 |
-
if space_id_startup: # Print repo URLs if SPACE_ID is found
|
| 187 |
-
print(f"✅ SPACE_ID found: {space_id_startup}")
|
| 188 |
-
print(f" Repo URL: https://huggingface.co/spaces/{space_id_startup}")
|
| 189 |
-
print(f" Repo Tree URL: https://huggingface.co/spaces/{space_id_startup}/tree/main")
|
| 190 |
-
else:
|
| 191 |
-
print("ℹ️ SPACE_ID environment variable not found (running locally?). Repo URL cannot be determined.")
|
| 192 |
|
| 193 |
-
|
|
|
|
|
|
|
| 194 |
|
| 195 |
-
|
| 196 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
app.py - GAIA Benchmark Agent Application
|
| 3 |
+
|
| 4 |
+
This is the main application file that brings together:
|
| 5 |
+
1. Tools from tools.py (web search, calculator, file analysis)
|
| 6 |
+
2. RAG system from retriever.py (guest database)
|
| 7 |
+
3. LLM integration with fallback options
|
| 8 |
+
4. Agent workflow for handling GAIA questions
|
| 9 |
+
5. Gradio interface for submission to the GAIA benchmark
|
| 10 |
+
|
| 11 |
+
The goal is to achieve 30%+ score on GAIA benchmark questions to earn the course certificate.
|
| 12 |
+
|
| 13 |
+
How it works:
|
| 14 |
+
1. User logs in with HuggingFace account
|
| 15 |
+
2. System fetches GAIA questions from the evaluation API
|
| 16 |
+
3. Our agent processes each question using its tools
|
| 17 |
+
4. Answers are submitted and scored
|
| 18 |
+
5. Results are displayed with pass/fail status
|
| 19 |
+
|
| 20 |
+
Key design decisions:
|
| 21 |
+
- Modular architecture: tools and retriever in separate files
|
| 22 |
+
- Robust error handling: graceful failures with logging
|
| 23 |
+
- API key flexibility: OpenAI (best) or HuggingFace (fallback)
|
| 24 |
+
- GAIA-optimized: focused on accuracy over speed
|
| 25 |
+
"""
|
| 26 |
+
|
| 27 |
import os
|
| 28 |
import gradio as gr
|
| 29 |
import requests
|
|
|
|
| 30 |
import pandas as pd
|
| 31 |
+
import asyncio
|
| 32 |
+
import logging
|
| 33 |
+
from typing import List, Dict, Any, Optional
|
| 34 |
|
| 35 |
+
# Setup comprehensive logging
|
| 36 |
+
logging.basicConfig(
|
| 37 |
+
level=logging.INFO,
|
| 38 |
+
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
|
| 39 |
+
)
|
| 40 |
+
logger = logging.getLogger(__name__)
|
| 41 |
+
|
| 42 |
+
# ============================================================================
|
| 43 |
+
# CONSTANTS AND CONFIGURATION
|
| 44 |
+
# ============================================================================
|
| 45 |
+
|
| 46 |
+
# GAIA evaluation API endpoint
|
| 47 |
DEFAULT_API_URL = "https://agents-course-unit4-scoring.hf.space"
|
| 48 |
|
| 49 |
+
# Required score to pass the course
|
| 50 |
+
PASSING_SCORE = 30 # 30% minimum to earn certificate
|
| 51 |
+
|
| 52 |
+
# ============================================================================
|
| 53 |
+
# LLM SETUP WITH FALLBACK OPTIONS
|
| 54 |
+
# ============================================================================
|
|
|
|
|
|
|
|
|
|
|
|
|
| 55 |
|
| 56 |
+
def create_llm():
|
| 57 |
"""
|
| 58 |
+
Create an LLM (Large Language Model) with fallback options.
|
| 59 |
+
|
| 60 |
+
Priority order:
|
| 61 |
+
1. OpenAI GPT-4 (best performance for GAIA)
|
| 62 |
+
2. HuggingFace Qwen model (free alternative)
|
| 63 |
+
|
| 64 |
+
Why this order:
|
| 65 |
+
- OpenAI models generally perform better on GAIA benchmark
|
| 66 |
+
- HuggingFace provides free alternative for those without OpenAI credits
|
| 67 |
+
- Fallback ensures the agent works regardless of available keys
|
| 68 |
+
|
| 69 |
+
API Keys Setup:
|
| 70 |
+
- Go to your HuggingFace Space settings
|
| 71 |
+
- Add "Repository secrets"
|
| 72 |
+
- Set OPENAI_API_KEY (recommended) and/or HF_TOKEN
|
| 73 |
+
|
| 74 |
+
Returns:
|
| 75 |
+
LLM: Configured language model ready for use
|
| 76 |
+
|
| 77 |
+
Raises:
|
| 78 |
+
RuntimeError: If no API keys are available
|
| 79 |
"""
|
| 80 |
+
logger.info("Initializing LLM with fallback options...")
|
| 81 |
+
|
| 82 |
+
# Try OpenAI first (recommended for GAIA performance)
|
| 83 |
+
openai_key = os.getenv("OPENAI_API_KEY")
|
| 84 |
+
if openai_key:
|
| 85 |
+
try:
|
| 86 |
+
from llama_index.llms.openai import OpenAI
|
| 87 |
+
|
| 88 |
+
llm = OpenAI(
|
| 89 |
+
api_key=openai_key,
|
| 90 |
+
model="gpt-4o-mini", # Good balance of cost and performance
|
| 91 |
+
max_tokens=1024, # Reasonable limit for GAIA answers
|
| 92 |
+
temperature=0.1 # Low temperature for more consistent, factual responses
|
| 93 |
+
)
|
| 94 |
+
|
| 95 |
+
logger.info("✅ Successfully initialized OpenAI LLM")
|
| 96 |
+
return llm
|
| 97 |
+
|
| 98 |
+
except ImportError:
|
| 99 |
+
logger.warning("❌ OpenAI library not available, trying HuggingFace...")
|
| 100 |
+
except Exception as e:
|
| 101 |
+
logger.warning(f"❌ OpenAI initialization failed: {e}, trying HuggingFace...")
|
| 102 |
else:
|
| 103 |
+
logger.info("ℹ️ No OPENAI_API_KEY found, trying HuggingFace...")
|
| 104 |
+
|
| 105 |
+
# Fallback to HuggingFace
|
| 106 |
+
hf_token = os.getenv("HF_TOKEN")
|
| 107 |
+
if hf_token:
|
| 108 |
+
try:
|
| 109 |
+
from llama_index.llms.huggingface_api import HuggingFaceInferenceAPI
|
| 110 |
+
|
| 111 |
+
llm = HuggingFaceInferenceAPI(
|
| 112 |
+
model_name="Qwen/Qwen2.5-Coder-32B-Instruct", # Good open-source model
|
| 113 |
+
token=hf_token,
|
| 114 |
+
max_new_tokens=512, # Limit for response length
|
| 115 |
+
temperature=0.1, # Low temperature for consistency
|
| 116 |
+
context_window=8192 # Context window size
|
| 117 |
+
)
|
| 118 |
+
|
| 119 |
+
logger.info("✅ Successfully initialized HuggingFace LLM")
|
| 120 |
+
return llm
|
| 121 |
+
|
| 122 |
+
except ImportError:
|
| 123 |
+
logger.error("❌ HuggingFace library not available")
|
| 124 |
+
except Exception as e:
|
| 125 |
+
logger.error(f"❌ HuggingFace initialization failed: {e}")
|
| 126 |
+
else:
|
| 127 |
+
logger.info("ℹ️ No HF_TOKEN found")
|
| 128 |
+
|
| 129 |
+
# If we get here, no LLM could be initialized
|
| 130 |
+
error_msg = (
|
| 131 |
+
"No LLM could be initialized. Please set either:\n"
|
| 132 |
+
"- OPENAI_API_KEY (recommended for better GAIA performance)\n"
|
| 133 |
+
"- HF_TOKEN (free alternative)\n"
|
| 134 |
+
"In your HuggingFace Space settings → Repository secrets"
|
| 135 |
+
)
|
| 136 |
+
logger.error(error_msg)
|
| 137 |
+
raise RuntimeError(error_msg)
|
| 138 |
+
|
| 139 |
+
|
| 140 |
+
# ============================================================================
|
| 141 |
+
# GAIA AGENT CLASS - Main Agent Implementation
|
| 142 |
+
# ============================================================================
|
| 143 |
+
|
| 144 |
+
class GAIAAgent:
|
| 145 |
+
"""
|
| 146 |
+
GAIA Benchmark Agent that combines course learning with benchmark capabilities.
|
| 147 |
+
|
| 148 |
+
This agent demonstrates:
|
| 149 |
+
1. Multi-tool usage (web search, calculator, file analysis)
|
| 150 |
+
2. RAG implementation (guest database from course)
|
| 151 |
+
3. LLM integration with robust error handling
|
| 152 |
+
4. GAIA-optimized prompting for accurate answers
|
| 153 |
+
|
| 154 |
+
The agent is designed to handle various types of GAIA questions:
|
| 155 |
+
- Factual questions requiring web search
|
| 156 |
+
- Mathematical problems requiring calculations
|
| 157 |
+
- Data analysis questions requiring file processing
|
| 158 |
+
- Questions about the guest database (demonstrating RAG)
|
| 159 |
+
"""
|
| 160 |
+
|
| 161 |
+
def __init__(self):
|
| 162 |
+
"""
|
| 163 |
+
Initialize the GAIA agent with LLM and tools.
|
| 164 |
+
|
| 165 |
+
This sets up:
|
| 166 |
+
1. The language model (with fallback options)
|
| 167 |
+
2. All available tools (web search, calculator, etc.)
|
| 168 |
+
3. The agent workflow that orchestrates everything
|
| 169 |
+
"""
|
| 170 |
+
logger.info("🚀 Initializing GAIA Agent...")
|
| 171 |
+
|
| 172 |
+
# Step 1: Initialize the LLM
|
| 173 |
+
try:
|
| 174 |
+
self.llm = create_llm()
|
| 175 |
+
logger.info("✅ LLM initialized successfully")
|
| 176 |
+
except Exception as e:
|
| 177 |
+
logger.error(f"❌ Failed to initialize LLM: {e}")
|
| 178 |
+
raise
|
| 179 |
+
|
| 180 |
+
# Step 2: Import and create tools
|
| 181 |
+
tools = []
|
| 182 |
+
|
| 183 |
+
# Import tools from our tools.py file
|
| 184 |
+
try:
|
| 185 |
+
from tools import create_all_tools
|
| 186 |
+
tool_list = create_all_tools()
|
| 187 |
+
tools.extend(tool_list)
|
| 188 |
+
logger.info(f"✅ Loaded {len(tool_list)} tools from tools.py")
|
| 189 |
+
except ImportError as e:
|
| 190 |
+
logger.error(f"❌ Could not import tools.py: {e}")
|
| 191 |
+
except Exception as e:
|
| 192 |
+
logger.warning(f"⚠️ Error loading tools from tools.py: {e}")
|
| 193 |
+
|
| 194 |
+
# Import RAG tool from our retriever.py file
|
| 195 |
+
try:
|
| 196 |
+
from retriever import create_persona_tool
|
| 197 |
+
persona_tool = create_persona_tool()
|
| 198 |
+
if persona_tool:
|
| 199 |
+
tools.append(persona_tool)
|
| 200 |
+
logger.info("✅ Loaded persona RAG tool from retriever.py")
|
| 201 |
+
except ImportError as e:
|
| 202 |
+
logger.error(f"❌ Could not import retriever.py: {e}")
|
| 203 |
+
except Exception as e:
|
| 204 |
+
logger.warning(f"⚠️ Error loading RAG tool from retriever.py: {e}")
|
| 205 |
+
|
| 206 |
+
# Check if we have any tools
|
| 207 |
+
if not tools:
|
| 208 |
+
error_msg = "❌ No tools available! Check tools.py and retriever.py"
|
| 209 |
+
logger.error(error_msg)
|
| 210 |
+
raise RuntimeError(error_msg)
|
| 211 |
+
|
| 212 |
+
logger.info(f"✅ Total tools available: {len(tools)}")
|
| 213 |
+
for tool in tools:
|
| 214 |
+
logger.info(f" - {tool.metadata.name}: {tool.metadata.description[:50]}...")
|
| 215 |
+
|
| 216 |
+
# Step 3: Create the agent workflow
|
| 217 |
+
try:
|
| 218 |
+
from llama_index.core.agent.workflow import AgentWorkflow
|
| 219 |
+
|
| 220 |
+
# Create the agent with a GAIA-optimized system prompt
|
| 221 |
+
self.agent = AgentWorkflow.from_tools_or_functions(
|
| 222 |
+
tools_or_functions=tools,
|
| 223 |
+
llm=self.llm,
|
| 224 |
+
system_prompt=self._create_system_prompt()
|
| 225 |
+
)
|
| 226 |
+
|
| 227 |
+
logger.info("✅ Agent workflow created successfully")
|
| 228 |
+
|
| 229 |
+
except ImportError as e:
|
| 230 |
+
error_msg = f"❌ Could not import AgentWorkflow: {e}"
|
| 231 |
+
logger.error(error_msg)
|
| 232 |
+
raise RuntimeError(error_msg)
|
| 233 |
+
except Exception as e:
|
| 234 |
+
error_msg = f"❌ Failed to create agent workflow: {e}"
|
| 235 |
+
logger.error(error_msg)
|
| 236 |
+
raise RuntimeError(error_msg)
|
| 237 |
+
|
| 238 |
+
logger.info("🎉 GAIA Agent initialization complete!")
|
| 239 |
+
|
| 240 |
+
def _create_system_prompt(self) -> str:
|
| 241 |
+
"""
|
| 242 |
+
Create a system prompt optimized for GAIA benchmark performance.
|
| 243 |
+
|
| 244 |
+
The prompt is designed to:
|
| 245 |
+
1. Encourage accuracy over creativity
|
| 246 |
+
2. Guide proper tool usage
|
| 247 |
+
3. Ensure concise, direct answers
|
| 248 |
+
4. Handle various question types
|
| 249 |
+
|
| 250 |
+
Returns:
|
| 251 |
+
str: Optimized system prompt for GAIA questions
|
| 252 |
+
"""
|
| 253 |
+
return """You are a helpful AI assistant specialized in answering questions accurately and concisely.
|
| 254 |
+
|
| 255 |
+
IMPORTANT - GAIA BENCHMARK GUIDELINES:
|
| 256 |
+
- Provide direct, factual answers without extra explanations
|
| 257 |
+
- Use your tools when you need specific information or calculations
|
| 258 |
+
- Be precise and accurate - exact matches are required for scoring
|
| 259 |
+
- If you're not certain about an answer, use available tools to verify
|
| 260 |
+
|
| 261 |
+
AVAILABLE TOOLS AND WHEN TO USE THEM:
|
| 262 |
+
1. web_search: Use for current information, recent events, facts not in your training data
|
| 263 |
+
2. calculator: Use for ANY mathematical calculations to ensure accuracy
|
| 264 |
+
3. file_analyzer: Use when questions involve analyzing data files or documents
|
| 265 |
+
4. persona_database: Use for questions about people, characteristics, interests, professions
|
| 266 |
+
(Database contains 5000 diverse personas with various backgrounds and interests)
|
| 267 |
+
|
| 268 |
+
RESPONSE GUIDELINES:
|
| 269 |
+
- Give direct answers without phrases like "Based on my search..." or "According to..."
|
| 270 |
+
- For numerical answers, provide just the number or value
|
| 271 |
+
- For factual questions, provide just the fact
|
| 272 |
+
- For yes/no questions, answer yes or no clearly
|
| 273 |
+
- Always use tools for calculations rather than doing math in your head
|
| 274 |
|
| 275 |
+
EXAMPLES:
|
| 276 |
+
Question: "What is 15% of 847?"
|
| 277 |
+
Good: Use calculator tool, then respond with just the number
|
| 278 |
+
Bad: Try to calculate mentally and risk errors
|
| 279 |
+
|
| 280 |
+
Question: "Who is the current president of France?"
|
| 281 |
+
Good: Use web search to get current information
|
| 282 |
+
Bad: Guess based on training data that might be outdated
|
| 283 |
+
|
| 284 |
+
Remember: Accuracy is more important than speed. Use your tools to ensure correct answers."""
|
| 285 |
+
|
| 286 |
+
def __call__(self, question: str) -> str:
|
| 287 |
+
"""
|
| 288 |
+
Process a GAIA question and return an answer.
|
| 289 |
+
|
| 290 |
+
This is the main method that the evaluation system calls.
|
| 291 |
+
It handles the entire question-answering pipeline:
|
| 292 |
+
1. Logs the incoming question
|
| 293 |
+
2. Runs the agent workflow asynchronously
|
| 294 |
+
3. Extracts and cleans the response
|
| 295 |
+
4. Returns a properly formatted answer
|
| 296 |
+
|
| 297 |
+
Args:
|
| 298 |
+
question (str): The GAIA question to answer
|
| 299 |
+
|
| 300 |
+
Returns:
|
| 301 |
+
str: The agent's answer to the question
|
| 302 |
+
"""
|
| 303 |
+
logger.info(f"📝 Processing GAIA question: {question[:100]}...")
|
| 304 |
+
|
| 305 |
+
try:
|
| 306 |
+
# Run the agent asynchronously
|
| 307 |
+
# GAIA questions can be complex and may require multiple tool calls
|
| 308 |
+
loop = asyncio.new_event_loop()
|
| 309 |
+
asyncio.set_event_loop(loop)
|
| 310 |
+
|
| 311 |
+
try:
|
| 312 |
+
# Execute the agent workflow
|
| 313 |
+
result = loop.run_until_complete(
|
| 314 |
+
self.agent.run(user_msg=question)
|
| 315 |
+
)
|
| 316 |
+
|
| 317 |
+
# Extract the response from the result object
|
| 318 |
+
answer = self._extract_response(result)
|
| 319 |
+
|
| 320 |
+
# Clean and format the answer for GAIA submission
|
| 321 |
+
cleaned_answer = self._clean_answer(answer)
|
| 322 |
+
|
| 323 |
+
logger.info(f"✅ Generated answer: {cleaned_answer[:100]}...")
|
| 324 |
+
return cleaned_answer
|
| 325 |
+
|
| 326 |
+
finally:
|
| 327 |
+
# Always close the event loop to prevent memory leaks
|
| 328 |
+
loop.close()
|
| 329 |
+
|
| 330 |
+
except Exception as e:
|
| 331 |
+
# If anything goes wrong, return a helpful error message
|
| 332 |
+
error_msg = f"I encountered an error processing this question: {str(e)}"
|
| 333 |
+
logger.error(f"❌ Error processing question: {e}")
|
| 334 |
+
return error_msg
|
| 335 |
+
|
| 336 |
+
def _extract_response(self, result: Any) -> str:
|
| 337 |
+
"""
|
| 338 |
+
Extract the text response from the agent workflow result.
|
| 339 |
+
|
| 340 |
+
Agent workflows can return different types of objects.
|
| 341 |
+
This method handles various result formats robustly.
|
| 342 |
+
|
| 343 |
+
Args:
|
| 344 |
+
result: The result object from the agent workflow
|
| 345 |
+
|
| 346 |
+
Returns:
|
| 347 |
+
str: Extracted response text
|
| 348 |
+
"""
|
| 349 |
+
# Try different ways to extract the response
|
| 350 |
+
if hasattr(result, 'response'):
|
| 351 |
+
return str(result.response)
|
| 352 |
+
elif hasattr(result, 'content'):
|
| 353 |
+
return str(result.content)
|
| 354 |
+
elif hasattr(result, 'message'):
|
| 355 |
+
if hasattr(result.message, 'content'):
|
| 356 |
+
return str(result.message.content)
|
| 357 |
+
else:
|
| 358 |
+
return str(result.message)
|
| 359 |
+
else:
|
| 360 |
+
# Fallback: convert whatever we got to string
|
| 361 |
+
return str(result)
|
| 362 |
+
|
| 363 |
+
def _clean_answer(self, answer: str) -> str:
|
| 364 |
+
"""
|
| 365 |
+
Clean and format the answer for GAIA submission.
|
| 366 |
+
|
| 367 |
+
GAIA requires exact matches, so we need to:
|
| 368 |
+
1. Remove common prefixes that agents add
|
| 369 |
+
2. Strip whitespace
|
| 370 |
+
3. Ensure clean, direct responses
|
| 371 |
+
|
| 372 |
+
Args:
|
| 373 |
+
answer (str): Raw answer from the agent
|
| 374 |
+
|
| 375 |
+
Returns:
|
| 376 |
+
str: Cleaned answer ready for submission
|
| 377 |
+
"""
|
| 378 |
+
# Remove common agent response prefixes
|
| 379 |
+
prefixes_to_remove = [
|
| 380 |
+
"assistant:",
|
| 381 |
+
"Assistant:",
|
| 382 |
+
"Based on my search,",
|
| 383 |
+
"According to the search results,",
|
| 384 |
+
"The answer is:",
|
| 385 |
+
"Answer:"
|
| 386 |
+
]
|
| 387 |
+
|
| 388 |
+
cleaned = answer.strip()
|
| 389 |
+
|
| 390 |
+
for prefix in prefixes_to_remove:
|
| 391 |
+
if cleaned.startswith(prefix):
|
| 392 |
+
cleaned = cleaned[len(prefix):].strip()
|
| 393 |
+
|
| 394 |
+
return cleaned
|
| 395 |
+
|
| 396 |
+
|
| 397 |
+
# ============================================================================
|
| 398 |
+
# EVALUATION AND SUBMISSION LOGIC
|
| 399 |
+
# ============================================================================
|
| 400 |
+
|
| 401 |
+
def run_and_submit_all(profile: gr.OAuthProfile | None) -> tuple[str, pd.DataFrame]:
|
| 402 |
+
"""
|
| 403 |
+
Main function that handles the entire GAIA evaluation process.
|
| 404 |
+
|
| 405 |
+
This function:
|
| 406 |
+
1. Validates user authentication
|
| 407 |
+
2. Fetches questions from GAIA API
|
| 408 |
+
3. Runs the agent on all questions
|
| 409 |
+
4. Submits answers for scoring
|
| 410 |
+
5. Returns results and status
|
| 411 |
+
|
| 412 |
+
Args:
|
| 413 |
+
profile: Gradio OAuth profile (None if not logged in)
|
| 414 |
+
|
| 415 |
+
Returns:
|
| 416 |
+
tuple: (status_message, results_dataframe)
|
| 417 |
+
"""
|
| 418 |
+
# Step 1: Check authentication
|
| 419 |
+
if not profile:
|
| 420 |
+
logger.warning("❌ User not logged in")
|
| 421 |
+
return "Please log in to HuggingFace using the button above.", None
|
| 422 |
+
|
| 423 |
+
username = profile.username
|
| 424 |
+
logger.info(f"👤 User logged in: {username}")
|
| 425 |
+
|
| 426 |
+
# Step 2: Get space information for code link
|
| 427 |
+
space_id = os.getenv("SPACE_ID")
|
| 428 |
+
agent_code = f"https://huggingface.co/spaces/{space_id}/tree/main" if space_id else "No space ID available"
|
| 429 |
+
|
| 430 |
+
# Step 3: Set up API endpoints
|
| 431 |
api_url = DEFAULT_API_URL
|
| 432 |
questions_url = f"{api_url}/questions"
|
| 433 |
submit_url = f"{api_url}/submit"
|
| 434 |
+
|
| 435 |
+
# Step 4: Initialize the agent
|
| 436 |
+
logger.info("🤖 Initializing GAIA Agent...")
|
| 437 |
try:
|
| 438 |
+
agent = GAIAAgent()
|
| 439 |
+
logger.info("✅ GAIA Agent ready for evaluation")
|
| 440 |
except Exception as e:
|
| 441 |
+
error_msg = f"❌ Failed to initialize agent: {str(e)}"
|
| 442 |
+
logger.error(error_msg)
|
| 443 |
+
return error_msg, None
|
| 444 |
+
|
| 445 |
+
# Step 5: Fetch GAIA questions
|
| 446 |
+
logger.info(f"📥 Fetching questions from: {questions_url}")
|
|
|
|
|
|
|
| 447 |
try:
|
| 448 |
response = requests.get(questions_url, timeout=15)
|
| 449 |
response.raise_for_status()
|
| 450 |
questions_data = response.json()
|
| 451 |
+
|
| 452 |
if not questions_data:
|
| 453 |
+
return "❌ No questions received from GAIA API", None
|
| 454 |
+
|
| 455 |
+
logger.info(f"✅ Fetched {len(questions_data)} GAIA questions")
|
| 456 |
+
|
| 457 |
except requests.exceptions.RequestException as e:
|
| 458 |
+
error_msg = f"❌ Network error fetching questions: {str(e)}"
|
| 459 |
+
logger.error(error_msg)
|
| 460 |
+
return error_msg, None
|
|
|
|
|
|
|
|
|
|
| 461 |
except Exception as e:
|
| 462 |
+
error_msg = f"❌ Error processing questions: {str(e)}"
|
| 463 |
+
logger.error(error_msg)
|
| 464 |
+
return error_msg, None
|
| 465 |
+
|
| 466 |
+
# Step 6: Process all questions
|
| 467 |
+
logger.info(f"🧠 Running agent on {len(questions_data)} questions...")
|
| 468 |
results_log = []
|
| 469 |
answers_payload = []
|
| 470 |
+
|
| 471 |
+
for i, item in enumerate(questions_data, 1):
|
| 472 |
task_id = item.get("task_id")
|
| 473 |
question_text = item.get("question")
|
| 474 |
+
|
| 475 |
if not task_id or question_text is None:
|
| 476 |
+
logger.warning(f"⚠️ Skipping invalid question item: {item}")
|
| 477 |
continue
|
| 478 |
+
|
| 479 |
+
logger.info(f"📝 Processing question {i}/{len(questions_data)}: {task_id}")
|
| 480 |
+
|
| 481 |
try:
|
| 482 |
+
# Run the agent on this question
|
| 483 |
submitted_answer = agent(question_text)
|
| 484 |
+
|
| 485 |
+
# Store for submission
|
| 486 |
+
answers_payload.append({
|
| 487 |
+
"task_id": task_id,
|
| 488 |
+
"submitted_answer": submitted_answer
|
| 489 |
+
})
|
| 490 |
+
|
| 491 |
+
# Store for display (truncated for readability)
|
| 492 |
+
results_log.append({
|
| 493 |
+
"Task ID": task_id,
|
| 494 |
+
"Question": question_text[:100] + "..." if len(question_text) > 100 else question_text,
|
| 495 |
+
"Answer": submitted_answer[:150] + "..." if len(submitted_answer) > 150 else submitted_answer
|
| 496 |
+
})
|
| 497 |
+
|
| 498 |
+
logger.info(f"✅ Question {i} completed")
|
| 499 |
+
|
| 500 |
except Exception as e:
|
| 501 |
+
error_answer = f"ERROR: {str(e)}"
|
| 502 |
+
logger.error(f"❌ Error on question {i}: {e}")
|
| 503 |
+
|
| 504 |
+
answers_payload.append({
|
| 505 |
+
"task_id": task_id,
|
| 506 |
+
"submitted_answer": error_answer
|
| 507 |
+
})
|
| 508 |
+
|
| 509 |
+
results_log.append({
|
| 510 |
+
"Task ID": task_id,
|
| 511 |
+
"Question": question_text[:100] + "..." if len(question_text) > 100 else question_text,
|
| 512 |
+
"Answer": error_answer
|
| 513 |
+
})
|
| 514 |
+
|
| 515 |
if not answers_payload:
|
| 516 |
+
return "❌ No answers generated for submission", pd.DataFrame(results_log)
|
| 517 |
+
|
| 518 |
+
# Step 7: Submit answers to GAIA API
|
| 519 |
+
logger.info(f"📤 Submitting {len(answers_payload)} answers...")
|
| 520 |
+
submission_data = {
|
| 521 |
+
"username": username.strip(),
|
| 522 |
+
"agent_code": agent_code,
|
| 523 |
+
"answers": answers_payload
|
| 524 |
+
}
|
| 525 |
+
|
| 526 |
try:
|
| 527 |
response = requests.post(submit_url, json=submission_data, timeout=60)
|
| 528 |
response.raise_for_status()
|
| 529 |
result_data = response.json()
|
| 530 |
+
|
| 531 |
+
# Extract results
|
| 532 |
+
score = result_data.get('score', 0)
|
| 533 |
+
correct_count = result_data.get('correct_count', 0)
|
| 534 |
+
total_attempted = result_data.get('total_attempted', len(answers_payload))
|
| 535 |
+
|
| 536 |
+
# Determine pass/fail status
|
| 537 |
+
passed = score >= PASSING_SCORE
|
| 538 |
+
status_emoji = "🎉" if passed else "📊"
|
| 539 |
+
|
| 540 |
+
# Create status message
|
| 541 |
final_status = (
|
| 542 |
+
f"{status_emoji} GAIA Evaluation Results\n"
|
| 543 |
+
f"User: {username}\n"
|
| 544 |
+
f"Score: {score}% ({correct_count}/{total_attempted} correct)\n"
|
| 545 |
+
f"Required: {PASSING_SCORE}% to pass\n"
|
| 546 |
+
f"Status: {'✅ PASSED - Certificate Earned!' if passed else '❌ Not passed - Try again!'}\n"
|
| 547 |
+
f"Message: {result_data.get('message', 'Evaluation completed')}"
|
| 548 |
)
|
| 549 |
+
|
| 550 |
+
logger.info(f"✅ Submission successful - Score: {score}%")
|
| 551 |
+
return final_status, pd.DataFrame(results_log)
|
| 552 |
+
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 553 |
except requests.exceptions.RequestException as e:
|
| 554 |
+
error_msg = f"❌ Submission failed: {str(e)}"
|
| 555 |
+
logger.error(error_msg)
|
| 556 |
+
return error_msg, pd.DataFrame(results_log)
|
|
|
|
| 557 |
except Exception as e:
|
| 558 |
+
error_msg = f"❌ Unexpected error during submission: {str(e)}"
|
| 559 |
+
logger.error(error_msg)
|
| 560 |
+
return error_msg, pd.DataFrame(results_log)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 561 |
|
|
|
|
|
|
|
|
|
|
| 562 |
|
| 563 |
+
# ============================================================================
|
| 564 |
+
# GRADIO INTERFACE
|
| 565 |
+
# ============================================================================
|
|
|
|
|
|
|
|
|
|
| 566 |
|
| 567 |
+
# Create the Gradio interface
|
| 568 |
+
with gr.Blocks(title="GAIA Benchmark Agent") as demo:
|
| 569 |
+
# Header and instructions
|
| 570 |
+
gr.Markdown("# 🎯 GAIA Benchmark Agent - Course Final Project")
|
| 571 |
+
|
| 572 |
+
gr.Markdown("""
|
| 573 |
+
## 🚀 Welcome to Your Final Challenge!
|
| 574 |
+
|
| 575 |
+
This agent combines everything you've learned in the course:
|
| 576 |
+
- **🔧 Multi-Tool Integration**: Web search, calculator, file analysis
|
| 577 |
+
- **📚 RAG Implementation**: Persona database with 5K diverse individuals
|
| 578 |
+
- **🤖 Agent Workflows**: LlamaIndex agent orchestration
|
| 579 |
+
- **🎯 GAIA Optimization**: Designed for benchmark performance
|
| 580 |
+
|
| 581 |
+
### 📋 Setup Checklist:
|
| 582 |
+
1. **🔑 API Keys**: Set `OPENAI_API_KEY` or `HF_TOKEN` in Space secrets
|
| 583 |
+
2. **🔓 Public Space**: Keep your space public for verification
|
| 584 |
+
3. **👤 Login**: Use the HuggingFace login button below
|
| 585 |
+
4. **▶️ Run**: Click the evaluation button and wait for results
|
| 586 |
+
|
| 587 |
+
### 🏆 Goal: Score 30%+ to earn your certificate!
|
| 588 |
+
|
| 589 |
+
---
|
| 590 |
+
""")
|
| 591 |
+
|
| 592 |
+
# Login section
|
| 593 |
+
gr.Markdown("### Step 1: Login to HuggingFace")
|
| 594 |
gr.LoginButton()
|
| 595 |
+
|
| 596 |
+
# Evaluation section
|
| 597 |
+
gr.Markdown("### Step 2: Run GAIA Evaluation")
|
| 598 |
+
gr.Markdown("⚠️ **Note**: This may take 5-10 minutes to complete all questions. Please be patient!")
|
| 599 |
+
|
| 600 |
+
run_button = gr.Button(
|
| 601 |
+
"🚀 Run GAIA Evaluation & Submit Results",
|
| 602 |
+
variant="primary",
|
| 603 |
+
size="lg"
|
| 604 |
+
)
|
| 605 |
+
|
| 606 |
+
# Results section
|
| 607 |
+
gr.Markdown("### Step 3: View Results")
|
| 608 |
+
|
| 609 |
+
status_output = gr.Textbox(
|
| 610 |
+
label="📊 Evaluation Status & Results",
|
| 611 |
+
lines=8,
|
| 612 |
+
interactive=False,
|
| 613 |
+
placeholder="Results will appear here after evaluation..."
|
| 614 |
+
)
|
| 615 |
+
|
| 616 |
+
results_table = gr.DataFrame(
|
| 617 |
+
label="📝 Question-by-Question Results",
|
| 618 |
+
wrap=True
|
| 619 |
+
)
|
| 620 |
+
|
| 621 |
+
# Wire up the interface
|
| 622 |
run_button.click(
|
| 623 |
fn=run_and_submit_all,
|
| 624 |
outputs=[status_output, results_table]
|
| 625 |
)
|
| 626 |
+
|
| 627 |
+
# Footer
|
| 628 |
+
gr.Markdown("""
|
| 629 |
+
---
|
| 630 |
+
### 🔧 Troubleshooting:
|
| 631 |
+
- **No API Key Error**: Add `OPENAI_API_KEY` or `HF_TOKEN` to your Space secrets
|
| 632 |
+
- **Import Errors**: Check that all dependencies are installed
|
| 633 |
+
- **Low Score**: GAIA requires exact answers - the agent uses tools for accuracy
|
| 634 |
+
|
| 635 |
+
### 🏅 Good luck earning your certificate!
|
| 636 |
+
""")
|
| 637 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 638 |
|
| 639 |
+
# ============================================================================
|
| 640 |
+
# MAIN EXECUTION
|
| 641 |
+
# ============================================================================
|
| 642 |
|
| 643 |
+
if __name__ == "__main__":
|
| 644 |
+
print("\n" + "="*60)
|
| 645 |
+
print("🎯 GAIA BENCHMARK AGENT - Course Final Project")
|
| 646 |
+
print("="*60)
|
| 647 |
+
|
| 648 |
+
# Check environment setup
|
| 649 |
+
print("\n🔍 Environment Check:")
|
| 650 |
+
|
| 651 |
+
space_host = os.getenv("SPACE_HOST")
|
| 652 |
+
space_id = os.getenv("SPACE_ID")
|
| 653 |
+
openai_key = os.getenv("OPENAI_API_KEY")
|
| 654 |
+
hf_token = os.getenv("HF_TOKEN")
|
| 655 |
+
|
| 656 |
+
if space_host:
|
| 657 |
+
print(f"✅ SPACE_HOST: {space_host}")
|
| 658 |
+
if space_id:
|
| 659 |
+
print(f"✅ SPACE_ID: {space_id}")
|
| 660 |
+
if openai_key:
|
| 661 |
+
print("✅ OPENAI_API_KEY: Set")
|
| 662 |
+
if hf_token:
|
| 663 |
+
print("✅ HF_TOKEN: Set")
|
| 664 |
+
|
| 665 |
+
if not openai_key and not hf_token:
|
| 666 |
+
print("⚠️ WARNING: No API keys found!")
|
| 667 |
+
print(" Please set OPENAI_API_KEY or HF_TOKEN in Space secrets")
|
| 668 |
+
|
| 669 |
+
print(f"\n🎯 Target Score: {PASSING_SCORE}% (to earn certificate)")
|
| 670 |
+
print("🚀 Agent Features:")
|
| 671 |
+
print(" - Web Search (DuckDuckGo)")
|
| 672 |
+
print(" - Calculator (Math operations)")
|
| 673 |
+
print(" - Guest Database RAG (Course demo)")
|
| 674 |
+
print(" - File Analysis (Data processing)")
|
| 675 |
+
|
| 676 |
+
print("\n" + "="*60)
|
| 677 |
+
print("🌐 Launching Gradio Interface...")
|
| 678 |
+
print("="*60 + "\n")
|
| 679 |
+
|
| 680 |
+
# Launch the Gradio app
|
| 681 |
+
demo.launch(
|
| 682 |
+
debug=True,
|
| 683 |
+
share=False,
|
| 684 |
+
show_error=True
|
| 685 |
+
)
|
requirements.txt
CHANGED
|
@@ -1,2 +1,157 @@
|
|
| 1 |
-
|
| 2 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# ============================================================================
|
| 2 |
+
# GAIA Benchmark Agent - Requirements
|
| 3 |
+
# ============================================================================
|
| 4 |
+
# This file lists all the Python packages needed for the GAIA agent to work.
|
| 5 |
+
# Each section explains what the packages are used for.
|
| 6 |
+
|
| 7 |
+
# ============================================================================
|
| 8 |
+
# CORE INTERFACE AND API DEPENDENCIES
|
| 9 |
+
# ============================================================================
|
| 10 |
+
# These are essential for the app to run and communicate with GAIA API
|
| 11 |
+
|
| 12 |
+
gradio>=4.0.0
|
| 13 |
+
# Web interface for the agent - provides the UI where users interact
|
| 14 |
+
# Includes login functionality and result display
|
| 15 |
+
|
| 16 |
+
requests>=2.28.0
|
| 17 |
+
# For HTTP requests to the GAIA evaluation API
|
| 18 |
+
# Used to fetch questions and submit answers
|
| 19 |
+
|
| 20 |
+
pandas>=1.5.0
|
| 21 |
+
# Data manipulation and display of results in tables
|
| 22 |
+
# Used to show question-answer pairs in a nice format
|
| 23 |
+
|
| 24 |
+
# ============================================================================
|
| 25 |
+
# LLAMAINDEX CORE - The Foundation
|
| 26 |
+
# ============================================================================
|
| 27 |
+
# LlamaIndex is the main framework from the course
|
| 28 |
+
|
| 29 |
+
llama-index-core>=0.10.0
|
| 30 |
+
# Core LlamaIndex functionality - documents, nodes, retrievers, etc.
|
| 31 |
+
# This is the foundation that everything else builds on
|
| 32 |
+
|
| 33 |
+
# ============================================================================
|
| 34 |
+
# LLM (Language Model) INTEGRATIONS
|
| 35 |
+
# ============================================================================
|
| 36 |
+
# These allow us to use different LLMs with fallback options
|
| 37 |
+
|
| 38 |
+
llama-index-llms-openai
|
| 39 |
+
# OpenAI integration (GPT-4, GPT-3.5) - recommended for best GAIA performance
|
| 40 |
+
# Requires OPENAI_API_KEY in your Space secrets
|
| 41 |
+
|
| 42 |
+
llama-index-llms-huggingface-api
|
| 43 |
+
# HuggingFace Inference API integration - free alternative
|
| 44 |
+
# Uses models like Qwen/Qwen2.5-Coder-32B-Instruct
|
| 45 |
+
# Requires HF_TOKEN in your Space secrets
|
| 46 |
+
|
| 47 |
+
# ============================================================================
|
| 48 |
+
# AGENT WORKFLOW SYSTEM
|
| 49 |
+
# ============================================================================
|
| 50 |
+
# This enables the agent functionality from the course
|
| 51 |
+
|
| 52 |
+
llama-index-agent-workflow
|
| 53 |
+
# Agent workflow system - allows creating agents that can use multiple tools
|
| 54 |
+
# This is what orchestrates the web search, calculator, and RAG tools
|
| 55 |
+
|
| 56 |
+
# ============================================================================
|
| 57 |
+
# RETRIEVAL SYSTEMS (RAG) - Enhanced with Vector Embeddings
|
| 58 |
+
# ============================================================================
|
| 59 |
+
# These are for the advanced RAG (Retrieval-Augmented Generation) functionality
|
| 60 |
+
|
| 61 |
+
llama-index-retrievers-bm25
|
| 62 |
+
# BM25 retriever for keyword-based search (still useful as fallback)
|
| 63 |
+
# Great for finding exact matches and proper nouns
|
| 64 |
+
|
| 65 |
+
llama-index-embeddings-huggingface
|
| 66 |
+
# HuggingFace embedding models for semantic search
|
| 67 |
+
# Converts text to vectors that capture meaning and context
|
| 68 |
+
# Used with BAAI/bge-small-en-v1.5 model
|
| 69 |
+
|
| 70 |
+
llama-index-vector-stores-chroma
|
| 71 |
+
# ChromaDB vector store integration
|
| 72 |
+
# Provides persistent storage for vector embeddings
|
| 73 |
+
# Fast similarity search for semantic retrieval
|
| 74 |
+
|
| 75 |
+
chromadb>=0.4.0
|
| 76 |
+
# ChromaDB database for vector storage
|
| 77 |
+
# Self-contained vector database with no external dependencies
|
| 78 |
+
# Stores embeddings locally for fast retrieval
|
| 79 |
+
|
| 80 |
+
datasets>=2.0.0
|
| 81 |
+
# HuggingFace datasets library
|
| 82 |
+
# Used to load the finepersonas dataset
|
| 83 |
+
# Provides easy access to thousands of datasets
|
| 84 |
+
|
| 85 |
+
# ============================================================================
|
| 86 |
+
# TOOLS AND EXTERNAL SERVICES
|
| 87 |
+
# ============================================================================
|
| 88 |
+
# These packages enable the agent's tools
|
| 89 |
+
|
| 90 |
+
duckduckgo-search>=6.0.0
|
| 91 |
+
# Web search functionality using DuckDuckGo
|
| 92 |
+
# Essential for GAIA questions requiring current information
|
| 93 |
+
# Free alternative to Google Search API
|
| 94 |
+
|
| 95 |
+
# ============================================================================
|
| 96 |
+
# UTILITIES AND ENVIRONMENT
|
| 97 |
+
# ============================================================================
|
| 98 |
+
# Supporting packages for configuration and development
|
| 99 |
+
|
| 100 |
+
python-dotenv
|
| 101 |
+
# For loading environment variables from .env files
|
| 102 |
+
# Useful for local development and testing
|
| 103 |
+
|
| 104 |
+
nest-asyncio
|
| 105 |
+
# Allows running async code in environments that already have an event loop
|
| 106 |
+
# Required for running LlamaIndex query engines in Jupyter/Gradio
|
| 107 |
+
# Fixes "RuntimeError: This event loop is already running" errors
|
| 108 |
+
|
| 109 |
+
# ============================================================================
|
| 110 |
+
# OPTIONAL: ADDITIONAL USEFUL PACKAGES
|
| 111 |
+
# ============================================================================
|
| 112 |
+
# These might be helpful for specific GAIA questions but aren't required
|
| 113 |
+
|
| 114 |
+
# numpy
|
| 115 |
+
# For numerical computations if needed for advanced math questions
|
| 116 |
+
|
| 117 |
+
# matplotlib
|
| 118 |
+
# For creating charts/graphs if GAIA questions require visualizations
|
| 119 |
+
|
| 120 |
+
# beautifulsoup4
|
| 121 |
+
# For parsing HTML if web search results need detailed extraction
|
| 122 |
+
|
| 123 |
+
# ============================================================================
|
| 124 |
+
# DEVELOPMENT AND DEBUGGING (Optional)
|
| 125 |
+
# ============================================================================
|
| 126 |
+
# Uncomment these if you want enhanced debugging capabilities
|
| 127 |
+
|
| 128 |
+
# jupyter
|
| 129 |
+
# For interactive development and testing
|
| 130 |
+
|
| 131 |
+
# ipywidgets
|
| 132 |
+
# For enhanced Jupyter notebook widgets
|
| 133 |
+
|
| 134 |
+
# rich
|
| 135 |
+
# For beautiful terminal output and better logging
|
| 136 |
+
|
| 137 |
+
# ============================================================================
|
| 138 |
+
# INSTALLATION NOTES
|
| 139 |
+
# ============================================================================
|
| 140 |
+
#
|
| 141 |
+
# To install all dependencies:
|
| 142 |
+
# pip install -r requirements.txt
|
| 143 |
+
#
|
| 144 |
+
# For HuggingFace Spaces:
|
| 145 |
+
# - This file should be in your Space root directory
|
| 146 |
+
# - Dependencies are automatically installed when you deploy
|
| 147 |
+
# - No manual installation needed
|
| 148 |
+
#
|
| 149 |
+
# API Keys Setup:
|
| 150 |
+
# - Go to your Space Settings → Repository secrets
|
| 151 |
+
# - Add OPENAI_API_KEY (for OpenAI) or HF_TOKEN (for HuggingFace)
|
| 152 |
+
# - At least one is required for the agent to work
|
| 153 |
+
#
|
| 154 |
+
# Troubleshooting:
|
| 155 |
+
# - If imports fail, check that all packages installed correctly
|
| 156 |
+
# - Some packages may require specific versions for compatibility
|
| 157 |
+
# - Check the Space logs for detailed error messages
|
retriever.py
ADDED
|
@@ -0,0 +1,526 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
retriever.py - Advanced RAG Implementation with Personas Database
|
| 3 |
+
|
| 4 |
+
This file implements an advanced RAG system using:
|
| 5 |
+
1. Real dataset from HuggingFace (dvilasuero/finepersonas-v0.1-tiny)
|
| 6 |
+
2. Vector embeddings for semantic search
|
| 7 |
+
3. ChromaDB for persistent vector storage
|
| 8 |
+
4. LlamaIndex IngestionPipeline for processing
|
| 9 |
+
|
| 10 |
+
This demonstrates advanced course concepts:
|
| 11 |
+
- Dataset integration from HuggingFace
|
| 12 |
+
- Vector embeddings vs keyword search
|
| 13 |
+
- Persistent storage with ChromaDB
|
| 14 |
+
- Ingestion pipelines for data processing
|
| 15 |
+
|
| 16 |
+
Why this approach:
|
| 17 |
+
- 5K personas provide rich, diverse data
|
| 18 |
+
- Vector embeddings capture semantic meaning
|
| 19 |
+
- ChromaDB provides fast, persistent storage
|
| 20 |
+
- More realistic than simple guest database
|
| 21 |
+
|
| 22 |
+
download_and_prepare_personas() # Download 5K personas
|
| 23 |
+
load_persona_documents() # Load into documents
|
| 24 |
+
create_persona_index() # Create vector index
|
| 25 |
+
get_persona_query_engine() # For tools.py to use
|
| 26 |
+
"""
|
| 27 |
+
|
| 28 |
+
import logging
|
| 29 |
+
import os
|
| 30 |
+
from typing import List, Dict, Any
|
| 31 |
+
from pathlib import Path
|
| 32 |
+
|
| 33 |
+
# LlamaIndex core components
|
| 34 |
+
from llama_index.core.schema import Document
|
| 35 |
+
from llama_index.core.tools import FunctionTool, QueryEngineTool
|
| 36 |
+
from llama_index.core import SimpleDirectoryReader, VectorStoreIndex
|
| 37 |
+
from llama_index.core.node_parser import SentenceSplitter
|
| 38 |
+
from llama_index.core.ingestion import IngestionPipeline
|
| 39 |
+
|
| 40 |
+
# Embeddings and vector store
|
| 41 |
+
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
|
| 42 |
+
from llama_index.vector_stores.chroma import ChromaVectorStore
|
| 43 |
+
|
| 44 |
+
# External libraries
|
| 45 |
+
from datasets import load_dataset
|
| 46 |
+
import chromadb
|
| 47 |
+
|
| 48 |
+
# Setup logging
|
| 49 |
+
logger = logging.getLogger(__name__)
|
| 50 |
+
|
| 51 |
+
# ============================================================================
|
| 52 |
+
# CONFIGURATION AND CONSTANTS
|
| 53 |
+
# ============================================================================
|
| 54 |
+
|
| 55 |
+
# Dataset configuration
|
| 56 |
+
DATASET_NAME = "dvilasuero/finepersonas-v0.1-tiny"
|
| 57 |
+
DATA_DIR = Path("data")
|
| 58 |
+
CHROMA_DB_PATH = "./alfred_chroma_db"
|
| 59 |
+
COLLECTION_NAME = "alfred"
|
| 60 |
+
|
| 61 |
+
# Embedding model - good balance of performance and speed
|
| 62 |
+
EMBEDDING_MODEL = "BAAI/bge-small-en-v1.5"
|
| 63 |
+
|
| 64 |
+
# Chunk size for text splitting - optimal for personas
|
| 65 |
+
CHUNK_SIZE = 1024
|
| 66 |
+
CHUNK_OVERLAP = 20
|
| 67 |
+
|
| 68 |
+
# ============================================================================
|
| 69 |
+
# DATA PREPARATION - Loading Personas from HuggingFace
|
| 70 |
+
# ============================================================================
|
| 71 |
+
|
| 72 |
+
def download_and_prepare_personas() -> int:
|
| 73 |
+
"""
|
| 74 |
+
Download personas from HuggingFace and save as individual text files.
|
| 75 |
+
|
| 76 |
+
This approach demonstrates:
|
| 77 |
+
1. Dataset integration from HuggingFace Hub
|
| 78 |
+
2. Local file preparation for SimpleDirectoryReader
|
| 79 |
+
3. Data persistence for repeated runs
|
| 80 |
+
|
| 81 |
+
Why save as files:
|
| 82 |
+
- SimpleDirectoryReader expects file-based input
|
| 83 |
+
- Allows for easy inspection and debugging
|
| 84 |
+
- Caches data locally to avoid repeated downloads
|
| 85 |
+
- Mimics real-world scenario where you have document files
|
| 86 |
+
|
| 87 |
+
Returns:
|
| 88 |
+
int: Number of persona files created
|
| 89 |
+
"""
|
| 90 |
+
logger.info(f"Starting persona data preparation...")
|
| 91 |
+
|
| 92 |
+
# Create data directory if it doesn't exist
|
| 93 |
+
DATA_DIR.mkdir(parents=True, exist_ok=True)
|
| 94 |
+
|
| 95 |
+
# Check if we already have data (avoid re-downloading)
|
| 96 |
+
existing_files = list(DATA_DIR.glob("persona_*.txt"))
|
| 97 |
+
if existing_files:
|
| 98 |
+
logger.info(f"Found {len(existing_files)} existing persona files, skipping download")
|
| 99 |
+
return len(existing_files)
|
| 100 |
+
|
| 101 |
+
try:
|
| 102 |
+
# Load the dataset from HuggingFace
|
| 103 |
+
logger.info(f"Loading dataset: {DATASET_NAME}")
|
| 104 |
+
dataset = load_dataset(path=DATASET_NAME, split="train")
|
| 105 |
+
logger.info(f"Dataset loaded successfully with {len(dataset)} personas")
|
| 106 |
+
|
| 107 |
+
# Save each persona as a separate text file
|
| 108 |
+
personas_created = 0
|
| 109 |
+
for i, persona_data in enumerate(dataset):
|
| 110 |
+
persona_file = DATA_DIR / f"persona_{i}.txt"
|
| 111 |
+
|
| 112 |
+
# Extract the persona text
|
| 113 |
+
persona_text = persona_data["persona"]
|
| 114 |
+
|
| 115 |
+
# Add some metadata to make the persona more searchable
|
| 116 |
+
enhanced_text = f"Persona {i}:\n{persona_text}"
|
| 117 |
+
|
| 118 |
+
# Write to file
|
| 119 |
+
with open(persona_file, "w", encoding="utf-8") as f:
|
| 120 |
+
f.write(enhanced_text)
|
| 121 |
+
|
| 122 |
+
personas_created += 1
|
| 123 |
+
|
| 124 |
+
# Log progress for large datasets
|
| 125 |
+
if personas_created % 1000 == 0:
|
| 126 |
+
logger.info(f"Created {personas_created} persona files...")
|
| 127 |
+
|
| 128 |
+
logger.info(f"✅ Successfully created {personas_created} persona files")
|
| 129 |
+
return personas_created
|
| 130 |
+
|
| 131 |
+
except Exception as e:
|
| 132 |
+
logger.error(f"❌ Error downloading personas: {e}")
|
| 133 |
+
raise RuntimeError(f"Failed to download personas: {e}")
|
| 134 |
+
|
| 135 |
+
|
| 136 |
+
# ============================================================================
|
| 137 |
+
# DOCUMENT LOADING - Converting Files to LlamaIndex Documents
|
| 138 |
+
# ============================================================================
|
| 139 |
+
|
| 140 |
+
def load_persona_documents() -> List[Document]:
|
| 141 |
+
"""
|
| 142 |
+
Load persona files into LlamaIndex Document objects.
|
| 143 |
+
|
| 144 |
+
This demonstrates:
|
| 145 |
+
1. SimpleDirectoryReader usage for file loading
|
| 146 |
+
2. Document object creation and metadata handling
|
| 147 |
+
3. Error handling for file operations
|
| 148 |
+
|
| 149 |
+
Why SimpleDirectoryReader:
|
| 150 |
+
- Handles multiple file formats automatically
|
| 151 |
+
- Preserves file metadata (filename, path, etc.)
|
| 152 |
+
- Integrates seamlessly with LlamaIndex pipeline
|
| 153 |
+
- Scales well for large document collections
|
| 154 |
+
|
| 155 |
+
Returns:
|
| 156 |
+
List[Document]: List of loaded persona documents
|
| 157 |
+
"""
|
| 158 |
+
logger.info("Loading persona documents...")
|
| 159 |
+
|
| 160 |
+
# Ensure we have persona data
|
| 161 |
+
if not DATA_DIR.exists() or not list(DATA_DIR.glob("persona_*.txt")):
|
| 162 |
+
logger.info("No persona files found, downloading...")
|
| 163 |
+
download_and_prepare_personas()
|
| 164 |
+
|
| 165 |
+
try:
|
| 166 |
+
# Use SimpleDirectoryReader to load all text files
|
| 167 |
+
reader = SimpleDirectoryReader(input_dir=str(DATA_DIR))
|
| 168 |
+
documents = reader.load_data()
|
| 169 |
+
|
| 170 |
+
logger.info(f"✅ Loaded {len(documents)} persona documents")
|
| 171 |
+
|
| 172 |
+
# Log some statistics about the documents
|
| 173 |
+
if documents:
|
| 174 |
+
total_chars = sum(len(doc.text) for doc in documents)
|
| 175 |
+
avg_chars = total_chars / len(documents)
|
| 176 |
+
logger.info(f"Average document length: {avg_chars:.0f} characters")
|
| 177 |
+
|
| 178 |
+
return documents
|
| 179 |
+
|
| 180 |
+
except Exception as e:
|
| 181 |
+
logger.error(f"❌ Error loading documents: {e}")
|
| 182 |
+
raise RuntimeError(f"Failed to load persona documents: {e}")
|
| 183 |
+
|
| 184 |
+
|
| 185 |
+
# ============================================================================
|
| 186 |
+
# VECTOR STORE SETUP - ChromaDB Configuration
|
| 187 |
+
# ============================================================================
|
| 188 |
+
|
| 189 |
+
def setup_chroma_vector_store():
|
| 190 |
+
"""
|
| 191 |
+
Set up ChromaDB vector store for persistent storage.
|
| 192 |
+
|
| 193 |
+
This demonstrates:
|
| 194 |
+
1. Persistent vector database configuration
|
| 195 |
+
2. Collection management
|
| 196 |
+
3. Integration with LlamaIndex vector stores
|
| 197 |
+
|
| 198 |
+
Why ChromaDB:
|
| 199 |
+
- Persistent storage (survives application restarts)
|
| 200 |
+
- Fast vector similarity search
|
| 201 |
+
- Easy integration with LlamaIndex
|
| 202 |
+
- Good for development and production
|
| 203 |
+
- No external dependencies (self-contained)
|
| 204 |
+
|
| 205 |
+
Returns:
|
| 206 |
+
ChromaVectorStore: Configured vector store ready for use
|
| 207 |
+
"""
|
| 208 |
+
logger.info("Setting up ChromaDB vector store...")
|
| 209 |
+
|
| 210 |
+
try:
|
| 211 |
+
# Create persistent ChromaDB client
|
| 212 |
+
# This creates a local database that persists between runs
|
| 213 |
+
db = chromadb.PersistentClient(path=CHROMA_DB_PATH)
|
| 214 |
+
logger.info(f"ChromaDB client created at: {CHROMA_DB_PATH}")
|
| 215 |
+
|
| 216 |
+
# Get or create collection for our personas
|
| 217 |
+
# Collections are like tables in a traditional database
|
| 218 |
+
chroma_collection = db.get_or_create_collection(name=COLLECTION_NAME)
|
| 219 |
+
logger.info(f"Using collection: {COLLECTION_NAME}")
|
| 220 |
+
|
| 221 |
+
# Wrap ChromaDB collection in LlamaIndex vector store
|
| 222 |
+
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
|
| 223 |
+
|
| 224 |
+
logger.info("✅ ChromaDB vector store configured successfully")
|
| 225 |
+
return vector_store
|
| 226 |
+
|
| 227 |
+
except Exception as e:
|
| 228 |
+
logger.error(f"❌ Error setting up ChromaDB: {e}")
|
| 229 |
+
raise RuntimeError(f"Failed to setup ChromaDB: {e}")
|
| 230 |
+
|
| 231 |
+
|
| 232 |
+
# ============================================================================
|
| 233 |
+
# INGESTION PIPELINE - Document Processing with Embeddings
|
| 234 |
+
# ============================================================================
|
| 235 |
+
|
| 236 |
+
def create_ingestion_pipeline(vector_store) -> IngestionPipeline:
|
| 237 |
+
"""
|
| 238 |
+
Create an ingestion pipeline for processing persona documents.
|
| 239 |
+
|
| 240 |
+
This demonstrates:
|
| 241 |
+
1. Text chunking with SentenceSplitter
|
| 242 |
+
2. Embedding generation with HuggingFace models
|
| 243 |
+
3. Pipeline composition for complex processing
|
| 244 |
+
|
| 245 |
+
The pipeline does:
|
| 246 |
+
1. Split documents into smaller chunks (better for retrieval)
|
| 247 |
+
2. Generate vector embeddings for each chunk
|
| 248 |
+
3. Store embeddings in the vector database
|
| 249 |
+
|
| 250 |
+
Why this approach:
|
| 251 |
+
- Chunking improves retrieval precision
|
| 252 |
+
- Embeddings capture semantic meaning
|
| 253 |
+
- Pipeline caches results for efficiency
|
| 254 |
+
- Modular design allows easy modification
|
| 255 |
+
|
| 256 |
+
Args:
|
| 257 |
+
vector_store: ChromaDB vector store for persistence
|
| 258 |
+
|
| 259 |
+
Returns:
|
| 260 |
+
IngestionPipeline: Configured pipeline ready for document processing
|
| 261 |
+
"""
|
| 262 |
+
logger.info("Creating ingestion pipeline...")
|
| 263 |
+
|
| 264 |
+
try:
|
| 265 |
+
# Create text splitter
|
| 266 |
+
# SentenceSplitter respects sentence boundaries for better coherence
|
| 267 |
+
text_splitter = SentenceSplitter(
|
| 268 |
+
chunk_size=CHUNK_SIZE, # Max characters per chunk
|
| 269 |
+
chunk_overlap=CHUNK_OVERLAP # Overlap to maintain context
|
| 270 |
+
)
|
| 271 |
+
logger.info(f"Text splitter configured: {CHUNK_SIZE} chars, {CHUNK_OVERLAP} overlap")
|
| 272 |
+
|
| 273 |
+
# Create embedding model
|
| 274 |
+
# This model converts text to numerical vectors that capture meaning
|
| 275 |
+
embed_model = HuggingFaceEmbedding(model_name=EMBEDDING_MODEL)
|
| 276 |
+
logger.info(f"Embedding model configured: {EMBEDDING_MODEL}")
|
| 277 |
+
|
| 278 |
+
# Create the ingestion pipeline
|
| 279 |
+
# This processes documents through the transformations in order
|
| 280 |
+
pipeline = IngestionPipeline(
|
| 281 |
+
transformations=[
|
| 282 |
+
text_splitter, # First: split into chunks
|
| 283 |
+
embed_model, # Second: create embeddings
|
| 284 |
+
],
|
| 285 |
+
vector_store=vector_store # Third: store in database
|
| 286 |
+
)
|
| 287 |
+
|
| 288 |
+
logger.info("✅ Ingestion pipeline created successfully")
|
| 289 |
+
return pipeline
|
| 290 |
+
|
| 291 |
+
except Exception as e:
|
| 292 |
+
logger.error(f"❌ Error creating ingestion pipeline: {e}")
|
| 293 |
+
raise RuntimeError(f"Failed to create ingestion pipeline: {e}")
|
| 294 |
+
|
| 295 |
+
|
| 296 |
+
# ============================================================================
|
| 297 |
+
# INDEX CREATION - Vector Search Index
|
| 298 |
+
# ============================================================================
|
| 299 |
+
|
| 300 |
+
def create_persona_index():
|
| 301 |
+
"""
|
| 302 |
+
Create or load the persona vector index.
|
| 303 |
+
|
| 304 |
+
This is the main function that orchestrates the entire RAG setup:
|
| 305 |
+
1. Load documents from files
|
| 306 |
+
2. Set up vector storage
|
| 307 |
+
3. Process documents through pipeline
|
| 308 |
+
4. Create searchable index
|
| 309 |
+
|
| 310 |
+
The index enables semantic search where:
|
| 311 |
+
- Similar meanings are found even with different words
|
| 312 |
+
- Context and relationships are preserved
|
| 313 |
+
- Fast retrieval from thousands of personas
|
| 314 |
+
|
| 315 |
+
Returns:
|
| 316 |
+
VectorStoreIndex: Ready-to-use search index
|
| 317 |
+
"""
|
| 318 |
+
logger.info("Creating persona search index...")
|
| 319 |
+
|
| 320 |
+
try:
|
| 321 |
+
# Step 1: Load persona documents
|
| 322 |
+
documents = load_persona_documents()
|
| 323 |
+
if not documents:
|
| 324 |
+
raise RuntimeError("No documents loaded")
|
| 325 |
+
|
| 326 |
+
# Step 2: Set up vector store
|
| 327 |
+
vector_store = setup_chroma_vector_store()
|
| 328 |
+
|
| 329 |
+
# Step 3: Check if we already have processed data
|
| 330 |
+
# This saves time on repeated runs
|
| 331 |
+
try:
|
| 332 |
+
# Try to create index from existing vector store
|
| 333 |
+
embed_model = HuggingFaceEmbedding(model_name=EMBEDDING_MODEL)
|
| 334 |
+
existing_index = VectorStoreIndex.from_vector_store(
|
| 335 |
+
vector_store=vector_store,
|
| 336 |
+
embed_model=embed_model
|
| 337 |
+
)
|
| 338 |
+
|
| 339 |
+
# Test if the index has data
|
| 340 |
+
test_retriever = existing_index.as_retriever(similarity_top_k=1)
|
| 341 |
+
test_results = test_retriever.retrieve("test query")
|
| 342 |
+
|
| 343 |
+
if test_results:
|
| 344 |
+
logger.info("✅ Found existing persona index with data")
|
| 345 |
+
return existing_index
|
| 346 |
+
else:
|
| 347 |
+
logger.info("Existing index is empty, rebuilding...")
|
| 348 |
+
|
| 349 |
+
except Exception:
|
| 350 |
+
logger.info("No existing index found, creating new one...")
|
| 351 |
+
|
| 352 |
+
# Step 4: Process documents through ingestion pipeline
|
| 353 |
+
pipeline = create_ingestion_pipeline(vector_store)
|
| 354 |
+
|
| 355 |
+
logger.info(f"Processing {len(documents)} documents through pipeline...")
|
| 356 |
+
# This may take a while for large datasets as it generates embeddings
|
| 357 |
+
nodes = pipeline.run(documents=documents)
|
| 358 |
+
logger.info(f"✅ Processed {len(nodes)} document chunks")
|
| 359 |
+
|
| 360 |
+
# Step 5: Create the final index
|
| 361 |
+
embed_model = HuggingFaceEmbedding(model_name=EMBEDDING_MODEL)
|
| 362 |
+
index = VectorStoreIndex.from_vector_store(
|
| 363 |
+
vector_store=vector_store,
|
| 364 |
+
embed_model=embed_model
|
| 365 |
+
)
|
| 366 |
+
|
| 367 |
+
logger.info("✅ Persona index created successfully")
|
| 368 |
+
return index
|
| 369 |
+
|
| 370 |
+
except Exception as e:
|
| 371 |
+
logger.error(f"❌ Error creating persona index: {e}")
|
| 372 |
+
raise RuntimeError(f"Failed to create persona index: {e}")
|
| 373 |
+
|
| 374 |
+
|
| 375 |
+
# ============================================================================
|
| 376 |
+
# MAIN FUNCTIONS USED BY TOOLS.PY
|
| 377 |
+
# ============================================================================
|
| 378 |
+
# These are the core functions that tools.py uses to access the persona database.
|
| 379 |
+
# Tool creation is handled in tools.py following the course structure.
|
| 380 |
+
|
| 381 |
+
def get_persona_index():
|
| 382 |
+
"""
|
| 383 |
+
Get the persona index for use by tools.py.
|
| 384 |
+
|
| 385 |
+
This is a simple wrapper function that tools.py can import and use.
|
| 386 |
+
It ensures the index is created and ready for use.
|
| 387 |
+
|
| 388 |
+
Returns:
|
| 389 |
+
VectorStoreIndex: The persona database index
|
| 390 |
+
"""
|
| 391 |
+
return create_persona_index()
|
| 392 |
+
|
| 393 |
+
|
| 394 |
+
def get_persona_query_engine():
|
| 395 |
+
"""
|
| 396 |
+
Get a configured query engine for the persona database.
|
| 397 |
+
|
| 398 |
+
This creates a query engine ready for use in QueryEngineTool.
|
| 399 |
+
Tools.py can import this to create the persona database tool.
|
| 400 |
+
|
| 401 |
+
Returns:
|
| 402 |
+
QueryEngine: Configured query engine for persona database
|
| 403 |
+
"""
|
| 404 |
+
try:
|
| 405 |
+
# Get the index
|
| 406 |
+
index = create_persona_index()
|
| 407 |
+
|
| 408 |
+
# Configure embedding model (same as indexing)
|
| 409 |
+
embed_model = HuggingFaceEmbedding(model_name=EMBEDDING_MODEL)
|
| 410 |
+
|
| 411 |
+
# Create query engine with optimal settings
|
| 412 |
+
query_engine = index.as_query_engine(
|
| 413 |
+
response_mode="tree_summarize", # Good for combining multiple sources
|
| 414 |
+
similarity_top_k=5, # Retrieve top 5 most relevant personas
|
| 415 |
+
streaming=False # Disable streaming for stability
|
| 416 |
+
)
|
| 417 |
+
|
| 418 |
+
logger.info("✅ Persona query engine ready for tools.py")
|
| 419 |
+
return query_engine
|
| 420 |
+
|
| 421 |
+
except Exception as e:
|
| 422 |
+
logger.error(f"❌ Error creating query engine for tools.py: {e}")
|
| 423 |
+
raise
|
| 424 |
+
|
| 425 |
+
|
| 426 |
+
# ============================================================================
|
| 427 |
+
# TESTING AND DEBUGGING FUNCTIONS
|
| 428 |
+
# ============================================================================
|
| 429 |
+
|
| 430 |
+
def test_persona_system():
|
| 431 |
+
"""
|
| 432 |
+
Test the persona system components available in retriever.py.
|
| 433 |
+
This helps verify that the database setup is working correctly.
|
| 434 |
+
|
| 435 |
+
Note: Tool creation testing is now in tools.py since that's where tools are created.
|
| 436 |
+
"""
|
| 437 |
+
print("\n=== Testing Persona Database System ===")
|
| 438 |
+
|
| 439 |
+
# Test data preparation
|
| 440 |
+
print("\n--- Testing Data Preparation ---")
|
| 441 |
+
try:
|
| 442 |
+
count = download_and_prepare_personas()
|
| 443 |
+
print(f"✅ Data preparation successful: {count} personas")
|
| 444 |
+
except Exception as e:
|
| 445 |
+
print(f"❌ Data preparation failed: {e}")
|
| 446 |
+
return
|
| 447 |
+
|
| 448 |
+
# Test document loading
|
| 449 |
+
print("\n--- Testing Document Loading ---")
|
| 450 |
+
try:
|
| 451 |
+
docs = load_persona_documents()
|
| 452 |
+
print(f"✅ Document loading successful: {len(docs)} documents")
|
| 453 |
+
except Exception as e:
|
| 454 |
+
print(f"❌ Document loading failed: {e}")
|
| 455 |
+
return
|
| 456 |
+
|
| 457 |
+
# Test index creation
|
| 458 |
+
print("\n--- Testing Index Creation ---")
|
| 459 |
+
try:
|
| 460 |
+
index = create_persona_index()
|
| 461 |
+
print("✅ Index creation successful")
|
| 462 |
+
except Exception as e:
|
| 463 |
+
print(f"❌ Index creation failed: {e}")
|
| 464 |
+
return
|
| 465 |
+
|
| 466 |
+
# Test basic retrieval (without tool wrapper)
|
| 467 |
+
print("\n--- Testing Basic Retrieval ---")
|
| 468 |
+
test_queries = [
|
| 469 |
+
"writers and authors",
|
| 470 |
+
"people interested in travel",
|
| 471 |
+
"scientists and researchers"
|
| 472 |
+
]
|
| 473 |
+
|
| 474 |
+
try:
|
| 475 |
+
retriever = index.as_retriever(similarity_top_k=2)
|
| 476 |
+
|
| 477 |
+
for query in test_queries:
|
| 478 |
+
print(f"\nQuery: {query}")
|
| 479 |
+
try:
|
| 480 |
+
results = retriever.retrieve(query)
|
| 481 |
+
if results:
|
| 482 |
+
print(f"✅ Found {len(results)} results")
|
| 483 |
+
print(f"Sample: {results[0].text[:100]}...")
|
| 484 |
+
else:
|
| 485 |
+
print("No results found")
|
| 486 |
+
except Exception as e:
|
| 487 |
+
print(f"❌ Query failed: {e}")
|
| 488 |
+
|
| 489 |
+
except Exception as e:
|
| 490 |
+
print(f"❌ Retriever creation failed: {e}")
|
| 491 |
+
|
| 492 |
+
# Test query engine creation (for tools.py)
|
| 493 |
+
print("\n--- Testing Query Engine Creation ---")
|
| 494 |
+
try:
|
| 495 |
+
query_engine = get_persona_query_engine()
|
| 496 |
+
print("✅ Query engine creation successful")
|
| 497 |
+
print(" (This query engine can be used by tools.py)")
|
| 498 |
+
except Exception as e:
|
| 499 |
+
print(f"❌ Query engine creation failed: {e}")
|
| 500 |
+
|
| 501 |
+
print("\n=== Database System Testing Complete ===")
|
| 502 |
+
print("\nNote: For tool testing, run tools.py or usage_example.py")
|
| 503 |
+
|
| 504 |
+
|
| 505 |
+
# ============================================================================
|
| 506 |
+
# MAIN EXECUTION
|
| 507 |
+
# ============================================================================
|
| 508 |
+
|
| 509 |
+
if __name__ == "__main__":
|
| 510 |
+
# If this file is run directly, run tests
|
| 511 |
+
print("Persona Database System Testing")
|
| 512 |
+
print("=" * 50)
|
| 513 |
+
|
| 514 |
+
# Set up logging for testing
|
| 515 |
+
logging.basicConfig(level=logging.INFO)
|
| 516 |
+
|
| 517 |
+
# Run database system tests
|
| 518 |
+
test_persona_system()
|
| 519 |
+
|
| 520 |
+
print("\n" + "=" * 50)
|
| 521 |
+
print("Database testing complete!")
|
| 522 |
+
print("\nFor tool testing, run:")
|
| 523 |
+
print(" python tools.py")
|
| 524 |
+
print(" python usage_example.py")
|
| 525 |
+
print("\nFor full agent testing, run:")
|
| 526 |
+
print(" python app.py")
|
tools.py
ADDED
|
@@ -0,0 +1,656 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
tools.py - Agent Tools for GAIA Benchmark (Course Didactic Structure)
|
| 3 |
+
|
| 4 |
+
This file follows the course approach of separating:
|
| 5 |
+
1. Raw functions (the actual functionality)
|
| 6 |
+
2. Tool wrappers (FunctionTool and QueryEngineTool creation)
|
| 7 |
+
|
| 8 |
+
This makes it easier to understand and debug each component separately.
|
| 9 |
+
Each tool addresses specific GAIA benchmark needs while demonstrating course concepts.
|
| 10 |
+
|
| 11 |
+
create_persona_database_tool() # QueryEngineTool creation
|
| 12 |
+
get_all_tools() # All tools collection
|
| 13 |
+
"""
|
| 14 |
+
|
| 15 |
+
import logging
|
| 16 |
+
import math
|
| 17 |
+
import os
|
| 18 |
+
import random
|
| 19 |
+
from typing import List
|
| 20 |
+
import chromadb
|
| 21 |
+
|
| 22 |
+
# LlamaIndex imports
|
| 23 |
+
from llama_index.core.tools import FunctionTool, QueryEngineTool
|
| 24 |
+
from llama_index.core import VectorStoreIndex
|
| 25 |
+
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
|
| 26 |
+
from llama_index.vector_stores.chroma import ChromaVectorStore
|
| 27 |
+
from llama_index.llms.huggingface_api import HuggingFaceInferenceAPI
|
| 28 |
+
|
| 29 |
+
# Setup logging
|
| 30 |
+
logger = logging.getLogger(__name__)
|
| 31 |
+
|
| 32 |
+
# ============================================================================
|
| 33 |
+
# PART 1: RAW FUNCTIONS (The actual functionality)
|
| 34 |
+
# ============================================================================
|
| 35 |
+
# These are the core functions that do the actual work.
|
| 36 |
+
# They can be tested independently and are easy to understand.
|
| 37 |
+
|
| 38 |
+
def web_search(query: str) -> str:
|
| 39 |
+
"""
|
| 40 |
+
Search the web for information using DuckDuckGo.
|
| 41 |
+
|
| 42 |
+
This function handles the actual web searching logic.
|
| 43 |
+
Critical for GAIA questions requiring current information.
|
| 44 |
+
|
| 45 |
+
Args:
|
| 46 |
+
query (str): The search query/question
|
| 47 |
+
|
| 48 |
+
Returns:
|
| 49 |
+
str: Formatted search results with titles, content, and URLs
|
| 50 |
+
|
| 51 |
+
Why this is essential for GAIA:
|
| 52 |
+
- Many GAIA questions need current information (news, prices, events)
|
| 53 |
+
- LLMs have knowledge cutoffs and may not know recent facts
|
| 54 |
+
- Web search provides access to the latest information
|
| 55 |
+
"""
|
| 56 |
+
logger.info(f"🔍 Web search requested: {query}")
|
| 57 |
+
|
| 58 |
+
try:
|
| 59 |
+
# Import DuckDuckGo search - free search API
|
| 60 |
+
from duckduckgo_search import DDGS
|
| 61 |
+
|
| 62 |
+
# Perform the search with a reasonable limit
|
| 63 |
+
with DDGS() as ddgs:
|
| 64 |
+
# Get top 3 results to avoid overwhelming the LLM
|
| 65 |
+
results = list(ddgs.text(query, max_results=3))
|
| 66 |
+
|
| 67 |
+
if not results:
|
| 68 |
+
logger.warning("No search results found")
|
| 69 |
+
return "No search results found for this query."
|
| 70 |
+
|
| 71 |
+
# Format results in a clean, readable way
|
| 72 |
+
formatted_results = []
|
| 73 |
+
for i, result in enumerate(results, 1):
|
| 74 |
+
formatted_result = (
|
| 75 |
+
f"Result {i}:\n"
|
| 76 |
+
f"Title: {result['title']}\n"
|
| 77 |
+
f"Content: {result['body']}\n"
|
| 78 |
+
f"URL: {result['href']}\n"
|
| 79 |
+
)
|
| 80 |
+
formatted_results.append(formatted_result)
|
| 81 |
+
|
| 82 |
+
final_result = "\n".join(formatted_results)
|
| 83 |
+
logger.info(f"✅ Web search completed: {len(results)} results found")
|
| 84 |
+
return final_result
|
| 85 |
+
|
| 86 |
+
except ImportError:
|
| 87 |
+
error_msg = "DuckDuckGo search library not available. Please install duckduckgo-search."
|
| 88 |
+
logger.error(error_msg)
|
| 89 |
+
return error_msg
|
| 90 |
+
except Exception as e:
|
| 91 |
+
error_msg = f"Search error: {str(e)}"
|
| 92 |
+
logger.error(f"Web search failed: {e}")
|
| 93 |
+
return error_msg
|
| 94 |
+
|
| 95 |
+
|
| 96 |
+
def calculate(expression: str) -> str:
|
| 97 |
+
"""
|
| 98 |
+
Safely evaluate mathematical expressions.
|
| 99 |
+
|
| 100 |
+
This function handles mathematical calculations with safety measures.
|
| 101 |
+
CRITICAL for GAIA because many questions involve precise calculations.
|
| 102 |
+
|
| 103 |
+
Args:
|
| 104 |
+
expression (str): Mathematical expression (e.g., "2 + 2", "sqrt(16)", "sin(pi/2)")
|
| 105 |
+
|
| 106 |
+
Returns:
|
| 107 |
+
str: The result of the calculation or an error message
|
| 108 |
+
|
| 109 |
+
Why this is essential for GAIA:
|
| 110 |
+
- GAIA has many mathematical questions (percentages, conversions, etc.)
|
| 111 |
+
- LLMs can make arithmetic errors, especially with complex math
|
| 112 |
+
- Exact numerical accuracy is required (GAIA uses exact match scoring)
|
| 113 |
+
|
| 114 |
+
Examples:
|
| 115 |
+
calculate("2 + 2") → "4"
|
| 116 |
+
calculate("15% of 847") → calculate("0.15 * 847") → "127.05"
|
| 117 |
+
calculate("sqrt(16)") → "4.0"
|
| 118 |
+
"""
|
| 119 |
+
logger.info(f"🧮 Calculation requested: {expression}")
|
| 120 |
+
|
| 121 |
+
try:
|
| 122 |
+
# Create a safe environment for evaluation
|
| 123 |
+
# Only allow mathematical functions, no dangerous operations
|
| 124 |
+
allowed_names = {
|
| 125 |
+
# Include all math module functions (sin, cos, sqrt, log, etc.)
|
| 126 |
+
k: v for k, v in math.__dict__.items() if not k.startswith("__")
|
| 127 |
+
}
|
| 128 |
+
|
| 129 |
+
# Add safe Python functions
|
| 130 |
+
allowed_names.update({
|
| 131 |
+
"abs": abs, # Absolute value
|
| 132 |
+
"round": round, # Rounding
|
| 133 |
+
"min": min, # Minimum
|
| 134 |
+
"max": max, # Maximum
|
| 135 |
+
"sum": sum, # Sum of iterables
|
| 136 |
+
"pow": pow, # Power function
|
| 137 |
+
})
|
| 138 |
+
|
| 139 |
+
# Add mathematical constants
|
| 140 |
+
allowed_names.update({
|
| 141 |
+
"pi": math.pi, # π
|
| 142 |
+
"e": math.e, # Euler's number
|
| 143 |
+
})
|
| 144 |
+
|
| 145 |
+
# Evaluate the expression safely
|
| 146 |
+
# __builtins__ = {} prevents dangerous functions like open(), exec()
|
| 147 |
+
result = eval(expression, {"__builtins__": {}}, allowed_names)
|
| 148 |
+
|
| 149 |
+
result_str = str(result)
|
| 150 |
+
logger.info(f"✅ Calculation result: {expression} = {result_str}")
|
| 151 |
+
return result_str
|
| 152 |
+
|
| 153 |
+
except ZeroDivisionError:
|
| 154 |
+
error_msg = "Error: Division by zero"
|
| 155 |
+
logger.error(error_msg)
|
| 156 |
+
return error_msg
|
| 157 |
+
except ValueError as e:
|
| 158 |
+
error_msg = f"Error: Invalid mathematical operation - {str(e)}"
|
| 159 |
+
logger.error(error_msg)
|
| 160 |
+
return error_msg
|
| 161 |
+
except SyntaxError:
|
| 162 |
+
error_msg = "Error: Invalid mathematical expression syntax"
|
| 163 |
+
logger.error(error_msg)
|
| 164 |
+
return error_msg
|
| 165 |
+
except Exception as e:
|
| 166 |
+
error_msg = f"Calculation error: {str(e)}"
|
| 167 |
+
logger.error(f"Unexpected calculation error: {e}")
|
| 168 |
+
return error_msg
|
| 169 |
+
|
| 170 |
+
|
| 171 |
+
def analyze_file(file_content: str, file_type: str = "text") -> str:
|
| 172 |
+
"""
|
| 173 |
+
Analyze file content and extract relevant information.
|
| 174 |
+
|
| 175 |
+
This function processes different file types for analysis.
|
| 176 |
+
Useful for GAIA questions that include file attachments.
|
| 177 |
+
|
| 178 |
+
Args:
|
| 179 |
+
file_content (str): The content of the file
|
| 180 |
+
file_type (str): Type of file ("text", "csv", "json", etc.)
|
| 181 |
+
|
| 182 |
+
Returns:
|
| 183 |
+
str: Analysis results or extracted information
|
| 184 |
+
|
| 185 |
+
Why this helps with GAIA:
|
| 186 |
+
- Some GAIA questions include data files to analyze
|
| 187 |
+
- Questions might ask for statistics, summaries, or specific data extraction
|
| 188 |
+
- File processing shows practical data analysis skills
|
| 189 |
+
"""
|
| 190 |
+
logger.info(f"📊 File analysis requested for {file_type} file")
|
| 191 |
+
|
| 192 |
+
try:
|
| 193 |
+
if file_type.lower() == "csv":
|
| 194 |
+
# For CSV files, provide basic statistics
|
| 195 |
+
lines = file_content.strip().split('\n')
|
| 196 |
+
if not lines:
|
| 197 |
+
return "Empty file"
|
| 198 |
+
|
| 199 |
+
# Count rows and columns (assuming first row is header)
|
| 200 |
+
num_rows = len(lines) - 1 # Subtract header
|
| 201 |
+
if lines:
|
| 202 |
+
num_cols = len(lines[0].split(','))
|
| 203 |
+
analysis = (
|
| 204 |
+
f"CSV Analysis:\n"
|
| 205 |
+
f"- Rows: {num_rows}\n"
|
| 206 |
+
f"- Columns: {num_cols}\n"
|
| 207 |
+
f"- Headers: {lines[0]}"
|
| 208 |
+
)
|
| 209 |
+
if num_rows > 0:
|
| 210 |
+
analysis += f"\n- First data row: {lines[1] if len(lines) > 1 else 'None'}"
|
| 211 |
+
return analysis
|
| 212 |
+
|
| 213 |
+
elif file_type.lower() in ["txt", "text"]:
|
| 214 |
+
# For text files, provide basic statistics
|
| 215 |
+
lines = file_content.split('\n')
|
| 216 |
+
words = file_content.split()
|
| 217 |
+
chars = len(file_content)
|
| 218 |
+
|
| 219 |
+
return (
|
| 220 |
+
f"Text Analysis:\n"
|
| 221 |
+
f"- Lines: {len(lines)}\n"
|
| 222 |
+
f"- Words: {len(words)}\n"
|
| 223 |
+
f"- Characters: {chars}"
|
| 224 |
+
)
|
| 225 |
+
|
| 226 |
+
else:
|
| 227 |
+
# For other file types, return content with basic info
|
| 228 |
+
preview = file_content[:1000] + '...' if len(file_content) > 1000 else file_content
|
| 229 |
+
return f"File content ({file_type}):\n{preview}"
|
| 230 |
+
|
| 231 |
+
except Exception as e:
|
| 232 |
+
error_msg = f"File analysis error: {str(e)}"
|
| 233 |
+
logger.error(error_msg)
|
| 234 |
+
return error_msg
|
| 235 |
+
|
| 236 |
+
|
| 237 |
+
def get_weather(location: str) -> str:
|
| 238 |
+
"""
|
| 239 |
+
Get dummy weather information for a location.
|
| 240 |
+
|
| 241 |
+
This is a simplified weather function for demonstration.
|
| 242 |
+
In a real implementation, you'd connect to a weather API like OpenWeatherMap.
|
| 243 |
+
|
| 244 |
+
Args:
|
| 245 |
+
location (str): City or location name
|
| 246 |
+
|
| 247 |
+
Returns:
|
| 248 |
+
str: Weather description with temperature
|
| 249 |
+
|
| 250 |
+
Note: This is a dummy implementation for course purposes.
|
| 251 |
+
Real weather data would require an API key and actual weather service.
|
| 252 |
+
"""
|
| 253 |
+
logger.info(f"🌤️ Weather requested for: {location}")
|
| 254 |
+
|
| 255 |
+
# Dummy weather data for demonstration
|
| 256 |
+
weather_conditions = [
|
| 257 |
+
{"condition": "Sunny", "temp_c": 25, "humidity": 60},
|
| 258 |
+
{"condition": "Cloudy", "temp_c": 20, "humidity": 70},
|
| 259 |
+
{"condition": "Rainy", "temp_c": 15, "humidity": 85},
|
| 260 |
+
{"condition": "Windy", "temp_c": 22, "humidity": 55},
|
| 261 |
+
{"condition": "Clear", "temp_c": 28, "humidity": 45}
|
| 262 |
+
]
|
| 263 |
+
|
| 264 |
+
# Randomly select weather (in real implementation, this would be API call)
|
| 265 |
+
weather = random.choice(weather_conditions)
|
| 266 |
+
|
| 267 |
+
result = (
|
| 268 |
+
f"Weather in {location.title()}:\n"
|
| 269 |
+
f"Condition: {weather['condition']}\n"
|
| 270 |
+
f"Temperature: {weather['temp_c']}°C\n"
|
| 271 |
+
f"Humidity: {weather['humidity']}%"
|
| 272 |
+
)
|
| 273 |
+
|
| 274 |
+
logger.info(f"✅ Weather result: {weather['condition']}, {weather['temp_c']}°C")
|
| 275 |
+
return result
|
| 276 |
+
|
| 277 |
+
|
| 278 |
+
# ============================================================================
|
| 279 |
+
# PART 2: PERSONA DATABASE SETUP (QueryEngine creation)
|
| 280 |
+
# ============================================================================
|
| 281 |
+
# This sets up the persona database query engine following the course pattern.
|
| 282 |
+
|
| 283 |
+
def create_persona_query_engine():
|
| 284 |
+
"""
|
| 285 |
+
Create a query engine for the persona database following course pattern.
|
| 286 |
+
|
| 287 |
+
This demonstrates the exact approach from the course:
|
| 288 |
+
1. Connect to existing ChromaDB database
|
| 289 |
+
2. Create VectorStoreIndex from the stored vectors
|
| 290 |
+
3. Configure LLM for response generation
|
| 291 |
+
4. Create QueryEngine with specific settings
|
| 292 |
+
|
| 293 |
+
Returns:
|
| 294 |
+
QueryEngine: Ready-to-use query engine for persona database
|
| 295 |
+
|
| 296 |
+
Why QueryEngine vs simple retrieval:
|
| 297 |
+
- QueryEngine combines retrieval + LLM generation
|
| 298 |
+
- Provides natural, conversational responses
|
| 299 |
+
- Can synthesize information from multiple personas
|
| 300 |
+
- Better for complex questions requiring reasoning
|
| 301 |
+
"""
|
| 302 |
+
logger.info("🏗️ Creating persona database query engine...")
|
| 303 |
+
|
| 304 |
+
try:
|
| 305 |
+
# Step 1: Connect to existing ChromaDB (created by retriever.py)
|
| 306 |
+
db = chromadb.PersistentClient(path="./alfred_chroma_db")
|
| 307 |
+
chroma_collection = db.get_or_create_collection("alfred")
|
| 308 |
+
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
|
| 309 |
+
logger.info("✅ Connected to ChromaDB")
|
| 310 |
+
|
| 311 |
+
# Step 2: Set up embedding model (same as used during indexing)
|
| 312 |
+
embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")
|
| 313 |
+
logger.info("✅ Embedding model configured")
|
| 314 |
+
|
| 315 |
+
# Step 3: Create VectorStoreIndex from existing data
|
| 316 |
+
index = VectorStoreIndex.from_vector_store(
|
| 317 |
+
vector_store=vector_store,
|
| 318 |
+
embed_model=embed_model
|
| 319 |
+
)
|
| 320 |
+
logger.info("✅ Vector index created")
|
| 321 |
+
|
| 322 |
+
# Step 4: Configure LLM for response generation
|
| 323 |
+
# Try to get LLM from settings first, then fallback
|
| 324 |
+
try:
|
| 325 |
+
from llama_index.core import Settings
|
| 326 |
+
llm = Settings.llm
|
| 327 |
+
|
| 328 |
+
if llm is None:
|
| 329 |
+
# Fallback to HuggingFace LLM
|
| 330 |
+
hf_token = os.getenv("HF_TOKEN")
|
| 331 |
+
if hf_token:
|
| 332 |
+
llm = HuggingFaceInferenceAPI(
|
| 333 |
+
model_name="Qwen/Qwen2.5-Coder-32B-Instruct",
|
| 334 |
+
token=hf_token,
|
| 335 |
+
max_new_tokens=512,
|
| 336 |
+
temperature=0.1
|
| 337 |
+
)
|
| 338 |
+
logger.info("✅ Using HuggingFace LLM")
|
| 339 |
+
else:
|
| 340 |
+
logger.warning("⚠️ No LLM available, query engine will use default")
|
| 341 |
+
llm = None
|
| 342 |
+
except Exception:
|
| 343 |
+
logger.warning("⚠️ Could not configure LLM, using default")
|
| 344 |
+
llm = None
|
| 345 |
+
|
| 346 |
+
# Step 5: Create QueryEngine with optimized settings
|
| 347 |
+
query_engine = index.as_query_engine(
|
| 348 |
+
llm=llm,
|
| 349 |
+
response_mode="tree_summarize", # Good for combining multiple sources
|
| 350 |
+
similarity_top_k=5, # Retrieve top 5 most relevant personas
|
| 351 |
+
streaming=False # Disable streaming for stability
|
| 352 |
+
)
|
| 353 |
+
|
| 354 |
+
logger.info("✅ Persona query engine created successfully")
|
| 355 |
+
return query_engine
|
| 356 |
+
|
| 357 |
+
except Exception as e:
|
| 358 |
+
logger.error(f"❌ Error creating persona query engine: {e}")
|
| 359 |
+
raise RuntimeError(f"Failed to create persona query engine: {e}")
|
| 360 |
+
|
| 361 |
+
|
| 362 |
+
# ============================================================================
|
| 363 |
+
# PART 3: TOOL WRAPPERS (Converting functions to tools)
|
| 364 |
+
# ============================================================================
|
| 365 |
+
# This section creates the actual tools that the agent can use.
|
| 366 |
+
# Each tool wraps a function with metadata for the LLM to understand.
|
| 367 |
+
|
| 368 |
+
# Web Search Tool
|
| 369 |
+
web_search_tool = FunctionTool.from_defaults(
|
| 370 |
+
fn=web_search,
|
| 371 |
+
name="web_search",
|
| 372 |
+
description=(
|
| 373 |
+
"Search the web for current information, recent events, statistics, "
|
| 374 |
+
"facts, or any information not in the LLM's training data. "
|
| 375 |
+
"Use this when you need up-to-date or specific factual information. "
|
| 376 |
+
"Essential for GAIA questions about current events, prices, or recent developments."
|
| 377 |
+
)
|
| 378 |
+
)
|
| 379 |
+
|
| 380 |
+
# Calculator Tool
|
| 381 |
+
calculator_tool = FunctionTool.from_defaults(
|
| 382 |
+
fn=calculate,
|
| 383 |
+
name="calculator",
|
| 384 |
+
description=(
|
| 385 |
+
"Perform mathematical calculations and evaluate mathematical expressions. "
|
| 386 |
+
"Supports basic arithmetic (+, -, *, /), advanced math functions (sqrt, sin, cos, log), "
|
| 387 |
+
"and mathematical constants (pi, e). Use this for any numerical computations, "
|
| 388 |
+
"percentage calculations, unit conversions, or statistical operations. "
|
| 389 |
+
"CRITICAL for GAIA mathematical questions to ensure accuracy."
|
| 390 |
+
)
|
| 391 |
+
)
|
| 392 |
+
|
| 393 |
+
# File Analysis Tool
|
| 394 |
+
file_analysis_tool = FunctionTool.from_defaults(
|
| 395 |
+
fn=analyze_file,
|
| 396 |
+
name="file_analyzer",
|
| 397 |
+
description=(
|
| 398 |
+
"Analyze file contents including CSV files, text files, and other data files. "
|
| 399 |
+
"Can extract statistics, summarize content, and process structured data. "
|
| 400 |
+
"Use this when GAIA questions involve analyzing attached files or datasets."
|
| 401 |
+
)
|
| 402 |
+
)
|
| 403 |
+
|
| 404 |
+
# Weather Tool (demonstration)
|
| 405 |
+
weather_tool = FunctionTool.from_defaults(
|
| 406 |
+
fn=get_weather,
|
| 407 |
+
name="weather_tool",
|
| 408 |
+
description=(
|
| 409 |
+
"Get weather information for a specific location. "
|
| 410 |
+
"Note: This is a demo implementation with dummy data. "
|
| 411 |
+
"Use when questions ask about weather conditions."
|
| 412 |
+
)
|
| 413 |
+
)
|
| 414 |
+
|
| 415 |
+
# Persona Database Query Engine Tool
|
| 416 |
+
def create_persona_database_tool():
|
| 417 |
+
"""
|
| 418 |
+
Create the persona database tool using QueryEngineTool.
|
| 419 |
+
|
| 420 |
+
This follows the exact course pattern for creating QueryEngineTool.
|
| 421 |
+
The tool combines retrieval with LLM generation for natural responses.
|
| 422 |
+
|
| 423 |
+
Returns:
|
| 424 |
+
QueryEngineTool: Tool for querying the persona database
|
| 425 |
+
"""
|
| 426 |
+
logger.info("🛠️ Creating persona database tool...")
|
| 427 |
+
|
| 428 |
+
try:
|
| 429 |
+
# First ensure we have the persona data (this will create it if needed)
|
| 430 |
+
try:
|
| 431 |
+
from retriever import create_persona_index
|
| 432 |
+
# This creates the index if it doesn't exist
|
| 433 |
+
create_persona_index()
|
| 434 |
+
logger.info("✅ Persona index ready")
|
| 435 |
+
except Exception as e:
|
| 436 |
+
logger.warning(f"⚠️ Could not ensure persona index: {e}")
|
| 437 |
+
|
| 438 |
+
# Create the query engine
|
| 439 |
+
query_engine = create_persona_query_engine()
|
| 440 |
+
|
| 441 |
+
# Create the QueryEngineTool following course pattern
|
| 442 |
+
persona_tool = QueryEngineTool.from_defaults(
|
| 443 |
+
query_engine=query_engine,
|
| 444 |
+
name="persona_database",
|
| 445 |
+
description=(
|
| 446 |
+
"Search and query a database of 5000 diverse personas with various backgrounds, "
|
| 447 |
+
"interests, and professions. Use this to find people with specific characteristics, "
|
| 448 |
+
"skills, or interests. Can answer questions like 'find writers', 'who likes travel', "
|
| 449 |
+
"'scientists in the group', 'creative professionals', or 'people interested in technology'. "
|
| 450 |
+
"Returns detailed information about matching personas with their backgrounds and interests."
|
| 451 |
+
)
|
| 452 |
+
)
|
| 453 |
+
|
| 454 |
+
logger.info("✅ Persona database tool created successfully")
|
| 455 |
+
return persona_tool
|
| 456 |
+
|
| 457 |
+
except Exception as e:
|
| 458 |
+
logger.error(f"❌ Error creating persona database tool: {e}")
|
| 459 |
+
# Return None so the agent can still work without this tool
|
| 460 |
+
return None
|
| 461 |
+
|
| 462 |
+
|
| 463 |
+
# ============================================================================
|
| 464 |
+
# PART 4: TOOL COLLECTION (Getting all tools together)
|
| 465 |
+
# ============================================================================
|
| 466 |
+
|
| 467 |
+
def get_all_tools() -> List:
|
| 468 |
+
"""
|
| 469 |
+
Get all available tools for the GAIA agent.
|
| 470 |
+
|
| 471 |
+
This function collects all tools and handles any creation errors gracefully.
|
| 472 |
+
The agent will work with whatever tools are successfully created.
|
| 473 |
+
|
| 474 |
+
Returns:
|
| 475 |
+
List: All successfully created tools
|
| 476 |
+
"""
|
| 477 |
+
logger.info("🔧 Collecting all tools...")
|
| 478 |
+
|
| 479 |
+
tools = []
|
| 480 |
+
|
| 481 |
+
# Add function-based tools (these should always work)
|
| 482 |
+
try:
|
| 483 |
+
tools.extend([
|
| 484 |
+
web_search_tool,
|
| 485 |
+
calculator_tool,
|
| 486 |
+
file_analysis_tool,
|
| 487 |
+
weather_tool
|
| 488 |
+
])
|
| 489 |
+
logger.info(f"✅ Added {len(tools)} function-based tools")
|
| 490 |
+
except Exception as e:
|
| 491 |
+
logger.error(f"❌ Error adding function tools: {e}")
|
| 492 |
+
|
| 493 |
+
# Add persona database tool (this might fail if database isn't ready)
|
| 494 |
+
try:
|
| 495 |
+
persona_tool = create_persona_database_tool()
|
| 496 |
+
if persona_tool:
|
| 497 |
+
tools.append(persona_tool)
|
| 498 |
+
logger.info("✅ Added persona database tool")
|
| 499 |
+
else:
|
| 500 |
+
logger.warning("⚠️ Persona database tool not available")
|
| 501 |
+
except Exception as e:
|
| 502 |
+
logger.warning(f"⚠️ Could not create persona database tool: {e}")
|
| 503 |
+
|
| 504 |
+
logger.info(f"🎯 Total tools available: {len(tools)}")
|
| 505 |
+
for tool in tools:
|
| 506 |
+
tool_name = getattr(tool.metadata, 'name', 'Unknown')
|
| 507 |
+
logger.info(f" - {tool_name}")
|
| 508 |
+
|
| 509 |
+
return tools
|
| 510 |
+
|
| 511 |
+
|
| 512 |
+
# ============================================================================
|
| 513 |
+
# PART 5: TESTING FUNCTIONS (For development and debugging)
|
| 514 |
+
# ============================================================================
|
| 515 |
+
|
| 516 |
+
def test_individual_functions():
|
| 517 |
+
"""
|
| 518 |
+
Test each function individually to make sure they work.
|
| 519 |
+
This helps with debugging and understanding what each function does.
|
| 520 |
+
"""
|
| 521 |
+
print("\n=== Testing Individual Functions ===")
|
| 522 |
+
|
| 523 |
+
# Test web search
|
| 524 |
+
print("\n--- Testing Web Search Function ---")
|
| 525 |
+
try:
|
| 526 |
+
result = web_search("current year")
|
| 527 |
+
print(f"Web search result: {result[:150]}...")
|
| 528 |
+
print("✅ Web search function works")
|
| 529 |
+
except Exception as e:
|
| 530 |
+
print(f"❌ Web search failed: {e}")
|
| 531 |
+
|
| 532 |
+
# Test calculator
|
| 533 |
+
print("\n--- Testing Calculator Function ---")
|
| 534 |
+
try:
|
| 535 |
+
result = calculate("2 + 2 * 3")
|
| 536 |
+
print(f"Calculator result (2 + 2 * 3): {result}")
|
| 537 |
+
result = calculate("sqrt(16)")
|
| 538 |
+
print(f"Calculator result (sqrt(16)): {result}")
|
| 539 |
+
print("✅ Calculator function works")
|
| 540 |
+
except Exception as e:
|
| 541 |
+
print(f"❌ Calculator failed: {e}")
|
| 542 |
+
|
| 543 |
+
# Test file analyzer
|
| 544 |
+
print("\n--- Testing File Analysis Function ---")
|
| 545 |
+
try:
|
| 546 |
+
sample_csv = "name,age,city\nJohn,25,NYC\nJane,30,LA\nBob,35,SF"
|
| 547 |
+
result = analyze_file(sample_csv, "csv")
|
| 548 |
+
print(f"File analysis result: {result}")
|
| 549 |
+
print("✅ File analysis function works")
|
| 550 |
+
except Exception as e:
|
| 551 |
+
print(f"❌ File analysis failed: {e}")
|
| 552 |
+
|
| 553 |
+
# Test weather
|
| 554 |
+
print("\n--- Testing Weather Function ---")
|
| 555 |
+
try:
|
| 556 |
+
result = get_weather("Paris")
|
| 557 |
+
print(f"Weather result: {result}")
|
| 558 |
+
print("✅ Weather function works")
|
| 559 |
+
except Exception as e:
|
| 560 |
+
print(f"❌ Weather failed: {e}")
|
| 561 |
+
|
| 562 |
+
|
| 563 |
+
def test_tool_creation():
|
| 564 |
+
"""
|
| 565 |
+
Test that all tools can be created successfully.
|
| 566 |
+
"""
|
| 567 |
+
print("\n=== Testing Tool Creation ===")
|
| 568 |
+
|
| 569 |
+
try:
|
| 570 |
+
tools = get_all_tools()
|
| 571 |
+
print(f"✅ Successfully created {len(tools)} tools")
|
| 572 |
+
|
| 573 |
+
for tool in tools:
|
| 574 |
+
tool_name = getattr(tool.metadata, 'name', 'Unknown')
|
| 575 |
+
tool_desc = getattr(tool.metadata, 'description', 'No description')[:100]
|
| 576 |
+
print(f" - {tool_name}: {tool_desc}...")
|
| 577 |
+
|
| 578 |
+
except Exception as e:
|
| 579 |
+
print(f"❌ Tool creation failed: {e}")
|
| 580 |
+
|
| 581 |
+
|
| 582 |
+
def test_tool_functionality():
|
| 583 |
+
"""
|
| 584 |
+
Test that tools can actually be called and return results.
|
| 585 |
+
"""
|
| 586 |
+
print("\n=== Testing Tool Functionality ===")
|
| 587 |
+
|
| 588 |
+
tools = get_all_tools()
|
| 589 |
+
|
| 590 |
+
for tool in tools:
|
| 591 |
+
tool_name = getattr(tool.metadata, 'name', 'Unknown')
|
| 592 |
+
print(f"\n--- Testing {tool_name} ---")
|
| 593 |
+
|
| 594 |
+
try:
|
| 595 |
+
if tool_name == "calculator":
|
| 596 |
+
# Test calculator tool
|
| 597 |
+
result = tool.func("5 * 8")
|
| 598 |
+
print(f"Calculator test (5 * 8): {result}")
|
| 599 |
+
|
| 600 |
+
elif tool_name == "web_search":
|
| 601 |
+
# Test web search (might be slow)
|
| 602 |
+
print("Testing web search (this might take a moment)...")
|
| 603 |
+
result = tool.func("Python programming")
|
| 604 |
+
print(f"Web search test: {result[:100]}...")
|
| 605 |
+
|
| 606 |
+
elif tool_name == "file_analyzer":
|
| 607 |
+
# Test file analyzer
|
| 608 |
+
test_data = "col1,col2\nval1,val2\nval3,val4"
|
| 609 |
+
result = tool.func(test_data, "csv")
|
| 610 |
+
print(f"File analyzer test: {result}")
|
| 611 |
+
|
| 612 |
+
elif tool_name == "weather_tool":
|
| 613 |
+
# Test weather tool
|
| 614 |
+
result = tool.func("London")
|
| 615 |
+
print(f"Weather test: {result}")
|
| 616 |
+
|
| 617 |
+
elif tool_name == "persona_database":
|
| 618 |
+
# Test persona database (might be slow on first run)
|
| 619 |
+
print("Testing persona database (this might take a moment)...")
|
| 620 |
+
# This would be an async call in real usage
|
| 621 |
+
print("Persona database test skipped (requires async)")
|
| 622 |
+
|
| 623 |
+
print(f"✅ {tool_name} test completed")
|
| 624 |
+
|
| 625 |
+
except Exception as e:
|
| 626 |
+
print(f"❌ {tool_name} test failed: {e}")
|
| 627 |
+
|
| 628 |
+
|
| 629 |
+
# ============================================================================
|
| 630 |
+
# MAIN EXECUTION (For testing when file is run directly)
|
| 631 |
+
# ============================================================================
|
| 632 |
+
|
| 633 |
+
if __name__ == "__main__":
|
| 634 |
+
print("GAIA Agent Tools Testing")
|
| 635 |
+
print("=" * 50)
|
| 636 |
+
|
| 637 |
+
# Set up logging for testing
|
| 638 |
+
logging.basicConfig(level=logging.INFO)
|
| 639 |
+
|
| 640 |
+
# Test individual functions first
|
| 641 |
+
test_individual_functions()
|
| 642 |
+
|
| 643 |
+
# Test tool creation
|
| 644 |
+
test_tool_creation()
|
| 645 |
+
|
| 646 |
+
# Test tool functionality (optional - can be slow)
|
| 647 |
+
response = input("\nRun tool functionality tests? (y/n): ")
|
| 648 |
+
if response.lower() == 'y':
|
| 649 |
+
test_tool_functionality()
|
| 650 |
+
else:
|
| 651 |
+
print("Skipping functionality tests")
|
| 652 |
+
|
| 653 |
+
print("\n=== Tools Testing Complete ===")
|
| 654 |
+
print("\nTo use these tools in your agent:")
|
| 655 |
+
print("from tools import get_all_tools")
|
| 656 |
+
print("tools = get_all_tools()")
|