Isateles commited on
Commit
8a7b3d1
·
1 Parent(s): 81917a3

Updated agent

Browse files
Files changed (5) hide show
  1. README.md +264 -13
  2. app.py +632 -143
  3. requirements.txt +157 -2
  4. retriever.py +526 -0
  5. tools.py +656 -0
README.md CHANGED
@@ -1,15 +1,266 @@
1
- ---
2
- title: Template Final Assignment
3
- emoji: 🕵🏻‍♂️
4
- colorFrom: indigo
5
- colorTo: indigo
6
- sdk: gradio
7
- sdk_version: 5.25.2
8
- app_file: app.py
9
- pinned: false
10
- hf_oauth: true
11
- # optional, default duration is 8 hours/480 minutes. Max duration is 30 days/43200 minutes.
12
- hf_oauth_expiration_minutes: 480
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
  ---
14
 
15
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
1
+ # 🎯 GAIA Benchmark Agent - Course Final Project
2
+
3
+ A comprehensive AI agent that demonstrates course learning while achieving 30%+ score on GAIA benchmark to earn your course certificate.
4
+
5
+ ## 🌟 What This Agent Demonstrates
6
+
7
+ This project combines all major concepts from the course:
8
+
9
+ ### 📚 **Course Learning Applied**
10
+ - **🔧 Tools Integration**: Multiple tool types working together
11
+ - **📖 RAG Implementation**: Persona database with 5K diverse individuals using vector embeddings
12
+ - **🤖 Agent Workflows**: LlamaIndex agent orchestration
13
+ - **🧠 LLM Integration**: Fallback options for accessibility
14
+ - **📁 Modular Architecture**: Clean separation of concerns
15
+
16
+ ### 🎯 **GAIA Benchmark Optimized**
17
+ - **🔍 Web Search**: For current information and facts
18
+ - **🧮 Calculator**: For mathematical accuracy (critical for GAIA)
19
+ - **📊 File Analysis**: For data processing questions
20
+ - **💬 Conversational**: Natural language interaction
21
+
22
+ ## 🗂️ Project Structure
23
+
24
+ ```
25
+ your-space/
26
+ ├── app.py # Main application with Gradio interface
27
+ ├── tools.py # All agent tools (web search, calculator, etc.)
28
+ ├── retriever.py # RAG implementation with guest database
29
+ ├── requirements.txt # Python dependencies
30
+ └── README.md # This file
31
+ ```
32
+
33
+ ### 📁 **File Explanations**
34
+
35
+ **`app.py`** - Main Application
36
+ - Gradio interface for GAIA evaluation
37
+ - Agent initialization with error handling
38
+ - Question processing and answer submission
39
+ - Results display and certificate status
40
+
41
+ **`tools.py`** - Agent Tools
42
+ - **Web Search Tool**: DuckDuckGo integration for current info
43
+ - **Calculator Tool**: Safe mathematical expression evaluation
44
+ - **File Analysis Tool**: Process CSV, text, and data files
45
+ - All tools have detailed documentation and error handling
46
+
47
+ **`retriever.py`** - Advanced RAG System
48
+ - Persona database with 5K diverse individuals from HuggingFace
49
+ - Vector embeddings with ChromaDB for semantic search
50
+ - IngestionPipeline for document processing
51
+ - Demonstrates state-of-the-art RAG concepts
52
+
53
+ ## 🚀 Quick Setup Guide
54
+
55
+ ### 1. **Clone or Duplicate This Space**
56
+ ```bash
57
+ # If cloning locally
58
+ git clone https://huggingface.co/spaces/your-username/your-space
59
+ cd your-space
60
+
61
+ # Or duplicate this space to your HF account
62
+ ```
63
+
64
+ ### 2. **Set API Keys** ⚡ **CRITICAL STEP**
65
+
66
+ In your HuggingFace Space:
67
+ 1. Go to **Settings** → **Repository secrets**
68
+ 2. Add **at least one** of these:
69
+
70
+ **Option A: OpenAI (Recommended)**
71
+ - Name: `OPENAI_API_KEY`
72
+ - Value: `sk-...` (your OpenAI API key)
73
+ - **Why**: Better performance on GAIA benchmark
74
+
75
+ **Option B: HuggingFace (Free Alternative)**
76
+ - Name: `HF_TOKEN`
77
+ - Value: `hf_...` (your HF token)
78
+ - **Why**: Free alternative, works without OpenAI credits
79
+
80
+ **Get API Keys:**
81
+ - **OpenAI**: https://platform.openai.com/api-keys
82
+ - **HuggingFace**: https://huggingface.co/settings/tokens
83
+
84
+ ### 3. **Ensure Public Space**
85
+ - Your space must be **public** for leaderboard verification
86
+ - Go to Settings → Change from Private to Public
87
+
88
+ ### 4. **Run Evaluation**
89
+ 1. Click the HuggingFace login button
90
+ 2. Click "Run GAIA Evaluation & Submit Results"
91
+ 3. Wait 5-10 minutes for completion
92
+ 4. Check your score - need 30%+ to pass! 🏆
93
+
94
+ ## 🔧 Why Each Tool Matters for GAIA
95
+
96
+ ### 🌐 **Web Search Tool**
97
+ ```python
98
+ # Example GAIA questions this helps with:
99
+ "Who is the current president of France?"
100
+ "What was Tesla's stock price yesterday?"
101
+ "Recent developments in AI research"
102
+ ```
103
+ **Why needed**: GAIA questions often require current information beyond LLM training data.
104
+
105
+ ### 🧮 **Calculator Tool**
106
+ ```python
107
+ # Example GAIA questions this helps with:
108
+ "What is 15% of 847?"
109
+ "Calculate the area of a circle with radius 23.7m"
110
+ "If I invest $5000 at 3.2% annual interest for 7 years..."
111
+ ```
112
+ **Why needed**: LLMs can make arithmetic errors. GAIA requires exact numerical accuracy.
113
+
114
+ ### 📊 **File Analysis Tool**
115
+ ```python
116
+ # Example GAIA questions this helps with:
117
+ "Analyze this CSV file and tell me the average..."
118
+ "What is the most common value in column 3?"
119
+ "Process this data file and extract..."
120
+ ```
121
+ **Why needed**: Some GAIA questions include file attachments requiring analysis.
122
+
123
+ ### 📚 **Persona RAG Tool**
124
+ ```python
125
+ # Example questions this demonstrates:
126
+ "Find writers and authors"
127
+ "Who are the scientists?"
128
+ "People interested in travel"
129
+ "Creative professionals at the event"
130
+ ```
131
+ **Why included**: Demonstrates advanced RAG with 5K real personas, vector embeddings, and semantic search.
132
+
133
+ ## 📖 Course Concepts Demonstrated
134
+
135
+ ### 🔧 **Components** (From Course Unit 2)
136
+ - **LLM Integration**: OpenAI + HuggingFace fallback
137
+ - **Document Processing**: Text chunking and metadata
138
+ - **Response Synthesis**: Clean answer formatting
139
+
140
+ ### 🛠️ **Tools** (From Course Unit 3)
141
+ - **FunctionTool Creation**: Multiple tool types
142
+ - **Tool Descriptions**: Proper LLM guidance
143
+ - **Error Handling**: Graceful tool failures
144
+
145
+ ### 🤖 **Agents** (From Course Unit 4)
146
+ - **AgentWorkflow**: Multi-tool orchestration
147
+ - **System Prompts**: GAIA-optimized instructions
148
+ - **Async Processing**: Efficient question handling
149
+
150
+ ### 📖 **RAG Implementation** (From Course Unit 5)
151
+ - **Dataset Integration**: 5K personas from HuggingFace
152
+ - **Vector Embeddings**: Semantic search with BAAI/bge-small-en-v1.5
153
+ - **ChromaDB Storage**: Persistent vector database
154
+ - **Ingestion Pipeline**: Document processing and chunking
155
+
156
+ ### 🏗️ **Workflows** (From Course Unit 6)
157
+ - **Event-Driven**: Tool selection and execution
158
+ - **State Management**: Context preservation
159
+ - **Error Recovery**: Robust failure handling
160
+
161
+ ## 🎓 Why This Approach Works for GAIA
162
+
163
+ ### ✅ **Accuracy First**
164
+ - Calculator prevents math errors
165
+ - Web search provides current facts
166
+ - Low temperature LLM settings for consistency
167
+
168
+ ### ✅ **Comprehensive Coverage**
169
+ - Factual questions → Web search
170
+ - Mathematical questions → Calculator
171
+ - Data questions → File analysis
172
+ - Knowledge questions → RAG system
173
+
174
+ ### ✅ **Robust Error Handling**
175
+ - Graceful API failures
176
+ - Tool availability checking
177
+ - Fallback responses
178
+
179
+ ### ✅ **GAIA-Specific Optimizations**
180
+ - Direct, concise answers
181
+ - Exact match optimization
182
+ - Minimal extra text
183
+
184
+ ## 🔧 Troubleshooting
185
+
186
+ ### ❌ **"No LLM available" Error**
187
+ **Problem**: No API keys set
188
+ **Solution**: Add `OPENAI_API_KEY` or `HF_TOKEN` to Space secrets
189
+
190
+ ### ❌ **Import Errors**
191
+ **Problem**: Dependencies not installed
192
+ **Solution**: Check requirements.txt is in root directory, restart Space
193
+
194
+ ### ❌ **Low GAIA Score**
195
+ **Problem**: Agent giving wrong answers
196
+ **Solutions**:
197
+ - Check API key is working (OpenAI generally performs better)
198
+ - Review agent logs for tool usage
199
+ - Ensure web search and calculator are working
200
+
201
+ ### ❌ **"Could not submit" Error**
202
+ **Problem**: Network or authentication issue
203
+ **Solution**:
204
+ - Ensure logged in to HuggingFace
205
+ - Check space is public
206
+ - Try again (temporary network issues)
207
+
208
+ ### ❌ **Tools Not Working**
209
+ **Problem**: Missing dependencies or API issues
210
+ **Solution**: Check Space logs, verify all packages installed
211
+
212
+ ## 📊 Expected Performance
213
+
214
+ ### 🎯 **Target Scores**
215
+ - **Minimum for Certificate**: 30%
216
+ - **Good Performance**: 40-50%
217
+ - **Excellent Performance**: 60%+
218
+
219
+ ### 📈 **Performance Factors**
220
+ - **API Choice**: OpenAI typically scores higher than HuggingFace
221
+ - **Tool Usage**: Questions requiring tools score better when tools work
222
+ - **Answer Format**: Direct answers score better than verbose responses
223
+
224
+ ## 🚀 Getting Better Scores
225
+
226
+ ### 💡 **Optimization Tips**
227
+ 1. **Use OpenAI**: Generally more accurate than HuggingFace for GAIA
228
+ 2. **Check Tool Functionality**: Test web search and calculator work
229
+ 3. **Review Failed Questions**: Look at specific errors in results table
230
+ 4. **Adjust System Prompt**: Fine-tune for your specific weak areas
231
+
232
+ ### 🔄 **Iterative Improvement**
233
+ 1. Run evaluation and check results
234
+ 2. Identify patterns in failed questions
235
+ 3. Adjust tools or prompts accordingly
236
+ 4. Re-run evaluation
237
+
238
+ ## 🏆 Certificate Achievement
239
+
240
+ **To earn your course certificate:**
241
+ 1. ✅ Score 30% or higher on GAIA evaluation
242
+ 2. ✅ Keep your space public for verification
243
+ 3. ✅ Submit through the official interface
244
+
245
+ **When you pass:**
246
+ - You'll see "✅ PASSED - Certificate Earned!" in results
247
+ - Your score will appear on the student leaderboard
248
+ - You can download your official certificate
249
+
250
+ ## 🤝 Getting Help
251
+
252
+ **If you're stuck:**
253
+ 1. Check the troubleshooting section above
254
+ 2. Review Space logs for specific errors
255
+ 3. Test individual components (tools.py, retriever.py)
256
+ 4. Ask in the course Discord for community help
257
+
258
+ ## 🎉 Good Luck!
259
+
260
+ This agent represents everything you've learned in the course. The modular design makes it easy to understand, debug, and improve. Focus on getting those API keys set up correctly, and you'll be well on your way to earning your certificate!
261
+
262
+ **Remember**: The goal isn't just to pass the benchmark, but to demonstrate your understanding of modern AI agent development. This codebase serves as a portfolio piece showing your skills in RAG, tool integration, and agent orchestration.
263
+
264
  ---
265
 
266
+ *Built with ❤️ using LlamaIndex and course concepts*
app.py CHANGED
@@ -1,196 +1,685 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  import os
2
  import gradio as gr
3
  import requests
4
- import inspect
5
  import pandas as pd
 
 
 
6
 
7
- # (Keep Constants as is)
8
- # --- Constants ---
 
 
 
 
 
 
 
 
 
 
9
  DEFAULT_API_URL = "https://agents-course-unit4-scoring.hf.space"
10
 
11
- # --- Basic Agent Definition ---
12
- # ----- THIS IS WERE YOU CAN BUILD WHAT YOU WANT ------
13
- class BasicAgent:
14
- def __init__(self):
15
- print("BasicAgent initialized.")
16
- def __call__(self, question: str) -> str:
17
- print(f"Agent received question (first 50 chars): {question[:50]}...")
18
- fixed_answer = "This is a default answer."
19
- print(f"Agent returning fixed answer: {fixed_answer}")
20
- return fixed_answer
21
 
22
- def run_and_submit_all( profile: gr.OAuthProfile | None):
23
  """
24
- Fetches all questions, runs the BasicAgent on them, submits all answers,
25
- and displays the results.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
26
  """
27
- # --- Determine HF Space Runtime URL and Repo URL ---
28
- space_id = os.getenv("SPACE_ID") # Get the SPACE_ID for sending link to the code
29
-
30
- if profile:
31
- username= f"{profile.username}"
32
- print(f"User logged in: {username}")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
33
  else:
34
- print("User not logged in.")
35
- return "Please Login to Hugging Face with the button.", None
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
36
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
37
  api_url = DEFAULT_API_URL
38
  questions_url = f"{api_url}/questions"
39
  submit_url = f"{api_url}/submit"
40
-
41
- # 1. Instantiate Agent ( modify this part to create your agent)
 
42
  try:
43
- agent = BasicAgent()
 
44
  except Exception as e:
45
- print(f"Error instantiating agent: {e}")
46
- return f"Error initializing agent: {e}", None
47
- # In the case of an app running as a hugging Face space, this link points toward your codebase ( usefull for others so please keep it public)
48
- agent_code = f"https://huggingface.co/spaces/{space_id}/tree/main"
49
- print(agent_code)
50
-
51
- # 2. Fetch Questions
52
- print(f"Fetching questions from: {questions_url}")
53
  try:
54
  response = requests.get(questions_url, timeout=15)
55
  response.raise_for_status()
56
  questions_data = response.json()
 
57
  if not questions_data:
58
- print("Fetched questions list is empty.")
59
- return "Fetched questions list is empty or invalid format.", None
60
- print(f"Fetched {len(questions_data)} questions.")
 
61
  except requests.exceptions.RequestException as e:
62
- print(f"Error fetching questions: {e}")
63
- return f"Error fetching questions: {e}", None
64
- except requests.exceptions.JSONDecodeError as e:
65
- print(f"Error decoding JSON response from questions endpoint: {e}")
66
- print(f"Response text: {response.text[:500]}")
67
- return f"Error decoding server response for questions: {e}", None
68
  except Exception as e:
69
- print(f"An unexpected error occurred fetching questions: {e}")
70
- return f"An unexpected error occurred fetching questions: {e}", None
71
-
72
- # 3. Run your Agent
 
 
73
  results_log = []
74
  answers_payload = []
75
- print(f"Running agent on {len(questions_data)} questions...")
76
- for item in questions_data:
77
  task_id = item.get("task_id")
78
  question_text = item.get("question")
 
79
  if not task_id or question_text is None:
80
- print(f"Skipping item with missing task_id or question: {item}")
81
  continue
 
 
 
82
  try:
 
83
  submitted_answer = agent(question_text)
84
- answers_payload.append({"task_id": task_id, "submitted_answer": submitted_answer})
85
- results_log.append({"Task ID": task_id, "Question": question_text, "Submitted Answer": submitted_answer})
 
 
 
 
 
 
 
 
 
 
 
 
 
 
86
  except Exception as e:
87
- print(f"Error running agent on task {task_id}: {e}")
88
- results_log.append({"Task ID": task_id, "Question": question_text, "Submitted Answer": f"AGENT ERROR: {e}"})
89
-
 
 
 
 
 
 
 
 
 
 
 
90
  if not answers_payload:
91
- print("Agent did not produce any answers to submit.")
92
- return "Agent did not produce any answers to submit.", pd.DataFrame(results_log)
93
-
94
- # 4. Prepare Submission
95
- submission_data = {"username": username.strip(), "agent_code": agent_code, "answers": answers_payload}
96
- status_update = f"Agent finished. Submitting {len(answers_payload)} answers for user '{username}'..."
97
- print(status_update)
98
-
99
- # 5. Submit
100
- print(f"Submitting {len(answers_payload)} answers to: {submit_url}")
101
  try:
102
  response = requests.post(submit_url, json=submission_data, timeout=60)
103
  response.raise_for_status()
104
  result_data = response.json()
 
 
 
 
 
 
 
 
 
 
 
105
  final_status = (
106
- f"Submission Successful!\n"
107
- f"User: {result_data.get('username')}\n"
108
- f"Overall Score: {result_data.get('score', 'N/A')}% "
109
- f"({result_data.get('correct_count', '?')}/{result_data.get('total_attempted', '?')} correct)\n"
110
- f"Message: {result_data.get('message', 'No message received.')}"
 
111
  )
112
- print("Submission successful.")
113
- results_df = pd.DataFrame(results_log)
114
- return final_status, results_df
115
- except requests.exceptions.HTTPError as e:
116
- error_detail = f"Server responded with status {e.response.status_code}."
117
- try:
118
- error_json = e.response.json()
119
- error_detail += f" Detail: {error_json.get('detail', e.response.text)}"
120
- except requests.exceptions.JSONDecodeError:
121
- error_detail += f" Response: {e.response.text[:500]}"
122
- status_message = f"Submission Failed: {error_detail}"
123
- print(status_message)
124
- results_df = pd.DataFrame(results_log)
125
- return status_message, results_df
126
- except requests.exceptions.Timeout:
127
- status_message = "Submission Failed: The request timed out."
128
- print(status_message)
129
- results_df = pd.DataFrame(results_log)
130
- return status_message, results_df
131
  except requests.exceptions.RequestException as e:
132
- status_message = f"Submission Failed: Network error - {e}"
133
- print(status_message)
134
- results_df = pd.DataFrame(results_log)
135
- return status_message, results_df
136
  except Exception as e:
137
- status_message = f"An unexpected error occurred during submission: {e}"
138
- print(status_message)
139
- results_df = pd.DataFrame(results_log)
140
- return status_message, results_df
141
-
142
-
143
- # --- Build Gradio Interface using Blocks ---
144
- with gr.Blocks() as demo:
145
- gr.Markdown("# Basic Agent Evaluation Runner")
146
- gr.Markdown(
147
- """
148
- **Instructions:**
149
 
150
- 1. Please clone this space, then modify the code to define your agent's logic, the tools, the necessary packages, etc ...
151
- 2. Log in to your Hugging Face account using the button below. This uses your HF username for submission.
152
- 3. Click 'Run Evaluation & Submit All Answers' to fetch questions, run your agent, submit answers, and see the score.
153
 
154
- ---
155
- **Disclaimers:**
156
- Once clicking on the "submit button, it can take quite some time ( this is the time for the agent to go through all the questions).
157
- This space provides a basic setup and is intentionally sub-optimal to encourage you to develop your own, more robust solution. For instance for the delay process of the submit button, a solution could be to cache the answers and submit in a seperate action or even to answer the questions in async.
158
- """
159
- )
160
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
161
  gr.LoginButton()
162
-
163
- run_button = gr.Button("Run Evaluation & Submit All Answers")
164
-
165
- status_output = gr.Textbox(label="Run Status / Submission Result", lines=5, interactive=False)
166
- # Removed max_rows=10 from DataFrame constructor
167
- results_table = gr.DataFrame(label="Questions and Agent Answers", wrap=True)
168
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
169
  run_button.click(
170
  fn=run_and_submit_all,
171
  outputs=[status_output, results_table]
172
  )
 
 
 
 
 
 
 
 
 
 
 
173
 
174
- if __name__ == "__main__":
175
- print("\n" + "-"*30 + " App Starting " + "-"*30)
176
- # Check for SPACE_HOST and SPACE_ID at startup for information
177
- space_host_startup = os.getenv("SPACE_HOST")
178
- space_id_startup = os.getenv("SPACE_ID") # Get SPACE_ID at startup
179
-
180
- if space_host_startup:
181
- print(f"✅ SPACE_HOST found: {space_host_startup}")
182
- print(f" Runtime URL should be: https://{space_host_startup}.hf.space")
183
- else:
184
- print("ℹ️ SPACE_HOST environment variable not found (running locally?).")
185
-
186
- if space_id_startup: # Print repo URLs if SPACE_ID is found
187
- print(f"✅ SPACE_ID found: {space_id_startup}")
188
- print(f" Repo URL: https://huggingface.co/spaces/{space_id_startup}")
189
- print(f" Repo Tree URL: https://huggingface.co/spaces/{space_id_startup}/tree/main")
190
- else:
191
- print("ℹ️ SPACE_ID environment variable not found (running locally?). Repo URL cannot be determined.")
192
 
193
- print("-"*(60 + len(" App Starting ")) + "\n")
 
 
194
 
195
- print("Launching Gradio Interface for Basic Agent Evaluation...")
196
- demo.launch(debug=True, share=False)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ app.py - GAIA Benchmark Agent Application
3
+
4
+ This is the main application file that brings together:
5
+ 1. Tools from tools.py (web search, calculator, file analysis)
6
+ 2. RAG system from retriever.py (guest database)
7
+ 3. LLM integration with fallback options
8
+ 4. Agent workflow for handling GAIA questions
9
+ 5. Gradio interface for submission to the GAIA benchmark
10
+
11
+ The goal is to achieve 30%+ score on GAIA benchmark questions to earn the course certificate.
12
+
13
+ How it works:
14
+ 1. User logs in with HuggingFace account
15
+ 2. System fetches GAIA questions from the evaluation API
16
+ 3. Our agent processes each question using its tools
17
+ 4. Answers are submitted and scored
18
+ 5. Results are displayed with pass/fail status
19
+
20
+ Key design decisions:
21
+ - Modular architecture: tools and retriever in separate files
22
+ - Robust error handling: graceful failures with logging
23
+ - API key flexibility: OpenAI (best) or HuggingFace (fallback)
24
+ - GAIA-optimized: focused on accuracy over speed
25
+ """
26
+
27
  import os
28
  import gradio as gr
29
  import requests
 
30
  import pandas as pd
31
+ import asyncio
32
+ import logging
33
+ from typing import List, Dict, Any, Optional
34
 
35
+ # Setup comprehensive logging
36
+ logging.basicConfig(
37
+ level=logging.INFO,
38
+ format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
39
+ )
40
+ logger = logging.getLogger(__name__)
41
+
42
+ # ============================================================================
43
+ # CONSTANTS AND CONFIGURATION
44
+ # ============================================================================
45
+
46
+ # GAIA evaluation API endpoint
47
  DEFAULT_API_URL = "https://agents-course-unit4-scoring.hf.space"
48
 
49
+ # Required score to pass the course
50
+ PASSING_SCORE = 30 # 30% minimum to earn certificate
51
+
52
+ # ============================================================================
53
+ # LLM SETUP WITH FALLBACK OPTIONS
54
+ # ============================================================================
 
 
 
 
55
 
56
+ def create_llm():
57
  """
58
+ Create an LLM (Large Language Model) with fallback options.
59
+
60
+ Priority order:
61
+ 1. OpenAI GPT-4 (best performance for GAIA)
62
+ 2. HuggingFace Qwen model (free alternative)
63
+
64
+ Why this order:
65
+ - OpenAI models generally perform better on GAIA benchmark
66
+ - HuggingFace provides free alternative for those without OpenAI credits
67
+ - Fallback ensures the agent works regardless of available keys
68
+
69
+ API Keys Setup:
70
+ - Go to your HuggingFace Space settings
71
+ - Add "Repository secrets"
72
+ - Set OPENAI_API_KEY (recommended) and/or HF_TOKEN
73
+
74
+ Returns:
75
+ LLM: Configured language model ready for use
76
+
77
+ Raises:
78
+ RuntimeError: If no API keys are available
79
  """
80
+ logger.info("Initializing LLM with fallback options...")
81
+
82
+ # Try OpenAI first (recommended for GAIA performance)
83
+ openai_key = os.getenv("OPENAI_API_KEY")
84
+ if openai_key:
85
+ try:
86
+ from llama_index.llms.openai import OpenAI
87
+
88
+ llm = OpenAI(
89
+ api_key=openai_key,
90
+ model="gpt-4o-mini", # Good balance of cost and performance
91
+ max_tokens=1024, # Reasonable limit for GAIA answers
92
+ temperature=0.1 # Low temperature for more consistent, factual responses
93
+ )
94
+
95
+ logger.info("✅ Successfully initialized OpenAI LLM")
96
+ return llm
97
+
98
+ except ImportError:
99
+ logger.warning("❌ OpenAI library not available, trying HuggingFace...")
100
+ except Exception as e:
101
+ logger.warning(f"❌ OpenAI initialization failed: {e}, trying HuggingFace...")
102
  else:
103
+ logger.info("ℹ️ No OPENAI_API_KEY found, trying HuggingFace...")
104
+
105
+ # Fallback to HuggingFace
106
+ hf_token = os.getenv("HF_TOKEN")
107
+ if hf_token:
108
+ try:
109
+ from llama_index.llms.huggingface_api import HuggingFaceInferenceAPI
110
+
111
+ llm = HuggingFaceInferenceAPI(
112
+ model_name="Qwen/Qwen2.5-Coder-32B-Instruct", # Good open-source model
113
+ token=hf_token,
114
+ max_new_tokens=512, # Limit for response length
115
+ temperature=0.1, # Low temperature for consistency
116
+ context_window=8192 # Context window size
117
+ )
118
+
119
+ logger.info("✅ Successfully initialized HuggingFace LLM")
120
+ return llm
121
+
122
+ except ImportError:
123
+ logger.error("❌ HuggingFace library not available")
124
+ except Exception as e:
125
+ logger.error(f"❌ HuggingFace initialization failed: {e}")
126
+ else:
127
+ logger.info("ℹ️ No HF_TOKEN found")
128
+
129
+ # If we get here, no LLM could be initialized
130
+ error_msg = (
131
+ "No LLM could be initialized. Please set either:\n"
132
+ "- OPENAI_API_KEY (recommended for better GAIA performance)\n"
133
+ "- HF_TOKEN (free alternative)\n"
134
+ "In your HuggingFace Space settings → Repository secrets"
135
+ )
136
+ logger.error(error_msg)
137
+ raise RuntimeError(error_msg)
138
+
139
+
140
+ # ============================================================================
141
+ # GAIA AGENT CLASS - Main Agent Implementation
142
+ # ============================================================================
143
+
144
+ class GAIAAgent:
145
+ """
146
+ GAIA Benchmark Agent that combines course learning with benchmark capabilities.
147
+
148
+ This agent demonstrates:
149
+ 1. Multi-tool usage (web search, calculator, file analysis)
150
+ 2. RAG implementation (guest database from course)
151
+ 3. LLM integration with robust error handling
152
+ 4. GAIA-optimized prompting for accurate answers
153
+
154
+ The agent is designed to handle various types of GAIA questions:
155
+ - Factual questions requiring web search
156
+ - Mathematical problems requiring calculations
157
+ - Data analysis questions requiring file processing
158
+ - Questions about the guest database (demonstrating RAG)
159
+ """
160
+
161
+ def __init__(self):
162
+ """
163
+ Initialize the GAIA agent with LLM and tools.
164
+
165
+ This sets up:
166
+ 1. The language model (with fallback options)
167
+ 2. All available tools (web search, calculator, etc.)
168
+ 3. The agent workflow that orchestrates everything
169
+ """
170
+ logger.info("🚀 Initializing GAIA Agent...")
171
+
172
+ # Step 1: Initialize the LLM
173
+ try:
174
+ self.llm = create_llm()
175
+ logger.info("✅ LLM initialized successfully")
176
+ except Exception as e:
177
+ logger.error(f"❌ Failed to initialize LLM: {e}")
178
+ raise
179
+
180
+ # Step 2: Import and create tools
181
+ tools = []
182
+
183
+ # Import tools from our tools.py file
184
+ try:
185
+ from tools import create_all_tools
186
+ tool_list = create_all_tools()
187
+ tools.extend(tool_list)
188
+ logger.info(f"✅ Loaded {len(tool_list)} tools from tools.py")
189
+ except ImportError as e:
190
+ logger.error(f"❌ Could not import tools.py: {e}")
191
+ except Exception as e:
192
+ logger.warning(f"⚠️ Error loading tools from tools.py: {e}")
193
+
194
+ # Import RAG tool from our retriever.py file
195
+ try:
196
+ from retriever import create_persona_tool
197
+ persona_tool = create_persona_tool()
198
+ if persona_tool:
199
+ tools.append(persona_tool)
200
+ logger.info("✅ Loaded persona RAG tool from retriever.py")
201
+ except ImportError as e:
202
+ logger.error(f"❌ Could not import retriever.py: {e}")
203
+ except Exception as e:
204
+ logger.warning(f"⚠️ Error loading RAG tool from retriever.py: {e}")
205
+
206
+ # Check if we have any tools
207
+ if not tools:
208
+ error_msg = "❌ No tools available! Check tools.py and retriever.py"
209
+ logger.error(error_msg)
210
+ raise RuntimeError(error_msg)
211
+
212
+ logger.info(f"✅ Total tools available: {len(tools)}")
213
+ for tool in tools:
214
+ logger.info(f" - {tool.metadata.name}: {tool.metadata.description[:50]}...")
215
+
216
+ # Step 3: Create the agent workflow
217
+ try:
218
+ from llama_index.core.agent.workflow import AgentWorkflow
219
+
220
+ # Create the agent with a GAIA-optimized system prompt
221
+ self.agent = AgentWorkflow.from_tools_or_functions(
222
+ tools_or_functions=tools,
223
+ llm=self.llm,
224
+ system_prompt=self._create_system_prompt()
225
+ )
226
+
227
+ logger.info("✅ Agent workflow created successfully")
228
+
229
+ except ImportError as e:
230
+ error_msg = f"❌ Could not import AgentWorkflow: {e}"
231
+ logger.error(error_msg)
232
+ raise RuntimeError(error_msg)
233
+ except Exception as e:
234
+ error_msg = f"❌ Failed to create agent workflow: {e}"
235
+ logger.error(error_msg)
236
+ raise RuntimeError(error_msg)
237
+
238
+ logger.info("🎉 GAIA Agent initialization complete!")
239
+
240
+ def _create_system_prompt(self) -> str:
241
+ """
242
+ Create a system prompt optimized for GAIA benchmark performance.
243
+
244
+ The prompt is designed to:
245
+ 1. Encourage accuracy over creativity
246
+ 2. Guide proper tool usage
247
+ 3. Ensure concise, direct answers
248
+ 4. Handle various question types
249
+
250
+ Returns:
251
+ str: Optimized system prompt for GAIA questions
252
+ """
253
+ return """You are a helpful AI assistant specialized in answering questions accurately and concisely.
254
+
255
+ IMPORTANT - GAIA BENCHMARK GUIDELINES:
256
+ - Provide direct, factual answers without extra explanations
257
+ - Use your tools when you need specific information or calculations
258
+ - Be precise and accurate - exact matches are required for scoring
259
+ - If you're not certain about an answer, use available tools to verify
260
+
261
+ AVAILABLE TOOLS AND WHEN TO USE THEM:
262
+ 1. web_search: Use for current information, recent events, facts not in your training data
263
+ 2. calculator: Use for ANY mathematical calculations to ensure accuracy
264
+ 3. file_analyzer: Use when questions involve analyzing data files or documents
265
+ 4. persona_database: Use for questions about people, characteristics, interests, professions
266
+ (Database contains 5000 diverse personas with various backgrounds and interests)
267
+
268
+ RESPONSE GUIDELINES:
269
+ - Give direct answers without phrases like "Based on my search..." or "According to..."
270
+ - For numerical answers, provide just the number or value
271
+ - For factual questions, provide just the fact
272
+ - For yes/no questions, answer yes or no clearly
273
+ - Always use tools for calculations rather than doing math in your head
274
 
275
+ EXAMPLES:
276
+ Question: "What is 15% of 847?"
277
+ Good: Use calculator tool, then respond with just the number
278
+ Bad: Try to calculate mentally and risk errors
279
+
280
+ Question: "Who is the current president of France?"
281
+ Good: Use web search to get current information
282
+ Bad: Guess based on training data that might be outdated
283
+
284
+ Remember: Accuracy is more important than speed. Use your tools to ensure correct answers."""
285
+
286
+ def __call__(self, question: str) -> str:
287
+ """
288
+ Process a GAIA question and return an answer.
289
+
290
+ This is the main method that the evaluation system calls.
291
+ It handles the entire question-answering pipeline:
292
+ 1. Logs the incoming question
293
+ 2. Runs the agent workflow asynchronously
294
+ 3. Extracts and cleans the response
295
+ 4. Returns a properly formatted answer
296
+
297
+ Args:
298
+ question (str): The GAIA question to answer
299
+
300
+ Returns:
301
+ str: The agent's answer to the question
302
+ """
303
+ logger.info(f"📝 Processing GAIA question: {question[:100]}...")
304
+
305
+ try:
306
+ # Run the agent asynchronously
307
+ # GAIA questions can be complex and may require multiple tool calls
308
+ loop = asyncio.new_event_loop()
309
+ asyncio.set_event_loop(loop)
310
+
311
+ try:
312
+ # Execute the agent workflow
313
+ result = loop.run_until_complete(
314
+ self.agent.run(user_msg=question)
315
+ )
316
+
317
+ # Extract the response from the result object
318
+ answer = self._extract_response(result)
319
+
320
+ # Clean and format the answer for GAIA submission
321
+ cleaned_answer = self._clean_answer(answer)
322
+
323
+ logger.info(f"✅ Generated answer: {cleaned_answer[:100]}...")
324
+ return cleaned_answer
325
+
326
+ finally:
327
+ # Always close the event loop to prevent memory leaks
328
+ loop.close()
329
+
330
+ except Exception as e:
331
+ # If anything goes wrong, return a helpful error message
332
+ error_msg = f"I encountered an error processing this question: {str(e)}"
333
+ logger.error(f"❌ Error processing question: {e}")
334
+ return error_msg
335
+
336
+ def _extract_response(self, result: Any) -> str:
337
+ """
338
+ Extract the text response from the agent workflow result.
339
+
340
+ Agent workflows can return different types of objects.
341
+ This method handles various result formats robustly.
342
+
343
+ Args:
344
+ result: The result object from the agent workflow
345
+
346
+ Returns:
347
+ str: Extracted response text
348
+ """
349
+ # Try different ways to extract the response
350
+ if hasattr(result, 'response'):
351
+ return str(result.response)
352
+ elif hasattr(result, 'content'):
353
+ return str(result.content)
354
+ elif hasattr(result, 'message'):
355
+ if hasattr(result.message, 'content'):
356
+ return str(result.message.content)
357
+ else:
358
+ return str(result.message)
359
+ else:
360
+ # Fallback: convert whatever we got to string
361
+ return str(result)
362
+
363
+ def _clean_answer(self, answer: str) -> str:
364
+ """
365
+ Clean and format the answer for GAIA submission.
366
+
367
+ GAIA requires exact matches, so we need to:
368
+ 1. Remove common prefixes that agents add
369
+ 2. Strip whitespace
370
+ 3. Ensure clean, direct responses
371
+
372
+ Args:
373
+ answer (str): Raw answer from the agent
374
+
375
+ Returns:
376
+ str: Cleaned answer ready for submission
377
+ """
378
+ # Remove common agent response prefixes
379
+ prefixes_to_remove = [
380
+ "assistant:",
381
+ "Assistant:",
382
+ "Based on my search,",
383
+ "According to the search results,",
384
+ "The answer is:",
385
+ "Answer:"
386
+ ]
387
+
388
+ cleaned = answer.strip()
389
+
390
+ for prefix in prefixes_to_remove:
391
+ if cleaned.startswith(prefix):
392
+ cleaned = cleaned[len(prefix):].strip()
393
+
394
+ return cleaned
395
+
396
+
397
+ # ============================================================================
398
+ # EVALUATION AND SUBMISSION LOGIC
399
+ # ============================================================================
400
+
401
+ def run_and_submit_all(profile: gr.OAuthProfile | None) -> tuple[str, pd.DataFrame]:
402
+ """
403
+ Main function that handles the entire GAIA evaluation process.
404
+
405
+ This function:
406
+ 1. Validates user authentication
407
+ 2. Fetches questions from GAIA API
408
+ 3. Runs the agent on all questions
409
+ 4. Submits answers for scoring
410
+ 5. Returns results and status
411
+
412
+ Args:
413
+ profile: Gradio OAuth profile (None if not logged in)
414
+
415
+ Returns:
416
+ tuple: (status_message, results_dataframe)
417
+ """
418
+ # Step 1: Check authentication
419
+ if not profile:
420
+ logger.warning("❌ User not logged in")
421
+ return "Please log in to HuggingFace using the button above.", None
422
+
423
+ username = profile.username
424
+ logger.info(f"👤 User logged in: {username}")
425
+
426
+ # Step 2: Get space information for code link
427
+ space_id = os.getenv("SPACE_ID")
428
+ agent_code = f"https://huggingface.co/spaces/{space_id}/tree/main" if space_id else "No space ID available"
429
+
430
+ # Step 3: Set up API endpoints
431
  api_url = DEFAULT_API_URL
432
  questions_url = f"{api_url}/questions"
433
  submit_url = f"{api_url}/submit"
434
+
435
+ # Step 4: Initialize the agent
436
+ logger.info("🤖 Initializing GAIA Agent...")
437
  try:
438
+ agent = GAIAAgent()
439
+ logger.info("✅ GAIA Agent ready for evaluation")
440
  except Exception as e:
441
+ error_msg = f" Failed to initialize agent: {str(e)}"
442
+ logger.error(error_msg)
443
+ return error_msg, None
444
+
445
+ # Step 5: Fetch GAIA questions
446
+ logger.info(f"📥 Fetching questions from: {questions_url}")
 
 
447
  try:
448
  response = requests.get(questions_url, timeout=15)
449
  response.raise_for_status()
450
  questions_data = response.json()
451
+
452
  if not questions_data:
453
+ return " No questions received from GAIA API", None
454
+
455
+ logger.info(f"Fetched {len(questions_data)} GAIA questions")
456
+
457
  except requests.exceptions.RequestException as e:
458
+ error_msg = f" Network error fetching questions: {str(e)}"
459
+ logger.error(error_msg)
460
+ return error_msg, None
 
 
 
461
  except Exception as e:
462
+ error_msg = f" Error processing questions: {str(e)}"
463
+ logger.error(error_msg)
464
+ return error_msg, None
465
+
466
+ # Step 6: Process all questions
467
+ logger.info(f"🧠 Running agent on {len(questions_data)} questions...")
468
  results_log = []
469
  answers_payload = []
470
+
471
+ for i, item in enumerate(questions_data, 1):
472
  task_id = item.get("task_id")
473
  question_text = item.get("question")
474
+
475
  if not task_id or question_text is None:
476
+ logger.warning(f"⚠️ Skipping invalid question item: {item}")
477
  continue
478
+
479
+ logger.info(f"📝 Processing question {i}/{len(questions_data)}: {task_id}")
480
+
481
  try:
482
+ # Run the agent on this question
483
  submitted_answer = agent(question_text)
484
+
485
+ # Store for submission
486
+ answers_payload.append({
487
+ "task_id": task_id,
488
+ "submitted_answer": submitted_answer
489
+ })
490
+
491
+ # Store for display (truncated for readability)
492
+ results_log.append({
493
+ "Task ID": task_id,
494
+ "Question": question_text[:100] + "..." if len(question_text) > 100 else question_text,
495
+ "Answer": submitted_answer[:150] + "..." if len(submitted_answer) > 150 else submitted_answer
496
+ })
497
+
498
+ logger.info(f"✅ Question {i} completed")
499
+
500
  except Exception as e:
501
+ error_answer = f"ERROR: {str(e)}"
502
+ logger.error(f" Error on question {i}: {e}")
503
+
504
+ answers_payload.append({
505
+ "task_id": task_id,
506
+ "submitted_answer": error_answer
507
+ })
508
+
509
+ results_log.append({
510
+ "Task ID": task_id,
511
+ "Question": question_text[:100] + "..." if len(question_text) > 100 else question_text,
512
+ "Answer": error_answer
513
+ })
514
+
515
  if not answers_payload:
516
+ return " No answers generated for submission", pd.DataFrame(results_log)
517
+
518
+ # Step 7: Submit answers to GAIA API
519
+ logger.info(f"📤 Submitting {len(answers_payload)} answers...")
520
+ submission_data = {
521
+ "username": username.strip(),
522
+ "agent_code": agent_code,
523
+ "answers": answers_payload
524
+ }
525
+
526
  try:
527
  response = requests.post(submit_url, json=submission_data, timeout=60)
528
  response.raise_for_status()
529
  result_data = response.json()
530
+
531
+ # Extract results
532
+ score = result_data.get('score', 0)
533
+ correct_count = result_data.get('correct_count', 0)
534
+ total_attempted = result_data.get('total_attempted', len(answers_payload))
535
+
536
+ # Determine pass/fail status
537
+ passed = score >= PASSING_SCORE
538
+ status_emoji = "🎉" if passed else "📊"
539
+
540
+ # Create status message
541
  final_status = (
542
+ f"{status_emoji} GAIA Evaluation Results\n"
543
+ f"User: {username}\n"
544
+ f"Score: {score}% ({correct_count}/{total_attempted} correct)\n"
545
+ f"Required: {PASSING_SCORE}% to pass\n"
546
+ f"Status: {'✅ PASSED - Certificate Earned!' if passed else ' Not passed - Try again!'}\n"
547
+ f"Message: {result_data.get('message', 'Evaluation completed')}"
548
  )
549
+
550
+ logger.info(f"✅ Submission successful - Score: {score}%")
551
+ return final_status, pd.DataFrame(results_log)
552
+
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
553
  except requests.exceptions.RequestException as e:
554
+ error_msg = f"Submission failed: {str(e)}"
555
+ logger.error(error_msg)
556
+ return error_msg, pd.DataFrame(results_log)
 
557
  except Exception as e:
558
+ error_msg = f" Unexpected error during submission: {str(e)}"
559
+ logger.error(error_msg)
560
+ return error_msg, pd.DataFrame(results_log)
 
 
 
 
 
 
 
 
 
561
 
 
 
 
562
 
563
+ # ============================================================================
564
+ # GRADIO INTERFACE
565
+ # ============================================================================
 
 
 
566
 
567
+ # Create the Gradio interface
568
+ with gr.Blocks(title="GAIA Benchmark Agent") as demo:
569
+ # Header and instructions
570
+ gr.Markdown("# 🎯 GAIA Benchmark Agent - Course Final Project")
571
+
572
+ gr.Markdown("""
573
+ ## 🚀 Welcome to Your Final Challenge!
574
+
575
+ This agent combines everything you've learned in the course:
576
+ - **🔧 Multi-Tool Integration**: Web search, calculator, file analysis
577
+ - **📚 RAG Implementation**: Persona database with 5K diverse individuals
578
+ - **🤖 Agent Workflows**: LlamaIndex agent orchestration
579
+ - **🎯 GAIA Optimization**: Designed for benchmark performance
580
+
581
+ ### 📋 Setup Checklist:
582
+ 1. **🔑 API Keys**: Set `OPENAI_API_KEY` or `HF_TOKEN` in Space secrets
583
+ 2. **🔓 Public Space**: Keep your space public for verification
584
+ 3. **👤 Login**: Use the HuggingFace login button below
585
+ 4. **▶️ Run**: Click the evaluation button and wait for results
586
+
587
+ ### 🏆 Goal: Score 30%+ to earn your certificate!
588
+
589
+ ---
590
+ """)
591
+
592
+ # Login section
593
+ gr.Markdown("### Step 1: Login to HuggingFace")
594
  gr.LoginButton()
595
+
596
+ # Evaluation section
597
+ gr.Markdown("### Step 2: Run GAIA Evaluation")
598
+ gr.Markdown("⚠️ **Note**: This may take 5-10 minutes to complete all questions. Please be patient!")
599
+
600
+ run_button = gr.Button(
601
+ "🚀 Run GAIA Evaluation & Submit Results",
602
+ variant="primary",
603
+ size="lg"
604
+ )
605
+
606
+ # Results section
607
+ gr.Markdown("### Step 3: View Results")
608
+
609
+ status_output = gr.Textbox(
610
+ label="📊 Evaluation Status & Results",
611
+ lines=8,
612
+ interactive=False,
613
+ placeholder="Results will appear here after evaluation..."
614
+ )
615
+
616
+ results_table = gr.DataFrame(
617
+ label="📝 Question-by-Question Results",
618
+ wrap=True
619
+ )
620
+
621
+ # Wire up the interface
622
  run_button.click(
623
  fn=run_and_submit_all,
624
  outputs=[status_output, results_table]
625
  )
626
+
627
+ # Footer
628
+ gr.Markdown("""
629
+ ---
630
+ ### 🔧 Troubleshooting:
631
+ - **No API Key Error**: Add `OPENAI_API_KEY` or `HF_TOKEN` to your Space secrets
632
+ - **Import Errors**: Check that all dependencies are installed
633
+ - **Low Score**: GAIA requires exact answers - the agent uses tools for accuracy
634
+
635
+ ### 🏅 Good luck earning your certificate!
636
+ """)
637
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
638
 
639
+ # ============================================================================
640
+ # MAIN EXECUTION
641
+ # ============================================================================
642
 
643
+ if __name__ == "__main__":
644
+ print("\n" + "="*60)
645
+ print("🎯 GAIA BENCHMARK AGENT - Course Final Project")
646
+ print("="*60)
647
+
648
+ # Check environment setup
649
+ print("\n🔍 Environment Check:")
650
+
651
+ space_host = os.getenv("SPACE_HOST")
652
+ space_id = os.getenv("SPACE_ID")
653
+ openai_key = os.getenv("OPENAI_API_KEY")
654
+ hf_token = os.getenv("HF_TOKEN")
655
+
656
+ if space_host:
657
+ print(f"✅ SPACE_HOST: {space_host}")
658
+ if space_id:
659
+ print(f"✅ SPACE_ID: {space_id}")
660
+ if openai_key:
661
+ print("✅ OPENAI_API_KEY: Set")
662
+ if hf_token:
663
+ print("✅ HF_TOKEN: Set")
664
+
665
+ if not openai_key and not hf_token:
666
+ print("⚠️ WARNING: No API keys found!")
667
+ print(" Please set OPENAI_API_KEY or HF_TOKEN in Space secrets")
668
+
669
+ print(f"\n🎯 Target Score: {PASSING_SCORE}% (to earn certificate)")
670
+ print("🚀 Agent Features:")
671
+ print(" - Web Search (DuckDuckGo)")
672
+ print(" - Calculator (Math operations)")
673
+ print(" - Guest Database RAG (Course demo)")
674
+ print(" - File Analysis (Data processing)")
675
+
676
+ print("\n" + "="*60)
677
+ print("🌐 Launching Gradio Interface...")
678
+ print("="*60 + "\n")
679
+
680
+ # Launch the Gradio app
681
+ demo.launch(
682
+ debug=True,
683
+ share=False,
684
+ show_error=True
685
+ )
requirements.txt CHANGED
@@ -1,2 +1,157 @@
1
- gradio
2
- requests
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # ============================================================================
2
+ # GAIA Benchmark Agent - Requirements
3
+ # ============================================================================
4
+ # This file lists all the Python packages needed for the GAIA agent to work.
5
+ # Each section explains what the packages are used for.
6
+
7
+ # ============================================================================
8
+ # CORE INTERFACE AND API DEPENDENCIES
9
+ # ============================================================================
10
+ # These are essential for the app to run and communicate with GAIA API
11
+
12
+ gradio>=4.0.0
13
+ # Web interface for the agent - provides the UI where users interact
14
+ # Includes login functionality and result display
15
+
16
+ requests>=2.28.0
17
+ # For HTTP requests to the GAIA evaluation API
18
+ # Used to fetch questions and submit answers
19
+
20
+ pandas>=1.5.0
21
+ # Data manipulation and display of results in tables
22
+ # Used to show question-answer pairs in a nice format
23
+
24
+ # ============================================================================
25
+ # LLAMAINDEX CORE - The Foundation
26
+ # ============================================================================
27
+ # LlamaIndex is the main framework from the course
28
+
29
+ llama-index-core>=0.10.0
30
+ # Core LlamaIndex functionality - documents, nodes, retrievers, etc.
31
+ # This is the foundation that everything else builds on
32
+
33
+ # ============================================================================
34
+ # LLM (Language Model) INTEGRATIONS
35
+ # ============================================================================
36
+ # These allow us to use different LLMs with fallback options
37
+
38
+ llama-index-llms-openai
39
+ # OpenAI integration (GPT-4, GPT-3.5) - recommended for best GAIA performance
40
+ # Requires OPENAI_API_KEY in your Space secrets
41
+
42
+ llama-index-llms-huggingface-api
43
+ # HuggingFace Inference API integration - free alternative
44
+ # Uses models like Qwen/Qwen2.5-Coder-32B-Instruct
45
+ # Requires HF_TOKEN in your Space secrets
46
+
47
+ # ============================================================================
48
+ # AGENT WORKFLOW SYSTEM
49
+ # ============================================================================
50
+ # This enables the agent functionality from the course
51
+
52
+ llama-index-agent-workflow
53
+ # Agent workflow system - allows creating agents that can use multiple tools
54
+ # This is what orchestrates the web search, calculator, and RAG tools
55
+
56
+ # ============================================================================
57
+ # RETRIEVAL SYSTEMS (RAG) - Enhanced with Vector Embeddings
58
+ # ============================================================================
59
+ # These are for the advanced RAG (Retrieval-Augmented Generation) functionality
60
+
61
+ llama-index-retrievers-bm25
62
+ # BM25 retriever for keyword-based search (still useful as fallback)
63
+ # Great for finding exact matches and proper nouns
64
+
65
+ llama-index-embeddings-huggingface
66
+ # HuggingFace embedding models for semantic search
67
+ # Converts text to vectors that capture meaning and context
68
+ # Used with BAAI/bge-small-en-v1.5 model
69
+
70
+ llama-index-vector-stores-chroma
71
+ # ChromaDB vector store integration
72
+ # Provides persistent storage for vector embeddings
73
+ # Fast similarity search for semantic retrieval
74
+
75
+ chromadb>=0.4.0
76
+ # ChromaDB database for vector storage
77
+ # Self-contained vector database with no external dependencies
78
+ # Stores embeddings locally for fast retrieval
79
+
80
+ datasets>=2.0.0
81
+ # HuggingFace datasets library
82
+ # Used to load the finepersonas dataset
83
+ # Provides easy access to thousands of datasets
84
+
85
+ # ============================================================================
86
+ # TOOLS AND EXTERNAL SERVICES
87
+ # ============================================================================
88
+ # These packages enable the agent's tools
89
+
90
+ duckduckgo-search>=6.0.0
91
+ # Web search functionality using DuckDuckGo
92
+ # Essential for GAIA questions requiring current information
93
+ # Free alternative to Google Search API
94
+
95
+ # ============================================================================
96
+ # UTILITIES AND ENVIRONMENT
97
+ # ============================================================================
98
+ # Supporting packages for configuration and development
99
+
100
+ python-dotenv
101
+ # For loading environment variables from .env files
102
+ # Useful for local development and testing
103
+
104
+ nest-asyncio
105
+ # Allows running async code in environments that already have an event loop
106
+ # Required for running LlamaIndex query engines in Jupyter/Gradio
107
+ # Fixes "RuntimeError: This event loop is already running" errors
108
+
109
+ # ============================================================================
110
+ # OPTIONAL: ADDITIONAL USEFUL PACKAGES
111
+ # ============================================================================
112
+ # These might be helpful for specific GAIA questions but aren't required
113
+
114
+ # numpy
115
+ # For numerical computations if needed for advanced math questions
116
+
117
+ # matplotlib
118
+ # For creating charts/graphs if GAIA questions require visualizations
119
+
120
+ # beautifulsoup4
121
+ # For parsing HTML if web search results need detailed extraction
122
+
123
+ # ============================================================================
124
+ # DEVELOPMENT AND DEBUGGING (Optional)
125
+ # ============================================================================
126
+ # Uncomment these if you want enhanced debugging capabilities
127
+
128
+ # jupyter
129
+ # For interactive development and testing
130
+
131
+ # ipywidgets
132
+ # For enhanced Jupyter notebook widgets
133
+
134
+ # rich
135
+ # For beautiful terminal output and better logging
136
+
137
+ # ============================================================================
138
+ # INSTALLATION NOTES
139
+ # ============================================================================
140
+ #
141
+ # To install all dependencies:
142
+ # pip install -r requirements.txt
143
+ #
144
+ # For HuggingFace Spaces:
145
+ # - This file should be in your Space root directory
146
+ # - Dependencies are automatically installed when you deploy
147
+ # - No manual installation needed
148
+ #
149
+ # API Keys Setup:
150
+ # - Go to your Space Settings → Repository secrets
151
+ # - Add OPENAI_API_KEY (for OpenAI) or HF_TOKEN (for HuggingFace)
152
+ # - At least one is required for the agent to work
153
+ #
154
+ # Troubleshooting:
155
+ # - If imports fail, check that all packages installed correctly
156
+ # - Some packages may require specific versions for compatibility
157
+ # - Check the Space logs for detailed error messages
retriever.py ADDED
@@ -0,0 +1,526 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ retriever.py - Advanced RAG Implementation with Personas Database
3
+
4
+ This file implements an advanced RAG system using:
5
+ 1. Real dataset from HuggingFace (dvilasuero/finepersonas-v0.1-tiny)
6
+ 2. Vector embeddings for semantic search
7
+ 3. ChromaDB for persistent vector storage
8
+ 4. LlamaIndex IngestionPipeline for processing
9
+
10
+ This demonstrates advanced course concepts:
11
+ - Dataset integration from HuggingFace
12
+ - Vector embeddings vs keyword search
13
+ - Persistent storage with ChromaDB
14
+ - Ingestion pipelines for data processing
15
+
16
+ Why this approach:
17
+ - 5K personas provide rich, diverse data
18
+ - Vector embeddings capture semantic meaning
19
+ - ChromaDB provides fast, persistent storage
20
+ - More realistic than simple guest database
21
+
22
+ download_and_prepare_personas() # Download 5K personas
23
+ load_persona_documents() # Load into documents
24
+ create_persona_index() # Create vector index
25
+ get_persona_query_engine() # For tools.py to use
26
+ """
27
+
28
+ import logging
29
+ import os
30
+ from typing import List, Dict, Any
31
+ from pathlib import Path
32
+
33
+ # LlamaIndex core components
34
+ from llama_index.core.schema import Document
35
+ from llama_index.core.tools import FunctionTool, QueryEngineTool
36
+ from llama_index.core import SimpleDirectoryReader, VectorStoreIndex
37
+ from llama_index.core.node_parser import SentenceSplitter
38
+ from llama_index.core.ingestion import IngestionPipeline
39
+
40
+ # Embeddings and vector store
41
+ from llama_index.embeddings.huggingface import HuggingFaceEmbedding
42
+ from llama_index.vector_stores.chroma import ChromaVectorStore
43
+
44
+ # External libraries
45
+ from datasets import load_dataset
46
+ import chromadb
47
+
48
+ # Setup logging
49
+ logger = logging.getLogger(__name__)
50
+
51
+ # ============================================================================
52
+ # CONFIGURATION AND CONSTANTS
53
+ # ============================================================================
54
+
55
+ # Dataset configuration
56
+ DATASET_NAME = "dvilasuero/finepersonas-v0.1-tiny"
57
+ DATA_DIR = Path("data")
58
+ CHROMA_DB_PATH = "./alfred_chroma_db"
59
+ COLLECTION_NAME = "alfred"
60
+
61
+ # Embedding model - good balance of performance and speed
62
+ EMBEDDING_MODEL = "BAAI/bge-small-en-v1.5"
63
+
64
+ # Chunk size for text splitting - optimal for personas
65
+ CHUNK_SIZE = 1024
66
+ CHUNK_OVERLAP = 20
67
+
68
+ # ============================================================================
69
+ # DATA PREPARATION - Loading Personas from HuggingFace
70
+ # ============================================================================
71
+
72
+ def download_and_prepare_personas() -> int:
73
+ """
74
+ Download personas from HuggingFace and save as individual text files.
75
+
76
+ This approach demonstrates:
77
+ 1. Dataset integration from HuggingFace Hub
78
+ 2. Local file preparation for SimpleDirectoryReader
79
+ 3. Data persistence for repeated runs
80
+
81
+ Why save as files:
82
+ - SimpleDirectoryReader expects file-based input
83
+ - Allows for easy inspection and debugging
84
+ - Caches data locally to avoid repeated downloads
85
+ - Mimics real-world scenario where you have document files
86
+
87
+ Returns:
88
+ int: Number of persona files created
89
+ """
90
+ logger.info(f"Starting persona data preparation...")
91
+
92
+ # Create data directory if it doesn't exist
93
+ DATA_DIR.mkdir(parents=True, exist_ok=True)
94
+
95
+ # Check if we already have data (avoid re-downloading)
96
+ existing_files = list(DATA_DIR.glob("persona_*.txt"))
97
+ if existing_files:
98
+ logger.info(f"Found {len(existing_files)} existing persona files, skipping download")
99
+ return len(existing_files)
100
+
101
+ try:
102
+ # Load the dataset from HuggingFace
103
+ logger.info(f"Loading dataset: {DATASET_NAME}")
104
+ dataset = load_dataset(path=DATASET_NAME, split="train")
105
+ logger.info(f"Dataset loaded successfully with {len(dataset)} personas")
106
+
107
+ # Save each persona as a separate text file
108
+ personas_created = 0
109
+ for i, persona_data in enumerate(dataset):
110
+ persona_file = DATA_DIR / f"persona_{i}.txt"
111
+
112
+ # Extract the persona text
113
+ persona_text = persona_data["persona"]
114
+
115
+ # Add some metadata to make the persona more searchable
116
+ enhanced_text = f"Persona {i}:\n{persona_text}"
117
+
118
+ # Write to file
119
+ with open(persona_file, "w", encoding="utf-8") as f:
120
+ f.write(enhanced_text)
121
+
122
+ personas_created += 1
123
+
124
+ # Log progress for large datasets
125
+ if personas_created % 1000 == 0:
126
+ logger.info(f"Created {personas_created} persona files...")
127
+
128
+ logger.info(f"✅ Successfully created {personas_created} persona files")
129
+ return personas_created
130
+
131
+ except Exception as e:
132
+ logger.error(f"❌ Error downloading personas: {e}")
133
+ raise RuntimeError(f"Failed to download personas: {e}")
134
+
135
+
136
+ # ============================================================================
137
+ # DOCUMENT LOADING - Converting Files to LlamaIndex Documents
138
+ # ============================================================================
139
+
140
+ def load_persona_documents() -> List[Document]:
141
+ """
142
+ Load persona files into LlamaIndex Document objects.
143
+
144
+ This demonstrates:
145
+ 1. SimpleDirectoryReader usage for file loading
146
+ 2. Document object creation and metadata handling
147
+ 3. Error handling for file operations
148
+
149
+ Why SimpleDirectoryReader:
150
+ - Handles multiple file formats automatically
151
+ - Preserves file metadata (filename, path, etc.)
152
+ - Integrates seamlessly with LlamaIndex pipeline
153
+ - Scales well for large document collections
154
+
155
+ Returns:
156
+ List[Document]: List of loaded persona documents
157
+ """
158
+ logger.info("Loading persona documents...")
159
+
160
+ # Ensure we have persona data
161
+ if not DATA_DIR.exists() or not list(DATA_DIR.glob("persona_*.txt")):
162
+ logger.info("No persona files found, downloading...")
163
+ download_and_prepare_personas()
164
+
165
+ try:
166
+ # Use SimpleDirectoryReader to load all text files
167
+ reader = SimpleDirectoryReader(input_dir=str(DATA_DIR))
168
+ documents = reader.load_data()
169
+
170
+ logger.info(f"✅ Loaded {len(documents)} persona documents")
171
+
172
+ # Log some statistics about the documents
173
+ if documents:
174
+ total_chars = sum(len(doc.text) for doc in documents)
175
+ avg_chars = total_chars / len(documents)
176
+ logger.info(f"Average document length: {avg_chars:.0f} characters")
177
+
178
+ return documents
179
+
180
+ except Exception as e:
181
+ logger.error(f"❌ Error loading documents: {e}")
182
+ raise RuntimeError(f"Failed to load persona documents: {e}")
183
+
184
+
185
+ # ============================================================================
186
+ # VECTOR STORE SETUP - ChromaDB Configuration
187
+ # ============================================================================
188
+
189
+ def setup_chroma_vector_store():
190
+ """
191
+ Set up ChromaDB vector store for persistent storage.
192
+
193
+ This demonstrates:
194
+ 1. Persistent vector database configuration
195
+ 2. Collection management
196
+ 3. Integration with LlamaIndex vector stores
197
+
198
+ Why ChromaDB:
199
+ - Persistent storage (survives application restarts)
200
+ - Fast vector similarity search
201
+ - Easy integration with LlamaIndex
202
+ - Good for development and production
203
+ - No external dependencies (self-contained)
204
+
205
+ Returns:
206
+ ChromaVectorStore: Configured vector store ready for use
207
+ """
208
+ logger.info("Setting up ChromaDB vector store...")
209
+
210
+ try:
211
+ # Create persistent ChromaDB client
212
+ # This creates a local database that persists between runs
213
+ db = chromadb.PersistentClient(path=CHROMA_DB_PATH)
214
+ logger.info(f"ChromaDB client created at: {CHROMA_DB_PATH}")
215
+
216
+ # Get or create collection for our personas
217
+ # Collections are like tables in a traditional database
218
+ chroma_collection = db.get_or_create_collection(name=COLLECTION_NAME)
219
+ logger.info(f"Using collection: {COLLECTION_NAME}")
220
+
221
+ # Wrap ChromaDB collection in LlamaIndex vector store
222
+ vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
223
+
224
+ logger.info("✅ ChromaDB vector store configured successfully")
225
+ return vector_store
226
+
227
+ except Exception as e:
228
+ logger.error(f"❌ Error setting up ChromaDB: {e}")
229
+ raise RuntimeError(f"Failed to setup ChromaDB: {e}")
230
+
231
+
232
+ # ============================================================================
233
+ # INGESTION PIPELINE - Document Processing with Embeddings
234
+ # ============================================================================
235
+
236
+ def create_ingestion_pipeline(vector_store) -> IngestionPipeline:
237
+ """
238
+ Create an ingestion pipeline for processing persona documents.
239
+
240
+ This demonstrates:
241
+ 1. Text chunking with SentenceSplitter
242
+ 2. Embedding generation with HuggingFace models
243
+ 3. Pipeline composition for complex processing
244
+
245
+ The pipeline does:
246
+ 1. Split documents into smaller chunks (better for retrieval)
247
+ 2. Generate vector embeddings for each chunk
248
+ 3. Store embeddings in the vector database
249
+
250
+ Why this approach:
251
+ - Chunking improves retrieval precision
252
+ - Embeddings capture semantic meaning
253
+ - Pipeline caches results for efficiency
254
+ - Modular design allows easy modification
255
+
256
+ Args:
257
+ vector_store: ChromaDB vector store for persistence
258
+
259
+ Returns:
260
+ IngestionPipeline: Configured pipeline ready for document processing
261
+ """
262
+ logger.info("Creating ingestion pipeline...")
263
+
264
+ try:
265
+ # Create text splitter
266
+ # SentenceSplitter respects sentence boundaries for better coherence
267
+ text_splitter = SentenceSplitter(
268
+ chunk_size=CHUNK_SIZE, # Max characters per chunk
269
+ chunk_overlap=CHUNK_OVERLAP # Overlap to maintain context
270
+ )
271
+ logger.info(f"Text splitter configured: {CHUNK_SIZE} chars, {CHUNK_OVERLAP} overlap")
272
+
273
+ # Create embedding model
274
+ # This model converts text to numerical vectors that capture meaning
275
+ embed_model = HuggingFaceEmbedding(model_name=EMBEDDING_MODEL)
276
+ logger.info(f"Embedding model configured: {EMBEDDING_MODEL}")
277
+
278
+ # Create the ingestion pipeline
279
+ # This processes documents through the transformations in order
280
+ pipeline = IngestionPipeline(
281
+ transformations=[
282
+ text_splitter, # First: split into chunks
283
+ embed_model, # Second: create embeddings
284
+ ],
285
+ vector_store=vector_store # Third: store in database
286
+ )
287
+
288
+ logger.info("✅ Ingestion pipeline created successfully")
289
+ return pipeline
290
+
291
+ except Exception as e:
292
+ logger.error(f"❌ Error creating ingestion pipeline: {e}")
293
+ raise RuntimeError(f"Failed to create ingestion pipeline: {e}")
294
+
295
+
296
+ # ============================================================================
297
+ # INDEX CREATION - Vector Search Index
298
+ # ============================================================================
299
+
300
+ def create_persona_index():
301
+ """
302
+ Create or load the persona vector index.
303
+
304
+ This is the main function that orchestrates the entire RAG setup:
305
+ 1. Load documents from files
306
+ 2. Set up vector storage
307
+ 3. Process documents through pipeline
308
+ 4. Create searchable index
309
+
310
+ The index enables semantic search where:
311
+ - Similar meanings are found even with different words
312
+ - Context and relationships are preserved
313
+ - Fast retrieval from thousands of personas
314
+
315
+ Returns:
316
+ VectorStoreIndex: Ready-to-use search index
317
+ """
318
+ logger.info("Creating persona search index...")
319
+
320
+ try:
321
+ # Step 1: Load persona documents
322
+ documents = load_persona_documents()
323
+ if not documents:
324
+ raise RuntimeError("No documents loaded")
325
+
326
+ # Step 2: Set up vector store
327
+ vector_store = setup_chroma_vector_store()
328
+
329
+ # Step 3: Check if we already have processed data
330
+ # This saves time on repeated runs
331
+ try:
332
+ # Try to create index from existing vector store
333
+ embed_model = HuggingFaceEmbedding(model_name=EMBEDDING_MODEL)
334
+ existing_index = VectorStoreIndex.from_vector_store(
335
+ vector_store=vector_store,
336
+ embed_model=embed_model
337
+ )
338
+
339
+ # Test if the index has data
340
+ test_retriever = existing_index.as_retriever(similarity_top_k=1)
341
+ test_results = test_retriever.retrieve("test query")
342
+
343
+ if test_results:
344
+ logger.info("✅ Found existing persona index with data")
345
+ return existing_index
346
+ else:
347
+ logger.info("Existing index is empty, rebuilding...")
348
+
349
+ except Exception:
350
+ logger.info("No existing index found, creating new one...")
351
+
352
+ # Step 4: Process documents through ingestion pipeline
353
+ pipeline = create_ingestion_pipeline(vector_store)
354
+
355
+ logger.info(f"Processing {len(documents)} documents through pipeline...")
356
+ # This may take a while for large datasets as it generates embeddings
357
+ nodes = pipeline.run(documents=documents)
358
+ logger.info(f"✅ Processed {len(nodes)} document chunks")
359
+
360
+ # Step 5: Create the final index
361
+ embed_model = HuggingFaceEmbedding(model_name=EMBEDDING_MODEL)
362
+ index = VectorStoreIndex.from_vector_store(
363
+ vector_store=vector_store,
364
+ embed_model=embed_model
365
+ )
366
+
367
+ logger.info("✅ Persona index created successfully")
368
+ return index
369
+
370
+ except Exception as e:
371
+ logger.error(f"❌ Error creating persona index: {e}")
372
+ raise RuntimeError(f"Failed to create persona index: {e}")
373
+
374
+
375
+ # ============================================================================
376
+ # MAIN FUNCTIONS USED BY TOOLS.PY
377
+ # ============================================================================
378
+ # These are the core functions that tools.py uses to access the persona database.
379
+ # Tool creation is handled in tools.py following the course structure.
380
+
381
+ def get_persona_index():
382
+ """
383
+ Get the persona index for use by tools.py.
384
+
385
+ This is a simple wrapper function that tools.py can import and use.
386
+ It ensures the index is created and ready for use.
387
+
388
+ Returns:
389
+ VectorStoreIndex: The persona database index
390
+ """
391
+ return create_persona_index()
392
+
393
+
394
+ def get_persona_query_engine():
395
+ """
396
+ Get a configured query engine for the persona database.
397
+
398
+ This creates a query engine ready for use in QueryEngineTool.
399
+ Tools.py can import this to create the persona database tool.
400
+
401
+ Returns:
402
+ QueryEngine: Configured query engine for persona database
403
+ """
404
+ try:
405
+ # Get the index
406
+ index = create_persona_index()
407
+
408
+ # Configure embedding model (same as indexing)
409
+ embed_model = HuggingFaceEmbedding(model_name=EMBEDDING_MODEL)
410
+
411
+ # Create query engine with optimal settings
412
+ query_engine = index.as_query_engine(
413
+ response_mode="tree_summarize", # Good for combining multiple sources
414
+ similarity_top_k=5, # Retrieve top 5 most relevant personas
415
+ streaming=False # Disable streaming for stability
416
+ )
417
+
418
+ logger.info("✅ Persona query engine ready for tools.py")
419
+ return query_engine
420
+
421
+ except Exception as e:
422
+ logger.error(f"❌ Error creating query engine for tools.py: {e}")
423
+ raise
424
+
425
+
426
+ # ============================================================================
427
+ # TESTING AND DEBUGGING FUNCTIONS
428
+ # ============================================================================
429
+
430
+ def test_persona_system():
431
+ """
432
+ Test the persona system components available in retriever.py.
433
+ This helps verify that the database setup is working correctly.
434
+
435
+ Note: Tool creation testing is now in tools.py since that's where tools are created.
436
+ """
437
+ print("\n=== Testing Persona Database System ===")
438
+
439
+ # Test data preparation
440
+ print("\n--- Testing Data Preparation ---")
441
+ try:
442
+ count = download_and_prepare_personas()
443
+ print(f"✅ Data preparation successful: {count} personas")
444
+ except Exception as e:
445
+ print(f"❌ Data preparation failed: {e}")
446
+ return
447
+
448
+ # Test document loading
449
+ print("\n--- Testing Document Loading ---")
450
+ try:
451
+ docs = load_persona_documents()
452
+ print(f"✅ Document loading successful: {len(docs)} documents")
453
+ except Exception as e:
454
+ print(f"❌ Document loading failed: {e}")
455
+ return
456
+
457
+ # Test index creation
458
+ print("\n--- Testing Index Creation ---")
459
+ try:
460
+ index = create_persona_index()
461
+ print("✅ Index creation successful")
462
+ except Exception as e:
463
+ print(f"❌ Index creation failed: {e}")
464
+ return
465
+
466
+ # Test basic retrieval (without tool wrapper)
467
+ print("\n--- Testing Basic Retrieval ---")
468
+ test_queries = [
469
+ "writers and authors",
470
+ "people interested in travel",
471
+ "scientists and researchers"
472
+ ]
473
+
474
+ try:
475
+ retriever = index.as_retriever(similarity_top_k=2)
476
+
477
+ for query in test_queries:
478
+ print(f"\nQuery: {query}")
479
+ try:
480
+ results = retriever.retrieve(query)
481
+ if results:
482
+ print(f"✅ Found {len(results)} results")
483
+ print(f"Sample: {results[0].text[:100]}...")
484
+ else:
485
+ print("No results found")
486
+ except Exception as e:
487
+ print(f"❌ Query failed: {e}")
488
+
489
+ except Exception as e:
490
+ print(f"❌ Retriever creation failed: {e}")
491
+
492
+ # Test query engine creation (for tools.py)
493
+ print("\n--- Testing Query Engine Creation ---")
494
+ try:
495
+ query_engine = get_persona_query_engine()
496
+ print("✅ Query engine creation successful")
497
+ print(" (This query engine can be used by tools.py)")
498
+ except Exception as e:
499
+ print(f"❌ Query engine creation failed: {e}")
500
+
501
+ print("\n=== Database System Testing Complete ===")
502
+ print("\nNote: For tool testing, run tools.py or usage_example.py")
503
+
504
+
505
+ # ============================================================================
506
+ # MAIN EXECUTION
507
+ # ============================================================================
508
+
509
+ if __name__ == "__main__":
510
+ # If this file is run directly, run tests
511
+ print("Persona Database System Testing")
512
+ print("=" * 50)
513
+
514
+ # Set up logging for testing
515
+ logging.basicConfig(level=logging.INFO)
516
+
517
+ # Run database system tests
518
+ test_persona_system()
519
+
520
+ print("\n" + "=" * 50)
521
+ print("Database testing complete!")
522
+ print("\nFor tool testing, run:")
523
+ print(" python tools.py")
524
+ print(" python usage_example.py")
525
+ print("\nFor full agent testing, run:")
526
+ print(" python app.py")
tools.py ADDED
@@ -0,0 +1,656 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ tools.py - Agent Tools for GAIA Benchmark (Course Didactic Structure)
3
+
4
+ This file follows the course approach of separating:
5
+ 1. Raw functions (the actual functionality)
6
+ 2. Tool wrappers (FunctionTool and QueryEngineTool creation)
7
+
8
+ This makes it easier to understand and debug each component separately.
9
+ Each tool addresses specific GAIA benchmark needs while demonstrating course concepts.
10
+
11
+ create_persona_database_tool() # QueryEngineTool creation
12
+ get_all_tools() # All tools collection
13
+ """
14
+
15
+ import logging
16
+ import math
17
+ import os
18
+ import random
19
+ from typing import List
20
+ import chromadb
21
+
22
+ # LlamaIndex imports
23
+ from llama_index.core.tools import FunctionTool, QueryEngineTool
24
+ from llama_index.core import VectorStoreIndex
25
+ from llama_index.embeddings.huggingface import HuggingFaceEmbedding
26
+ from llama_index.vector_stores.chroma import ChromaVectorStore
27
+ from llama_index.llms.huggingface_api import HuggingFaceInferenceAPI
28
+
29
+ # Setup logging
30
+ logger = logging.getLogger(__name__)
31
+
32
+ # ============================================================================
33
+ # PART 1: RAW FUNCTIONS (The actual functionality)
34
+ # ============================================================================
35
+ # These are the core functions that do the actual work.
36
+ # They can be tested independently and are easy to understand.
37
+
38
+ def web_search(query: str) -> str:
39
+ """
40
+ Search the web for information using DuckDuckGo.
41
+
42
+ This function handles the actual web searching logic.
43
+ Critical for GAIA questions requiring current information.
44
+
45
+ Args:
46
+ query (str): The search query/question
47
+
48
+ Returns:
49
+ str: Formatted search results with titles, content, and URLs
50
+
51
+ Why this is essential for GAIA:
52
+ - Many GAIA questions need current information (news, prices, events)
53
+ - LLMs have knowledge cutoffs and may not know recent facts
54
+ - Web search provides access to the latest information
55
+ """
56
+ logger.info(f"🔍 Web search requested: {query}")
57
+
58
+ try:
59
+ # Import DuckDuckGo search - free search API
60
+ from duckduckgo_search import DDGS
61
+
62
+ # Perform the search with a reasonable limit
63
+ with DDGS() as ddgs:
64
+ # Get top 3 results to avoid overwhelming the LLM
65
+ results = list(ddgs.text(query, max_results=3))
66
+
67
+ if not results:
68
+ logger.warning("No search results found")
69
+ return "No search results found for this query."
70
+
71
+ # Format results in a clean, readable way
72
+ formatted_results = []
73
+ for i, result in enumerate(results, 1):
74
+ formatted_result = (
75
+ f"Result {i}:\n"
76
+ f"Title: {result['title']}\n"
77
+ f"Content: {result['body']}\n"
78
+ f"URL: {result['href']}\n"
79
+ )
80
+ formatted_results.append(formatted_result)
81
+
82
+ final_result = "\n".join(formatted_results)
83
+ logger.info(f"✅ Web search completed: {len(results)} results found")
84
+ return final_result
85
+
86
+ except ImportError:
87
+ error_msg = "DuckDuckGo search library not available. Please install duckduckgo-search."
88
+ logger.error(error_msg)
89
+ return error_msg
90
+ except Exception as e:
91
+ error_msg = f"Search error: {str(e)}"
92
+ logger.error(f"Web search failed: {e}")
93
+ return error_msg
94
+
95
+
96
+ def calculate(expression: str) -> str:
97
+ """
98
+ Safely evaluate mathematical expressions.
99
+
100
+ This function handles mathematical calculations with safety measures.
101
+ CRITICAL for GAIA because many questions involve precise calculations.
102
+
103
+ Args:
104
+ expression (str): Mathematical expression (e.g., "2 + 2", "sqrt(16)", "sin(pi/2)")
105
+
106
+ Returns:
107
+ str: The result of the calculation or an error message
108
+
109
+ Why this is essential for GAIA:
110
+ - GAIA has many mathematical questions (percentages, conversions, etc.)
111
+ - LLMs can make arithmetic errors, especially with complex math
112
+ - Exact numerical accuracy is required (GAIA uses exact match scoring)
113
+
114
+ Examples:
115
+ calculate("2 + 2") → "4"
116
+ calculate("15% of 847") → calculate("0.15 * 847") → "127.05"
117
+ calculate("sqrt(16)") → "4.0"
118
+ """
119
+ logger.info(f"🧮 Calculation requested: {expression}")
120
+
121
+ try:
122
+ # Create a safe environment for evaluation
123
+ # Only allow mathematical functions, no dangerous operations
124
+ allowed_names = {
125
+ # Include all math module functions (sin, cos, sqrt, log, etc.)
126
+ k: v for k, v in math.__dict__.items() if not k.startswith("__")
127
+ }
128
+
129
+ # Add safe Python functions
130
+ allowed_names.update({
131
+ "abs": abs, # Absolute value
132
+ "round": round, # Rounding
133
+ "min": min, # Minimum
134
+ "max": max, # Maximum
135
+ "sum": sum, # Sum of iterables
136
+ "pow": pow, # Power function
137
+ })
138
+
139
+ # Add mathematical constants
140
+ allowed_names.update({
141
+ "pi": math.pi, # π
142
+ "e": math.e, # Euler's number
143
+ })
144
+
145
+ # Evaluate the expression safely
146
+ # __builtins__ = {} prevents dangerous functions like open(), exec()
147
+ result = eval(expression, {"__builtins__": {}}, allowed_names)
148
+
149
+ result_str = str(result)
150
+ logger.info(f"✅ Calculation result: {expression} = {result_str}")
151
+ return result_str
152
+
153
+ except ZeroDivisionError:
154
+ error_msg = "Error: Division by zero"
155
+ logger.error(error_msg)
156
+ return error_msg
157
+ except ValueError as e:
158
+ error_msg = f"Error: Invalid mathematical operation - {str(e)}"
159
+ logger.error(error_msg)
160
+ return error_msg
161
+ except SyntaxError:
162
+ error_msg = "Error: Invalid mathematical expression syntax"
163
+ logger.error(error_msg)
164
+ return error_msg
165
+ except Exception as e:
166
+ error_msg = f"Calculation error: {str(e)}"
167
+ logger.error(f"Unexpected calculation error: {e}")
168
+ return error_msg
169
+
170
+
171
+ def analyze_file(file_content: str, file_type: str = "text") -> str:
172
+ """
173
+ Analyze file content and extract relevant information.
174
+
175
+ This function processes different file types for analysis.
176
+ Useful for GAIA questions that include file attachments.
177
+
178
+ Args:
179
+ file_content (str): The content of the file
180
+ file_type (str): Type of file ("text", "csv", "json", etc.)
181
+
182
+ Returns:
183
+ str: Analysis results or extracted information
184
+
185
+ Why this helps with GAIA:
186
+ - Some GAIA questions include data files to analyze
187
+ - Questions might ask for statistics, summaries, or specific data extraction
188
+ - File processing shows practical data analysis skills
189
+ """
190
+ logger.info(f"📊 File analysis requested for {file_type} file")
191
+
192
+ try:
193
+ if file_type.lower() == "csv":
194
+ # For CSV files, provide basic statistics
195
+ lines = file_content.strip().split('\n')
196
+ if not lines:
197
+ return "Empty file"
198
+
199
+ # Count rows and columns (assuming first row is header)
200
+ num_rows = len(lines) - 1 # Subtract header
201
+ if lines:
202
+ num_cols = len(lines[0].split(','))
203
+ analysis = (
204
+ f"CSV Analysis:\n"
205
+ f"- Rows: {num_rows}\n"
206
+ f"- Columns: {num_cols}\n"
207
+ f"- Headers: {lines[0]}"
208
+ )
209
+ if num_rows > 0:
210
+ analysis += f"\n- First data row: {lines[1] if len(lines) > 1 else 'None'}"
211
+ return analysis
212
+
213
+ elif file_type.lower() in ["txt", "text"]:
214
+ # For text files, provide basic statistics
215
+ lines = file_content.split('\n')
216
+ words = file_content.split()
217
+ chars = len(file_content)
218
+
219
+ return (
220
+ f"Text Analysis:\n"
221
+ f"- Lines: {len(lines)}\n"
222
+ f"- Words: {len(words)}\n"
223
+ f"- Characters: {chars}"
224
+ )
225
+
226
+ else:
227
+ # For other file types, return content with basic info
228
+ preview = file_content[:1000] + '...' if len(file_content) > 1000 else file_content
229
+ return f"File content ({file_type}):\n{preview}"
230
+
231
+ except Exception as e:
232
+ error_msg = f"File analysis error: {str(e)}"
233
+ logger.error(error_msg)
234
+ return error_msg
235
+
236
+
237
+ def get_weather(location: str) -> str:
238
+ """
239
+ Get dummy weather information for a location.
240
+
241
+ This is a simplified weather function for demonstration.
242
+ In a real implementation, you'd connect to a weather API like OpenWeatherMap.
243
+
244
+ Args:
245
+ location (str): City or location name
246
+
247
+ Returns:
248
+ str: Weather description with temperature
249
+
250
+ Note: This is a dummy implementation for course purposes.
251
+ Real weather data would require an API key and actual weather service.
252
+ """
253
+ logger.info(f"🌤️ Weather requested for: {location}")
254
+
255
+ # Dummy weather data for demonstration
256
+ weather_conditions = [
257
+ {"condition": "Sunny", "temp_c": 25, "humidity": 60},
258
+ {"condition": "Cloudy", "temp_c": 20, "humidity": 70},
259
+ {"condition": "Rainy", "temp_c": 15, "humidity": 85},
260
+ {"condition": "Windy", "temp_c": 22, "humidity": 55},
261
+ {"condition": "Clear", "temp_c": 28, "humidity": 45}
262
+ ]
263
+
264
+ # Randomly select weather (in real implementation, this would be API call)
265
+ weather = random.choice(weather_conditions)
266
+
267
+ result = (
268
+ f"Weather in {location.title()}:\n"
269
+ f"Condition: {weather['condition']}\n"
270
+ f"Temperature: {weather['temp_c']}°C\n"
271
+ f"Humidity: {weather['humidity']}%"
272
+ )
273
+
274
+ logger.info(f"✅ Weather result: {weather['condition']}, {weather['temp_c']}°C")
275
+ return result
276
+
277
+
278
+ # ============================================================================
279
+ # PART 2: PERSONA DATABASE SETUP (QueryEngine creation)
280
+ # ============================================================================
281
+ # This sets up the persona database query engine following the course pattern.
282
+
283
+ def create_persona_query_engine():
284
+ """
285
+ Create a query engine for the persona database following course pattern.
286
+
287
+ This demonstrates the exact approach from the course:
288
+ 1. Connect to existing ChromaDB database
289
+ 2. Create VectorStoreIndex from the stored vectors
290
+ 3. Configure LLM for response generation
291
+ 4. Create QueryEngine with specific settings
292
+
293
+ Returns:
294
+ QueryEngine: Ready-to-use query engine for persona database
295
+
296
+ Why QueryEngine vs simple retrieval:
297
+ - QueryEngine combines retrieval + LLM generation
298
+ - Provides natural, conversational responses
299
+ - Can synthesize information from multiple personas
300
+ - Better for complex questions requiring reasoning
301
+ """
302
+ logger.info("🏗️ Creating persona database query engine...")
303
+
304
+ try:
305
+ # Step 1: Connect to existing ChromaDB (created by retriever.py)
306
+ db = chromadb.PersistentClient(path="./alfred_chroma_db")
307
+ chroma_collection = db.get_or_create_collection("alfred")
308
+ vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
309
+ logger.info("✅ Connected to ChromaDB")
310
+
311
+ # Step 2: Set up embedding model (same as used during indexing)
312
+ embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")
313
+ logger.info("✅ Embedding model configured")
314
+
315
+ # Step 3: Create VectorStoreIndex from existing data
316
+ index = VectorStoreIndex.from_vector_store(
317
+ vector_store=vector_store,
318
+ embed_model=embed_model
319
+ )
320
+ logger.info("✅ Vector index created")
321
+
322
+ # Step 4: Configure LLM for response generation
323
+ # Try to get LLM from settings first, then fallback
324
+ try:
325
+ from llama_index.core import Settings
326
+ llm = Settings.llm
327
+
328
+ if llm is None:
329
+ # Fallback to HuggingFace LLM
330
+ hf_token = os.getenv("HF_TOKEN")
331
+ if hf_token:
332
+ llm = HuggingFaceInferenceAPI(
333
+ model_name="Qwen/Qwen2.5-Coder-32B-Instruct",
334
+ token=hf_token,
335
+ max_new_tokens=512,
336
+ temperature=0.1
337
+ )
338
+ logger.info("✅ Using HuggingFace LLM")
339
+ else:
340
+ logger.warning("⚠️ No LLM available, query engine will use default")
341
+ llm = None
342
+ except Exception:
343
+ logger.warning("⚠️ Could not configure LLM, using default")
344
+ llm = None
345
+
346
+ # Step 5: Create QueryEngine with optimized settings
347
+ query_engine = index.as_query_engine(
348
+ llm=llm,
349
+ response_mode="tree_summarize", # Good for combining multiple sources
350
+ similarity_top_k=5, # Retrieve top 5 most relevant personas
351
+ streaming=False # Disable streaming for stability
352
+ )
353
+
354
+ logger.info("✅ Persona query engine created successfully")
355
+ return query_engine
356
+
357
+ except Exception as e:
358
+ logger.error(f"❌ Error creating persona query engine: {e}")
359
+ raise RuntimeError(f"Failed to create persona query engine: {e}")
360
+
361
+
362
+ # ============================================================================
363
+ # PART 3: TOOL WRAPPERS (Converting functions to tools)
364
+ # ============================================================================
365
+ # This section creates the actual tools that the agent can use.
366
+ # Each tool wraps a function with metadata for the LLM to understand.
367
+
368
+ # Web Search Tool
369
+ web_search_tool = FunctionTool.from_defaults(
370
+ fn=web_search,
371
+ name="web_search",
372
+ description=(
373
+ "Search the web for current information, recent events, statistics, "
374
+ "facts, or any information not in the LLM's training data. "
375
+ "Use this when you need up-to-date or specific factual information. "
376
+ "Essential for GAIA questions about current events, prices, or recent developments."
377
+ )
378
+ )
379
+
380
+ # Calculator Tool
381
+ calculator_tool = FunctionTool.from_defaults(
382
+ fn=calculate,
383
+ name="calculator",
384
+ description=(
385
+ "Perform mathematical calculations and evaluate mathematical expressions. "
386
+ "Supports basic arithmetic (+, -, *, /), advanced math functions (sqrt, sin, cos, log), "
387
+ "and mathematical constants (pi, e). Use this for any numerical computations, "
388
+ "percentage calculations, unit conversions, or statistical operations. "
389
+ "CRITICAL for GAIA mathematical questions to ensure accuracy."
390
+ )
391
+ )
392
+
393
+ # File Analysis Tool
394
+ file_analysis_tool = FunctionTool.from_defaults(
395
+ fn=analyze_file,
396
+ name="file_analyzer",
397
+ description=(
398
+ "Analyze file contents including CSV files, text files, and other data files. "
399
+ "Can extract statistics, summarize content, and process structured data. "
400
+ "Use this when GAIA questions involve analyzing attached files or datasets."
401
+ )
402
+ )
403
+
404
+ # Weather Tool (demonstration)
405
+ weather_tool = FunctionTool.from_defaults(
406
+ fn=get_weather,
407
+ name="weather_tool",
408
+ description=(
409
+ "Get weather information for a specific location. "
410
+ "Note: This is a demo implementation with dummy data. "
411
+ "Use when questions ask about weather conditions."
412
+ )
413
+ )
414
+
415
+ # Persona Database Query Engine Tool
416
+ def create_persona_database_tool():
417
+ """
418
+ Create the persona database tool using QueryEngineTool.
419
+
420
+ This follows the exact course pattern for creating QueryEngineTool.
421
+ The tool combines retrieval with LLM generation for natural responses.
422
+
423
+ Returns:
424
+ QueryEngineTool: Tool for querying the persona database
425
+ """
426
+ logger.info("🛠️ Creating persona database tool...")
427
+
428
+ try:
429
+ # First ensure we have the persona data (this will create it if needed)
430
+ try:
431
+ from retriever import create_persona_index
432
+ # This creates the index if it doesn't exist
433
+ create_persona_index()
434
+ logger.info("✅ Persona index ready")
435
+ except Exception as e:
436
+ logger.warning(f"⚠️ Could not ensure persona index: {e}")
437
+
438
+ # Create the query engine
439
+ query_engine = create_persona_query_engine()
440
+
441
+ # Create the QueryEngineTool following course pattern
442
+ persona_tool = QueryEngineTool.from_defaults(
443
+ query_engine=query_engine,
444
+ name="persona_database",
445
+ description=(
446
+ "Search and query a database of 5000 diverse personas with various backgrounds, "
447
+ "interests, and professions. Use this to find people with specific characteristics, "
448
+ "skills, or interests. Can answer questions like 'find writers', 'who likes travel', "
449
+ "'scientists in the group', 'creative professionals', or 'people interested in technology'. "
450
+ "Returns detailed information about matching personas with their backgrounds and interests."
451
+ )
452
+ )
453
+
454
+ logger.info("✅ Persona database tool created successfully")
455
+ return persona_tool
456
+
457
+ except Exception as e:
458
+ logger.error(f"❌ Error creating persona database tool: {e}")
459
+ # Return None so the agent can still work without this tool
460
+ return None
461
+
462
+
463
+ # ============================================================================
464
+ # PART 4: TOOL COLLECTION (Getting all tools together)
465
+ # ============================================================================
466
+
467
+ def get_all_tools() -> List:
468
+ """
469
+ Get all available tools for the GAIA agent.
470
+
471
+ This function collects all tools and handles any creation errors gracefully.
472
+ The agent will work with whatever tools are successfully created.
473
+
474
+ Returns:
475
+ List: All successfully created tools
476
+ """
477
+ logger.info("🔧 Collecting all tools...")
478
+
479
+ tools = []
480
+
481
+ # Add function-based tools (these should always work)
482
+ try:
483
+ tools.extend([
484
+ web_search_tool,
485
+ calculator_tool,
486
+ file_analysis_tool,
487
+ weather_tool
488
+ ])
489
+ logger.info(f"✅ Added {len(tools)} function-based tools")
490
+ except Exception as e:
491
+ logger.error(f"❌ Error adding function tools: {e}")
492
+
493
+ # Add persona database tool (this might fail if database isn't ready)
494
+ try:
495
+ persona_tool = create_persona_database_tool()
496
+ if persona_tool:
497
+ tools.append(persona_tool)
498
+ logger.info("✅ Added persona database tool")
499
+ else:
500
+ logger.warning("⚠️ Persona database tool not available")
501
+ except Exception as e:
502
+ logger.warning(f"⚠️ Could not create persona database tool: {e}")
503
+
504
+ logger.info(f"🎯 Total tools available: {len(tools)}")
505
+ for tool in tools:
506
+ tool_name = getattr(tool.metadata, 'name', 'Unknown')
507
+ logger.info(f" - {tool_name}")
508
+
509
+ return tools
510
+
511
+
512
+ # ============================================================================
513
+ # PART 5: TESTING FUNCTIONS (For development and debugging)
514
+ # ============================================================================
515
+
516
+ def test_individual_functions():
517
+ """
518
+ Test each function individually to make sure they work.
519
+ This helps with debugging and understanding what each function does.
520
+ """
521
+ print("\n=== Testing Individual Functions ===")
522
+
523
+ # Test web search
524
+ print("\n--- Testing Web Search Function ---")
525
+ try:
526
+ result = web_search("current year")
527
+ print(f"Web search result: {result[:150]}...")
528
+ print("✅ Web search function works")
529
+ except Exception as e:
530
+ print(f"❌ Web search failed: {e}")
531
+
532
+ # Test calculator
533
+ print("\n--- Testing Calculator Function ---")
534
+ try:
535
+ result = calculate("2 + 2 * 3")
536
+ print(f"Calculator result (2 + 2 * 3): {result}")
537
+ result = calculate("sqrt(16)")
538
+ print(f"Calculator result (sqrt(16)): {result}")
539
+ print("✅ Calculator function works")
540
+ except Exception as e:
541
+ print(f"❌ Calculator failed: {e}")
542
+
543
+ # Test file analyzer
544
+ print("\n--- Testing File Analysis Function ---")
545
+ try:
546
+ sample_csv = "name,age,city\nJohn,25,NYC\nJane,30,LA\nBob,35,SF"
547
+ result = analyze_file(sample_csv, "csv")
548
+ print(f"File analysis result: {result}")
549
+ print("✅ File analysis function works")
550
+ except Exception as e:
551
+ print(f"❌ File analysis failed: {e}")
552
+
553
+ # Test weather
554
+ print("\n--- Testing Weather Function ---")
555
+ try:
556
+ result = get_weather("Paris")
557
+ print(f"Weather result: {result}")
558
+ print("✅ Weather function works")
559
+ except Exception as e:
560
+ print(f"❌ Weather failed: {e}")
561
+
562
+
563
+ def test_tool_creation():
564
+ """
565
+ Test that all tools can be created successfully.
566
+ """
567
+ print("\n=== Testing Tool Creation ===")
568
+
569
+ try:
570
+ tools = get_all_tools()
571
+ print(f"✅ Successfully created {len(tools)} tools")
572
+
573
+ for tool in tools:
574
+ tool_name = getattr(tool.metadata, 'name', 'Unknown')
575
+ tool_desc = getattr(tool.metadata, 'description', 'No description')[:100]
576
+ print(f" - {tool_name}: {tool_desc}...")
577
+
578
+ except Exception as e:
579
+ print(f"❌ Tool creation failed: {e}")
580
+
581
+
582
+ def test_tool_functionality():
583
+ """
584
+ Test that tools can actually be called and return results.
585
+ """
586
+ print("\n=== Testing Tool Functionality ===")
587
+
588
+ tools = get_all_tools()
589
+
590
+ for tool in tools:
591
+ tool_name = getattr(tool.metadata, 'name', 'Unknown')
592
+ print(f"\n--- Testing {tool_name} ---")
593
+
594
+ try:
595
+ if tool_name == "calculator":
596
+ # Test calculator tool
597
+ result = tool.func("5 * 8")
598
+ print(f"Calculator test (5 * 8): {result}")
599
+
600
+ elif tool_name == "web_search":
601
+ # Test web search (might be slow)
602
+ print("Testing web search (this might take a moment)...")
603
+ result = tool.func("Python programming")
604
+ print(f"Web search test: {result[:100]}...")
605
+
606
+ elif tool_name == "file_analyzer":
607
+ # Test file analyzer
608
+ test_data = "col1,col2\nval1,val2\nval3,val4"
609
+ result = tool.func(test_data, "csv")
610
+ print(f"File analyzer test: {result}")
611
+
612
+ elif tool_name == "weather_tool":
613
+ # Test weather tool
614
+ result = tool.func("London")
615
+ print(f"Weather test: {result}")
616
+
617
+ elif tool_name == "persona_database":
618
+ # Test persona database (might be slow on first run)
619
+ print("Testing persona database (this might take a moment)...")
620
+ # This would be an async call in real usage
621
+ print("Persona database test skipped (requires async)")
622
+
623
+ print(f"✅ {tool_name} test completed")
624
+
625
+ except Exception as e:
626
+ print(f"❌ {tool_name} test failed: {e}")
627
+
628
+
629
+ # ============================================================================
630
+ # MAIN EXECUTION (For testing when file is run directly)
631
+ # ============================================================================
632
+
633
+ if __name__ == "__main__":
634
+ print("GAIA Agent Tools Testing")
635
+ print("=" * 50)
636
+
637
+ # Set up logging for testing
638
+ logging.basicConfig(level=logging.INFO)
639
+
640
+ # Test individual functions first
641
+ test_individual_functions()
642
+
643
+ # Test tool creation
644
+ test_tool_creation()
645
+
646
+ # Test tool functionality (optional - can be slow)
647
+ response = input("\nRun tool functionality tests? (y/n): ")
648
+ if response.lower() == 'y':
649
+ test_tool_functionality()
650
+ else:
651
+ print("Skipping functionality tests")
652
+
653
+ print("\n=== Tools Testing Complete ===")
654
+ print("\nTo use these tools in your agent:")
655
+ print("from tools import get_all_tools")
656
+ print("tools = get_all_tools()")