Isateles commited on
Commit
394d24e
·
1 Parent(s): 3120256

Update GAIA agent-fixes and test code

Browse files
README.md CHANGED
@@ -34,6 +34,123 @@ My agent uses:
34
 
35
  ## 🔧 Tools Implemented
36
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
37
  1. **Web Search** (`web_search`): Uses DuckDuckGo to find current information
38
  2. **Calculator** (`calculator`): Handles math, percentages, and word problems
39
  3. **File Analyzer** (`file_analyzer`): Analyzes CSV and text files
 
34
 
35
  ## 🔧 Tools Implemented
36
 
37
+ 1. **Web Search** (`web_search`): Uses Google Search (with DuckDuckGo fallback)
38
+ 2. **Calculator** (`calculator`): Handles math, percentages, and word problems
39
+ 3. **File Analyzer** (`file_analyzer`): Analyzes CSV and text files
40
+ 4. **Weather** (`weather`): Real weather data using OpenWeather API
41
+ 5. **Persona Database** (`persona_database`): RAG system for finding personas
42
+
43
+ ## 💡 Key Insights
44
+
45
+ The biggest challenge was understanding that the course evaluation uses **exact match** on clean answers. The GAIA prompt helps the agent reason well, but I needed to extract just the answer part (without "FINAL ANSWER:") for submission.
46
+
47
+ ### Smart Agent Strategy:
48
+ - **Knowledge First**: The agent tries to answer from its extensive knowledge (up to January 2025)
49
+ - **Search When Needed**: Only searches for current info, verification, or when explicitly asked
50
+ - **Google Priority**: Uses Google Custom Search first (most reliable in HF Spaces)
51
+ - **DuckDuckGo Fallback**: Multiple methods to ensure search works even if one fails
52
+ - **Clean Answers**: Extracts exactly what GAIA expects (no units, articles, or formatting)
53
+
54
+ ## 🚀 Features
55
+
56
+ - Clean answer extraction aligned with GAIA scoring rules
57
+ - Handles numbers without commas/units as required
58
+ - Properly formats lists and yes/no answers
59
+ - RAG integration for persona queries
60
+ - Real weather data when API key is available
61
+ - Fallback mechanisms for robustness
62
+
63
+ ## 📋 Requirements
64
+
65
+ All dependencies are in `requirements.txt`. The key ones are:
66
+ - LlamaIndex (core framework)
67
+ - Gradio (web interface)
68
+ - ChromaDB (vector storage)
69
+ - DuckDuckGo Search (web tool)
70
+
71
+ ## 🔑 API Keys Needed
72
+
73
+ Add these to your HuggingFace Space secrets:
74
+ - `GROQ_API_KEY` (recommended - fast and free)
75
+ - `ANTHROPIC_API_KEY` or `CLAUDE_API_KEY` (best performance)
76
+ - `TOGETHER_API_KEY` (good alternative)
77
+ - `HF_TOKEN` (free fallback)
78
+ - `OPENAI_API_KEY` (if you have credits)
79
+
80
+ ### For Web Search:
81
+ - `GOOGLE_API_KEY` (required for web search)
82
+ - Your Google Custom Search Engine ID is already configured: `746382dd3c2bd4135`
83
+ - Google Search is prioritized first, then DuckDuckGo as fallback
84
+ - If you see "quota exceeded", check your Google Cloud Console usage
85
+
86
+ ### Optional:
87
+ - `OPENWEATHER_API_KEY` (for real weather data)
88
+
89
+ ## 🔍 Troubleshooting Web Search
90
+
91
+ If Google Search isn't working:
92
+ 1. Check your API key is correct in HF Secrets
93
+ 2. Verify the Custom Search API is enabled in Google Cloud Console
94
+ 3. Check your quota hasn't been exceeded (300 queries/day free tier)
95
+ 4. The CSE ID `746382dd3c2bd4135` should work, but you can override with `GOOGLE_CSE_ID` env var
96
+
97
+ If all web search fails, the agent will use its knowledge base (up to Jan 2025).
98
+
99
+ ## 📊 Expected Performance
100
+
101
+ Based on my testing and understanding of GAIA:
102
+ - Math questions: Should score well with the calculator tool
103
+ - Factual questions: Web search helps find current information
104
+ - Data questions: File analyzer handles CSV analysis
105
+ - Simple logic: GAIA prompt guides proper reasoning
106
+
107
+ Target: 30%+ to pass the course!
108
+
109
+ ## 🛠️ How It Works
110
+
111
+ 1. **Question Processing**: Agent receives a GAIA question
112
+ 2. **Tool Selection**: Uses the right tools based on the question
113
+ 3. **Reasoning**: Follows GAIA prompt to think through the problem
114
+ 4. **Answer Extraction**: Extracts clean answer for exact match
115
+ 5. **Submission**: Sends properly formatted answer to evaluation
116
+
117
+ ## 📝 Course Learnings Applied
118
+
119
+ - **Agent Architecture**: Using AgentWorkflow as taught in the course
120
+ - **Tool Integration**: Each tool has a clear purpose and description
121
+ - **RAG System**: Persona database shows RAG implementation
122
+ - **Prompt Engineering**: GAIA prompt for structured reasoning
123
+ - **Error Handling**: Graceful fallbacks instead of crashes
124
+
125
+ ## 🎯 Goal
126
+
127
+ Pass the GAIA evaluation with 30%+ score by applying everything learned in the AI Agents course!
128
+
129
+ ---
130
+
131
+ *This project demonstrates practical application of agent concepts, tool integration, RAG systems, and prompt engineering as taught in the course.*
132
+
133
+ This is my submission for the AI Agents course final project. I've built a RAG agent to tackle the GAIA benchmark using everything we learned in the course!
134
+
135
+ ## 🎓 What I Learned & Applied
136
+
137
+ Throughout this course, I learned about:
138
+ - Building agents with LlamaIndex AgentWorkflow
139
+ - Creating and integrating tools (web search, calculator, file analysis)
140
+ - Implementing RAG systems with vector databases
141
+ - Proper prompting techniques for agent systems
142
+ - Working with multiple LLM providers
143
+
144
+ ## 🏗️ Architecture
145
+
146
+ My agent uses:
147
+ - **LlamaIndex AgentWorkflow**: For orchestrating the agent's reasoning
148
+ - **Multiple LLMs**: Supports Claude, Groq, Together AI, HuggingFace, and OpenAI
149
+ - **ChromaDB**: For the persona RAG database
150
+ - **GAIA System Prompt**: To ensure proper reasoning and answer formatting
151
+
152
+ ## 🔧 Tools Implemented
153
+
154
  1. **Web Search** (`web_search`): Uses DuckDuckGo to find current information
155
  2. **Calculator** (`calculator`): Handles math, percentages, and word problems
156
  3. **File Analyzer** (`file_analyzer`): Analyzes CSV and text files
__pycache__/app.cpython-312.pyc CHANGED
Binary files a/__pycache__/app.cpython-312.pyc and b/__pycache__/app.cpython-312.pyc differ
 
__pycache__/tools.cpython-312.pyc ADDED
Binary file (26.2 kB). View file
 
app.py CHANGED
@@ -12,151 +12,162 @@ import logging
12
  import re
13
  import string
14
  from typing import List, Dict, Any, Optional
 
 
15
 
16
  # Logging setup
17
- logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
 
 
 
 
18
  logger = logging.getLogger(__name__)
19
 
20
  # Constants
21
  GAIA_API_URL = "https://agents-course-unit4-scoring.hf.space"
22
  PASSING_SCORE = 30
23
 
24
- # GAIA System Prompt - for internal reasoning
25
- GAIA_SYSTEM_PROMPT = """You are a general AI assistant. I will ask you a question. Report your thoughts, and finish your answer with the following template: FINAL ANSWER: [YOUR FINAL ANSWER]. YOUR FINAL ANSWER should be a number OR as few words as possible OR a comma separated list of numbers and/or strings. If you are asked for a number, don't use comma to write your number neither use units such as $ or percent sign unless specified otherwise. If you are asked for a string, don't use articles, neither abbreviations (e.g. for cities), and write the digits in plain text unless specified otherwise. If you are asked for a comma separated list, apply the above rules depending of whether the element to be put in the list is a number or a string."""
 
 
 
 
 
 
 
 
26
 
27
  def setup_llm():
28
  """Initialize the best available LLM"""
29
 
30
- # Priority: Claude > Groq > Together > HF > OpenAI
31
-
32
- # if api_key := (os.getenv("ANTHROPIC_API_KEY") or os.getenv("CLAUDE_API_KEY")):
33
- # try:
34
- # from llama_index.llms.anthropic import Anthropic
35
- # llm = Anthropic(
36
- # api_key=api_key,
37
- # model="claude-3-5-sonnet-20241022",
38
- # temperature=0.0,
39
- # max_tokens=2048
40
- # )
41
- # logger.info("✅ Using Claude 3.5 Sonnet")
42
- # return llm
43
- # except Exception as e:
44
- # logger.warning(f"Claude setup failed: {e}")
45
-
46
  if api_key := os.getenv("GROQ_API_KEY"):
47
  try:
48
  from llama_index.llms.groq import Groq
49
  llm = Groq(
50
  api_key=api_key,
51
- model="meta-llama/llama-4-scout-17b-16e-instruct",
52
  temperature=0.0,
53
  max_tokens=2048
54
  )
55
- logger.info("✅ Using Groq Llama 3 70B")
56
  return llm
57
  except Exception as e:
58
  logger.warning(f"Groq setup failed: {e}")
59
 
60
  if api_key := os.getenv("TOGETHER_API_KEY"):
61
  try:
62
- from llama_index.llms.together import Together
63
- llm = Together(
64
  api_key=api_key,
65
- model="deepseek-ai/DeepSeek-V3",
66
  temperature=0.0,
67
  max_tokens=2048
68
  )
69
- logger.info("✅ Using Together AI")
70
  return llm
71
  except Exception as e:
72
  logger.warning(f"Together setup failed: {e}")
73
-
74
- if api_key := os.getenv("HF_TOKEN"):
75
- try:
76
- from llama_index.llms.huggingface_api import HuggingFaceInferenceAPI
77
- llm = HuggingFaceInferenceAPI(
78
- model_name="meta-llama/Llama-3.1-70B-Instruct",
79
- token=api_key,
80
- temperature=0.0
81
- )
82
- logger.info("✅ Using HuggingFace")
83
- return llm
84
- except Exception as e:
85
- logger.warning(f"HuggingFace setup failed: {e}")
86
-
87
- if api_key := os.getenv("OPENAI_API_KEY"):
88
- try:
89
- from llama_index.llms.openai import OpenAI
90
- llm = OpenAI(
91
- api_key=api_key,
92
- model="gpt-4o-mini",
93
- temperature=0.0,
94
- max_tokens=2048
95
- )
96
- logger.info("✅ Using OpenAI")
97
- return llm
98
- except Exception as e:
99
- logger.warning(f"OpenAI setup failed: {e}")
100
-
101
- raise RuntimeError("No LLM API key found! Set one of: ANTHROPIC_API_KEY, GROQ_API_KEY, TOGETHER_API_KEY, HF_TOKEN, OPENAI_API_KEY")
102
 
103
  def extract_final_answer(response_text: str) -> str:
104
  """Extract answer aligned with GAIA scoring rules"""
105
 
 
106
  match = re.search(r"FINAL ANSWER:\s*(.+?)(?:\n|$)", response_text, re.IGNORECASE | re.DOTALL)
107
 
108
  if not match:
109
- logger.warning("No FINAL ANSWER found")
110
- return ""
 
 
 
 
 
 
 
 
 
 
 
 
111
 
112
- answer = match.group(1).strip()
 
113
 
114
  # Clean for GAIA scoring
115
 
116
- # 1. Numbers: remove units and formatting
117
- if re.match(r'^[\d$%,.\s]+$', answer):
118
- cleaned = answer.replace('$', '').replace('%', '').replace(',', '')
 
119
  try:
 
120
  num = float(cleaned)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
121
  return str(int(num)) if num.is_integer() else str(num)
122
  except:
123
  pass
124
 
125
- # 2. Lists: consistent comma separation
126
- if ',' in answer or ';' in answer:
127
- items = re.split(r'[,;]', answer)
128
- cleaned_items = []
 
129
 
130
- for item in items:
131
- item = item.strip()
 
 
 
132
  # Try to parse as number
133
  try:
134
- cleaned = item.replace('$', '').replace('%', '').replace(',', '')
135
- num = float(cleaned)
136
- cleaned_items.append(str(int(num)) if num.is_integer() else str(num))
137
  except:
138
- # Keep as string
139
- cleaned_items.append(item)
 
 
 
 
140
 
141
- return ', '.join(cleaned_items)
142
 
143
- # 3. Yes/no: lowercase
144
  if answer.lower() in ['yes', 'no']:
145
  return answer.lower()
146
 
147
- # 4. Single words/strings: remove articles if at start
148
  words = answer.split()
149
  if words and words[0].lower() in ['the', 'a', 'an']:
150
  return ' '.join(words[1:])
151
 
152
  return answer
153
 
 
154
  class GAIAAgent:
155
  """GAIA RAG Agent using LlamaIndex AgentWorkflow"""
156
 
157
  def __init__(self):
158
  logger.info("Initializing GAIA RAG Agent...")
159
 
 
 
 
160
  # Initialize LLM
161
  self.llm = setup_llm()
162
 
@@ -184,44 +195,120 @@ class GAIAAgent:
184
  """Process a question and return clean answer for course submission"""
185
  logger.info(f"Processing question: {question[:100]}...")
186
 
 
 
 
 
187
  try:
188
- # Run agent asynchronously
189
  loop = asyncio.new_event_loop()
190
  asyncio.set_event_loop(loop)
191
 
192
  try:
193
  async def run_agent():
194
- handler = self.agent.run(user_msg=question)
195
-
196
- # Log tool usage
197
- from llama_index.core.agent.workflow import ToolCallResult
198
- async for event in handler.stream_events():
199
- if isinstance(event, ToolCallResult):
200
- logger.info(f"Tool used: {event.tool_name}")
201
 
202
- result = await handler
203
- return result
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
204
 
205
- result = loop.run_until_complete(run_agent())
 
 
 
206
 
207
- # Extract response text
208
- if hasattr(result, 'response'):
209
- response_text = str(result.response)
210
- else:
211
- response_text = str(result)
212
-
213
- # Extract clean answer (no "FINAL ANSWER:" prefix)
214
  clean_answer = extract_final_answer(response_text)
215
 
216
- logger.info(f"Final answer: '{clean_answer}'")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
217
  return clean_answer
218
 
219
  finally:
 
220
  loop.close()
221
 
222
  except Exception as e:
223
  logger.error(f"Error processing question: {e}")
224
- return ""
 
 
225
 
226
  def run_and_submit_all(profile: gr.OAuthProfile | None):
227
  """Run GAIA evaluation following course template structure"""
@@ -351,24 +438,29 @@ Message: {result_data.get('message', 'Evaluation complete')}"""
351
  return error_msg, pd.DataFrame(results_log)
352
 
353
  # Gradio Interface
354
- with gr.Blocks(title="GAIA RAG Agent") as demo:
355
- gr.Markdown("# Isadora Teles - GAIA Agent - Final HF Agents Project")
 
356
  gr.Markdown("""
357
- This is a RAG agent implementation, with multiple tools, for the GAIA benchmark.
358
- TEST 2.
359
 
360
  **Features:**
361
- - 🧠 LlamaIndex AgentWorkflow with GAIA prompt
362
- - 🔍 Web search for current information
363
- - 🧮 Calculator for mathematical problems
 
364
  - 📊 File analyzer for data questions
365
- - 👥 RAG persona database
366
  - ✅ Clean answer extraction for exact match
367
 
 
 
 
 
 
368
  **Instructions:**
369
  1. Log in with HuggingFace account
370
  2. Click 'Run Evaluation & Submit All Answers'
371
- 3. Wait for the agent to process all questions (5-10 minutes)
372
  4. Check your score!
373
  """)
374
 
@@ -407,20 +499,21 @@ if __name__ == "__main__":
407
 
408
  # Check API keys
409
  api_keys = [
410
- #("Claude", os.getenv("ANTHROPIC_API_KEY") or os.getenv("CLAUDE_API_KEY")),
411
  ("Groq", os.getenv("GROQ_API_KEY")),
 
412
  ("Together", os.getenv("TOGETHER_API_KEY")),
413
  ("HuggingFace", os.getenv("HF_TOKEN")),
414
  ("OpenAI", os.getenv("OPENAI_API_KEY")),
 
415
  ("OpenWeather", os.getenv("OPENWEATHER_API_KEY"))
416
  ]
417
 
418
  available = [name for name, key in api_keys if key]
419
 
420
  if available:
421
- print(f"✅ Available LLMs: {', '.join(available)}")
422
  else:
423
- print("❌ No LLM API keys found!")
424
 
425
  print("="*60 + "\n")
426
 
 
12
  import re
13
  import string
14
  from typing import List, Dict, Any, Optional
15
+ import warnings
16
+ warnings.filterwarnings("ignore", category=RuntimeWarning, module="asyncio")
17
 
18
  # Logging setup
19
+ logging.basicConfig(
20
+ level=logging.INFO,
21
+ format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
22
+ datefmt='%H:%M:%S'
23
+ )
24
  logger = logging.getLogger(__name__)
25
 
26
  # Constants
27
  GAIA_API_URL = "https://agents-course-unit4-scoring.hf.space"
28
  PASSING_SCORE = 30
29
 
30
+ # GAIA System Prompt - for intelligent reasoning and tool use
31
+ GAIA_SYSTEM_PROMPT = """You are a general AI assistant. I will ask you a question. Report your thoughts, and finish your answer with the following template: FINAL ANSWER: [YOUR FINAL ANSWER]. YOUR FINAL ANSWER should be a number OR as few words as possible OR a comma separated list of numbers and/or strings. If you are asked for a number, don't use comma to write your number neither use units such as $ or percent sign unless specified otherwise. If you are asked for a string, don't use articles, neither abbreviations (e.g. for cities), and write the digits in plain text unless specified otherwise. If you are asked for a comma separated list, apply the above rules depending of whether the element to be put in the list is a number or a string.
32
+
33
+ IMPORTANT: You have extensive knowledge up to January 2025. For most questions, try to answer from your knowledge FIRST. Only use web_search when:
34
+ 1. The question asks for current/recent information (after January 2025)
35
+ 2. You're unsure and need to verify facts
36
+ 3. The question explicitly asks to search or look up information
37
+ 4. The question is about real-time data (weather, stock prices, current events)
38
+
39
+ Always use the calculator tool for ANY mathematical computation, even simple ones."""
40
 
41
  def setup_llm():
42
  """Initialize the best available LLM"""
43
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
44
  if api_key := os.getenv("GROQ_API_KEY"):
45
  try:
46
  from llama_index.llms.groq import Groq
47
  llm = Groq(
48
  api_key=api_key,
49
+ model="llama-3.3-70b-versatile", # Correct model name
50
  temperature=0.0,
51
  max_tokens=2048
52
  )
53
+ logger.info("✅ Using Groq Llama 3.3 70B")
54
  return llm
55
  except Exception as e:
56
  logger.warning(f"Groq setup failed: {e}")
57
 
58
  if api_key := os.getenv("TOGETHER_API_KEY"):
59
  try:
60
+ from llama_index.llms.together import TogetherLLM
61
+ llm = TogetherLLM(
62
  api_key=api_key,
63
+ model="meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo", # Correct Together model
64
  temperature=0.0,
65
  max_tokens=2048
66
  )
67
+ logger.info("✅ Using Together AI Llama 3.1 70B")
68
  return llm
69
  except Exception as e:
70
  logger.warning(f"Together setup failed: {e}")
71
+
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
72
 
73
  def extract_final_answer(response_text: str) -> str:
74
  """Extract answer aligned with GAIA scoring rules"""
75
 
76
+ # Look for FINAL ANSWER pattern
77
  match = re.search(r"FINAL ANSWER:\s*(.+?)(?:\n|$)", response_text, re.IGNORECASE | re.DOTALL)
78
 
79
  if not match:
80
+ # Fallback: look for answer at the end of response
81
+ lines = response_text.strip().split('\n')
82
+ if lines:
83
+ # Check if last line looks like an answer
84
+ last_line = lines[-1].strip()
85
+ if len(last_line) < 100 and not last_line.startswith(('I', 'The', 'To', 'Based')):
86
+ answer = last_line
87
+ else:
88
+ logger.warning("No FINAL ANSWER found")
89
+ return ""
90
+ else:
91
+ return ""
92
+ else:
93
+ answer = match.group(1).strip()
94
 
95
+ # Remove any trailing punctuation that's not part of the answer
96
+ answer = answer.rstrip('.')
97
 
98
  # Clean for GAIA scoring
99
 
100
+ # 1. Handle numbers with more precision
101
+ if re.match(r'^[\d\s.,\-+e]+$', answer):
102
+ # Remove all formatting
103
+ cleaned = answer.replace(',', '').replace(' ', '')
104
  try:
105
+ # Try to parse as float
106
  num = float(cleaned)
107
+ # Return integer if whole number, otherwise keep precision
108
+ if num.is_integer():
109
+ return str(int(num))
110
+ else:
111
+ # Keep original precision, don't round
112
+ return str(num)
113
+ except:
114
+ pass
115
+
116
+ # 2. Handle percentages (remove % sign)
117
+ if answer.endswith('%'):
118
+ answer = answer[:-1].strip()
119
+ try:
120
+ num = float(answer)
121
  return str(int(num)) if num.is_integer() else str(num)
122
  except:
123
  pass
124
 
125
+ # 3. Lists: clean and standardize
126
+ if ',' in answer or ' and ' in answer.lower():
127
+ # Split on commas and 'and'
128
+ parts = re.split(r',|\s+and\s+', answer)
129
+ cleaned_parts = []
130
 
131
+ for part in parts:
132
+ part = part.strip()
133
+ if not part:
134
+ continue
135
+
136
  # Try to parse as number
137
  try:
138
+ num = float(part.replace('$', '').replace('%', '').replace(',', ''))
139
+ cleaned_parts.append(str(int(num)) if num.is_integer() else str(num))
 
140
  except:
141
+ # Remove articles from strings
142
+ words = part.split()
143
+ if words and words[0].lower() in ['the', 'a', 'an']:
144
+ cleaned_parts.append(' '.join(words[1:]))
145
+ else:
146
+ cleaned_parts.append(part)
147
 
148
+ return ', '.join(cleaned_parts)
149
 
150
+ # 4. Yes/No answers
151
  if answer.lower() in ['yes', 'no']:
152
  return answer.lower()
153
 
154
+ # 5. Single words/phrases: remove articles
155
  words = answer.split()
156
  if words and words[0].lower() in ['the', 'a', 'an']:
157
  return ' '.join(words[1:])
158
 
159
  return answer
160
 
161
+
162
  class GAIAAgent:
163
  """GAIA RAG Agent using LlamaIndex AgentWorkflow"""
164
 
165
  def __init__(self):
166
  logger.info("Initializing GAIA RAG Agent...")
167
 
168
+ # Skip persona RAG for faster GAIA evaluation
169
+ os.environ["SKIP_PERSONA_RAG"] = "true"
170
+
171
  # Initialize LLM
172
  self.llm = setup_llm()
173
 
 
195
  """Process a question and return clean answer for course submission"""
196
  logger.info(f"Processing question: {question[:100]}...")
197
 
198
+ import warnings
199
+ warnings.filterwarnings("ignore", category=RuntimeWarning, message=".*Event loop is closed.*")
200
+
201
+
202
  try:
203
+ # Create new event loop for async operations
204
  loop = asyncio.new_event_loop()
205
  asyncio.set_event_loop(loop)
206
 
207
  try:
208
  async def run_agent():
209
+ # Track what happened during execution
210
+ tool_calls = []
211
+ response_chunks = []
 
 
 
 
212
 
213
+ try:
214
+ # Start the agent workflow
215
+ handler = self.agent.run(user_msg=question)
216
+
217
+ # IMPORTANT: Process events WITHOUT consuming them
218
+ # We need to collect BOTH tool usage AND response content
219
+ from llama_index.core.agent.workflow import ToolCallResult
220
+
221
+ # Stream events and collect information
222
+ async for event in handler.stream_events():
223
+ # Log tool usage
224
+ if isinstance(event, ToolCallResult):
225
+ tool_info = f"{event.tool_name}: {str(event.result)[:100]}..."
226
+ tool_calls.append(tool_info)
227
+ logger.info(f"Tool used: {tool_info}")
228
+
229
+ # Also collect any text responses
230
+ # Different event types might have content in different attributes
231
+ if hasattr(event, 'delta'):
232
+ response_chunks.append(str(event.delta))
233
+ elif hasattr(event, 'content'):
234
+ response_chunks.append(str(event.content))
235
+ elif hasattr(event, 'response'):
236
+ response_chunks.append(str(event.response))
237
+
238
+ # Get the final result after streaming
239
+ result = await handler
240
+
241
+ # Extract the final response text
242
+ # Priority: accumulated chunks > result.response > str(result)
243
+ if response_chunks:
244
+ response_text = ''.join(response_chunks)
245
+ elif hasattr(result, 'response'):
246
+ response_text = str(result.response)
247
+ else:
248
+ response_text = str(result)
249
+
250
+ # Log what tools were used for debugging
251
+ if tool_calls:
252
+ logger.info(f"Tools used in this query: {', '.join(set(tool_calls))}")
253
+
254
+ # CRITICAL: Check if we got a meaningful response
255
+ # This prevents infinite loops
256
+ if not response_text or len(response_text.strip()) < 10:
257
+ logger.warning("Got empty or too short response from agent")
258
+ # Return a fallback response
259
+ return "FINAL ANSWER: Unable to determine answer"
260
+
261
+ return response_text
262
+
263
+ except asyncio.TimeoutError:
264
+ # Prevent infinite waiting
265
+ logger.error("Agent timeout - preventing infinite loop")
266
+ return "FINAL ANSWER: Request timeout"
267
+
268
+ except Exception as e:
269
+ logger.error(f"Agent execution error: {e}")
270
+ # Return structured error response
271
+ return f"FINAL ANSWER: Error occurred"
272
 
273
+ # Run with timeout to prevent infinite loops
274
+ response_text = loop.run_until_complete(
275
+ asyncio.wait_for(run_agent(), timeout=120) # 2 minute timeout
276
+ )
277
 
278
+ # Extract clean answer
 
 
 
 
 
 
279
  clean_answer = extract_final_answer(response_text)
280
 
281
+ # VALIDATION: Ensure we have a valid answer
282
+ if not clean_answer:
283
+ logger.warning("No answer extracted, using fallback")
284
+ # Try to extract any number or short phrase from response
285
+ # This prevents returning empty string to GAIA
286
+ numbers = re.findall(r'\b\d+\.?\d*\b', response_text)
287
+ if numbers:
288
+ clean_answer = numbers[-1] # Use last number found
289
+ else:
290
+ # Look for any short phrase that could be an answer
291
+ sentences = response_text.split('.')
292
+ for sent in reversed(sentences):
293
+ sent = sent.strip()
294
+ if 0 < len(sent) < 50 and not sent.startswith(('I', 'The', 'To')):
295
+ clean_answer = sent
296
+ break
297
+
298
+ logger.info(f"Full response preview: {response_text[:200]}...")
299
+ logger.info(f"Extracted answer: '{clean_answer}'")
300
+
301
  return clean_answer
302
 
303
  finally:
304
+ # Always close the loop
305
  loop.close()
306
 
307
  except Exception as e:
308
  logger.error(f"Error processing question: {e}")
309
+ # Never return empty string to GAIA - always return something
310
+ return "0" # Safe fallback for math questions
311
+
312
 
313
  def run_and_submit_all(profile: gr.OAuthProfile | None):
314
  """Run GAIA evaluation following course template structure"""
 
438
  return error_msg, pd.DataFrame(results_log)
439
 
440
  # Gradio Interface
441
+ with gr.Blocks(title="GAIA RAG Agent - Final Project") as demo:
442
+ gr.Markdown("# GAIA Smart RAG Agent - Final HF Agents Course Project")
443
+ gr.Markdown("### by Isadora Teles")
444
  gr.Markdown("""
445
+ This is a smart RAG agent for the GAIA benchmark that knows when to use its knowledge vs when to search.
 
446
 
447
  **Features:**
448
+ - 🧠 LlamaIndex AgentWorkflow with intelligent reasoning
449
+ - 💭 Answers from knowledge first (up to Jan 2025)
450
+ - 🔍 Google Search when needed (with DuckDuckGo fallback)
451
+ - 🧮 Calculator for all math problems
452
  - 📊 File analyzer for data questions
 
453
  - ✅ Clean answer extraction for exact match
454
 
455
+ **Smart Strategy:**
456
+ - Uses internal knowledge for facts it knows
457
+ - Only searches for current info or verification
458
+ - Prioritizes accuracy and efficiency
459
+
460
  **Instructions:**
461
  1. Log in with HuggingFace account
462
  2. Click 'Run Evaluation & Submit All Answers'
463
+ 3. Wait for the agent to process all questions (3-5 minutes)
464
  4. Check your score!
465
  """)
466
 
 
499
 
500
  # Check API keys
501
  api_keys = [
 
502
  ("Groq", os.getenv("GROQ_API_KEY")),
503
+ ("Claude", os.getenv("ANTHROPIC_API_KEY") or os.getenv("CLAUDE_API_KEY")),
504
  ("Together", os.getenv("TOGETHER_API_KEY")),
505
  ("HuggingFace", os.getenv("HF_TOKEN")),
506
  ("OpenAI", os.getenv("OPENAI_API_KEY")),
507
+ ("Google Search", os.getenv("GOOGLE_API_KEY")),
508
  ("OpenWeather", os.getenv("OPENWEATHER_API_KEY"))
509
  ]
510
 
511
  available = [name for name, key in api_keys if key]
512
 
513
  if available:
514
+ print(f"✅ Available APIs: {', '.join(available)}")
515
  else:
516
+ print("❌ No API keys found!")
517
 
518
  print("="*60 + "\n")
519
 
test_gaia_agent.py ADDED
@@ -0,0 +1,420 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # test_gaia_agent.py
2
+ """
3
+ Comprehensive test script for GAIA Agent
4
+ Tests LLM, search, tools, and answer extraction
5
+ Run with: python test_gaia_agent.py
6
+ """
7
+
8
+ import os
9
+ import sys
10
+ import logging
11
+ import asyncio
12
+ import json
13
+ from datetime import datetime
14
+ from typing import Dict, List, Tuple
15
+
16
+ # Configure logging
17
+ logging.basicConfig(
18
+ level=logging.INFO,
19
+ format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
20
+ datefmt='%H:%M:%S'
21
+ )
22
+ logger = logging.getLogger(__name__)
23
+
24
+ # Color codes for terminal output
25
+ class Colors:
26
+ HEADER = '\033[95m'
27
+ OKBLUE = '\033[94m'
28
+ OKCYAN = '\033[96m'
29
+ OKGREEN = '\033[92m'
30
+ WARNING = '\033[93m'
31
+ FAIL = '\033[91m'
32
+ ENDC = '\033[0m'
33
+ BOLD = '\033[1m'
34
+ UNDERLINE = '\033[4m'
35
+
36
+ def print_header(text: str):
37
+ print(f"\n{Colors.HEADER}{Colors.BOLD}{'='*60}{Colors.ENDC}")
38
+ print(f"{Colors.HEADER}{Colors.BOLD}{text.center(60)}{Colors.ENDC}")
39
+ print(f"{Colors.HEADER}{Colors.BOLD}{'='*60}{Colors.ENDC}\n")
40
+
41
+ def print_test(name: str, status: bool, details: str = ""):
42
+ status_text = f"{Colors.OKGREEN}✓ PASS{Colors.ENDC}" if status else f"{Colors.FAIL}✗ FAIL{Colors.ENDC}"
43
+ print(f"{name:<40} {status_text}")
44
+ if details:
45
+ print(f" {Colors.OKCYAN}→ {details}{Colors.ENDC}")
46
+
47
+ def print_section(text: str):
48
+ print(f"\n{Colors.OKBLUE}{Colors.BOLD}{text}{Colors.ENDC}")
49
+ print(f"{Colors.OKBLUE}{'-'*40}{Colors.ENDC}")
50
+
51
+ # Test 1: Environment and API Keys
52
+ def test_environment():
53
+ print_section("Testing Environment Setup")
54
+
55
+ api_keys = {
56
+ "GROQ_API_KEY": "Groq (Primary LLM)",
57
+ "ANTHROPIC_API_KEY": "Anthropic Claude",
58
+ "TOGETHER_API_KEY": "Together AI",
59
+ "HF_TOKEN": "HuggingFace",
60
+ "OPENAI_API_KEY": "OpenAI",
61
+ "GOOGLE_API_KEY": "Google Search",
62
+ "GOOGLE_CSE_ID": "Google Custom Search Engine ID"
63
+ }
64
+
65
+ available = []
66
+ missing = []
67
+
68
+ for key, service in api_keys.items():
69
+ if os.getenv(key):
70
+ available.append(service)
71
+ print_test(f"{service} API Key", True, f"{key} is set")
72
+ else:
73
+ missing.append(service)
74
+ print_test(f"{service} API Key", False, f"{key} not found")
75
+
76
+ # Set SKIP_PERSONA_RAG for testing
77
+ os.environ["SKIP_PERSONA_RAG"] = "true"
78
+ print_test("SKIP_PERSONA_RAG set", True, "Persona RAG disabled for faster testing")
79
+
80
+ return len(available) > 0, available, missing
81
+
82
+ # Test 2: LLM Initialization
83
+ def test_llm_setup():
84
+ print_section("Testing LLM Setup")
85
+
86
+ try:
87
+ from app import setup_llm
88
+
89
+ llm = setup_llm()
90
+ print_test("LLM Initialization", True, f"Using {type(llm).__name__}")
91
+
92
+ # Test basic LLM call
93
+ try:
94
+ response = llm.complete("Say 'Hello World' and nothing else.")
95
+ response_text = str(response).strip()
96
+
97
+ success = "hello world" in response_text.lower()
98
+ print_test("LLM Basic Response", success, f"Response: {response_text[:50]}")
99
+
100
+ return True, llm
101
+ except Exception as e:
102
+ print_test("LLM Basic Response", False, f"Error: {str(e)[:100]}")
103
+ return False, None
104
+
105
+ except Exception as e:
106
+ print_test("LLM Initialization", False, f"Error: {str(e)[:100]}")
107
+ return False, None
108
+
109
+ # Test 3: Web Search Functions
110
+ def test_web_search():
111
+ print_section("Testing Web Search")
112
+
113
+ try:
114
+ from tools import search_web, _search_google, _search_duckduckgo
115
+
116
+ test_query = "Python programming language"
117
+
118
+ # Test Google Search
119
+ print("\nTesting Google Search...")
120
+ try:
121
+ google_result = _search_google(test_query)
122
+ if google_result and "error" not in google_result.lower():
123
+ print_test("Google Search", True, f"Got {len(google_result)} chars")
124
+ print(f" Preview: {google_result[:150]}...")
125
+ else:
126
+ print_test("Google Search", False, google_result[:100])
127
+ except Exception as e:
128
+ print_test("Google Search", False, str(e)[:100])
129
+
130
+ # Test DuckDuckGo Search
131
+ print("\nTesting DuckDuckGo Search...")
132
+ try:
133
+ ddg_result = _search_duckduckgo(test_query)
134
+ if ddg_result and "error" not in ddg_result.lower():
135
+ print_test("DuckDuckGo Search", True, f"Got {len(ddg_result)} chars")
136
+ print(f" Preview: {ddg_result[:150]}...")
137
+ else:
138
+ print_test("DuckDuckGo Search", False, ddg_result[:100])
139
+ except Exception as e:
140
+ print_test("DuckDuckGo Search", False, str(e)[:100])
141
+
142
+ # Test Combined Search
143
+ print("\nTesting Combined Web Search...")
144
+ try:
145
+ result = search_web(test_query)
146
+ success = result and len(result) > 50 and "error" not in result.lower()
147
+ print_test("Combined Web Search", success, f"Got {len(result)} chars")
148
+ return success
149
+ except Exception as e:
150
+ print_test("Combined Web Search", False, str(e)[:100])
151
+ return False
152
+
153
+ except ImportError as e:
154
+ print_test("Import Tools Module", False, str(e))
155
+ return False
156
+
157
+ # Test 4: Other Tools
158
+ def test_tools():
159
+ print_section("Testing Other Tools")
160
+
161
+ try:
162
+ from tools import calculate, analyze_file, get_weather
163
+
164
+ # Test Calculator
165
+ calc_tests = [
166
+ ("2 + 2", "4"),
167
+ ("15% of 1000", "150"),
168
+ ("square root of 144", "12"),
169
+ ("4847 * 3291", "15951477") ,
170
+ ]
171
+
172
+ calc_success = 0
173
+ for expr, expected in calc_tests:
174
+ try:
175
+ result = calculate(expr)
176
+ success = str(result) == expected
177
+ calc_success += success
178
+ print_test(f"Calculate: {expr}", success, f"Got {result}, expected {expected}")
179
+ except Exception as e:
180
+ print_test(f"Calculate: {expr}", False, str(e)[:50])
181
+
182
+ # Test File Analyzer
183
+ try:
184
+ csv_content = "name,age,score\nAlice,25,85\nBob,30,92"
185
+ result = analyze_file(csv_content, "csv")
186
+ success = "3" in result and "name" in result
187
+ print_test("File Analyzer (CSV)", success, "Basic CSV analysis works")
188
+ except Exception as e:
189
+ print_test("File Analyzer (CSV)", False, str(e)[:50])
190
+
191
+ # Test Weather
192
+ try:
193
+ result = get_weather("Paris")
194
+ success = "Temperature" in result and "°C" in result
195
+ print_test("Weather Tool", success, result.split('\n')[0])
196
+ except Exception as e:
197
+ print_test("Weather Tool", False, str(e)[:50])
198
+
199
+ return calc_success >= 3
200
+
201
+ except ImportError as e:
202
+ print_test("Import Tools", False, str(e))
203
+ return False
204
+
205
+ # Test 5: Answer Extraction
206
+ def test_answer_extraction():
207
+ print_section("Testing Answer Extraction")
208
+
209
+ try:
210
+ # Try importing just the function we need
211
+ import sys
212
+ import os
213
+ sys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))
214
+
215
+ # Import the extract_final_answer function directly
216
+ from app import extract_final_answer
217
+
218
+ test_cases = [
219
+ # (input, expected)
220
+ ("The answer is 42. FINAL ANSWER: 42", "42"),
221
+ ("FINAL ANSWER: 15%", "15"),
222
+ ("Calculating... FINAL ANSWER: 3,456", "3456"),
223
+ ("FINAL ANSWER: Paris", "Paris"),
224
+ ("FINAL ANSWER: The Eiffel Tower", "Eiffel Tower"),
225
+ ("FINAL ANSWER: yes", "yes"),
226
+ ("FINAL ANSWER: 1, 2, 3, 4, 5", "1, 2, 3, 4, 5"),
227
+ ("Some text FINAL ANSWER: $1,234.56", "1234.56"),
228
+ ("No final answer marker here", ""),
229
+ ]
230
+
231
+ success_count = 0
232
+ for input_text, expected in test_cases:
233
+ result = extract_final_answer(input_text)
234
+ success = result == expected
235
+ success_count += success
236
+ print_test(
237
+ f"Extract: {expected or '(empty)'}",
238
+ success,
239
+ f"Got '{result}'" if not success else ""
240
+ )
241
+
242
+ return success_count >= len(test_cases) - 2
243
+
244
+ except ImportError as e:
245
+ # If import fails, try a minimal test
246
+ print_test("Answer Extraction Import", False, f"Import error: {str(e)[:100]}")
247
+
248
+ # Create a minimal version for testing
249
+ def extract_final_answer_minimal(text):
250
+ import re
251
+ match = re.search(r"FINAL ANSWER:\s*(.+?)(?:\n|$)", text, re.IGNORECASE)
252
+ return match.group(1).strip() if match else ""
253
+
254
+ # Test with minimal version
255
+ test_text = "The answer is FINAL ANSWER: 42"
256
+ result = extract_final_answer_minimal(test_text)
257
+ success = result == "42"
258
+ print_test("Minimal Extraction Test", success, f"Got '{result}'")
259
+ return success
260
+
261
+ except Exception as e:
262
+ print_test("Answer Extraction", False, str(e))
263
+ return False
264
+
265
+ # Test 6: Full Agent Test
266
+ def test_gaia_agent(llm):
267
+ print_section("Testing GAIA Agent")
268
+
269
+ try:
270
+ # Import here to ensure environment is set up
271
+ from app import GAIAAgent
272
+
273
+ # Initialize agent
274
+ print("Initializing GAIA Agent...")
275
+ agent = GAIAAgent()
276
+ print_test("Agent Initialization", True, "Agent created successfully")
277
+
278
+ # Test questions matching GAIA style
279
+ test_questions = [
280
+ # (question, expected_answer_pattern, description)
281
+ ("What is 2 + 2?", r"^4$", "Simple math"),
282
+ ("Calculate 15% of 1200", r"^180$", "Percentage calculation"),
283
+ ("What is the capital of France?", r"(?i)paris", "Factual question"),
284
+ ("Is 17 a prime number? Answer yes or no.", r"(?i)yes", "Yes/no question"),
285
+ ("List the first 3 prime numbers", r"2.*3.*5", "List question"),
286
+ ]
287
+
288
+ print("\nRunning test questions...")
289
+ success_count = 0
290
+
291
+ for question, pattern, description in test_questions:
292
+ print(f"\n{Colors.BOLD}Q: {question}{Colors.ENDC}")
293
+ try:
294
+ answer = agent(question)
295
+ print(f"A: '{answer}'")
296
+
297
+ import re
298
+ matches = bool(re.search(pattern, answer))
299
+ success_count += matches
300
+
301
+ print_test(f"{description}", matches,
302
+ f"Expected pattern: {pattern}" if not matches else "")
303
+
304
+ except Exception as e:
305
+ print_test(f"{description}", False, f"Error: {str(e)[:50]}")
306
+ print(f"{Colors.WARNING}Full error: {e}{Colors.ENDC}")
307
+
308
+ return success_count >= 3
309
+
310
+ except Exception as e:
311
+ print_test("GAIA Agent", False, f"Error: {str(e)}")
312
+ import traceback
313
+ print(f"{Colors.WARNING}Full traceback:{Colors.ENDC}")
314
+ traceback.print_exc()
315
+ return False
316
+
317
+ # Test 7: GAIA API Integration
318
+ def test_gaia_api():
319
+ print_section("Testing GAIA API Connection")
320
+
321
+ try:
322
+ import requests
323
+ from app import GAIA_API_URL
324
+
325
+ # Test questions endpoint
326
+ try:
327
+ response = requests.get(f"{GAIA_API_URL}/questions", timeout=10)
328
+ if response.status_code == 200:
329
+ questions = response.json()
330
+ print_test("GAIA API Questions", True, f"Got {len(questions)} questions")
331
+
332
+ # Show sample question
333
+ if questions:
334
+ sample = questions[0]
335
+ print(f" Sample task_id: {sample.get('task_id', 'N/A')}")
336
+ q_text = sample.get('question', '')[:100]
337
+ print(f" Sample question: {q_text}...")
338
+
339
+ return True
340
+ else:
341
+ print_test("GAIA API Questions", False, f"HTTP {response.status_code}")
342
+ return False
343
+ except Exception as e:
344
+ print_test("GAIA API Questions", False, str(e)[:100])
345
+ return False
346
+
347
+ except Exception as e:
348
+ print_test("GAIA API Test", False, str(e))
349
+ return False
350
+
351
+ # Main test runner
352
+ def main():
353
+ print_header("GAIA Agent Local Test Suite")
354
+
355
+ # Track overall results
356
+ results = {
357
+ "Environment": False,
358
+ "LLM": False,
359
+ "Web Search": False,
360
+ "Tools": False,
361
+ "Answer Extraction": False,
362
+ "Agent": False,
363
+ "API": False
364
+ }
365
+
366
+ # Run tests
367
+ env_ok, available, missing = test_environment()
368
+ results["Environment"] = env_ok
369
+
370
+ if not env_ok:
371
+ print(f"\n{Colors.FAIL}No API keys found! Please set at least one of:{Colors.ENDC}")
372
+ for m in missing:
373
+ print(f" - {m}")
374
+ print("\nExample:")
375
+ print(" export GROQ_API_KEY='your-key-here'")
376
+ return
377
+
378
+ # Test LLM
379
+ llm_ok, llm = test_llm_setup()
380
+ results["LLM"] = llm_ok
381
+
382
+ # Test other components
383
+ results["Web Search"] = test_web_search()
384
+ results["Tools"] = test_tools()
385
+ results["Answer Extraction"] = test_answer_extraction()
386
+
387
+ # Only test agent if LLM works
388
+ if llm_ok:
389
+ results["Agent"] = test_gaia_agent(llm)
390
+
391
+ # Test API connection
392
+ results["API"] = test_gaia_api()
393
+
394
+ # Summary
395
+ print_header("Test Summary")
396
+
397
+ passed = sum(1 for v in results.values() if v)
398
+ total = len(results)
399
+
400
+ for component, status in results.items():
401
+ print_test(component, status)
402
+
403
+ print(f"\n{Colors.BOLD}Overall: {passed}/{total} components working{Colors.ENDC}")
404
+
405
+ if passed == total:
406
+ print(f"{Colors.OKGREEN}✨ All tests passed! Your agent is ready for GAIA evaluation.{Colors.ENDC}")
407
+ elif passed >= total - 2:
408
+ print(f"{Colors.WARNING}⚠️ Most components working. Check failed components above.{Colors.ENDC}")
409
+ else:
410
+ print(f"{Colors.FAIL}❌ Several components failing. Fix issues before running GAIA evaluation.{Colors.ENDC}")
411
+
412
+ # Recommendations
413
+ if not results["Web Search"]:
414
+ print(f"\n{Colors.WARNING}Tip: Web search is important for GAIA. Check your GOOGLE_API_KEY.{Colors.ENDC}")
415
+
416
+ if not results["Agent"]:
417
+ print(f"\n{Colors.WARNING}Tip: Agent not working. Check LLM setup and tool integration.{Colors.ENDC}")
418
+
419
+ if __name__ == "__main__":
420
+ main()
test_google_search.py ADDED
@@ -0,0 +1,143 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ Quick test for Google Search functionality
4
+ Run this to verify your Google API key and CSE ID are working
5
+ """
6
+
7
+ import os
8
+ import sys
9
+ import requests
10
+ import logging
11
+
12
+ # Set up logging
13
+ logging.basicConfig(level=logging.INFO)
14
+ logger = logging.getLogger(__name__)
15
+
16
+ def test_google_search():
17
+ """Test Google Custom Search API"""
18
+
19
+ print("🔍 Testing Google Search Configuration\n")
20
+
21
+ # Check for API key
22
+ api_key = os.getenv("GOOGLE_API_KEY")
23
+ if not api_key:
24
+ print("❌ GOOGLE_API_KEY not found in environment")
25
+ print(" Set it with: export GOOGLE_API_KEY=your_key_here")
26
+ return False
27
+
28
+ print("✅ Google API key found")
29
+
30
+ # CSE ID (yours or from env)
31
+ cse_id = os.getenv("GOOGLE_CSE_ID", "746382dd3c2bd4135")
32
+ print(f"✅ Using CSE ID: {cse_id}")
33
+
34
+ # Test query
35
+ test_query = "GAIA benchmark AI"
36
+ print(f"\nTesting search for: '{test_query}'")
37
+
38
+ # Make API call
39
+ url = "https://www.googleapis.com/customsearch/v1"
40
+ params = {
41
+ "key": api_key,
42
+ "cx": cse_id,
43
+ "q": test_query,
44
+ "num": 3
45
+ }
46
+
47
+ try:
48
+ print("Calling Google API...")
49
+ response = requests.get(url, params=params, timeout=10)
50
+
51
+ print(f"Response status: {response.status_code}")
52
+
53
+ if response.status_code == 200:
54
+ data = response.json()
55
+
56
+ # Check search info
57
+ search_info = data.get("searchInformation", {})
58
+ total_results = search_info.get("totalResults", "0")
59
+ search_time = search_info.get("searchTime", "0")
60
+
61
+ print(f"\n✅ Search successful!")
62
+ print(f" Total results: {total_results}")
63
+ print(f" Search time: {search_time}s")
64
+
65
+ # Show results
66
+ items = data.get("items", [])
67
+ if items:
68
+ print(f"\nFound {len(items)} results:")
69
+ for i, item in enumerate(items, 1):
70
+ print(f"\n{i}. {item.get('title', 'No title')}")
71
+ print(f" {item.get('snippet', 'No snippet')[:100]}...")
72
+ print(f" {item.get('link', 'No link')}")
73
+ else:
74
+ print("\n⚠️ No results returned (but API is working)")
75
+
76
+ # Check quota
77
+ if "queries" in data:
78
+ queries = data["queries"]["request"][0]
79
+ print(f"\n📊 API Usage:")
80
+ print(f" Results returned: {queries.get('count', 'unknown')}")
81
+ print(f" Total results: {queries.get('totalResults', 'unknown')}")
82
+
83
+ return True
84
+
85
+ else:
86
+ # Error response
87
+ print(f"\n❌ API Error (HTTP {response.status_code})")
88
+
89
+ try:
90
+ error_data = response.json()
91
+ error = error_data.get("error", {})
92
+ print(f" Code: {error.get('code', 'unknown')}")
93
+ print(f" Message: {error.get('message', 'unknown')}")
94
+
95
+ # Common errors
96
+ if response.status_code == 403:
97
+ print("\n🔧 Possible fixes:")
98
+ print(" 1. Check your API key is correct")
99
+ print(" 2. Enable 'Custom Search API' in Google Cloud Console")
100
+ print(" 3. Check your quota hasn't been exceeded")
101
+ elif response.status_code == 400:
102
+ print("\n🔧 Possible fixes:")
103
+ print(" 1. Check your CSE ID is correct")
104
+ print(" 2. Verify your search engine is set up properly")
105
+
106
+ except:
107
+ print(f" Raw response: {response.text[:200]}")
108
+
109
+ return False
110
+
111
+ except requests.exceptions.Timeout:
112
+ print("\n❌ Request timed out")
113
+ return False
114
+ except requests.exceptions.ConnectionError:
115
+ print("\n❌ Connection error - check your internet")
116
+ return False
117
+ except Exception as e:
118
+ print(f"\n❌ Unexpected error: {type(e).__name__}: {e}")
119
+ return False
120
+
121
+ def main():
122
+ """Run the test"""
123
+
124
+ print("="*60)
125
+ print("Google Custom Search API Test")
126
+ print("="*60)
127
+
128
+ success = test_google_search()
129
+
130
+ print("\n" + "="*60)
131
+ if success:
132
+ print("✅ Google Search is working correctly!")
133
+ print("Your GAIA agent should be able to search the web.")
134
+ else:
135
+ print("❌ Google Search is not working")
136
+ print("Fix the issues above before running the GAIA agent.")
137
+ print("\nThe agent will fall back to DuckDuckGo if available.")
138
+ print("="*60)
139
+
140
+ return 0 if success else 1
141
+
142
+ if __name__ == "__main__":
143
+ sys.exit(main())
test_local.py DELETED
@@ -1,216 +0,0 @@
1
- """
2
- Test GAIA Agent Locally
3
- Complete testing script for your GAIA RAG agent
4
- """
5
-
6
- import os
7
- import json
8
- import asyncio
9
- from app import GAIAAgent
10
-
11
- def test_gaia_agent():
12
- """Test the GAIA agent with sample questions"""
13
-
14
- print("🧪 Testing GAIA RAG Agent\n")
15
-
16
- # Check API keys
17
- api_keys = {
18
- "Claude": os.getenv("ANTHROPIC_API_KEY") or os.getenv("CLAUDE_API_KEY"),
19
- "Groq": os.getenv("GROQ_API_KEY"),
20
- "Together": os.getenv("TOGETHER_API_KEY"),
21
- "HuggingFace": os.getenv("HF_TOKEN"),
22
- "OpenAI": os.getenv("OPENAI_API_KEY")
23
- }
24
-
25
- available = [name for name, key in api_keys.items() if key]
26
-
27
- if not available:
28
- print("❌ No API keys found!")
29
- print("Set one of these environment variables:")
30
- print(" export GROQ_API_KEY=your_key")
31
- print(" export ANTHROPIC_API_KEY=your_key")
32
- print(" export TOGETHER_API_KEY=your_key")
33
- print(" export HF_TOKEN=your_key")
34
- return
35
-
36
- print(f"✅ Available LLMs: {', '.join(available)}\n")
37
-
38
- # GAIA-style test questions
39
- test_questions = [
40
- {"task_id": "test_001", "question": "What is 25 * 17?"},
41
- {"task_id": "test_002", "question": "What is the opposite of left?"},
42
- {"task_id": "test_003", "question": "How many planets are in our solar system?"},
43
- {"task_id": "test_004", "question": "Is Paris the capital of France?"},
44
- {"task_id": "test_005", "question": "What is 15% of 1000?"},
45
- {"task_id": "test_006", "question": "List the primary colors"},
46
- {"task_id": "test_007", "question": "What is the square root of 144?"},
47
- {"task_id": "test_008", "question": "How many days are in a week?"}
48
- ]
49
-
50
- # Initialize agent
51
- try:
52
- print("Initializing GAIA agent...")
53
- agent = GAIAAgent()
54
- print("✅ Agent ready!\n")
55
- except Exception as e:
56
- print(f"❌ Failed to create agent: {e}")
57
- return
58
-
59
- # Test each question
60
- answers_for_submission = []
61
- correct_count = 0
62
-
63
- print("Running test questions:\n")
64
- print("-" * 60)
65
-
66
- for item in test_questions:
67
- task_id = item["task_id"]
68
- question = item["question"]
69
-
70
- print(f"Q: {question}")
71
-
72
- try:
73
- # Get answer
74
- answer = agent(question)
75
-
76
- # Format for submission
77
- answers_for_submission.append({
78
- "task_id": task_id,
79
- "submitted_answer": answer
80
- })
81
-
82
- print(f"A: {answer}")
83
-
84
- # Check against expected answers
85
- expected = get_expected_answer(question)
86
- if expected and answer == expected:
87
- print("✅ Correct!")
88
- correct_count += 1
89
- elif expected:
90
- print(f"❌ Expected: {expected}")
91
-
92
- print("-" * 60)
93
-
94
- except Exception as e:
95
- print(f"Error: {e}")
96
- answers_for_submission.append({
97
- "task_id": task_id,
98
- "submitted_answer": ""
99
- })
100
- print("-" * 60)
101
-
102
- # Show submission format
103
- print("\n" + "="*60)
104
- print("SUBMISSION FORMAT (what gets sent to GAIA):")
105
- print(json.dumps(answers_for_submission, indent=2))
106
-
107
- # Save to file
108
- with open("test_submission.json", "w") as f:
109
- json.dump(answers_for_submission, f, indent=2)
110
-
111
- print("\n✅ Saved to test_submission.json")
112
-
113
- # Summary
114
- print(f"\nTest Results: {correct_count}/{len(test_questions)} correct")
115
- print(f"Expected score: {correct_count/len(test_questions)*100:.1f}%")
116
-
117
- def get_expected_answer(question):
118
- """Get expected answer for test questions"""
119
- expected = {
120
- "What is 25 * 17?": "425",
121
- "What is the opposite of left?": "right",
122
- "How many planets are in our solar system?": "8",
123
- "Is Paris the capital of France?": "yes",
124
- "What is 15% of 1000?": "150",
125
- "List the primary colors": "red, blue, yellow",
126
- "What is the square root of 144?": "12",
127
- "How many days are in a week?": "7"
128
- }
129
- return expected.get(question)
130
-
131
- def test_tools_only():
132
- """Test individual tools"""
133
-
134
- print("\n🔧 Testing Individual Tools\n")
135
-
136
- from tools import calculate, search_web, analyze_file, get_weather
137
-
138
- # Test calculator
139
- print("Calculator Tests:")
140
- test_calcs = [
141
- ("10 + 10", "20"),
142
- ("sqrt(144)", "12"),
143
- ("15% of 1000", "150"),
144
- ("25 * 17", "425")
145
- ]
146
-
147
- for expr, expected in test_calcs:
148
- result = calculate(expr)
149
- status = "✅" if result == expected else "❌"
150
- print(f" {status} {expr} = {result} (expected: {expected})")
151
-
152
- # Test file analyzer
153
- print("\nFile Analyzer Test:")
154
- csv_data = "product,price,quantity\nApple,1.50,100\nBanana,0.80,150"
155
- result = analyze_file(csv_data, "csv")
156
- print(result)
157
-
158
- # Test weather
159
- print("\nWeather Test:")
160
- result = get_weather("New York")
161
- print(result)
162
-
163
- # Test web search (if available)
164
- print("\nWeb Search Test:")
165
- try:
166
- result = search_web("capital of France")
167
- print(f"Found: {result[:200]}...")
168
- except Exception as e:
169
- print(f"Web search not available: {e}")
170
-
171
- def test_answer_extraction():
172
- """Test GAIA-compliant answer extraction"""
173
-
174
- print("\n📝 Testing Answer Extraction\n")
175
-
176
- from app import extract_final_answer
177
-
178
- test_cases = [
179
- ("I calculated it.\n\nFINAL ANSWER: 425", "425"),
180
- ("The answer is:\n\nFINAL ANSWER: $1,500", "1500"),
181
- ("After analysis:\n\nFINAL ANSWER: yes", "yes"),
182
- ("The result:\n\nFINAL ANSWER: red, blue, yellow", "red, blue, yellow"),
183
- ("FINAL ANSWER: The Paris", "Paris"),
184
- ("FINAL ANSWER: 25%", "25")
185
- ]
186
-
187
- print("Testing GAIA answer extraction:")
188
- for response, expected in test_cases:
189
- extracted = extract_final_answer(response)
190
- status = "✅" if extracted == expected else "❌"
191
- print(f"{status} '{response[:30]}...' → '{extracted}' (expected: '{expected}')")
192
-
193
- def main():
194
- """Run all tests"""
195
-
196
- print("="*60)
197
- print("GAIA RAG Agent - Complete Testing Suite")
198
- print("="*60)
199
-
200
- # Test components
201
- test_answer_extraction()
202
- test_tools_only()
203
-
204
- # Test full agent
205
- print("\n" + "="*60)
206
- test_gaia_agent()
207
-
208
- print("\n✅ Testing complete!")
209
- print("\nNext steps:")
210
- print("1. Review test_submission.json")
211
- print("2. Fix any failing tests")
212
- print("3. Deploy to HuggingFace Space")
213
- print("4. Run the real GAIA evaluation")
214
-
215
- if __name__ == "__main__":
216
- main()
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
tools.py CHANGED
@@ -4,55 +4,243 @@ Includes web search, calculator, file analyzer, weather, and persona RAG
4
  """
5
 
6
  import os
7
-
8
  import logging
9
  import math
10
  import re
11
  from typing import List, Optional
12
  from llama_index.core.tools import FunctionTool, QueryEngineTool
13
 
 
14
  logger = logging.getLogger(__name__)
 
15
 
16
  # ==========================================
17
- # Core Tool Functions
18
  # ==========================================
19
 
20
  def search_web(query: str) -> str:
21
  """
22
- Search the web for current information using DuckDuckGo.
23
- Returns concise, relevant results.
24
  """
25
- logger.info(f"Searching web for: {query}")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
26
 
27
  try:
28
- from duckduckgo_search import DDGS
 
 
 
 
 
 
 
 
 
 
 
 
 
 
29
 
30
- with DDGS() as ddgs:
31
- results = list(ddgs.text(query, max_results=3))
 
 
32
 
33
- if not results:
34
- return "No search results found."
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
35
 
36
- # Format results concisely for GAIA
37
- formatted_results = []
38
- for i, result in enumerate(results, 1):
39
- title = result.get('title', '')
40
- body = result.get('body', '')
41
- url = result.get('href', '')
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
42
 
43
- # Clean and truncate body
44
- clean_body = ' '.join(body.split())[:200]
 
 
 
 
 
 
 
 
 
 
 
45
 
46
- formatted_results.append(f"{i}. {title}\n{clean_body}\nSource: {url}")
 
 
 
 
 
 
 
47
 
48
- return "\n\n".join(formatted_results)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
49
 
50
  except ImportError:
51
  logger.error("duckduckgo_search not installed")
52
- return "Web search unavailable - package not installed"
53
  except Exception as e:
54
- logger.error(f"Search error: {e}")
55
- return f"Search failed: {str(e)}"
 
 
 
 
56
 
57
  def calculate(expression: str) -> str:
58
  """
@@ -80,6 +268,14 @@ def calculate(expression: str) -> str:
80
  result = (percentage / 100) * number
81
  return str(int(result) if result.is_integer() else round(result, 6))
82
 
 
 
 
 
 
 
 
 
83
  # Handle word numbers
84
  word_to_num = {
85
  'zero': '0', 'one': '1', 'two': '2', 'three': '3', 'four': '4',
@@ -94,12 +290,12 @@ def calculate(expression: str) -> str:
94
  for word, num in word_to_num.items():
95
  expr = re.sub(rf'\b{word}\b', num, expr, flags=re.IGNORECASE)
96
 
97
- # Replace math words
98
  math_replacements = {
99
  r'\bplus\b': '+', r'\bminus\b': '-', r'\btimes\b': '*',
100
  r'\bmultiplied by\b': '*', r'\bdivided by\b': '/', r'\bover\b': '/',
101
  r'\bsquared\b': '**2', r'\bcubed\b': '**3',
102
- r'\bto the power of\b': '**', r'\bsquare root of\b': 'sqrt'
103
  }
104
 
105
  for pattern, replacement in math_replacements.items():
@@ -132,7 +328,8 @@ def calculate(expression: str) -> str:
132
  except Exception as e:
133
  logger.error(f"Calculation error: {e}")
134
  return "0"
135
-
 
136
  def analyze_file(content: str, file_type: str = "text") -> str:
137
  """
138
  Analyze file contents, especially CSV files.
@@ -341,6 +538,10 @@ def create_simple_persona_engine(llm):
341
  # Tool Creation
342
  # ==========================================
343
 
 
 
 
 
344
  def get_gaia_tools(llm=None):
345
  """
346
  Get all tools needed for GAIA evaluation.
@@ -355,12 +556,18 @@ def get_gaia_tools(llm=None):
355
  FunctionTool.from_defaults(
356
  fn=search_web,
357
  name="web_search",
358
- description="Search the web for current information, facts, news, or any data not in the knowledge base. Use for questions requiring up-to-date information."
 
 
 
 
 
359
  ),
 
360
  FunctionTool.from_defaults(
361
  fn=calculate,
362
  name="calculator",
363
- description="Perform mathematical calculations including arithmetic, percentages, and advanced math functions. ALWAYS use this for ANY mathematical computation."
364
  ),
365
  FunctionTool.from_defaults(
366
  fn=analyze_file,
@@ -370,7 +577,7 @@ def get_gaia_tools(llm=None):
370
  FunctionTool.from_defaults(
371
  fn=get_weather,
372
  name="weather",
373
- description="Get current weather information for any location."
374
  )
375
  ]
376
 
 
4
  """
5
 
6
  import os
7
+ import requests
8
  import logging
9
  import math
10
  import re
11
  from typing import List, Optional
12
  from llama_index.core.tools import FunctionTool, QueryEngineTool
13
 
14
+ # Set up better logging
15
  logger = logging.getLogger(__name__)
16
+ logger.setLevel(logging.INFO)
17
 
18
  # ==========================================
19
+ # Web Search Functions
20
  # ==========================================
21
 
22
  def search_web(query: str) -> str:
23
  """
24
+ Search the web for current information, verification, or when explicitly needed.
25
+ Prioritizes Google Search, then DuckDuckGo as fallback.
26
  """
27
+ logger.info(f"Web search requested for: {query}")
28
+
29
+ # Try Google Custom Search first
30
+ google_result = _search_google(query)
31
+ if google_result and not google_result.startswith("Google search"):
32
+ logger.info("Google search successful")
33
+ return google_result
34
+
35
+ # Fallback to DuckDuckGo
36
+ logger.info("Trying DuckDuckGo as fallback...")
37
+ ddg_result = _search_duckduckgo(query)
38
+ if ddg_result and not ddg_result.startswith("DuckDuckGo"):
39
+ return ddg_result
40
+
41
+ # If all searches fail
42
+ logger.warning("All web search methods failed")
43
+ return f"Web search unavailable. Please answer based on knowledge up to January 2025."
44
+
45
+ def _search_google(query: str) -> str:
46
+ """Search using Google Custom Search API"""
47
+ api_key = os.getenv("GOOGLE_API_KEY")
48
+ # Use the provided CSE ID or fall back to environment variable
49
+ cx = os.getenv("GOOGLE_CSE_ID", "746382dd3c2bd4135") # Your custom search engine ID
50
+
51
+ if not api_key:
52
+ logger.info("Google API key not found")
53
+ return "Google search not configured - no API key"
54
 
55
  try:
56
+ url = "https://www.googleapis.com/customsearch/v1"
57
+ params = {
58
+ "key": api_key,
59
+ "cx": cx,
60
+ "q": query,
61
+ "num": 5 # Get more results for better coverage
62
+ }
63
+
64
+ logger.info(f"Calling Google Search API for: {query}")
65
+ logger.debug(f"Using CSE ID: {cx}")
66
+
67
+ response = requests.get(url, params=params, timeout=10)
68
+
69
+ # Log response status for debugging
70
+ logger.info(f"Google API response status: {response.status_code}")
71
 
72
+ if response.status_code != 200:
73
+ error_data = response.json() if response.text else {}
74
+ error_msg = error_data.get('error', {}).get('message', 'Unknown error')
75
+ logger.error(f"Google API error: {error_msg}")
76
 
77
+ if response.status_code == 403:
78
+ return "Google search quota exceeded or API key invalid"
79
+ elif response.status_code == 400:
80
+ return f"Google search configuration error: {error_msg}"
81
+ else:
82
+ return f"Google search error (HTTP {response.status_code}): {error_msg}"
83
+
84
+ response.raise_for_status()
85
+
86
+ data = response.json()
87
+ items = data.get("items", [])
88
+
89
+ # Check if search returned results
90
+ total_results = data.get("searchInformation", {}).get("totalResults", "0")
91
+ logger.info(f"Google found {total_results} total results, returning {len(items)}")
92
+
93
+ if not items:
94
+ logger.warning("No Google search results found")
95
+ return "No Google search results found for this query"
96
+
97
+ # Format results with more context
98
+ formatted_results = []
99
+ for i, item in enumerate(items[:3], 1):
100
+ title = item.get("title", "")
101
+ snippet = item.get("snippet", "")
102
+ link = item.get("link", "")
103
+
104
+ # Clean up snippet
105
+ snippet = ' '.join(snippet.split())
106
 
107
+ formatted_results.append(f"{i}. {title}\n{snippet}\nSource: {link}")
108
+
109
+ return "\n\n".join(formatted_results)
110
+
111
+ except requests.exceptions.HTTPError as e:
112
+ logger.error(f"Google API HTTP error: {e}")
113
+ return f"Google search HTTP error: {e.response.status_code}"
114
+ except requests.exceptions.Timeout:
115
+ logger.error("Google API timeout")
116
+ return "Google search timeout - try again"
117
+ except requests.exceptions.ConnectionError:
118
+ logger.error("Google API connection error")
119
+ return "Google search connection error"
120
+ except Exception as e:
121
+ logger.error(f"Google search unexpected error: {type(e).__name__}: {e}")
122
+ return f"Google search failed: {str(e)[:100]}"
123
+
124
+ def _search_duckduckgo(query: str) -> str:
125
+ """Search using DuckDuckGo with robust error handling"""
126
+ try:
127
+ from duckduckgo_search import DDGS
128
+
129
+ logger.info(f"Trying DuckDuckGo search for: {query}")
130
+
131
+ # Try with timeout and different methods
132
+ try:
133
+ with DDGS(timeout=10) as ddgs:
134
+ results = []
135
+
136
+ # Try instant answers first (often more reliable)
137
+ try:
138
+ instant = ddgs.answers(query)
139
+ if instant:
140
+ for answer in instant[:1]: # Just take first answer
141
+ if answer.get('text'):
142
+ results.append({
143
+ 'title': 'Quick Answer',
144
+ 'body': answer['text'],
145
+ 'href': answer.get('url', 'DuckDuckGo Instant Answer')
146
+ })
147
+ except:
148
+ pass
149
+
150
+ # Then try text search
151
+ try:
152
+ # Try lite backend first (more reliable in HF Spaces)
153
+ text_results = list(ddgs.text(query, max_results=3, backend="lite"))
154
+ results.extend(text_results)
155
+ except:
156
+ # Fallback to API backend
157
+ try:
158
+ text_results = list(ddgs.text(query, max_results=3, backend="api"))
159
+ results.extend(text_results)
160
+ except:
161
+ pass
162
+
163
+ if not results:
164
+ logger.warning("No DuckDuckGo results found")
165
+ return "No DuckDuckGo results found"
166
 
167
+ # Format results
168
+ formatted_results = []
169
+ for i, result in enumerate(results[:3], 1):
170
+ title = result.get('title', '')
171
+ body = result.get('body', '')
172
+ url = result.get('href', '')
173
+
174
+ # Clean body text
175
+ clean_body = ' '.join(body.split())[:200]
176
+ if len(body) > 200:
177
+ clean_body += "..."
178
+
179
+ formatted_results.append(f"{i}. {title}\n{clean_body}\nSource: {url}")
180
 
181
+ logger.info(f"DuckDuckGo returned {len(results)} results")
182
+ return "\n\n".join(formatted_results)
183
+
184
+ except Exception as e:
185
+ logger.warning(f"DuckDuckGo DDGS method failed: {e}")
186
+
187
+ # Fallback to direct API call (doesn't require auth)
188
+ import requests
189
 
190
+ response = requests.get(
191
+ "https://api.duckduckgo.com/",
192
+ params={
193
+ "q": query,
194
+ "format": "json",
195
+ "no_html": "1",
196
+ "skip_disambig": "1"
197
+ },
198
+ timeout=5
199
+ )
200
+
201
+ if response.status_code == 200:
202
+ data = response.json()
203
+
204
+ results = []
205
+
206
+ # Get instant answer
207
+ if data.get("AbstractText"):
208
+ results.append(
209
+ f"1. Quick Answer\n{data['AbstractText']}\n"
210
+ f"Source: {data.get('AbstractURL', 'DuckDuckGo')}"
211
+ )
212
+
213
+ # Get definition if available
214
+ if data.get("Definition"):
215
+ results.append(
216
+ f"{len(results)+1}. Definition\n{data['Definition']}\n"
217
+ f"Source: {data.get('DefinitionURL', 'DuckDuckGo')}"
218
+ )
219
+
220
+ # Get answer if available
221
+ if data.get("Answer"):
222
+ results.append(
223
+ f"{len(results)+1}. Answer\n{data['Answer']}\n"
224
+ f"Source: DuckDuckGo Instant Answer"
225
+ )
226
+
227
+ if results:
228
+ return "\n\n".join(results)
229
+ else:
230
+ return "DuckDuckGo API returned no results"
231
+ else:
232
+ return f"DuckDuckGo API error: HTTP {response.status_code}"
233
 
234
  except ImportError:
235
  logger.error("duckduckgo_search not installed")
236
+ return "DuckDuckGo search unavailable - package not installed"
237
  except Exception as e:
238
+ logger.error(f"DuckDuckGo search error: {e}")
239
+ return f"DuckDuckGo search failed: {str(e)[:100]}"
240
+
241
+ # ==========================================
242
+ # Core Tool Functions
243
+ # ==========================================
244
 
245
  def calculate(expression: str) -> str:
246
  """
 
268
  result = (percentage / 100) * number
269
  return str(int(result) if result.is_integer() else round(result, 6))
270
 
271
+ # Handle square root BEFORE other replacements
272
+ if 'square root' in expr.lower():
273
+ match = re.search(r'square root of\s*(\d+(?:\.\d+)?)', expr, re.IGNORECASE)
274
+ if match:
275
+ number = float(match.group(1))
276
+ result = math.sqrt(number)
277
+ return str(int(result) if result.is_integer() else result)
278
+
279
  # Handle word numbers
280
  word_to_num = {
281
  'zero': '0', 'one': '1', 'two': '2', 'three': '3', 'four': '4',
 
290
  for word, num in word_to_num.items():
291
  expr = re.sub(rf'\b{word}\b', num, expr, flags=re.IGNORECASE)
292
 
293
+ # Replace math words (but NOT square root anymore since we handled it)
294
  math_replacements = {
295
  r'\bplus\b': '+', r'\bminus\b': '-', r'\btimes\b': '*',
296
  r'\bmultiplied by\b': '*', r'\bdivided by\b': '/', r'\bover\b': '/',
297
  r'\bsquared\b': '**2', r'\bcubed\b': '**3',
298
+ r'\bto the power of\b': '**'
299
  }
300
 
301
  for pattern, replacement in math_replacements.items():
 
328
  except Exception as e:
329
  logger.error(f"Calculation error: {e}")
330
  return "0"
331
+
332
+
333
  def analyze_file(content: str, file_type: str = "text") -> str:
334
  """
335
  Analyze file contents, especially CSV files.
 
538
  # Tool Creation
539
  # ==========================================
540
 
541
+ def get_my_tools(llm=None):
542
+ """Get all tools for the GAIA agent (alias maintained for compatibility)"""
543
+ return get_gaia_tools(llm)
544
+
545
  def get_gaia_tools(llm=None):
546
  """
547
  Get all tools needed for GAIA evaluation.
 
556
  FunctionTool.from_defaults(
557
  fn=search_web,
558
  name="web_search",
559
+ description="""Use ONLY for:
560
+ 1. Current events after January 2025
561
+ 2. Real-time data (stock prices, weather, sports scores)
562
+ 3. When question explicitly asks to "search" or "look up"
563
+ 4. To verify facts you're uncertain about
564
+ Do NOT use for general knowledge, historical facts, or math."""
565
  ),
566
+
567
  FunctionTool.from_defaults(
568
  fn=calculate,
569
  name="calculator",
570
+ description="ALWAYS use for ANY math calculation, including simple arithmetic like 2+2. Required for all numbers."
571
  ),
572
  FunctionTool.from_defaults(
573
  fn=analyze_file,
 
577
  FunctionTool.from_defaults(
578
  fn=get_weather,
579
  name="weather",
580
+ description="Get current weather information for any location. Use when asked about weather conditions."
581
  )
582
  ]
583