File size: 5,891 Bytes
512bc6b
b0a3ba0
e01c471
48859a4
 
c57d4a8
 
 
 
 
 
e01c471
 
a4f05bc
394d24e
a4f05bc
 
 
394d24e
a4f05bc
394d24e
a4f05bc
394d24e
a4f05bc
 
 
 
 
394d24e
a4f05bc
394d24e
a4f05bc
 
 
 
394d24e
a4f05bc
 
 
 
394d24e
a4f05bc
 
 
 
 
 
 
394d24e
a4f05bc
 
 
 
394d24e
a4f05bc
394d24e
a4f05bc
 
 
 
 
 
 
 
 
 
 
394d24e
a4f05bc
394d24e
a4f05bc
 
 
394d24e
a4f05bc
 
 
394d24e
a4f05bc
 
 
394d24e
a4f05bc
 
 
394d24e
a4f05bc
394d24e
a4f05bc
 
 
 
 
394d24e
a4f05bc
 
8fd225d
a4f05bc
 
 
 
 
 
 
 
8fd225d
394d24e
a4f05bc
 
 
 
394d24e
a4f05bc
e828c8e
a4f05bc
 
 
 
 
 
e828c8e
a4f05bc
e828c8e
a4f05bc
 
 
 
e828c8e
a4f05bc
 
 
 
e828c8e
a4f05bc
 
 
 
e828c8e
a4f05bc
 
 
 
e828c8e
a4f05bc
e828c8e
a4f05bc
 
 
e828c8e
a4f05bc
 
 
e828c8e
a4f05bc
 
 
e01c471
a4f05bc
e01c471
a4f05bc
 
 
 
 
e01c471
a4f05bc
e01c471
a4f05bc
 
 
e828c8e
a4f05bc
e828c8e
a4f05bc
 
 
 
 
e828c8e
a4f05bc
e828c8e
a4f05bc
e828c8e
a4f05bc
 
 
 
 
e01c471
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
---
title: Isadora Teles - GAIA Agent - Final HF Agents Project
emoji: ๐Ÿค–
colorFrom: purple
colorTo: blue
sdk: gradio
sdk_version: 5.25.2
app_file: app.py
pinned: false
hf_oauth: true
hf_oauth_expiration_minutes: 480
---

# ๐ŸŽ“ My GAIA RAG Agent - AI Agents Course Final Project

**Author:** Isadora Teles  
**Course:** AI Agents with LlamaIndex  
**Goal:** Build an agent that achieves 30%+ on the GAIA benchmark

## ๐Ÿ“š Project Overview

This is my final project for the AI Agents course. I've built a RAG (Retrieval-Augmented Generation) agent to tackle the challenging GAIA benchmark, which tests AI agents on diverse real-world questions.

### What I Built
- **Multi-LLM Agent**: Supports 5+ different LLMs with automatic fallback
- **Custom Tools**: Web search, calculator, file analyzer, and more
- **Smart Answer Extraction**: Handles GAIA's exact-match requirements
- **Robust Error Handling**: Manages rate limits and API failures gracefully

## ๐Ÿš€ My Learning Journey

### Week 1: Initial Struggles
- Started with `AgentWorkflow` - too complex!
- Couldn't get past 0% due to answer formatting issues
- Learned that GAIA uses **exact string matching**

### Week 2: Architecture Switch
- Switched to `ReActAgent` - much simpler and more reliable
- Fixed LLM compatibility issues (especially with Groq)
- Discovered the importance of good system prompts

### Week 3: Fine-tuning
- Implemented comprehensive answer extraction
- Added special handling for:
  - Missing files โ†’ "No file provided"
  - Botanical fruits vs vegetables
  - Reversed text questions
  - Name extraction from verbose responses

### Week 4: Optimization
- Added multi-LLM fallback for rate limits
- Reduced token usage to conserve API limits
- Achieved **25%** and pushing for **30%+**!

## ๐Ÿ”ง Technical Architecture

```
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”     โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”     โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚   Multi-LLM     โ”‚โ”€โ”€โ”€โ”€โ–ถโ”‚ ReAct Agent  โ”‚โ”€โ”€โ”€โ”€โ–ถโ”‚    Tools    โ”‚
โ”‚   Manager       โ”‚     โ”‚              โ”‚     โ”‚             โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜     โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜     โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
         โ”‚                      โ”‚                     โ”‚
         โ–ผ                      โ–ผ                     โ–ผ
   [Gemini, Groq,         [Reasoning &          [Web Search,
    Claude, etc.]          Planning]            Calculator,
                                               File Analyzer]
```

## ๐Ÿ’ก Key Learnings

1. **Exact Match is Unforgiving**
   - "4 albums" โ‰  "4" in GAIA's evaluation
   - Every character matters!

2. **Simple > Complex**
   - ReActAgent outperformed AgentWorkflow
   - Clear prompts beat clever engineering

3. **Tool Design Matters**
   - Good descriptions guide the agent
   - Error messages should be actionable

4. **LLM Diversity is Key**
   - Different LLMs have different strengths
   - Rate limits require fallback strategies

## ๐Ÿ› ๏ธ Setup Instructions

### 1. Clone and Install
```bash
git clone [your-repo]
pip install -r requirements.txt
```

### 2. Set API Keys
Create a `.env` file or set in HuggingFace Spaces:
```
# Choose at least one LLM
GEMINI_API_KEY=your_key      # Recommended
GROQ_API_KEY=your_key        # Fast but limited
ANTHROPIC_API_KEY=your_key   # High quality

# For web search
GOOGLE_API_KEY=your_key
GOOGLE_CSE_ID=your_cse_id
```

### 3. Run Locally
```bash
python app.py
```

## ๐Ÿ“Š Performance Metrics

| Metric | Value | Notes |
|--------|-------|-------|
| Target Score | 30% | Course requirement |
| Current Best | 25% | Close to target! |
| Avg Response Time | 8-15s | Depends on LLM |
| Questions Handled | 20/20 | All question types |

## ๐ŸŽฏ GAIA Question Types I Handle

1. **Web Search Questions**
   - Current events
   - Wikipedia lookups
   - Fact verification

2. **Math & Calculations**
   - Arithmetic operations
   - Python code execution
   - Percentage calculations

3. **File Analysis**
   - CSV/Excel processing
   - Python code analysis
   - Missing file detection

4. **Special Cases**
   - Reversed text puzzles
   - Botanical classification
   - Name extraction

## ๐Ÿ› Known Issues & Solutions

### Issue 1: Rate Limits
**Problem:** Groq limits to 100k tokens/day  
**Solution:** Automatic LLM switching

### Issue 2: File Not Found
**Problem:** Questions mention files that aren't provided  
**Solution:** Return "No file provided" instead of error

### Issue 3: Long Answers
**Problem:** Agent gives explanations when only name needed  
**Solution:** Enhanced answer extraction with patterns

## ๐Ÿ”ฎ Future Improvements

If I had more time, I would:
1. Add vision capabilities for image questions
2. Implement caching to reduce API calls
3. Create a custom fine-tuned model
4. Add more sophisticated web scraping

## ๐Ÿ™ Acknowledgments

- **Course Instructors** - For the excellent LlamaIndex tutorials
- **GAIA Team** - For creating such a challenging benchmark
- **Open Source Community** - For all the amazing tools

## ๐Ÿ“ Lessons for Fellow Students

1. **Start Simple** - Don't overcomplicate your first version
2. **Log Everything** - Debugging is easier with good logs
3. **Test Incrementally** - Fix one question type at a time
4. **Read the Docs** - GAIA's exact requirements are crucial
5. **Ask for Help** - The community is super helpful!

## ๐ŸŽ‰ Final Thoughts

This project taught me that building AI agents is as much about handling edge cases as it is about the core logic. Every percentage point on GAIA represents hours of debugging and learning. 

Even if I don't hit 30%, I've learned invaluable lessons about:
- Production-ready agent development
- Multi-LLM orchestration
- Tool design and integration
- The importance of precise specifications