Final_Assignment_AGENT_GAIA

Sleeping

App Files Files Community

Final_Assignment_AGENT_GAIA / README.md

Isateles

Update GAIA agent-refactor

a4f05bc 9 months ago

preview code

raw

history blame contribute delete

5.89 kB

A newer version of the Gradio SDK is available: 6.6.0

Upgrade

metadata

title: Isadora Teles - GAIA Agent - Final HF Agents Project
emoji: 🤖
colorFrom: purple
colorTo: blue
sdk: gradio
sdk_version: 5.25.2
app_file: app.py
pinned: false
hf_oauth: true
hf_oauth_expiration_minutes: 480

🎓 My GAIA RAG Agent - AI Agents Course Final Project

Author: Isadora Teles
Course: AI Agents with LlamaIndex
Goal: Build an agent that achieves 30%+ on the GAIA benchmark

📚 Project Overview

This is my final project for the AI Agents course. I've built a RAG (Retrieval-Augmented Generation) agent to tackle the challenging GAIA benchmark, which tests AI agents on diverse real-world questions.

What I Built

Multi-LLM Agent: Supports 5+ different LLMs with automatic fallback
Custom Tools: Web search, calculator, file analyzer, and more
Smart Answer Extraction: Handles GAIA's exact-match requirements
Robust Error Handling: Manages rate limits and API failures gracefully

🚀 My Learning Journey

Week 1: Initial Struggles

Started with AgentWorkflow - too complex!
Couldn't get past 0% due to answer formatting issues
Learned that GAIA uses exact string matching

Week 2: Architecture Switch

Switched to ReActAgent - much simpler and more reliable
Fixed LLM compatibility issues (especially with Groq)
Discovered the importance of good system prompts

Week 3: Fine-tuning

Implemented comprehensive answer extraction
Added special handling for:
- Missing files → "No file provided"
- Botanical fruits vs vegetables
- Reversed text questions
- Name extraction from verbose responses

Week 4: Optimization

Added multi-LLM fallback for rate limits
Reduced token usage to conserve API limits
Achieved 25% and pushing for 30%+!

🔧 Technical Architecture

┌─────────────────┐     ┌──────────────┐     ┌─────────────┐
│   Multi-LLM     │────▶│ ReAct Agent  │────▶│    Tools    │
│   Manager       │     │              │     │             │
└─────────────────┘     └──────────────┘     └─────────────┘
         │                      │                     │
         ▼                      ▼                     ▼
   [Gemini, Groq,         [Reasoning &          [Web Search,
    Claude, etc.]          Planning]            Calculator,
                                               File Analyzer]

💡 Key Learnings

Exact Match is Unforgiving
- "4 albums" ≠ "4" in GAIA's evaluation
- Every character matters!
Simple > Complex
- ReActAgent outperformed AgentWorkflow
- Clear prompts beat clever engineering
Tool Design Matters
- Good descriptions guide the agent
- Error messages should be actionable
LLM Diversity is Key
- Different LLMs have different strengths
- Rate limits require fallback strategies

🛠️ Setup Instructions

1. Clone and Install

git clone [your-repo]
pip install -r requirements.txt

2. Set API Keys

Create a .env file or set in HuggingFace Spaces:

# Choose at least one LLM
GEMINI_API_KEY=your_key      # Recommended
GROQ_API_KEY=your_key        # Fast but limited
ANTHROPIC_API_KEY=your_key   # High quality

# For web search
GOOGLE_API_KEY=your_key
GOOGLE_CSE_ID=your_cse_id

3. Run Locally

python app.py

📊 Performance Metrics

Metric	Value	Notes
Target Score	30%	Course requirement
Current Best	25%	Close to target!
Avg Response Time	8-15s	Depends on LLM
Questions Handled	20/20	All question types

🎯 GAIA Question Types I Handle

Web Search Questions
- Current events
- Wikipedia lookups
- Fact verification
Math & Calculations
- Arithmetic operations
- Python code execution
- Percentage calculations
File Analysis
- CSV/Excel processing
- Python code analysis
- Missing file detection
Special Cases
- Reversed text puzzles
- Botanical classification
- Name extraction

🐛 Known Issues & Solutions

Issue 1: Rate Limits

Problem: Groq limits to 100k tokens/day
Solution: Automatic LLM switching

Issue 2: File Not Found

Problem: Questions mention files that aren't provided
Solution: Return "No file provided" instead of error

Issue 3: Long Answers

Problem: Agent gives explanations when only name needed
Solution: Enhanced answer extraction with patterns

🔮 Future Improvements

If I had more time, I would:

Add vision capabilities for image questions
Implement caching to reduce API calls
Create a custom fine-tuned model
Add more sophisticated web scraping

🙏 Acknowledgments

Course Instructors - For the excellent LlamaIndex tutorials
GAIA Team - For creating such a challenging benchmark
Open Source Community - For all the amazing tools

📝 Lessons for Fellow Students

Start Simple - Don't overcomplicate your first version
Log Everything - Debugging is easier with good logs
Test Incrementally - Fix one question type at a time
Read the Docs - GAIA's exact requirements are crucial
Ask for Help - The community is super helpful!

🎉 Final Thoughts

This project taught me that building AI agents is as much about handling edge cases as it is about the core logic. Every percentage point on GAIA represents hours of debugging and learning.

Even if I don't hit 30%, I've learned invaluable lessons about:

Production-ready agent development
Multi-LLM orchestration
Tool design and integration
The importance of precise specifications