|
|
--- |
|
|
title: GAIA Agent - Certification |
|
|
emoji: π€ |
|
|
colorFrom: indigo |
|
|
colorTo: purple |
|
|
sdk: gradio |
|
|
sdk_version: 5.25.2 |
|
|
app_file: evaluation_app.py |
|
|
pinned: false |
|
|
hf_oauth: true |
|
|
hf_oauth_expiration_minutes: 480 |
|
|
--- |
|
|
|
|
|
# GAIA Agent - Hugging Face Agents Course Certification |
|
|
|
|
|
This is a LangGraph-based AI agent built to answer questions from the GAIA benchmark for the Hugging Face Agents Course Unit 4 certification. |
|
|
|
|
|
## Goal |
|
|
|
|
|
Achieve **30%+ accuracy** on the GAIA benchmark to earn the certification. |
|
|
|
|
|
## Agent Architecture |
|
|
|
|
|
The agent is built using: |
|
|
- **LLM**: Groq's Llama 3.3 70B (fast and free) |
|
|
- **Framework**: LangGraph for agent orchestration |
|
|
- **Tools**: 5 essential tools for maximum coverage |
|
|
|
|
|
### Tools Implemented |
|
|
|
|
|
1. **Web Search** (Tavily) - Search the internet for current information |
|
|
2. **Wikipedia Search** - Access encyclopedic knowledge (Wikipedia API) |
|
|
3. **Calculator** - Perform mathematical calculations |
|
|
4. **Python Executor** - Execute Python code for complex computations |
|
|
5. **File Reader** - Read CSV, JSON, and text files |
|
|
|
|
|
## Answer Format Rules |
|
|
|
|
|
The agent follows GAIA's strict formatting requirements: |
|
|
- **Numbers**: No commas, no units (unless requested) |
|
|
- **Text**: No articles (a, an, the), no abbreviations |
|
|
- **Lists**: Comma-separated with one space after commas |
|
|
- **Dates**: ISO format (YYYY-MM-DD) unless specified |
|
|
|
|
|
## Usage |
|
|
|
|
|
### Local Testing |
|
|
|
|
|
```bash |
|
|
# Install dependencies |
|
|
pip install -r requirements.txt |
|
|
|
|
|
# Set up environment variables in .env |
|
|
GROQ_API_KEY=your_key_here |
|
|
TAVILY_API_KEY=your_key_here |
|
|
|
|
|
# Test the agent |
|
|
python test_agent.py |
|
|
``` |
|
|
|
|
|
### Running Evaluation |
|
|
|
|
|
1. Open the Space URL |
|
|
2. Log in with your HuggingFace account |
|
|
3. Click "Run Evaluation & Submit All Answers" |
|
|
4. Wait for results (takes ~1-2 hours due to rate limiting) |
|
|
|
|
|
## Project Structure |
|
|
|
|
|
``` |
|
|
. |
|
|
βββ agent.py # Main agent implementation |
|
|
βββ evaluation_app.py # Gradio app for evaluation |
|
|
βββ test_agent.py # Local testing script |
|
|
βββ requirements.txt # Python dependencies |
|
|
βββ .env # API keys (not committed) |
|
|
βββ README.md # This file |
|
|
``` |
|
|
|
|
|
## Required API Keys |
|
|
|
|
|
- **GROQ_API_KEY**: Get from [console.groq.com](https://console.groq.com) |
|
|
- **TAVILY_API_KEY**: Get from [tavily.com](https://tavily.com) |
|
|
|
|
|
## Expected Performance |
|
|
|
|
|
With the current tool set: |
|
|
- **Web Search + Wikipedia + Calculator**: ~25-30% |
|
|
- **+ File Processing**: ~35-40% |
|
|
- **+ Python Execution**: ~40-45% |
|
|
|
|
|
## Course Information |
|
|
|
|
|
This project is part of the [Hugging Face Agents Course](https://huggingface.co/learn/agents-course) Unit 4 certification. |
|
|
|
|
|
## License |
|
|
|
|
|
MY License - Feel free to use and modify for your own certification! |