AheedTahir's picture
Final Working Implementation
223e45d
---
title: GAIA Agent - Certification
emoji: πŸ€–
colorFrom: indigo
colorTo: purple
sdk: gradio
sdk_version: 5.25.2
app_file: evaluation_app.py
pinned: false
hf_oauth: true
hf_oauth_expiration_minutes: 480
---
# GAIA Agent - Hugging Face Agents Course Certification
This is a LangGraph-based AI agent built to answer questions from the GAIA benchmark for the Hugging Face Agents Course Unit 4 certification.
## Goal
Achieve **30%+ accuracy** on the GAIA benchmark to earn the certification.
## Agent Architecture
The agent is built using:
- **LLM**: Groq's Llama 3.3 70B (fast and free)
- **Framework**: LangGraph for agent orchestration
- **Tools**: 5 essential tools for maximum coverage
### Tools Implemented
1. **Web Search** (Tavily) - Search the internet for current information
2. **Wikipedia Search** - Access encyclopedic knowledge (Wikipedia API)
3. **Calculator** - Perform mathematical calculations
4. **Python Executor** - Execute Python code for complex computations
5. **File Reader** - Read CSV, JSON, and text files
## Answer Format Rules
The agent follows GAIA's strict formatting requirements:
- **Numbers**: No commas, no units (unless requested)
- **Text**: No articles (a, an, the), no abbreviations
- **Lists**: Comma-separated with one space after commas
- **Dates**: ISO format (YYYY-MM-DD) unless specified
## Usage
### Local Testing
```bash
# Install dependencies
pip install -r requirements.txt
# Set up environment variables in .env
GROQ_API_KEY=your_key_here
TAVILY_API_KEY=your_key_here
# Test the agent
python test_agent.py
```
### Running Evaluation
1. Open the Space URL
2. Log in with your HuggingFace account
3. Click "Run Evaluation & Submit All Answers"
4. Wait for results (takes ~1-2 hours due to rate limiting)
## Project Structure
```
.
β”œβ”€β”€ agent.py # Main agent implementation
β”œβ”€β”€ evaluation_app.py # Gradio app for evaluation
β”œβ”€β”€ test_agent.py # Local testing script
β”œβ”€β”€ requirements.txt # Python dependencies
β”œβ”€β”€ .env # API keys (not committed)
└── README.md # This file
```
## Required API Keys
- **GROQ_API_KEY**: Get from [console.groq.com](https://console.groq.com)
- **TAVILY_API_KEY**: Get from [tavily.com](https://tavily.com)
## Expected Performance
With the current tool set:
- **Web Search + Wikipedia + Calculator**: ~25-30%
- **+ File Processing**: ~35-40%
- **+ Python Execution**: ~40-45%
## Course Information
This project is part of the [Hugging Face Agents Course](https://huggingface.co/learn/agents-course) Unit 4 certification.
## License
MY License - Feel free to use and modify for your own certification!