A newer version of the Gradio SDK is available:
6.5.1
metadata
title: GAIA Agent - Certification
emoji: π€
colorFrom: indigo
colorTo: purple
sdk: gradio
sdk_version: 5.25.2
app_file: evaluation_app.py
pinned: false
hf_oauth: true
hf_oauth_expiration_minutes: 480
GAIA Agent - Hugging Face Agents Course Certification
This is a LangGraph-based AI agent built to answer questions from the GAIA benchmark for the Hugging Face Agents Course Unit 4 certification.
Goal
Achieve 30%+ accuracy on the GAIA benchmark to earn the certification.
Agent Architecture
The agent is built using:
- LLM: Groq's Llama 3.3 70B (fast and free)
- Framework: LangGraph for agent orchestration
- Tools: 5 essential tools for maximum coverage
Tools Implemented
- Web Search (Tavily) - Search the internet for current information
- Wikipedia Search - Access encyclopedic knowledge (Wikipedia API)
- Calculator - Perform mathematical calculations
- Python Executor - Execute Python code for complex computations
- File Reader - Read CSV, JSON, and text files
Answer Format Rules
The agent follows GAIA's strict formatting requirements:
- Numbers: No commas, no units (unless requested)
- Text: No articles (a, an, the), no abbreviations
- Lists: Comma-separated with one space after commas
- Dates: ISO format (YYYY-MM-DD) unless specified
Usage
Local Testing
# Install dependencies
pip install -r requirements.txt
# Set up environment variables in .env
GROQ_API_KEY=your_key_here
TAVILY_API_KEY=your_key_here
# Test the agent
python test_agent.py
Running Evaluation
- Open the Space URL
- Log in with your HuggingFace account
- Click "Run Evaluation & Submit All Answers"
- Wait for results (takes ~1-2 hours due to rate limiting)
Project Structure
.
βββ agent.py # Main agent implementation
βββ evaluation_app.py # Gradio app for evaluation
βββ test_agent.py # Local testing script
βββ requirements.txt # Python dependencies
βββ .env # API keys (not committed)
βββ README.md # This file
Required API Keys
- GROQ_API_KEY: Get from console.groq.com
- TAVILY_API_KEY: Get from tavily.com
Expected Performance
With the current tool set:
- Web Search + Wikipedia + Calculator: ~25-30%
- + File Processing: ~35-40%
- + Python Execution: ~40-45%
Course Information
This project is part of the Hugging Face Agents Course Unit 4 certification.
License
MY License - Feel free to use and modify for your own certification!