AheedTahir's picture
Final Working Implementation
223e45d

A newer version of the Gradio SDK is available: 6.5.1

Upgrade
metadata
title: GAIA Agent - Certification
emoji: πŸ€–
colorFrom: indigo
colorTo: purple
sdk: gradio
sdk_version: 5.25.2
app_file: evaluation_app.py
pinned: false
hf_oauth: true
hf_oauth_expiration_minutes: 480

GAIA Agent - Hugging Face Agents Course Certification

This is a LangGraph-based AI agent built to answer questions from the GAIA benchmark for the Hugging Face Agents Course Unit 4 certification.

Goal

Achieve 30%+ accuracy on the GAIA benchmark to earn the certification.

Agent Architecture

The agent is built using:

  • LLM: Groq's Llama 3.3 70B (fast and free)
  • Framework: LangGraph for agent orchestration
  • Tools: 5 essential tools for maximum coverage

Tools Implemented

  1. Web Search (Tavily) - Search the internet for current information
  2. Wikipedia Search - Access encyclopedic knowledge (Wikipedia API)
  3. Calculator - Perform mathematical calculations
  4. Python Executor - Execute Python code for complex computations
  5. File Reader - Read CSV, JSON, and text files

Answer Format Rules

The agent follows GAIA's strict formatting requirements:

  • Numbers: No commas, no units (unless requested)
  • Text: No articles (a, an, the), no abbreviations
  • Lists: Comma-separated with one space after commas
  • Dates: ISO format (YYYY-MM-DD) unless specified

Usage

Local Testing

# Install dependencies
pip install -r requirements.txt

# Set up environment variables in .env
GROQ_API_KEY=your_key_here
TAVILY_API_KEY=your_key_here

# Test the agent
python test_agent.py

Running Evaluation

  1. Open the Space URL
  2. Log in with your HuggingFace account
  3. Click "Run Evaluation & Submit All Answers"
  4. Wait for results (takes ~1-2 hours due to rate limiting)

Project Structure

.
β”œβ”€β”€ agent.py              # Main agent implementation
β”œβ”€β”€ evaluation_app.py     # Gradio app for evaluation
β”œβ”€β”€ test_agent.py         # Local testing script
β”œβ”€β”€ requirements.txt      # Python dependencies
β”œβ”€β”€ .env                  # API keys (not committed)
└── README.md            # This file

Required API Keys

Expected Performance

With the current tool set:

  • Web Search + Wikipedia + Calculator: ~25-30%
  • + File Processing: ~35-40%
  • + Python Execution: ~40-45%

Course Information

This project is part of the Hugging Face Agents Course Unit 4 certification.

License

MY License - Feel free to use and modify for your own certification!