Final_Assignment_Template

Sleeping

App Files Files Community

Final_Assignment_Template / README.md

AheedTahir

Final Working Implementation

223e45d 3 months ago

preview code

raw

history blame contribute delete

2.7 kB

A newer version of the Gradio SDK is available: 6.5.1

Upgrade

metadata

title: GAIA Agent - Certification
emoji: 🤖
colorFrom: indigo
colorTo: purple
sdk: gradio
sdk_version: 5.25.2
app_file: evaluation_app.py
pinned: false
hf_oauth: true
hf_oauth_expiration_minutes: 480

GAIA Agent - Hugging Face Agents Course Certification

This is a LangGraph-based AI agent built to answer questions from the GAIA benchmark for the Hugging Face Agents Course Unit 4 certification.

Goal

Achieve 30%+ accuracy on the GAIA benchmark to earn the certification.

Agent Architecture

The agent is built using:

LLM: Groq's Llama 3.3 70B (fast and free)
Framework: LangGraph for agent orchestration
Tools: 5 essential tools for maximum coverage

Tools Implemented

Web Search (Tavily) - Search the internet for current information
Wikipedia Search - Access encyclopedic knowledge (Wikipedia API)
Calculator - Perform mathematical calculations
Python Executor - Execute Python code for complex computations
File Reader - Read CSV, JSON, and text files

Answer Format Rules

The agent follows GAIA's strict formatting requirements:

Numbers: No commas, no units (unless requested)
Text: No articles (a, an, the), no abbreviations
Lists: Comma-separated with one space after commas
Dates: ISO format (YYYY-MM-DD) unless specified

Usage

Local Testing

# Install dependencies
pip install -r requirements.txt

# Set up environment variables in .env
GROQ_API_KEY=your_key_here
TAVILY_API_KEY=your_key_here

# Test the agent
python test_agent.py

Running Evaluation

Open the Space URL
Log in with your HuggingFace account
Click "Run Evaluation & Submit All Answers"
Wait for results (takes ~1-2 hours due to rate limiting)

Project Structure

.
├── agent.py              # Main agent implementation
├── evaluation_app.py     # Gradio app for evaluation
├── test_agent.py         # Local testing script
├── requirements.txt      # Python dependencies
├── .env                  # API keys (not committed)
└── README.md            # This file

Required API Keys

GROQ_API_KEY: Get from console.groq.com
TAVILY_API_KEY: Get from tavily.com

Expected Performance

With the current tool set:

Web Search + Wikipedia + Calculator: ~25-30%
+ File Processing: ~35-40%
+ Python Execution: ~40-45%

Course Information

This project is part of the Hugging Face Agents Course Unit 4 certification.

License

MY License - Feel free to use and modify for your own certification!