--- title: GAIA Agent - Certification emoji: 🤖 colorFrom: indigo colorTo: purple sdk: gradio sdk_version: 5.25.2 app_file: evaluation_app.py pinned: false hf_oauth: true hf_oauth_expiration_minutes: 480 --- # GAIA Agent - Hugging Face Agents Course Certification This is a LangGraph-based AI agent built to answer questions from the GAIA benchmark for the Hugging Face Agents Course Unit 4 certification. ## Goal Achieve **30%+ accuracy** on the GAIA benchmark to earn the certification. ## Agent Architecture The agent is built using: - **LLM**: Groq's Llama 3.3 70B (fast and free) - **Framework**: LangGraph for agent orchestration - **Tools**: 5 essential tools for maximum coverage ### Tools Implemented 1. **Web Search** (Tavily) - Search the internet for current information 2. **Wikipedia Search** - Access encyclopedic knowledge (Wikipedia API) 3. **Calculator** - Perform mathematical calculations 4. **Python Executor** - Execute Python code for complex computations 5. **File Reader** - Read CSV, JSON, and text files ## Answer Format Rules The agent follows GAIA's strict formatting requirements: - **Numbers**: No commas, no units (unless requested) - **Text**: No articles (a, an, the), no abbreviations - **Lists**: Comma-separated with one space after commas - **Dates**: ISO format (YYYY-MM-DD) unless specified ## Usage ### Local Testing ```bash # Install dependencies pip install -r requirements.txt # Set up environment variables in .env GROQ_API_KEY=your_key_here TAVILY_API_KEY=your_key_here # Test the agent python test_agent.py ``` ### Running Evaluation 1. Open the Space URL 2. Log in with your HuggingFace account 3. Click "Run Evaluation & Submit All Answers" 4. Wait for results (takes ~1-2 hours due to rate limiting) ## Project Structure ``` . ├── agent.py # Main agent implementation ├── evaluation_app.py # Gradio app for evaluation ├── test_agent.py # Local testing script ├── requirements.txt # Python dependencies ├── .env # API keys (not committed) └── README.md # This file ``` ## Required API Keys - **GROQ_API_KEY**: Get from [console.groq.com](https://console.groq.com) - **TAVILY_API_KEY**: Get from [tavily.com](https://tavily.com) ## Expected Performance With the current tool set: - **Web Search + Wikipedia + Calculator**: ~25-30% - **+ File Processing**: ~35-40% - **+ Python Execution**: ~40-45% ## Course Information This project is part of the [Hugging Face Agents Course](https://huggingface.co/learn/agents-course) Unit 4 certification. ## License MY License - Feel free to use and modify for your own certification!