# πŸ€– GAIA Benchmark Agent (LangGraph) This is a LangGraph-powered agent for the [HuggingFace Agents Course](https://huggingface.co/learn/agents-course) Final Assignment. The agent is designed to solve GAIA benchmark questions and achieve a 30%+ score on Level 1 questions to earn the course certificate. ## 🎯 Goal **Score 30% or higher** (6+ correct out of 20 questions) on the GAIA Level 1 benchmark to earn your Certificate of Completion. ## Architecture ``` β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ LangGraph Workflow β”‚ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚ β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ START │────▢│ Agent │────▢│ Should β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ Node β”‚ β”‚ Continue? β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β–² β”‚ β”‚ β”‚ β”‚ β”‚ Yes β”‚ β”‚ No β”‚ β”‚ β”‚ β–Ό β”‚ β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β” β”‚ β”‚ β”‚ Tool │◀────│ Extract β”‚ β”‚ β”‚ β”‚ Node β”‚ β”‚ Answer β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β”‚ β”‚ β”‚ β–Ό β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ END β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ ``` ## Available Tools | Tool | Description | Use Case | |------|-------------|----------| | πŸ” `web_search` | DuckDuckGo web search | Current information, recent events, facts | | πŸ“š `wikipedia_search` | Wikipedia API | Historical facts, biographies, definitions | | 🐍 `python_executor` | Python REPL | Calculations, data processing, analysis | | πŸ“„ `read_file` | File reader | PDFs, text files, Excel spreadsheets | | πŸ”’ `calculator` | Math evaluator | Quick mathematical calculations | ## πŸš€ Setup ### Option 1: HuggingFace Spaces (Recommended for Certification) 1. **Fork/Duplicate this Space** to your HuggingFace account - Go to the Space and click "Duplicate this Space" - Choose a name and make it **Public** (required for certification) 2. **Add API Key** - Go to Space Settings > Secrets - Add a new secret: `OPENAI_API_KEY` with your OpenAI API key value - Click "Save secrets" 3. **Deploy** - The Space will automatically build and deploy - Wait for the build to complete (usually 2-5 minutes) 4. **Test and Submit** - Open the Space and test with a single question - Run the full benchmark - Submit to the leaderboard ### Option 2: Local Development ```bash # Clone the repository git clone https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE cd YOUR_SPACE # Create virtual environment (recommended) python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate # Install dependencies pip install -r requirements.txt # Set environment variable export OPENAI_API_KEY="sk-..." # On Windows: set OPENAI_API_KEY=sk-... # Run the app python app.py ``` The app will be available at `http://localhost:7860` ## πŸ“– Usage ### 1. Test Single Question - Click "Fetch & Solve Random Question" to test the agent on one question - Review the answer and validation status - This helps verify the agent is working correctly before running the full benchmark ### 2. Run Full Benchmark - Click "Run Agent on All Questions" - The process takes approximately 10-15 minutes - Progress is shown in real-time - Results are displayed in a table - Answers are automatically formatted for submission ### 3. Submit to Leaderboard - After running the benchmark, go to the "Submit to Leaderboard" tab - Enter your HuggingFace username - Enter your Space URL (must be public and end with `/tree/main`) - Answers JSON is auto-filled - Click "Submit to Leaderboard" - View your score and ranking ## πŸŽ“ Tips for Better Scores ### Answer Formatting (Critical!) The GAIA benchmark uses **exact string matching**. Your answers must match the ground truth character-for-character. **βœ… DO:** - Give just the number: `"42"` - Use exact spelling: `"John Smith"` - Comma-separated lists with NO spaces: `"apple,banana,cherry"` - Just "Yes" or "No" (capitalized) - Follow the date format specified in the question **❌ DON'T:** - Include prefixes like "FINAL ANSWER:" or "The answer is:" - Add explanations or context - Use different capitalization or spelling - Add spaces in comma-separated lists - Include units unless specifically requested ### Agent Strategy 1. **File Priority**: If a file is available, the agent reads it first - answers are often in the file 2. **Tool Selection**: The agent automatically chooses the best tool for each task 3. **Iteration Limit**: The agent has up to 15 iterations to solve each question 4. **Error Handling**: The agent gracefully handles errors and tries alternative approaches ### Best Practices 1. **Test First**: Always test with a single question before running the full benchmark 2. **Review Answers**: Check the validation status for each answer 3. **Verify Format**: Ensure answers don't contain prefixes or explanations 4. **Public Space**: Keep your Space public so the code link works for verification 5. **API Key**: Ensure your OpenAI API key has sufficient credits ## βš™οΈ Configuration ### Modifying the Agent The agent can be customized in `agent_enhanced.py`: - **Model**: Change `model_name` in `GAIAAgent.__init__()` (default: "gpt-4o") - **Temperature**: Adjust `temperature` (default: 0 for deterministic) - **Max Iterations**: Change `max_iterations` (default: 15) - **System Prompt**: Modify `SYSTEM_PROMPT` for different instructions - **Tools**: Add or remove tools from the `TOOLS` list ### Environment Variables - `OPENAI_API_KEY`: Required - Your OpenAI API key ## πŸ› Troubleshooting ### Common Issues **"Please provide your OpenAI API key"** - Ensure `OPENAI_API_KEY` is set in Space Secrets (for HF Spaces) or environment variables (for local) **"Failed to fetch questions from API"** - Check your internet connection - Verify the API URL is accessible: `https://agents-course-unit4-scoring.hf.space` - The API may be temporarily unavailable - try again later **"Agent error: ..."** - Check that your OpenAI API key is valid and has credits - Verify the model name is correct (e.g., "gpt-4o") - Review the error message for specific issues **"Submission error: ..."** - Ensure your Space URL is correct and public - Verify the URL ends with `/tree/main` (auto-added if missing) - Check that answers JSON is properly formatted - Ensure your HuggingFace username is correct **Low Scores (< 30%)** - Review answer formatting - exact matching is critical - Check that answers don't contain prefixes or explanations - Verify file reading is working (some questions require file analysis) - Consider increasing `max_iterations` for complex questions - Test with single questions to identify patterns ### Getting Help - Check the [Course Materials](https://huggingface.co/learn/agents-course/en/unit4/hands-on) - Review the [API Documentation](https://agents-course-unit4-scoring.hf.space/docs) - Check the [Student Leaderboard](https://huggingface.co/spaces/agents-course/Students_leaderboard) for examples - Review the [GAIA Benchmark Paper](https://huggingface.co/papers/2311.12983) ## πŸ“ Project Structure ``` certification/ β”œβ”€β”€ app.py # Gradio interface and main entry point β”œβ”€β”€ agent_enhanced.py # LangGraph agent implementation β”œβ”€β”€ requirements.txt # Python dependencies β”œβ”€β”€ README.md # This file └── .gitignore # Git ignore rules ``` ## πŸ”— Important Links - [GAIA Benchmark Leaderboard](https://huggingface.co/spaces/gaia-benchmark/leaderboard) - [Student Leaderboard](https://huggingface.co/spaces/agents-course/Students_leaderboard) - [Course Unit 4 - Hands-On](https://huggingface.co/learn/agents-course/en/unit4/hands-on) - [API Documentation](https://agents-course-unit4-scoring.hf.space/docs) - [GAIA Paper](https://huggingface.co/papers/2311.12983) - [HuggingFace Agents Course](https://huggingface.co/learn/agents-course) ## πŸ“Š Scoring - **Target**: 30%+ (6+ correct out of 20 questions) - **Evaluation**: Exact string matching - **Questions**: 20 Level 1 questions from GAIA validation set - **Submission**: Via the API endpoint `/submit` ## πŸ† Certification Once you achieve 30% or higher: 1. Your score will appear on the Student Leaderboard 2. You'll earn the Certificate of Completion 3. Share your achievement! ## πŸ“ License MIT License ## πŸ™ Acknowledgments - Built for the [HuggingFace Agents Course](https://huggingface.co/learn/agents-course) - Uses [LangGraph](https://langchain-ai.github.io/langgraph/) for agent orchestration - Based on the [GAIA Benchmark](https://huggingface.co/papers/2311.12983)