Spaces:
No application file
No application file
| # π€ GAIA Benchmark Agent (LangGraph) | |
| This is a LangGraph-powered agent for the [HuggingFace Agents Course](https://huggingface.co/learn/agents-course) Final Assignment. The agent is designed to solve GAIA benchmark questions and achieve a 30%+ score on Level 1 questions to earn the course certificate. | |
| ## π― Goal | |
| **Score 30% or higher** (6+ correct out of 20 questions) on the GAIA Level 1 benchmark to earn your Certificate of Completion. | |
| ## Architecture | |
| ``` | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| β LangGraph Workflow β | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ | |
| β β | |
| β βββββββββββ βββββββββββ ββββββββββββββββ β | |
| β β START ββββββΆβ Agent ββββββΆβ Should β β | |
| β βββββββββββ β Node β β Continue? β β | |
| β βββββββββββ ββββββββββββββββ β | |
| β β² β β β | |
| β β Yes β β No β | |
| β β βΌ β β | |
| β βββββββββββ βββββββββββΌβββββ β | |
| β β Tool βββββββ Extract β β | |
| β β Node β β Answer β β | |
| β βββββββββββ ββββββββββββββββ β | |
| β β β | |
| β βΌ β | |
| β βββββββββββ β | |
| β β END β β | |
| β βββββββββββ β | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| ``` | |
| ## Available Tools | |
| | Tool | Description | Use Case | | |
| |------|-------------|----------| | |
| | π `web_search` | DuckDuckGo web search | Current information, recent events, facts | | |
| | π `wikipedia_search` | Wikipedia API | Historical facts, biographies, definitions | | |
| | π `python_executor` | Python REPL | Calculations, data processing, analysis | | |
| | π `read_file` | File reader | PDFs, text files, Excel spreadsheets | | |
| | π’ `calculator` | Math evaluator | Quick mathematical calculations | | |
| ## π Setup | |
| ### Option 1: HuggingFace Spaces (Recommended for Certification) | |
| 1. **Fork/Duplicate this Space** to your HuggingFace account | |
| - Go to the Space and click "Duplicate this Space" | |
| - Choose a name and make it **Public** (required for certification) | |
| 2. **Add API Key** | |
| - Go to Space Settings > Secrets | |
| - Add a new secret: `OPENAI_API_KEY` with your OpenAI API key value | |
| - Click "Save secrets" | |
| 3. **Deploy** | |
| - The Space will automatically build and deploy | |
| - Wait for the build to complete (usually 2-5 minutes) | |
| 4. **Test and Submit** | |
| - Open the Space and test with a single question | |
| - Run the full benchmark | |
| - Submit to the leaderboard | |
| ### Option 2: Local Development | |
| ```bash | |
| # Clone the repository | |
| git clone https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE | |
| cd YOUR_SPACE | |
| # Create virtual environment (recommended) | |
| python -m venv venv | |
| source venv/bin/activate # On Windows: venv\Scripts\activate | |
| # Install dependencies | |
| pip install -r requirements.txt | |
| # Set environment variable | |
| export OPENAI_API_KEY="sk-..." # On Windows: set OPENAI_API_KEY=sk-... | |
| # Run the app | |
| python app.py | |
| ``` | |
| The app will be available at `http://localhost:7860` | |
| ## π Usage | |
| ### 1. Test Single Question | |
| - Click "Fetch & Solve Random Question" to test the agent on one question | |
| - Review the answer and validation status | |
| - This helps verify the agent is working correctly before running the full benchmark | |
| ### 2. Run Full Benchmark | |
| - Click "Run Agent on All Questions" | |
| - The process takes approximately 10-15 minutes | |
| - Progress is shown in real-time | |
| - Results are displayed in a table | |
| - Answers are automatically formatted for submission | |
| ### 3. Submit to Leaderboard | |
| - After running the benchmark, go to the "Submit to Leaderboard" tab | |
| - Enter your HuggingFace username | |
| - Enter your Space URL (must be public and end with `/tree/main`) | |
| - Answers JSON is auto-filled | |
| - Click "Submit to Leaderboard" | |
| - View your score and ranking | |
| ## π Tips for Better Scores | |
| ### Answer Formatting (Critical!) | |
| The GAIA benchmark uses **exact string matching**. Your answers must match the ground truth character-for-character. | |
| **β DO:** | |
| - Give just the number: `"42"` | |
| - Use exact spelling: `"John Smith"` | |
| - Comma-separated lists with NO spaces: `"apple,banana,cherry"` | |
| - Just "Yes" or "No" (capitalized) | |
| - Follow the date format specified in the question | |
| **β DON'T:** | |
| - Include prefixes like "FINAL ANSWER:" or "The answer is:" | |
| - Add explanations or context | |
| - Use different capitalization or spelling | |
| - Add spaces in comma-separated lists | |
| - Include units unless specifically requested | |
| ### Agent Strategy | |
| 1. **File Priority**: If a file is available, the agent reads it first - answers are often in the file | |
| 2. **Tool Selection**: The agent automatically chooses the best tool for each task | |
| 3. **Iteration Limit**: The agent has up to 15 iterations to solve each question | |
| 4. **Error Handling**: The agent gracefully handles errors and tries alternative approaches | |
| ### Best Practices | |
| 1. **Test First**: Always test with a single question before running the full benchmark | |
| 2. **Review Answers**: Check the validation status for each answer | |
| 3. **Verify Format**: Ensure answers don't contain prefixes or explanations | |
| 4. **Public Space**: Keep your Space public so the code link works for verification | |
| 5. **API Key**: Ensure your OpenAI API key has sufficient credits | |
| ## βοΈ Configuration | |
| ### Modifying the Agent | |
| The agent can be customized in `agent_enhanced.py`: | |
| - **Model**: Change `model_name` in `GAIAAgent.__init__()` (default: "gpt-4o") | |
| - **Temperature**: Adjust `temperature` (default: 0 for deterministic) | |
| - **Max Iterations**: Change `max_iterations` (default: 15) | |
| - **System Prompt**: Modify `SYSTEM_PROMPT` for different instructions | |
| - **Tools**: Add or remove tools from the `TOOLS` list | |
| ### Environment Variables | |
| - `OPENAI_API_KEY`: Required - Your OpenAI API key | |
| ## π Troubleshooting | |
| ### Common Issues | |
| **"Please provide your OpenAI API key"** | |
| - Ensure `OPENAI_API_KEY` is set in Space Secrets (for HF Spaces) or environment variables (for local) | |
| **"Failed to fetch questions from API"** | |
| - Check your internet connection | |
| - Verify the API URL is accessible: `https://agents-course-unit4-scoring.hf.space` | |
| - The API may be temporarily unavailable - try again later | |
| **"Agent error: ..."** | |
| - Check that your OpenAI API key is valid and has credits | |
| - Verify the model name is correct (e.g., "gpt-4o") | |
| - Review the error message for specific issues | |
| **"Submission error: ..."** | |
| - Ensure your Space URL is correct and public | |
| - Verify the URL ends with `/tree/main` (auto-added if missing) | |
| - Check that answers JSON is properly formatted | |
| - Ensure your HuggingFace username is correct | |
| **Low Scores (< 30%)** | |
| - Review answer formatting - exact matching is critical | |
| - Check that answers don't contain prefixes or explanations | |
| - Verify file reading is working (some questions require file analysis) | |
| - Consider increasing `max_iterations` for complex questions | |
| - Test with single questions to identify patterns | |
| ### Getting Help | |
| - Check the [Course Materials](https://huggingface.co/learn/agents-course/en/unit4/hands-on) | |
| - Review the [API Documentation](https://agents-course-unit4-scoring.hf.space/docs) | |
| - Check the [Student Leaderboard](https://huggingface.co/spaces/agents-course/Students_leaderboard) for examples | |
| - Review the [GAIA Benchmark Paper](https://huggingface.co/papers/2311.12983) | |
| ## π Project Structure | |
| ``` | |
| certification/ | |
| βββ app.py # Gradio interface and main entry point | |
| βββ agent_enhanced.py # LangGraph agent implementation | |
| βββ requirements.txt # Python dependencies | |
| βββ README.md # This file | |
| βββ .gitignore # Git ignore rules | |
| ``` | |
| ## π Important Links | |
| - [GAIA Benchmark Leaderboard](https://huggingface.co/spaces/gaia-benchmark/leaderboard) | |
| - [Student Leaderboard](https://huggingface.co/spaces/agents-course/Students_leaderboard) | |
| - [Course Unit 4 - Hands-On](https://huggingface.co/learn/agents-course/en/unit4/hands-on) | |
| - [API Documentation](https://agents-course-unit4-scoring.hf.space/docs) | |
| - [GAIA Paper](https://huggingface.co/papers/2311.12983) | |
| - [HuggingFace Agents Course](https://huggingface.co/learn/agents-course) | |
| ## π Scoring | |
| - **Target**: 30%+ (6+ correct out of 20 questions) | |
| - **Evaluation**: Exact string matching | |
| - **Questions**: 20 Level 1 questions from GAIA validation set | |
| - **Submission**: Via the API endpoint `/submit` | |
| ## π Certification | |
| Once you achieve 30% or higher: | |
| 1. Your score will appear on the Student Leaderboard | |
| 2. You'll earn the Certificate of Completion | |
| 3. Share your achievement! | |
| ## π License | |
| MIT License | |
| ## π Acknowledgments | |
| - Built for the [HuggingFace Agents Course](https://huggingface.co/learn/agents-course) | |
| - Uses [LangGraph](https://langchain-ai.github.io/langgraph/) for agent orchestration | |
| - Based on the [GAIA Benchmark](https://huggingface.co/papers/2311.12983) | |