Spaces:
No application file
A newer version of the Gradio SDK is available:
6.8.0
π€ GAIA Benchmark Agent (LangGraph)
This is a LangGraph-powered agent for the HuggingFace Agents Course Final Assignment. The agent is designed to solve GAIA benchmark questions and achieve a 30%+ score on Level 1 questions to earn the course certificate.
π― Goal
Score 30% or higher (6+ correct out of 20 questions) on the GAIA Level 1 benchmark to earn your Certificate of Completion.
Architecture
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β LangGraph Workflow β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β βββββββββββ βββββββββββ ββββββββββββββββ β
β β START ββββββΆβ Agent ββββββΆβ Should β β
β βββββββββββ β Node β β Continue? β β
β βββββββββββ ββββββββββββββββ β
β β² β β β
β β Yes β β No β
β β βΌ β β
β βββββββββββ βββββββββββΌβββββ β
β β Tool βββββββ Extract β β
β β Node β β Answer β β
β βββββββββββ ββββββββββββββββ β
β β β
β βΌ β
β βββββββββββ β
β β END β β
β βββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Available Tools
| Tool | Description | Use Case |
|---|---|---|
π web_search |
DuckDuckGo web search | Current information, recent events, facts |
π wikipedia_search |
Wikipedia API | Historical facts, biographies, definitions |
π python_executor |
Python REPL | Calculations, data processing, analysis |
π read_file |
File reader | PDFs, text files, Excel spreadsheets |
π’ calculator |
Math evaluator | Quick mathematical calculations |
π Setup
Option 1: HuggingFace Spaces (Recommended for Certification)
Fork/Duplicate this Space to your HuggingFace account
- Go to the Space and click "Duplicate this Space"
- Choose a name and make it Public (required for certification)
Add API Key
- Go to Space Settings > Secrets
- Add a new secret:
OPENAI_API_KEYwith your OpenAI API key value - Click "Save secrets"
Deploy
- The Space will automatically build and deploy
- Wait for the build to complete (usually 2-5 minutes)
Test and Submit
- Open the Space and test with a single question
- Run the full benchmark
- Submit to the leaderboard
Option 2: Local Development
# Clone the repository
git clone https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE
cd YOUR_SPACE
# Create virtual environment (recommended)
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Set environment variable
export OPENAI_API_KEY="sk-..." # On Windows: set OPENAI_API_KEY=sk-...
# Run the app
python app.py
The app will be available at http://localhost:7860
π Usage
1. Test Single Question
- Click "Fetch & Solve Random Question" to test the agent on one question
- Review the answer and validation status
- This helps verify the agent is working correctly before running the full benchmark
2. Run Full Benchmark
- Click "Run Agent on All Questions"
- The process takes approximately 10-15 minutes
- Progress is shown in real-time
- Results are displayed in a table
- Answers are automatically formatted for submission
3. Submit to Leaderboard
- After running the benchmark, go to the "Submit to Leaderboard" tab
- Enter your HuggingFace username
- Enter your Space URL (must be public and end with
/tree/main) - Answers JSON is auto-filled
- Click "Submit to Leaderboard"
- View your score and ranking
π Tips for Better Scores
Answer Formatting (Critical!)
The GAIA benchmark uses exact string matching. Your answers must match the ground truth character-for-character.
β DO:
- Give just the number:
"42" - Use exact spelling:
"John Smith" - Comma-separated lists with NO spaces:
"apple,banana,cherry" - Just "Yes" or "No" (capitalized)
- Follow the date format specified in the question
β DON'T:
- Include prefixes like "FINAL ANSWER:" or "The answer is:"
- Add explanations or context
- Use different capitalization or spelling
- Add spaces in comma-separated lists
- Include units unless specifically requested
Agent Strategy
- File Priority: If a file is available, the agent reads it first - answers are often in the file
- Tool Selection: The agent automatically chooses the best tool for each task
- Iteration Limit: The agent has up to 15 iterations to solve each question
- Error Handling: The agent gracefully handles errors and tries alternative approaches
Best Practices
- Test First: Always test with a single question before running the full benchmark
- Review Answers: Check the validation status for each answer
- Verify Format: Ensure answers don't contain prefixes or explanations
- Public Space: Keep your Space public so the code link works for verification
- API Key: Ensure your OpenAI API key has sufficient credits
βοΈ Configuration
Modifying the Agent
The agent can be customized in agent_enhanced.py:
- Model: Change
model_nameinGAIAAgent.__init__()(default: "gpt-4o") - Temperature: Adjust
temperature(default: 0 for deterministic) - Max Iterations: Change
max_iterations(default: 15) - System Prompt: Modify
SYSTEM_PROMPTfor different instructions - Tools: Add or remove tools from the
TOOLSlist
Environment Variables
OPENAI_API_KEY: Required - Your OpenAI API key
π Troubleshooting
Common Issues
"Please provide your OpenAI API key"
- Ensure
OPENAI_API_KEYis set in Space Secrets (for HF Spaces) or environment variables (for local)
"Failed to fetch questions from API"
- Check your internet connection
- Verify the API URL is accessible:
https://agents-course-unit4-scoring.hf.space - The API may be temporarily unavailable - try again later
"Agent error: ..."
- Check that your OpenAI API key is valid and has credits
- Verify the model name is correct (e.g., "gpt-4o")
- Review the error message for specific issues
"Submission error: ..."
- Ensure your Space URL is correct and public
- Verify the URL ends with
/tree/main(auto-added if missing) - Check that answers JSON is properly formatted
- Ensure your HuggingFace username is correct
Low Scores (< 30%)
- Review answer formatting - exact matching is critical
- Check that answers don't contain prefixes or explanations
- Verify file reading is working (some questions require file analysis)
- Consider increasing
max_iterationsfor complex questions - Test with single questions to identify patterns
Getting Help
- Check the Course Materials
- Review the API Documentation
- Check the Student Leaderboard for examples
- Review the GAIA Benchmark Paper
π Project Structure
certification/
βββ app.py # Gradio interface and main entry point
βββ agent_enhanced.py # LangGraph agent implementation
βββ requirements.txt # Python dependencies
βββ README.md # This file
βββ .gitignore # Git ignore rules
π Important Links
- GAIA Benchmark Leaderboard
- Student Leaderboard
- Course Unit 4 - Hands-On
- API Documentation
- GAIA Paper
- HuggingFace Agents Course
π Scoring
- Target: 30%+ (6+ correct out of 20 questions)
- Evaluation: Exact string matching
- Questions: 20 Level 1 questions from GAIA validation set
- Submission: Via the API endpoint
/submit
π Certification
Once you achieve 30% or higher:
- Your score will appear on the Student Leaderboard
- You'll earn the Certificate of Completion
- Share your achievement!
π License
MIT License
π Acknowledgments
- Built for the HuggingFace Agents Course
- Uses LangGraph for agent orchestration
- Based on the GAIA Benchmark