Spaces:
No application file
No application file
File size: 10,026 Bytes
0b90c85 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 | # π€ GAIA Benchmark Agent (LangGraph)
This is a LangGraph-powered agent for the [HuggingFace Agents Course](https://huggingface.co/learn/agents-course) Final Assignment. The agent is designed to solve GAIA benchmark questions and achieve a 30%+ score on Level 1 questions to earn the course certificate.
## π― Goal
**Score 30% or higher** (6+ correct out of 20 questions) on the GAIA Level 1 benchmark to earn your Certificate of Completion.
## Architecture
```
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β LangGraph Workflow β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β βββββββββββ βββββββββββ ββββββββββββββββ β
β β START ββββββΆβ Agent ββββββΆβ Should β β
β βββββββββββ β Node β β Continue? β β
β βββββββββββ ββββββββββββββββ β
β β² β β β
β β Yes β β No β
β β βΌ β β
β βββββββββββ βββββββββββΌβββββ β
β β Tool βββββββ Extract β β
β β Node β β Answer β β
β βββββββββββ ββββββββββββββββ β
β β β
β βΌ β
β βββββββββββ β
β β END β β
β βββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
```
## Available Tools
| Tool | Description | Use Case |
|------|-------------|----------|
| π `web_search` | DuckDuckGo web search | Current information, recent events, facts |
| π `wikipedia_search` | Wikipedia API | Historical facts, biographies, definitions |
| π `python_executor` | Python REPL | Calculations, data processing, analysis |
| π `read_file` | File reader | PDFs, text files, Excel spreadsheets |
| π’ `calculator` | Math evaluator | Quick mathematical calculations |
## π Setup
### Option 1: HuggingFace Spaces (Recommended for Certification)
1. **Fork/Duplicate this Space** to your HuggingFace account
- Go to the Space and click "Duplicate this Space"
- Choose a name and make it **Public** (required for certification)
2. **Add API Key**
- Go to Space Settings > Secrets
- Add a new secret: `OPENAI_API_KEY` with your OpenAI API key value
- Click "Save secrets"
3. **Deploy**
- The Space will automatically build and deploy
- Wait for the build to complete (usually 2-5 minutes)
4. **Test and Submit**
- Open the Space and test with a single question
- Run the full benchmark
- Submit to the leaderboard
### Option 2: Local Development
```bash
# Clone the repository
git clone https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE
cd YOUR_SPACE
# Create virtual environment (recommended)
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Set environment variable
export OPENAI_API_KEY="sk-..." # On Windows: set OPENAI_API_KEY=sk-...
# Run the app
python app.py
```
The app will be available at `http://localhost:7860`
## π Usage
### 1. Test Single Question
- Click "Fetch & Solve Random Question" to test the agent on one question
- Review the answer and validation status
- This helps verify the agent is working correctly before running the full benchmark
### 2. Run Full Benchmark
- Click "Run Agent on All Questions"
- The process takes approximately 10-15 minutes
- Progress is shown in real-time
- Results are displayed in a table
- Answers are automatically formatted for submission
### 3. Submit to Leaderboard
- After running the benchmark, go to the "Submit to Leaderboard" tab
- Enter your HuggingFace username
- Enter your Space URL (must be public and end with `/tree/main`)
- Answers JSON is auto-filled
- Click "Submit to Leaderboard"
- View your score and ranking
## π Tips for Better Scores
### Answer Formatting (Critical!)
The GAIA benchmark uses **exact string matching**. Your answers must match the ground truth character-for-character.
**β
DO:**
- Give just the number: `"42"`
- Use exact spelling: `"John Smith"`
- Comma-separated lists with NO spaces: `"apple,banana,cherry"`
- Just "Yes" or "No" (capitalized)
- Follow the date format specified in the question
**β DON'T:**
- Include prefixes like "FINAL ANSWER:" or "The answer is:"
- Add explanations or context
- Use different capitalization or spelling
- Add spaces in comma-separated lists
- Include units unless specifically requested
### Agent Strategy
1. **File Priority**: If a file is available, the agent reads it first - answers are often in the file
2. **Tool Selection**: The agent automatically chooses the best tool for each task
3. **Iteration Limit**: The agent has up to 15 iterations to solve each question
4. **Error Handling**: The agent gracefully handles errors and tries alternative approaches
### Best Practices
1. **Test First**: Always test with a single question before running the full benchmark
2. **Review Answers**: Check the validation status for each answer
3. **Verify Format**: Ensure answers don't contain prefixes or explanations
4. **Public Space**: Keep your Space public so the code link works for verification
5. **API Key**: Ensure your OpenAI API key has sufficient credits
## βοΈ Configuration
### Modifying the Agent
The agent can be customized in `agent_enhanced.py`:
- **Model**: Change `model_name` in `GAIAAgent.__init__()` (default: "gpt-4o")
- **Temperature**: Adjust `temperature` (default: 0 for deterministic)
- **Max Iterations**: Change `max_iterations` (default: 15)
- **System Prompt**: Modify `SYSTEM_PROMPT` for different instructions
- **Tools**: Add or remove tools from the `TOOLS` list
### Environment Variables
- `OPENAI_API_KEY`: Required - Your OpenAI API key
## π Troubleshooting
### Common Issues
**"Please provide your OpenAI API key"**
- Ensure `OPENAI_API_KEY` is set in Space Secrets (for HF Spaces) or environment variables (for local)
**"Failed to fetch questions from API"**
- Check your internet connection
- Verify the API URL is accessible: `https://agents-course-unit4-scoring.hf.space`
- The API may be temporarily unavailable - try again later
**"Agent error: ..."**
- Check that your OpenAI API key is valid and has credits
- Verify the model name is correct (e.g., "gpt-4o")
- Review the error message for specific issues
**"Submission error: ..."**
- Ensure your Space URL is correct and public
- Verify the URL ends with `/tree/main` (auto-added if missing)
- Check that answers JSON is properly formatted
- Ensure your HuggingFace username is correct
**Low Scores (< 30%)**
- Review answer formatting - exact matching is critical
- Check that answers don't contain prefixes or explanations
- Verify file reading is working (some questions require file analysis)
- Consider increasing `max_iterations` for complex questions
- Test with single questions to identify patterns
### Getting Help
- Check the [Course Materials](https://huggingface.co/learn/agents-course/en/unit4/hands-on)
- Review the [API Documentation](https://agents-course-unit4-scoring.hf.space/docs)
- Check the [Student Leaderboard](https://huggingface.co/spaces/agents-course/Students_leaderboard) for examples
- Review the [GAIA Benchmark Paper](https://huggingface.co/papers/2311.12983)
## π Project Structure
```
certification/
βββ app.py # Gradio interface and main entry point
βββ agent_enhanced.py # LangGraph agent implementation
βββ requirements.txt # Python dependencies
βββ README.md # This file
βββ .gitignore # Git ignore rules
```
## π Important Links
- [GAIA Benchmark Leaderboard](https://huggingface.co/spaces/gaia-benchmark/leaderboard)
- [Student Leaderboard](https://huggingface.co/spaces/agents-course/Students_leaderboard)
- [Course Unit 4 - Hands-On](https://huggingface.co/learn/agents-course/en/unit4/hands-on)
- [API Documentation](https://agents-course-unit4-scoring.hf.space/docs)
- [GAIA Paper](https://huggingface.co/papers/2311.12983)
- [HuggingFace Agents Course](https://huggingface.co/learn/agents-course)
## π Scoring
- **Target**: 30%+ (6+ correct out of 20 questions)
- **Evaluation**: Exact string matching
- **Questions**: 20 Level 1 questions from GAIA validation set
- **Submission**: Via the API endpoint `/submit`
## π Certification
Once you achieve 30% or higher:
1. Your score will appear on the Student Leaderboard
2. You'll earn the Certificate of Completion
3. Share your achievement!
## π License
MIT License
## π Acknowledgments
- Built for the [HuggingFace Agents Course](https://huggingface.co/learn/agents-course)
- Uses [LangGraph](https://langchain-ai.github.io/langgraph/) for agent orchestration
- Based on the [GAIA Benchmark](https://huggingface.co/papers/2311.12983)
|