File size: 10,026 Bytes
0b90c85
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
# πŸ€– GAIA Benchmark Agent (LangGraph)

This is a LangGraph-powered agent for the [HuggingFace Agents Course](https://huggingface.co/learn/agents-course) Final Assignment. The agent is designed to solve GAIA benchmark questions and achieve a 30%+ score on Level 1 questions to earn the course certificate.

## 🎯 Goal

**Score 30% or higher** (6+ correct out of 20 questions) on the GAIA Level 1 benchmark to earn your Certificate of Completion.

## Architecture

```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    LangGraph Workflow                    β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                         β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”‚
β”‚   β”‚  START  │────▢│  Agent  │────▢│   Should     β”‚     β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β”‚  Node   β”‚     β”‚  Continue?   β”‚     β”‚
β”‚                   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β”‚
β”‚                        β–²               β”‚    β”‚          β”‚
β”‚                        β”‚          Yes  β”‚    β”‚ No       β”‚
β”‚                        β”‚               β–Ό    β”‚          β”‚
β”‚                   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”     β”‚
β”‚                   β”‚  Tool   │◀────│   Extract    β”‚     β”‚
β”‚                   β”‚  Node   β”‚     β”‚   Answer     β”‚     β”‚
β”‚                   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β”‚
β”‚                                         β”‚              β”‚
β”‚                                         β–Ό              β”‚
β”‚                                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”        β”‚
β”‚                                    β”‚   END   β”‚        β”‚
β”‚                                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```

## Available Tools

| Tool | Description | Use Case |
|------|-------------|----------|
| πŸ” `web_search` | DuckDuckGo web search | Current information, recent events, facts |
| πŸ“š `wikipedia_search` | Wikipedia API | Historical facts, biographies, definitions |
| 🐍 `python_executor` | Python REPL | Calculations, data processing, analysis |
| πŸ“„ `read_file` | File reader | PDFs, text files, Excel spreadsheets |
| πŸ”’ `calculator` | Math evaluator | Quick mathematical calculations |

## πŸš€ Setup

### Option 1: HuggingFace Spaces (Recommended for Certification)

1. **Fork/Duplicate this Space** to your HuggingFace account
   - Go to the Space and click "Duplicate this Space"
   - Choose a name and make it **Public** (required for certification)

2. **Add API Key**
   - Go to Space Settings > Secrets
   - Add a new secret: `OPENAI_API_KEY` with your OpenAI API key value
   - Click "Save secrets"

3. **Deploy**
   - The Space will automatically build and deploy
   - Wait for the build to complete (usually 2-5 minutes)

4. **Test and Submit**
   - Open the Space and test with a single question
   - Run the full benchmark
   - Submit to the leaderboard

### Option 2: Local Development

```bash
# Clone the repository
git clone https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE
cd YOUR_SPACE

# Create virtual environment (recommended)
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Set environment variable
export OPENAI_API_KEY="sk-..."  # On Windows: set OPENAI_API_KEY=sk-...

# Run the app
python app.py
```

The app will be available at `http://localhost:7860`

## πŸ“– Usage

### 1. Test Single Question
- Click "Fetch & Solve Random Question" to test the agent on one question
- Review the answer and validation status
- This helps verify the agent is working correctly before running the full benchmark

### 2. Run Full Benchmark
- Click "Run Agent on All Questions"
- The process takes approximately 10-15 minutes
- Progress is shown in real-time
- Results are displayed in a table
- Answers are automatically formatted for submission

### 3. Submit to Leaderboard
- After running the benchmark, go to the "Submit to Leaderboard" tab
- Enter your HuggingFace username
- Enter your Space URL (must be public and end with `/tree/main`)
- Answers JSON is auto-filled
- Click "Submit to Leaderboard"
- View your score and ranking

## πŸŽ“ Tips for Better Scores

### Answer Formatting (Critical!)

The GAIA benchmark uses **exact string matching**. Your answers must match the ground truth character-for-character.

**βœ… DO:**
- Give just the number: `"42"`
- Use exact spelling: `"John Smith"`
- Comma-separated lists with NO spaces: `"apple,banana,cherry"`
- Just "Yes" or "No" (capitalized)
- Follow the date format specified in the question

**❌ DON'T:**
- Include prefixes like "FINAL ANSWER:" or "The answer is:"
- Add explanations or context
- Use different capitalization or spelling
- Add spaces in comma-separated lists
- Include units unless specifically requested

### Agent Strategy

1. **File Priority**: If a file is available, the agent reads it first - answers are often in the file
2. **Tool Selection**: The agent automatically chooses the best tool for each task
3. **Iteration Limit**: The agent has up to 15 iterations to solve each question
4. **Error Handling**: The agent gracefully handles errors and tries alternative approaches

### Best Practices

1. **Test First**: Always test with a single question before running the full benchmark
2. **Review Answers**: Check the validation status for each answer
3. **Verify Format**: Ensure answers don't contain prefixes or explanations
4. **Public Space**: Keep your Space public so the code link works for verification
5. **API Key**: Ensure your OpenAI API key has sufficient credits

## βš™οΈ Configuration

### Modifying the Agent

The agent can be customized in `agent_enhanced.py`:

- **Model**: Change `model_name` in `GAIAAgent.__init__()` (default: "gpt-4o")
- **Temperature**: Adjust `temperature` (default: 0 for deterministic)
- **Max Iterations**: Change `max_iterations` (default: 15)
- **System Prompt**: Modify `SYSTEM_PROMPT` for different instructions
- **Tools**: Add or remove tools from the `TOOLS` list

### Environment Variables

- `OPENAI_API_KEY`: Required - Your OpenAI API key

## πŸ› Troubleshooting

### Common Issues

**"Please provide your OpenAI API key"**
- Ensure `OPENAI_API_KEY` is set in Space Secrets (for HF Spaces) or environment variables (for local)

**"Failed to fetch questions from API"**
- Check your internet connection
- Verify the API URL is accessible: `https://agents-course-unit4-scoring.hf.space`
- The API may be temporarily unavailable - try again later

**"Agent error: ..."**
- Check that your OpenAI API key is valid and has credits
- Verify the model name is correct (e.g., "gpt-4o")
- Review the error message for specific issues

**"Submission error: ..."**
- Ensure your Space URL is correct and public
- Verify the URL ends with `/tree/main` (auto-added if missing)
- Check that answers JSON is properly formatted
- Ensure your HuggingFace username is correct

**Low Scores (< 30%)**
- Review answer formatting - exact matching is critical
- Check that answers don't contain prefixes or explanations
- Verify file reading is working (some questions require file analysis)
- Consider increasing `max_iterations` for complex questions
- Test with single questions to identify patterns

### Getting Help

- Check the [Course Materials](https://huggingface.co/learn/agents-course/en/unit4/hands-on)
- Review the [API Documentation](https://agents-course-unit4-scoring.hf.space/docs)
- Check the [Student Leaderboard](https://huggingface.co/spaces/agents-course/Students_leaderboard) for examples
- Review the [GAIA Benchmark Paper](https://huggingface.co/papers/2311.12983)

## πŸ“ Project Structure

```
certification/
β”œβ”€β”€ app.py                 # Gradio interface and main entry point
β”œβ”€β”€ agent_enhanced.py      # LangGraph agent implementation
β”œβ”€β”€ requirements.txt       # Python dependencies
β”œβ”€β”€ README.md             # This file
└── .gitignore            # Git ignore rules
```

## πŸ”— Important Links

- [GAIA Benchmark Leaderboard](https://huggingface.co/spaces/gaia-benchmark/leaderboard)
- [Student Leaderboard](https://huggingface.co/spaces/agents-course/Students_leaderboard)
- [Course Unit 4 - Hands-On](https://huggingface.co/learn/agents-course/en/unit4/hands-on)
- [API Documentation](https://agents-course-unit4-scoring.hf.space/docs)
- [GAIA Paper](https://huggingface.co/papers/2311.12983)
- [HuggingFace Agents Course](https://huggingface.co/learn/agents-course)

## πŸ“Š Scoring

- **Target**: 30%+ (6+ correct out of 20 questions)
- **Evaluation**: Exact string matching
- **Questions**: 20 Level 1 questions from GAIA validation set
- **Submission**: Via the API endpoint `/submit`

## πŸ† Certification

Once you achieve 30% or higher:
1. Your score will appear on the Student Leaderboard
2. You'll earn the Certificate of Completion
3. Share your achievement!

## πŸ“ License

MIT License

## πŸ™ Acknowledgments

- Built for the [HuggingFace Agents Course](https://huggingface.co/learn/agents-course)
- Uses [LangGraph](https://langchain-ai.github.io/langgraph/) for agent orchestration
- Based on the [GAIA Benchmark](https://huggingface.co/papers/2311.12983)