jarvis / README.md
jebaponselvasingh
first commit
0b90c85
# πŸ€– GAIA Benchmark Agent (LangGraph)
This is a LangGraph-powered agent for the [HuggingFace Agents Course](https://huggingface.co/learn/agents-course) Final Assignment. The agent is designed to solve GAIA benchmark questions and achieve a 30%+ score on Level 1 questions to earn the course certificate.
## 🎯 Goal
**Score 30% or higher** (6+ correct out of 20 questions) on the GAIA Level 1 benchmark to earn your Certificate of Completion.
## Architecture
```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ LangGraph Workflow β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ START │────▢│ Agent │────▢│ Should β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ Node β”‚ β”‚ Continue? β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ β–² β”‚ β”‚ β”‚
β”‚ β”‚ Yes β”‚ β”‚ No β”‚
β”‚ β”‚ β–Ό β”‚ β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β” β”‚
β”‚ β”‚ Tool │◀────│ Extract β”‚ β”‚
β”‚ β”‚ Node β”‚ β”‚ Answer β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ β”‚ β”‚
β”‚ β–Ό β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ END β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```
## Available Tools
| Tool | Description | Use Case |
|------|-------------|----------|
| πŸ” `web_search` | DuckDuckGo web search | Current information, recent events, facts |
| πŸ“š `wikipedia_search` | Wikipedia API | Historical facts, biographies, definitions |
| 🐍 `python_executor` | Python REPL | Calculations, data processing, analysis |
| πŸ“„ `read_file` | File reader | PDFs, text files, Excel spreadsheets |
| πŸ”’ `calculator` | Math evaluator | Quick mathematical calculations |
## πŸš€ Setup
### Option 1: HuggingFace Spaces (Recommended for Certification)
1. **Fork/Duplicate this Space** to your HuggingFace account
- Go to the Space and click "Duplicate this Space"
- Choose a name and make it **Public** (required for certification)
2. **Add API Key**
- Go to Space Settings > Secrets
- Add a new secret: `OPENAI_API_KEY` with your OpenAI API key value
- Click "Save secrets"
3. **Deploy**
- The Space will automatically build and deploy
- Wait for the build to complete (usually 2-5 minutes)
4. **Test and Submit**
- Open the Space and test with a single question
- Run the full benchmark
- Submit to the leaderboard
### Option 2: Local Development
```bash
# Clone the repository
git clone https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE
cd YOUR_SPACE
# Create virtual environment (recommended)
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Set environment variable
export OPENAI_API_KEY="sk-..." # On Windows: set OPENAI_API_KEY=sk-...
# Run the app
python app.py
```
The app will be available at `http://localhost:7860`
## πŸ“– Usage
### 1. Test Single Question
- Click "Fetch & Solve Random Question" to test the agent on one question
- Review the answer and validation status
- This helps verify the agent is working correctly before running the full benchmark
### 2. Run Full Benchmark
- Click "Run Agent on All Questions"
- The process takes approximately 10-15 minutes
- Progress is shown in real-time
- Results are displayed in a table
- Answers are automatically formatted for submission
### 3. Submit to Leaderboard
- After running the benchmark, go to the "Submit to Leaderboard" tab
- Enter your HuggingFace username
- Enter your Space URL (must be public and end with `/tree/main`)
- Answers JSON is auto-filled
- Click "Submit to Leaderboard"
- View your score and ranking
## πŸŽ“ Tips for Better Scores
### Answer Formatting (Critical!)
The GAIA benchmark uses **exact string matching**. Your answers must match the ground truth character-for-character.
**βœ… DO:**
- Give just the number: `"42"`
- Use exact spelling: `"John Smith"`
- Comma-separated lists with NO spaces: `"apple,banana,cherry"`
- Just "Yes" or "No" (capitalized)
- Follow the date format specified in the question
**❌ DON'T:**
- Include prefixes like "FINAL ANSWER:" or "The answer is:"
- Add explanations or context
- Use different capitalization or spelling
- Add spaces in comma-separated lists
- Include units unless specifically requested
### Agent Strategy
1. **File Priority**: If a file is available, the agent reads it first - answers are often in the file
2. **Tool Selection**: The agent automatically chooses the best tool for each task
3. **Iteration Limit**: The agent has up to 15 iterations to solve each question
4. **Error Handling**: The agent gracefully handles errors and tries alternative approaches
### Best Practices
1. **Test First**: Always test with a single question before running the full benchmark
2. **Review Answers**: Check the validation status for each answer
3. **Verify Format**: Ensure answers don't contain prefixes or explanations
4. **Public Space**: Keep your Space public so the code link works for verification
5. **API Key**: Ensure your OpenAI API key has sufficient credits
## βš™οΈ Configuration
### Modifying the Agent
The agent can be customized in `agent_enhanced.py`:
- **Model**: Change `model_name` in `GAIAAgent.__init__()` (default: "gpt-4o")
- **Temperature**: Adjust `temperature` (default: 0 for deterministic)
- **Max Iterations**: Change `max_iterations` (default: 15)
- **System Prompt**: Modify `SYSTEM_PROMPT` for different instructions
- **Tools**: Add or remove tools from the `TOOLS` list
### Environment Variables
- `OPENAI_API_KEY`: Required - Your OpenAI API key
## πŸ› Troubleshooting
### Common Issues
**"Please provide your OpenAI API key"**
- Ensure `OPENAI_API_KEY` is set in Space Secrets (for HF Spaces) or environment variables (for local)
**"Failed to fetch questions from API"**
- Check your internet connection
- Verify the API URL is accessible: `https://agents-course-unit4-scoring.hf.space`
- The API may be temporarily unavailable - try again later
**"Agent error: ..."**
- Check that your OpenAI API key is valid and has credits
- Verify the model name is correct (e.g., "gpt-4o")
- Review the error message for specific issues
**"Submission error: ..."**
- Ensure your Space URL is correct and public
- Verify the URL ends with `/tree/main` (auto-added if missing)
- Check that answers JSON is properly formatted
- Ensure your HuggingFace username is correct
**Low Scores (< 30%)**
- Review answer formatting - exact matching is critical
- Check that answers don't contain prefixes or explanations
- Verify file reading is working (some questions require file analysis)
- Consider increasing `max_iterations` for complex questions
- Test with single questions to identify patterns
### Getting Help
- Check the [Course Materials](https://huggingface.co/learn/agents-course/en/unit4/hands-on)
- Review the [API Documentation](https://agents-course-unit4-scoring.hf.space/docs)
- Check the [Student Leaderboard](https://huggingface.co/spaces/agents-course/Students_leaderboard) for examples
- Review the [GAIA Benchmark Paper](https://huggingface.co/papers/2311.12983)
## πŸ“ Project Structure
```
certification/
β”œβ”€β”€ app.py # Gradio interface and main entry point
β”œβ”€β”€ agent_enhanced.py # LangGraph agent implementation
β”œβ”€β”€ requirements.txt # Python dependencies
β”œβ”€β”€ README.md # This file
└── .gitignore # Git ignore rules
```
## πŸ”— Important Links
- [GAIA Benchmark Leaderboard](https://huggingface.co/spaces/gaia-benchmark/leaderboard)
- [Student Leaderboard](https://huggingface.co/spaces/agents-course/Students_leaderboard)
- [Course Unit 4 - Hands-On](https://huggingface.co/learn/agents-course/en/unit4/hands-on)
- [API Documentation](https://agents-course-unit4-scoring.hf.space/docs)
- [GAIA Paper](https://huggingface.co/papers/2311.12983)
- [HuggingFace Agents Course](https://huggingface.co/learn/agents-course)
## πŸ“Š Scoring
- **Target**: 30%+ (6+ correct out of 20 questions)
- **Evaluation**: Exact string matching
- **Questions**: 20 Level 1 questions from GAIA validation set
- **Submission**: Via the API endpoint `/submit`
## πŸ† Certification
Once you achieve 30% or higher:
1. Your score will appear on the Student Leaderboard
2. You'll earn the Certificate of Completion
3. Share your achievement!
## πŸ“ License
MIT License
## πŸ™ Acknowledgments
- Built for the [HuggingFace Agents Course](https://huggingface.co/learn/agents-course)
- Uses [LangGraph](https://langchain-ai.github.io/langgraph/) for agent orchestration
- Based on the [GAIA Benchmark](https://huggingface.co/papers/2311.12983)