Spaces:

jebaselvasingh
/

jarvis

No application file

File size: 10,026 Bytes

0b90c85

# 🤖 GAIA Benchmark Agent (LangGraph)

This is a LangGraph-powered agent for the [HuggingFace Agents Course](https://huggingface.co/learn/agents-course) Final Assignment. The agent is designed to solve GAIA benchmark questions and achieve a 30%+ score on Level 1 questions to earn the course certificate.

## 🎯 Goal

**Score 30% or higher** (6+ correct out of 20 questions) on the GAIA Level 1 benchmark to earn your Certificate of Completion.

## Architecture

```
┌─────────────────────────────────────────────────────────┐
│                    LangGraph Workflow                    │
├─────────────────────────────────────────────────────────┤
│                                                         │
│   ┌─────────┐     ┌─────────┐     ┌──────────────┐     │
│   │  START  │────▶│  Agent  │────▶│   Should     │     │
│   └─────────┘     │  Node   │     │  Continue?   │     │
│                   └─────────┘     └──────────────┘     │
│                        ▲               │    │          │
│                        │          Yes  │    │ No       │
│                        │               ▼    │          │
│                   ┌─────────┐     ┌─────────▼────┐     │
│                   │  Tool   │◀────│   Extract    │     │
│                   │  Node   │     │   Answer     │     │
│                   └─────────┘     └──────────────┘     │
│                                         │              │
│                                         ▼              │
│                                    ┌─────────┐        │
│                                    │   END   │        │
│                                    └─────────┘        │
└─────────────────────────────────────────────────────────┘
```

## Available Tools

| Tool | Description | Use Case |
|------|-------------|----------|
| 🔍 `web_search` | DuckDuckGo web search | Current information, recent events, facts |
| 📚 `wikipedia_search` | Wikipedia API | Historical facts, biographies, definitions |
| 🐍 `python_executor` | Python REPL | Calculations, data processing, analysis |
| 📄 `read_file` | File reader | PDFs, text files, Excel spreadsheets |
| 🔢 `calculator` | Math evaluator | Quick mathematical calculations |

## 🚀 Setup

### Option 1: HuggingFace Spaces (Recommended for Certification)

1. **Fork/Duplicate this Space** to your HuggingFace account
   - Go to the Space and click "Duplicate this Space"
   - Choose a name and make it **Public** (required for certification)

2. **Add API Key**
   - Go to Space Settings > Secrets
   - Add a new secret: `OPENAI_API_KEY` with your OpenAI API key value
   - Click "Save secrets"

3. **Deploy**
   - The Space will automatically build and deploy
   - Wait for the build to complete (usually 2-5 minutes)

4. **Test and Submit**
   - Open the Space and test with a single question
   - Run the full benchmark
   - Submit to the leaderboard

### Option 2: Local Development

```bash
# Clone the repository
git clone https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE
cd YOUR_SPACE

# Create virtual environment (recommended)
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Set environment variable
export OPENAI_API_KEY="sk-..."  # On Windows: set OPENAI_API_KEY=sk-...

# Run the app
python app.py
```

The app will be available at `http://localhost:7860`

## 📖 Usage

### 1. Test Single Question
- Click "Fetch & Solve Random Question" to test the agent on one question
- Review the answer and validation status
- This helps verify the agent is working correctly before running the full benchmark

### 2. Run Full Benchmark
- Click "Run Agent on All Questions"
- The process takes approximately 10-15 minutes
- Progress is shown in real-time
- Results are displayed in a table
- Answers are automatically formatted for submission

### 3. Submit to Leaderboard
- After running the benchmark, go to the "Submit to Leaderboard" tab
- Enter your HuggingFace username
- Enter your Space URL (must be public and end with `/tree/main`)
- Answers JSON is auto-filled
- Click "Submit to Leaderboard"
- View your score and ranking

## 🎓 Tips for Better Scores

### Answer Formatting (Critical!)

The GAIA benchmark uses **exact string matching**. Your answers must match the ground truth character-for-character.

**✅ DO:**
- Give just the number: `"42"`
- Use exact spelling: `"John Smith"`
- Comma-separated lists with NO spaces: `"apple,banana,cherry"`
- Just "Yes" or "No" (capitalized)
- Follow the date format specified in the question

**❌ DON'T:**
- Include prefixes like "FINAL ANSWER:" or "The answer is:"
- Add explanations or context
- Use different capitalization or spelling
- Add spaces in comma-separated lists
- Include units unless specifically requested

### Agent Strategy

1. **File Priority**: If a file is available, the agent reads it first - answers are often in the file
2. **Tool Selection**: The agent automatically chooses the best tool for each task
3. **Iteration Limit**: The agent has up to 15 iterations to solve each question
4. **Error Handling**: The agent gracefully handles errors and tries alternative approaches

### Best Practices

1. **Test First**: Always test with a single question before running the full benchmark
2. **Review Answers**: Check the validation status for each answer
3. **Verify Format**: Ensure answers don't contain prefixes or explanations
4. **Public Space**: Keep your Space public so the code link works for verification
5. **API Key**: Ensure your OpenAI API key has sufficient credits

## ⚙️ Configuration

### Modifying the Agent

The agent can be customized in `agent_enhanced.py`:

- **Model**: Change `model_name` in `GAIAAgent.__init__()` (default: "gpt-4o")
- **Temperature**: Adjust `temperature` (default: 0 for deterministic)
- **Max Iterations**: Change `max_iterations` (default: 15)
- **System Prompt**: Modify `SYSTEM_PROMPT` for different instructions
- **Tools**: Add or remove tools from the `TOOLS` list

### Environment Variables

- `OPENAI_API_KEY`: Required - Your OpenAI API key

## 🐛 Troubleshooting

### Common Issues

**"Please provide your OpenAI API key"**
- Ensure `OPENAI_API_KEY` is set in Space Secrets (for HF Spaces) or environment variables (for local)

**"Failed to fetch questions from API"**
- Check your internet connection
- Verify the API URL is accessible: `https://agents-course-unit4-scoring.hf.space`
- The API may be temporarily unavailable - try again later

**"Agent error: ..."**
- Check that your OpenAI API key is valid and has credits
- Verify the model name is correct (e.g., "gpt-4o")
- Review the error message for specific issues

**"Submission error: ..."**
- Ensure your Space URL is correct and public
- Verify the URL ends with `/tree/main` (auto-added if missing)
- Check that answers JSON is properly formatted
- Ensure your HuggingFace username is correct

**Low Scores (< 30%)**
- Review answer formatting - exact matching is critical
- Check that answers don't contain prefixes or explanations
- Verify file reading is working (some questions require file analysis)
- Consider increasing `max_iterations` for complex questions
- Test with single questions to identify patterns

### Getting Help

- Check the [Course Materials](https://huggingface.co/learn/agents-course/en/unit4/hands-on)
- Review the [API Documentation](https://agents-course-unit4-scoring.hf.space/docs)
- Check the [Student Leaderboard](https://huggingface.co/spaces/agents-course/Students_leaderboard) for examples
- Review the [GAIA Benchmark Paper](https://huggingface.co/papers/2311.12983)

## 📁 Project Structure

```
certification/
├── app.py                 # Gradio interface and main entry point
├── agent_enhanced.py      # LangGraph agent implementation
├── requirements.txt       # Python dependencies
├── README.md             # This file
└── .gitignore            # Git ignore rules
```

## 🔗 Important Links

- [GAIA Benchmark Leaderboard](https://huggingface.co/spaces/gaia-benchmark/leaderboard)
- [Student Leaderboard](https://huggingface.co/spaces/agents-course/Students_leaderboard)
- [Course Unit 4 - Hands-On](https://huggingface.co/learn/agents-course/en/unit4/hands-on)
- [API Documentation](https://agents-course-unit4-scoring.hf.space/docs)
- [GAIA Paper](https://huggingface.co/papers/2311.12983)
- [HuggingFace Agents Course](https://huggingface.co/learn/agents-course)

## 📊 Scoring

- **Target**: 30%+ (6+ correct out of 20 questions)
- **Evaluation**: Exact string matching
- **Questions**: 20 Level 1 questions from GAIA validation set
- **Submission**: Via the API endpoint `/submit`

## 🏆 Certification

Once you achieve 30% or higher:
1. Your score will appear on the Student Leaderboard
2. You'll earn the Certificate of Completion
3. Share your achievement!

## 📝 License

MIT License

## 🙏 Acknowledgments

- Built for the [HuggingFace Agents Course](https://huggingface.co/learn/agents-course)
- Uses [LangGraph](https://langchain-ai.github.io/langgraph/) for agent orchestration
- Based on the [GAIA Benchmark](https://huggingface.co/papers/2311.12983)