Spaces:

jebaselvasingh
/

jarvis

No application file

App Files Files Community

jarvis / README.md

jebaponselvasingh

first commit

0b90c85 about 2 months ago

preview code

raw

history blame contribute delete

10 kB

	# 🤖 GAIA Benchmark Agent (LangGraph)

	This is a LangGraph-powered agent for the [HuggingFace Agents Course](https://huggingface.co/learn/agents-course) Final Assignment. The agent is designed to solve GAIA benchmark questions and achieve a 30%+ score on Level 1 questions to earn the course certificate.

	## 🎯 Goal

	Score 30% or higher (6+ correct out of 20 questions) on the GAIA Level 1 benchmark to earn your Certificate of Completion.

	## Architecture

	```
	┌─────────────────────────────────────────────────────────┐
	│ LangGraph Workflow │
	├─────────────────────────────────────────────────────────┤
	│ │
	│ ┌─────────┐ ┌─────────┐ ┌──────────────┐ │
	│ │ START │────▶│ Agent │────▶│ Should │ │
	│ └─────────┘ │ Node │ │ Continue? │ │
	│ └─────────┘ └──────────────┘ │
	│ ▲ │ │ │
	│ │ Yes │ │ No │
	│ │ ▼ │ │
	│ ┌─────────┐ ┌─────────▼────┐ │
	│ │ Tool │◀────│ Extract │ │
	│ │ Node │ │ Answer │ │
	│ └─────────┘ └──────────────┘ │
	│ │ │
	│ ▼ │
	│ ┌─────────┐ │
	│ │ END │ │
	│ └─────────┘ │
	└─────────────────────────────────────────────────────────┘
	```

	## Available Tools

	\| Tool \| Description \| Use Case \|
	\|------\|-------------\|----------\|
	\| 🔍 `web_search` \| DuckDuckGo web search \| Current information, recent events, facts \|
	\| 📚 `wikipedia_search` \| Wikipedia API \| Historical facts, biographies, definitions \|
	\| 🐍 `python_executor` \| Python REPL \| Calculations, data processing, analysis \|
	\| 📄 `read_file` \| File reader \| PDFs, text files, Excel spreadsheets \|
	\| 🔢 `calculator` \| Math evaluator \| Quick mathematical calculations \|

	## 🚀 Setup

	### Option 1: HuggingFace Spaces (Recommended for Certification)

	1. Fork/Duplicate this Space to your HuggingFace account
	- Go to the Space and click "Duplicate this Space"
	- Choose a name and make it Public (required for certification)

	2. Add API Key
	- Go to Space Settings > Secrets
	- Add a new secret: `OPENAI_API_KEY` with your OpenAI API key value
	- Click "Save secrets"

	3. Deploy
	- The Space will automatically build and deploy
	- Wait for the build to complete (usually 2-5 minutes)

	4. Test and Submit
	- Open the Space and test with a single question
	- Run the full benchmark
	- Submit to the leaderboard

	### Option 2: Local Development

	```bash
	# Clone the repository
	git clone https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE
	cd YOUR_SPACE

	# Create virtual environment (recommended)
	python -m venv venv
	source venv/bin/activate # On Windows: venv\Scripts\activate

	# Install dependencies
	pip install -r requirements.txt

	# Set environment variable
	export OPENAI_API_KEY="sk-..." # On Windows: set OPENAI_API_KEY=sk-...

	# Run the app
	python app.py
	```

	The app will be available at `http://localhost:7860`

	## 📖 Usage

	### 1. Test Single Question
	- Click "Fetch & Solve Random Question" to test the agent on one question
	- Review the answer and validation status
	- This helps verify the agent is working correctly before running the full benchmark

	### 2. Run Full Benchmark
	- Click "Run Agent on All Questions"
	- The process takes approximately 10-15 minutes
	- Progress is shown in real-time
	- Results are displayed in a table
	- Answers are automatically formatted for submission

	### 3. Submit to Leaderboard
	- After running the benchmark, go to the "Submit to Leaderboard" tab
	- Enter your HuggingFace username
	- Enter your Space URL (must be public and end with `/tree/main`)
	- Answers JSON is auto-filled
	- Click "Submit to Leaderboard"
	- View your score and ranking

	## 🎓 Tips for Better Scores

	### Answer Formatting (Critical!)

	The GAIA benchmark uses exact string matching. Your answers must match the ground truth character-for-character.

	✅ DO:
	- Give just the number: `"42"`
	- Use exact spelling: `"John Smith"`
	- Comma-separated lists with NO spaces: `"apple,banana,cherry"`
	- Just "Yes" or "No" (capitalized)
	- Follow the date format specified in the question

	❌ DON'T:
	- Include prefixes like "FINAL ANSWER:" or "The answer is:"
	- Add explanations or context
	- Use different capitalization or spelling
	- Add spaces in comma-separated lists
	- Include units unless specifically requested

	### Agent Strategy

	1. File Priority: If a file is available, the agent reads it first - answers are often in the file
	2. Tool Selection: The agent automatically chooses the best tool for each task
	3. Iteration Limit: The agent has up to 15 iterations to solve each question
	4. Error Handling: The agent gracefully handles errors and tries alternative approaches

	### Best Practices

	1. Test First: Always test with a single question before running the full benchmark
	2. Review Answers: Check the validation status for each answer
	3. Verify Format: Ensure answers don't contain prefixes or explanations
	4. Public Space: Keep your Space public so the code link works for verification
	5. API Key: Ensure your OpenAI API key has sufficient credits

	## ⚙️ Configuration

	### Modifying the Agent

	The agent can be customized in `agent_enhanced.py`:

	- Model: Change `model_name` in `GAIAAgent.__init__()` (default: "gpt-4o")
	- Temperature: Adjust `temperature` (default: 0 for deterministic)
	- Max Iterations: Change `max_iterations` (default: 15)
	- System Prompt: Modify `SYSTEM_PROMPT` for different instructions
	- Tools: Add or remove tools from the `TOOLS` list

	### Environment Variables

	- `OPENAI_API_KEY`: Required - Your OpenAI API key

	## 🐛 Troubleshooting

	### Common Issues

	"Please provide your OpenAI API key"
	- Ensure `OPENAI_API_KEY` is set in Space Secrets (for HF Spaces) or environment variables (for local)

	"Failed to fetch questions from API"
	- Check your internet connection
	- Verify the API URL is accessible: `https://agents-course-unit4-scoring.hf.space`
	- The API may be temporarily unavailable - try again later

	"Agent error: ..."
	- Check that your OpenAI API key is valid and has credits
	- Verify the model name is correct (e.g., "gpt-4o")
	- Review the error message for specific issues

	"Submission error: ..."
	- Ensure your Space URL is correct and public
	- Verify the URL ends with `/tree/main` (auto-added if missing)
	- Check that answers JSON is properly formatted
	- Ensure your HuggingFace username is correct

	Low Scores (< 30%)
	- Review answer formatting - exact matching is critical
	- Check that answers don't contain prefixes or explanations
	- Verify file reading is working (some questions require file analysis)
	- Consider increasing `max_iterations` for complex questions
	- Test with single questions to identify patterns

	### Getting Help

	- Check the [Course Materials](https://huggingface.co/learn/agents-course/en/unit4/hands-on)
	- Review the [API Documentation](https://agents-course-unit4-scoring.hf.space/docs)
	- Check the [Student Leaderboard](https://huggingface.co/spaces/agents-course/Students_leaderboard) for examples
	- Review the [GAIA Benchmark Paper](https://huggingface.co/papers/2311.12983)

	## 📁 Project Structure

	```
	certification/
	├── app.py # Gradio interface and main entry point
	├── agent_enhanced.py # LangGraph agent implementation
	├── requirements.txt # Python dependencies
	├── README.md # This file
	└── .gitignore # Git ignore rules
	```

	## 🔗 Important Links

	- [GAIA Benchmark Leaderboard](https://huggingface.co/spaces/gaia-benchmark/leaderboard)
	- [Student Leaderboard](https://huggingface.co/spaces/agents-course/Students_leaderboard)
	- [Course Unit 4 - Hands-On](https://huggingface.co/learn/agents-course/en/unit4/hands-on)
	- [API Documentation](https://agents-course-unit4-scoring.hf.space/docs)
	- [GAIA Paper](https://huggingface.co/papers/2311.12983)
	- [HuggingFace Agents Course](https://huggingface.co/learn/agents-course)

	## 📊 Scoring

	- Target: 30%+ (6+ correct out of 20 questions)
	- Evaluation: Exact string matching
	- Questions: 20 Level 1 questions from GAIA validation set
	- Submission: Via the API endpoint `/submit`

	## 🏆 Certification

	Once you achieve 30% or higher:
	1. Your score will appear on the Student Leaderboard
	2. You'll earn the Certificate of Completion
	3. Share your achievement!

	## 📝 License

	MIT License

	## 🙏 Acknowledgments

	- Built for the [HuggingFace Agents Course](https://huggingface.co/learn/agents-course)
	- Uses [LangGraph](https://langchain-ai.github.io/langgraph/) for agent orchestration
	- Based on the [GAIA Benchmark](https://huggingface.co/papers/2311.12983)