Spaces:

jebaselvasingh
/

jarvis

No application file

App Files Files Community

jarvis / README.md

jebaponselvasingh

first commit

0b90c85 about 2 months ago

preview code

raw

history blame contribute delete

10 kB

A newer version of the Gradio SDK is available: 6.8.0

Upgrade

🤖 GAIA Benchmark Agent (LangGraph)

This is a LangGraph-powered agent for the HuggingFace Agents Course Final Assignment. The agent is designed to solve GAIA benchmark questions and achieve a 30%+ score on Level 1 questions to earn the course certificate.

🎯 Goal

Score 30% or higher (6+ correct out of 20 questions) on the GAIA Level 1 benchmark to earn your Certificate of Completion.

Architecture

┌─────────────────────────────────────────────────────────┐
│                    LangGraph Workflow                    │
├─────────────────────────────────────────────────────────┤
│                                                         │
│   ┌─────────┐     ┌─────────┐     ┌──────────────┐     │
│   │  START  │────▶│  Agent  │────▶│   Should     │     │
│   └─────────┘     │  Node   │     │  Continue?   │     │
│                   └─────────┘     └──────────────┘     │
│                        ▲               │    │          │
│                        │          Yes  │    │ No       │
│                        │               ▼    │          │
│                   ┌─────────┐     ┌─────────▼────┐     │
│                   │  Tool   │◀────│   Extract    │     │
│                   │  Node   │     │   Answer     │     │
│                   └─────────┘     └──────────────┘     │
│                                         │              │
│                                         ▼              │
│                                    ┌─────────┐        │
│                                    │   END   │        │
│                                    └─────────┘        │
└─────────────────────────────────────────────────────────┘

Available Tools

Tool	Description	Use Case
🔍 `web_search`	DuckDuckGo web search	Current information, recent events, facts
📚 `wikipedia_search`	Wikipedia API	Historical facts, biographies, definitions
🐍 `python_executor`	Python REPL	Calculations, data processing, analysis
📄 `read_file`	File reader	PDFs, text files, Excel spreadsheets
🔢 `calculator`	Math evaluator	Quick mathematical calculations

🚀 Setup

Option 1: HuggingFace Spaces (Recommended for Certification)

Fork/Duplicate this Space to your HuggingFace account
- Go to the Space and click "Duplicate this Space"
- Choose a name and make it Public (required for certification)
Add API Key
- Go to Space Settings > Secrets
- Add a new secret: OPENAI_API_KEY with your OpenAI API key value
- Click "Save secrets"
Deploy
- The Space will automatically build and deploy
- Wait for the build to complete (usually 2-5 minutes)
Test and Submit
- Open the Space and test with a single question
- Run the full benchmark
- Submit to the leaderboard

Option 2: Local Development

# Clone the repository
git clone https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE
cd YOUR_SPACE

# Create virtual environment (recommended)
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Set environment variable
export OPENAI_API_KEY="sk-..."  # On Windows: set OPENAI_API_KEY=sk-...

# Run the app
python app.py

The app will be available at http://localhost:7860

📖 Usage

1. Test Single Question

Click "Fetch & Solve Random Question" to test the agent on one question
Review the answer and validation status
This helps verify the agent is working correctly before running the full benchmark

2. Run Full Benchmark

Click "Run Agent on All Questions"
The process takes approximately 10-15 minutes
Progress is shown in real-time
Results are displayed in a table
Answers are automatically formatted for submission

3. Submit to Leaderboard

After running the benchmark, go to the "Submit to Leaderboard" tab
Enter your HuggingFace username
Enter your Space URL (must be public and end with /tree/main)
Answers JSON is auto-filled
Click "Submit to Leaderboard"
View your score and ranking

🎓 Tips for Better Scores

Answer Formatting (Critical!)

The GAIA benchmark uses exact string matching. Your answers must match the ground truth character-for-character.

✅ DO:

Give just the number: "42"
Use exact spelling: "John Smith"
Comma-separated lists with NO spaces: "apple,banana,cherry"
Just "Yes" or "No" (capitalized)
Follow the date format specified in the question

❌ DON'T:

Include prefixes like "FINAL ANSWER:" or "The answer is:"
Add explanations or context
Use different capitalization or spelling
Add spaces in comma-separated lists
Include units unless specifically requested

Agent Strategy

File Priority: If a file is available, the agent reads it first - answers are often in the file
Tool Selection: The agent automatically chooses the best tool for each task
Iteration Limit: The agent has up to 15 iterations to solve each question
Error Handling: The agent gracefully handles errors and tries alternative approaches

Best Practices

Test First: Always test with a single question before running the full benchmark
Review Answers: Check the validation status for each answer
Verify Format: Ensure answers don't contain prefixes or explanations
Public Space: Keep your Space public so the code link works for verification
API Key: Ensure your OpenAI API key has sufficient credits

⚙️ Configuration

Modifying the Agent

The agent can be customized in agent_enhanced.py:

Model: Change model_name in GAIAAgent.__init__() (default: "gpt-4o")
Temperature: Adjust temperature (default: 0 for deterministic)
Max Iterations: Change max_iterations (default: 15)
System Prompt: Modify SYSTEM_PROMPT for different instructions
Tools: Add or remove tools from the TOOLS list

Environment Variables

OPENAI_API_KEY: Required - Your OpenAI API key

🐛 Troubleshooting

Common Issues

"Please provide your OpenAI API key"

Ensure OPENAI_API_KEY is set in Space Secrets (for HF Spaces) or environment variables (for local)

"Failed to fetch questions from API"

Check your internet connection
Verify the API URL is accessible: https://agents-course-unit4-scoring.hf.space
The API may be temporarily unavailable - try again later

"Agent error: ..."

Check that your OpenAI API key is valid and has credits
Verify the model name is correct (e.g., "gpt-4o")
Review the error message for specific issues

"Submission error: ..."

Ensure your Space URL is correct and public
Verify the URL ends with /tree/main (auto-added if missing)
Check that answers JSON is properly formatted
Ensure your HuggingFace username is correct

Low Scores (< 30%)

Review answer formatting - exact matching is critical
Check that answers don't contain prefixes or explanations
Verify file reading is working (some questions require file analysis)
Consider increasing max_iterations for complex questions
Test with single questions to identify patterns

Getting Help

Check the Course Materials
Review the API Documentation
Check the Student Leaderboard for examples
Review the GAIA Benchmark Paper

📁 Project Structure

certification/
├── app.py                 # Gradio interface and main entry point
├── agent_enhanced.py      # LangGraph agent implementation
├── requirements.txt       # Python dependencies
├── README.md             # This file
└── .gitignore            # Git ignore rules

🔗 Important Links

📊 Scoring

Target: 30%+ (6+ correct out of 20 questions)
Evaluation: Exact string matching
Questions: 20 Level 1 questions from GAIA validation set
Submission: Via the API endpoint /submit

🏆 Certification

Once you achieve 30% or higher:

Your score will appear on the Student Leaderboard
You'll earn the Certificate of Completion
Share your achievement!

📝 License

MIT License

🙏 Acknowledgments

Built for the HuggingFace Agents Course
Uses LangGraph for agent orchestration
Based on the GAIA Benchmark