Update README.md
Browse files
README.md
CHANGED
|
@@ -1,5 +1,6 @@
|
|
| 1 |
-
|
| 2 |
-
|
|
|
|
| 3 |
colorFrom: indigo
|
| 4 |
colorTo: indigo
|
| 5 |
sdk: gradio
|
|
@@ -7,273 +8,6 @@ sdk_version: 5.25.2
|
|
| 7 |
app_file: app.py
|
| 8 |
pinned: false
|
| 9 |
hf_oauth: true
|
|
|
|
| 10 |
hf_oauth_expiration_minutes: 480
|
| 11 |
-
|
| 12 |
-
|
| 13 |
-
|
| 14 |
-
# ๐ฏ GAIA Benchmark Agent - Course Final Project
|
| 15 |
-
|
| 16 |
-
A comprehensive AI agent that demonstrates course learning while achieving 30%+ score on GAIA benchmark to earn your course certificate.
|
| 17 |
-
|
| 18 |
-
## ๐ What This Agent Demonstrates
|
| 19 |
-
|
| 20 |
-
This project combines all major concepts from the course:
|
| 21 |
-
|
| 22 |
-
### ๐ **Course Learning Applied**
|
| 23 |
-
- **๐ง Tools Integration**: Multiple tool types working together
|
| 24 |
-
- **๐ RAG Implementation**: Persona database with 5K diverse individuals using vector embeddings
|
| 25 |
-
- **๐ค Agent Workflows**: LlamaIndex agent orchestration
|
| 26 |
-
- **๐ง LLM Integration**: Fallback options for accessibility
|
| 27 |
-
- **๐ Modular Architecture**: Clean separation of concerns
|
| 28 |
-
|
| 29 |
-
### ๐ฏ **GAIA Benchmark Optimized**
|
| 30 |
-
- **๐ Web Search**: For current information and facts
|
| 31 |
-
- **๐งฎ Calculator**: For mathematical accuracy (critical for GAIA)
|
| 32 |
-
- **๐ File Analysis**: For data processing questions
|
| 33 |
-
- **๐ฌ Conversational**: Natural language interaction
|
| 34 |
-
|
| 35 |
-
## ๐๏ธ Project Structure
|
| 36 |
-
|
| 37 |
-
```
|
| 38 |
-
your-space/
|
| 39 |
-
โโโ app.py # Main application with Gradio interface
|
| 40 |
-
โโโ tools.py # All agent tools (web search, calculator, etc.)
|
| 41 |
-
โโโ retriever.py # RAG implementation with guest database
|
| 42 |
-
โโโ requirements.txt # Python dependencies
|
| 43 |
-
โโโ README.md # This file
|
| 44 |
-
```
|
| 45 |
-
|
| 46 |
-
### ๐ **File Explanations**
|
| 47 |
-
|
| 48 |
-
**`app.py`** - Main Application
|
| 49 |
-
- Gradio interface for GAIA evaluation
|
| 50 |
-
- Agent initialization with error handling
|
| 51 |
-
- Question processing and answer submission
|
| 52 |
-
- Results display and certificate status
|
| 53 |
-
|
| 54 |
-
**`tools.py`** - Agent Tools
|
| 55 |
-
- **Web Search Tool**: DuckDuckGo integration for current info
|
| 56 |
-
- **Calculator Tool**: Safe mathematical expression evaluation
|
| 57 |
-
- **File Analysis Tool**: Process CSV, text, and data files
|
| 58 |
-
- All tools have detailed documentation and error handling
|
| 59 |
-
|
| 60 |
-
**`retriever.py`** - Advanced RAG System
|
| 61 |
-
- Persona database with 5K diverse individuals from HuggingFace
|
| 62 |
-
- Vector embeddings with ChromaDB for semantic search
|
| 63 |
-
- IngestionPipeline for document processing
|
| 64 |
-
- Demonstrates state-of-the-art RAG concepts
|
| 65 |
-
|
| 66 |
-
## ๐ Quick Setup Guide
|
| 67 |
-
|
| 68 |
-
### 1. **Clone or Duplicate This Space**
|
| 69 |
-
```bash
|
| 70 |
-
# If cloning locally
|
| 71 |
-
git clone https://huggingface.co/spaces/your-username/your-space
|
| 72 |
-
cd your-space
|
| 73 |
-
|
| 74 |
-
# Or duplicate this space to your HF account
|
| 75 |
-
```
|
| 76 |
-
|
| 77 |
-
### 2. **Set API Keys** โก **CRITICAL STEP**
|
| 78 |
-
|
| 79 |
-
In your HuggingFace Space:
|
| 80 |
-
1. Go to **Settings** โ **Repository secrets**
|
| 81 |
-
2. Add **at least one** of these:
|
| 82 |
-
|
| 83 |
-
**Option A: OpenAI (Recommended)**
|
| 84 |
-
- Name: `OPENAI_API_KEY`
|
| 85 |
-
- Value: `sk-...` (your OpenAI API key)
|
| 86 |
-
- **Why**: Better performance on GAIA benchmark
|
| 87 |
-
|
| 88 |
-
**Option B: HuggingFace (Free Alternative)**
|
| 89 |
-
- Name: `HF_TOKEN`
|
| 90 |
-
- Value: `hf_...` (your HF token)
|
| 91 |
-
- **Why**: Free alternative, works without OpenAI credits
|
| 92 |
-
|
| 93 |
-
**Get API Keys:**
|
| 94 |
-
- **OpenAI**: https://platform.openai.com/api-keys
|
| 95 |
-
- **HuggingFace**: https://huggingface.co/settings/tokens
|
| 96 |
-
|
| 97 |
-
### 3. **Ensure Public Space**
|
| 98 |
-
- Your space must be **public** for leaderboard verification
|
| 99 |
-
- Go to Settings โ Change from Private to Public
|
| 100 |
-
|
| 101 |
-
### 4. **Run Evaluation**
|
| 102 |
-
1. Click the HuggingFace login button
|
| 103 |
-
2. Click "Run GAIA Evaluation & Submit Results"
|
| 104 |
-
3. Wait 5-10 minutes for completion
|
| 105 |
-
4. Check your score - need 30%+ to pass! ๐
|
| 106 |
-
|
| 107 |
-
## ๐ง Why Each Tool Matters for GAIA
|
| 108 |
-
|
| 109 |
-
### ๐ **Web Search Tool**
|
| 110 |
-
```python
|
| 111 |
-
# Example GAIA questions this helps with:
|
| 112 |
-
"Who is the current president of France?"
|
| 113 |
-
"What was Tesla's stock price yesterday?"
|
| 114 |
-
"Recent developments in AI research"
|
| 115 |
-
```
|
| 116 |
-
**Why needed**: GAIA questions often require current information beyond LLM training data.
|
| 117 |
-
|
| 118 |
-
### ๐งฎ **Calculator Tool**
|
| 119 |
-
```python
|
| 120 |
-
# Example GAIA questions this helps with:
|
| 121 |
-
"What is 15% of 847?"
|
| 122 |
-
"Calculate the area of a circle with radius 23.7m"
|
| 123 |
-
"If I invest $5000 at 3.2% annual interest for 7 years..."
|
| 124 |
-
```
|
| 125 |
-
**Why needed**: LLMs can make arithmetic errors. GAIA requires exact numerical accuracy.
|
| 126 |
-
|
| 127 |
-
### ๐ **File Analysis Tool**
|
| 128 |
-
```python
|
| 129 |
-
# Example GAIA questions this helps with:
|
| 130 |
-
"Analyze this CSV file and tell me the average..."
|
| 131 |
-
"What is the most common value in column 3?"
|
| 132 |
-
"Process this data file and extract..."
|
| 133 |
-
```
|
| 134 |
-
**Why needed**: Some GAIA questions include file attachments requiring analysis.
|
| 135 |
-
|
| 136 |
-
### ๐ **Persona RAG Tool**
|
| 137 |
-
```python
|
| 138 |
-
# Example questions this demonstrates:
|
| 139 |
-
"Find writers and authors"
|
| 140 |
-
"Who are the scientists?"
|
| 141 |
-
"People interested in travel"
|
| 142 |
-
"Creative professionals at the event"
|
| 143 |
-
```
|
| 144 |
-
**Why included**: Demonstrates advanced RAG with 5K real personas, vector embeddings, and semantic search.
|
| 145 |
-
|
| 146 |
-
## ๐ Course Concepts Demonstrated
|
| 147 |
-
|
| 148 |
-
### ๐ง **Components** (From Course Unit 2)
|
| 149 |
-
- **LLM Integration**: OpenAI + HuggingFace fallback
|
| 150 |
-
- **Document Processing**: Text chunking and metadata
|
| 151 |
-
- **Response Synthesis**: Clean answer formatting
|
| 152 |
-
|
| 153 |
-
### ๐ ๏ธ **Tools** (From Course Unit 3)
|
| 154 |
-
- **FunctionTool Creation**: Multiple tool types
|
| 155 |
-
- **Tool Descriptions**: Proper LLM guidance
|
| 156 |
-
- **Error Handling**: Graceful tool failures
|
| 157 |
-
|
| 158 |
-
### ๐ค **Agents** (From Course Unit 4)
|
| 159 |
-
- **AgentWorkflow**: Multi-tool orchestration
|
| 160 |
-
- **System Prompts**: GAIA-optimized instructions
|
| 161 |
-
- **Async Processing**: Efficient question handling
|
| 162 |
-
|
| 163 |
-
### ๐ **RAG Implementation** (From Course Unit 5)
|
| 164 |
-
- **Dataset Integration**: 5K personas from HuggingFace
|
| 165 |
-
- **Vector Embeddings**: Semantic search with BAAI/bge-small-en-v1.5
|
| 166 |
-
- **ChromaDB Storage**: Persistent vector database
|
| 167 |
-
- **Ingestion Pipeline**: Document processing and chunking
|
| 168 |
-
|
| 169 |
-
### ๐๏ธ **Workflows** (From Course Unit 6)
|
| 170 |
-
- **Event-Driven**: Tool selection and execution
|
| 171 |
-
- **State Management**: Context preservation
|
| 172 |
-
- **Error Recovery**: Robust failure handling
|
| 173 |
-
|
| 174 |
-
## ๐ Why This Approach Works for GAIA
|
| 175 |
-
|
| 176 |
-
### โ
**Accuracy First**
|
| 177 |
-
- Calculator prevents math errors
|
| 178 |
-
- Web search provides current facts
|
| 179 |
-
- Low temperature LLM settings for consistency
|
| 180 |
-
|
| 181 |
-
### โ
**Comprehensive Coverage**
|
| 182 |
-
- Factual questions โ Web search
|
| 183 |
-
- Mathematical questions โ Calculator
|
| 184 |
-
- Data questions โ File analysis
|
| 185 |
-
- Knowledge questions โ RAG system
|
| 186 |
-
|
| 187 |
-
### โ
**Robust Error Handling**
|
| 188 |
-
- Graceful API failures
|
| 189 |
-
- Tool availability checking
|
| 190 |
-
- Fallback responses
|
| 191 |
-
|
| 192 |
-
### โ
**GAIA-Specific Optimizations**
|
| 193 |
-
- Direct, concise answers
|
| 194 |
-
- Exact match optimization
|
| 195 |
-
- Minimal extra text
|
| 196 |
-
|
| 197 |
-
## ๐ง Troubleshooting
|
| 198 |
-
|
| 199 |
-
### โ **"No LLM available" Error**
|
| 200 |
-
**Problem**: No API keys set
|
| 201 |
-
**Solution**: Add `OPENAI_API_KEY` or `HF_TOKEN` to Space secrets
|
| 202 |
-
|
| 203 |
-
### โ **Import Errors**
|
| 204 |
-
**Problem**: Dependencies not installed
|
| 205 |
-
**Solution**: Check requirements.txt is in root directory, restart Space
|
| 206 |
-
|
| 207 |
-
### โ **Low GAIA Score**
|
| 208 |
-
**Problem**: Agent giving wrong answers
|
| 209 |
-
**Solutions**:
|
| 210 |
-
- Check API key is working (OpenAI generally performs better)
|
| 211 |
-
- Review agent logs for tool usage
|
| 212 |
-
- Ensure web search and calculator are working
|
| 213 |
-
|
| 214 |
-
### โ **"Could not submit" Error**
|
| 215 |
-
**Problem**: Network or authentication issue
|
| 216 |
-
**Solution**:
|
| 217 |
-
- Ensure logged in to HuggingFace
|
| 218 |
-
- Check space is public
|
| 219 |
-
- Try again (temporary network issues)
|
| 220 |
-
|
| 221 |
-
### โ **Tools Not Working**
|
| 222 |
-
**Problem**: Missing dependencies or API issues
|
| 223 |
-
**Solution**: Check Space logs, verify all packages installed
|
| 224 |
-
|
| 225 |
-
## ๐ Expected Performance
|
| 226 |
-
|
| 227 |
-
### ๐ฏ **Target Scores**
|
| 228 |
-
- **Minimum for Certificate**: 30%
|
| 229 |
-
- **Good Performance**: 40-50%
|
| 230 |
-
- **Excellent Performance**: 60%+
|
| 231 |
-
|
| 232 |
-
### ๐ **Performance Factors**
|
| 233 |
-
- **API Choice**: OpenAI typically scores higher than HuggingFace
|
| 234 |
-
- **Tool Usage**: Questions requiring tools score better when tools work
|
| 235 |
-
- **Answer Format**: Direct answers score better than verbose responses
|
| 236 |
-
|
| 237 |
-
## ๐ Getting Better Scores
|
| 238 |
-
|
| 239 |
-
### ๐ก **Optimization Tips**
|
| 240 |
-
1. **Use OpenAI**: Generally more accurate than HuggingFace for GAIA
|
| 241 |
-
2. **Check Tool Functionality**: Test web search and calculator work
|
| 242 |
-
3. **Review Failed Questions**: Look at specific errors in results table
|
| 243 |
-
4. **Adjust System Prompt**: Fine-tune for your specific weak areas
|
| 244 |
-
|
| 245 |
-
### ๐ **Iterative Improvement**
|
| 246 |
-
1. Run evaluation and check results
|
| 247 |
-
2. Identify patterns in failed questions
|
| 248 |
-
3. Adjust tools or prompts accordingly
|
| 249 |
-
4. Re-run evaluation
|
| 250 |
-
|
| 251 |
-
## ๐ Certificate Achievement
|
| 252 |
-
|
| 253 |
-
**To earn your course certificate:**
|
| 254 |
-
1. โ
Score 30% or higher on GAIA evaluation
|
| 255 |
-
2. โ
Keep your space public for verification
|
| 256 |
-
3. โ
Submit through the official interface
|
| 257 |
-
|
| 258 |
-
**When you pass:**
|
| 259 |
-
- You'll see "โ
PASSED - Certificate Earned!" in results
|
| 260 |
-
- Your score will appear on the student leaderboard
|
| 261 |
-
- You can download your official certificate
|
| 262 |
-
|
| 263 |
-
## ๐ค Getting Help
|
| 264 |
-
|
| 265 |
-
**If you're stuck:**
|
| 266 |
-
1. Check the troubleshooting section above
|
| 267 |
-
2. Review Space logs for specific errors
|
| 268 |
-
3. Test individual components (tools.py, retriever.py)
|
| 269 |
-
4. Ask in the course Discord for community help
|
| 270 |
-
|
| 271 |
-
## ๐ Good Luck!
|
| 272 |
-
|
| 273 |
-
This agent represents everything you've learned in the course. The modular design makes it easy to understand, debug, and improve. Focus on getting those API keys set up correctly, and you'll be well on your way to earning your certificate!
|
| 274 |
-
|
| 275 |
-
**Remember**: The goal isn't just to pass the benchmark, but to demonstrate your understanding of modern AI agent development. This codebase serves as a portfolio piece showing your skills in RAG, tool integration, and agent orchestration.
|
| 276 |
-
|
| 277 |
-
---
|
| 278 |
-
|
| 279 |
-
*Built with โค๏ธ using LlamaIndex and course concepts*
|
|
|
|
| 1 |
+
---
|
| 2 |
+
title: Isadora Final Assignment
|
| 3 |
+
emoji: ๐ต๐ปโโ๏ธ
|
| 4 |
colorFrom: indigo
|
| 5 |
colorTo: indigo
|
| 6 |
sdk: gradio
|
|
|
|
| 8 |
app_file: app.py
|
| 9 |
pinned: false
|
| 10 |
hf_oauth: true
|
| 11 |
+
# optional, default duration is 8 hours/480 minutes. Max duration is 30 days/43200 minutes.
|
| 12 |
hf_oauth_expiration_minutes: 480
|
| 13 |
+
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|