File size: 10,753 Bytes
43ce1e1 7ef24ef 43ce1e1 7ef24ef 43ce1e1 7ef24ef 43ce1e1 7ef24ef 43ce1e1 7ef24ef 82b80c0 7ef24ef 95e7104 7ef24ef 95e7104 7ef24ef 82b80c0 7ef24ef 82b80c0 7ef24ef 82b80c0 7ef24ef 82b80c0 7ef24ef 82b80c0 7ef24ef 6dce4fa 7ef24ef 6dce4fa 7ef24ef 6dce4fa 7ef24ef f477d08 7ef24ef f477d08 7ef24ef f477d08 7ef24ef f477d08 7ef24ef f477d08 82b80c0 43ce1e1 7ef24ef 43ce1e1 7ef24ef 43ce1e1 7ef24ef 43ce1e1 7ef24ef 43ce1e1 7ef24ef 43ce1e1 7ef24ef 43ce1e1 7ef24ef 83178da 7ef24ef 83178da 43ce1e1 7ef24ef 65443cb 7ef24ef 65443cb 7ef24ef 65443cb 7ef24ef 65443cb 7ef24ef 65443cb 7ef24ef 65443cb 7ef24ef 65443cb 7ef24ef 43ce1e1 7ef24ef 43ce1e1 7ef24ef 43ce1e1 7ef24ef 43ce1e1 7ef24ef 43ce1e1 7ef24ef 43ce1e1 7ef24ef 43ce1e1 7ef24ef 43ce1e1 7ef24ef 43ce1e1 7ef24ef 43ce1e1 7ef24ef 43ce1e1 7ef24ef 43ce1e1 7ef24ef 43ce1e1 7ef24ef 43ce1e1 7ef24ef 43ce1e1 7ef24ef 43ce1e1 7ef24ef 43ce1e1 7ef24ef 43ce1e1 7ef24ef 43ce1e1 7ef24ef 43ce1e1 7ef24ef 82b80c0 7ef24ef 82b80c0 7ef24ef 5a03810 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 |
# π GAIA Agent Production Deployment Guide
## System Architecture: Qwen Models + LangGraph Workflow
### **π― Updated System Requirements**
**GAIA Agent now uses ONLY:**
- β
**Qwen 2.5 Models**: 7B/32B/72B via HuggingFace Inference API
- β
**LangGraph Workflow**: Multi-agent orchestration with synthesis
- β
**Specialized Agents**: Router, web research, file processing, reasoning
- β
**Professional Tools**: Wikipedia, web search, calculator, file processor
- β **No Fallbacks**: Requires proper authentication - no simplified responses
### **π¨ Authentication Requirements - CRITICAL**
**The system now REQUIRES proper authentication:**
```python
# REQUIRED: HuggingFace token with inference permissions
HF_TOKEN=hf_your_token_here
# The system will FAIL without proper authentication
# No SimpleClient fallback available
```
### **π― Expected Results**
With proper authentication and Qwen model access:
- **β
GAIA Benchmark Score**: 30%+ (full LangGraph workflow with Qwen models)
- **β
Multi-Agent Processing**: Router β Specialized Agents β Tools β Synthesis
- **β
Intelligent Model Selection**: 7B (fast) β 32B (balanced) β 72B (complex)
- **β
Professional Tools**: Wikipedia API, DuckDuckGo search, calculator, file processor
- **β
Detailed Analysis**: Processing details, confidence scores, cost tracking
**Without proper authentication:**
- **β System Initialization Fails**: No fallback options available
- **β Clear Error Messages**: Guides users to proper authentication setup
## π§ Technical Implementation
### OAuth Authentication (Production)
```python
class GAIAAgentApp:
def __init__(self, hf_token: Optional[str] = None):
if not hf_token:
raise ValueError("HuggingFace token with inference permissions is required")
# Initialize QwenClient with token
self.llm_client = QwenClient(hf_token=hf_token)
# Initialize LangGraph workflow with tools
self.workflow = SimpleGAIAWorkflow(self.llm_client)
# OAuth token extraction in production
def run_and_submit_all(profile: gr.OAuthProfile | None):
oauth_token = getattr(profile, 'oauth_token', None)
agent = GAIAAgentApp.create_with_oauth_token(oauth_token)
```
### Qwen Model Configuration
```python
# QwenClient now uses ONLY Qwen models
self.models = {
ModelTier.ROUTER: ModelConfig(
name="Qwen/Qwen2.5-7B-Instruct", # Fast classification
cost_per_token=0.0003
),
ModelTier.MAIN: ModelConfig(
name="Qwen/Qwen2.5-32B-Instruct", # Balanced performance
cost_per_token=0.0008
),
ModelTier.COMPLEX: ModelConfig(
name="Qwen/Qwen2.5-72B-Instruct", # Best performance
cost_per_token=0.0015
)
}
```
### Error Handling
```python
# Clear error messages guide users to proper authentication
if not oauth_token:
return "Authentication Required: Valid token with inference permissions needed for Qwen model access."
try:
agent = GAIAAgentApp.create_with_oauth_token(oauth_token)
except ValueError as ve:
return f"Authentication Error: {ve}"
except RuntimeError as re:
return f"System Error: {re}"
```
## π― Deployment Steps
### 1. Pre-Deployment Checklist
- [ ] **Code Ready**: All Qwen-only changes committed
- [ ] **Dependencies**: `requirements.txt` updated with all packages
- [ ] **Testing**: QwenClient initialization test passes locally
- [ ] **Environment**: No hardcoded tokens in code
- [ ] **Authentication**: HF_TOKEN available with inference permissions
### 2. HuggingFace Space Configuration
Create a new HuggingFace Space with these settings:
```yaml
# Space Configuration
title: "GAIA Agent System"
emoji: "π€"
colorFrom: "blue"
colorTo: "green"
sdk: gradio
sdk_version: "4.44.0"
app_file: "src/app.py"
pinned: false
license: "mit"
suggested_hardware: "cpu-basic"
suggested_storage: "small"
```
### 3. Required Files Structure
```
/
βββ src/
β βββ app.py # Main application (Qwen + LangGraph)
β βββ models/
β β βββ qwen_client.py # Qwen-only client
β βββ agents/ # All agent files
β βββ tools/ # All tool files
β βββ workflow/ # LangGraph workflow
β βββ requirements.txt # All dependencies
βββ README.md # Space documentation
βββ .gitignore # Exclude sensitive files
```
### 4. Environment Variables (Space Secrets)
**π― CRITICAL: Set HF_TOKEN for Qwen Model Access**
To get **real GAIA Agent performance** with Qwen models and LangGraph workflow:
```bash
# REQUIRED for Qwen model access and LangGraph functionality
HF_TOKEN=hf_your_token_here # REQUIRED: Your HuggingFace token
```
**How to set HF_TOKEN:**
1. Go to your Space settings in HuggingFace
2. Navigate to "Repository secrets"
3. Add new secret:
- **Name**: `HF_TOKEN`
- **Value**: Your HuggingFace token (from [https://huggingface.co/settings/tokens](https://huggingface.co/settings/tokens))
β οΈ **IMPORTANT**: Do NOT set `HF_TOKEN` as a regular environment variable - use Space secrets for security.
**Token Requirements:**
- Token must have **`read`** and **`inference`** scopes
- Generate token at: https://huggingface.co/settings/tokens
- Select "Fine-grained" token type
- Enable both scopes for Qwen model functionality
**Optional environment variables:**
```bash
# Optional: LangSmith tracing (if you want observability)
LANGCHAIN_TRACING_V2=true # Optional: LangSmith tracing
LANGCHAIN_API_KEY=your_key_here # Optional: LangSmith API key
LANGCHAIN_PROJECT=gaia-agent # Optional: LangSmith project
```
### 5. Authentication Flow in Production
```python
# Production OAuth Flow:
1. User clicks "Login with HuggingFace" button
2. OAuth flow provides profile with token
3. System validates OAuth token for Qwen model access
4. If sufficient scopes: Initialize QwenClient with LangGraph workflow
5. If insufficient scopes: Show clear error message with guidance
6. System either works fully or fails clearly - no degraded modes
```
#### OAuth Requirements β οΈ
**CRITICAL**: Gradio OAuth tokens often have **limited scopes** by default:
- β
**"read" scope**: Can access user profile, model info
- β **"inference" scope**: Often missing - REQUIRED for Qwen models
- β **"write" scope**: Not needed for this application
**System Behavior**:
- **Full-scope token**: Uses Qwen models with LangGraph β 30%+ GAIA performance
- **Limited-scope token**: Clear error message β User guided to proper authentication
- **No token**: Clear error message β User guided to login
**Clear Error Handling**:
```python
# No more fallback confusion - clear requirements
if test_response.status_code == 401:
return "Authentication Error: Your OAuth token lacks inference permissions. Please logout and login again with full access."
```
### 6. Deployment Process
1. **Create Space**:
```bash
# Visit https://huggingface.co/new-space
# Choose Gradio SDK
# Upload all files from src/ directory
```
2. **Upload Files**:
- Copy entire `src/` directory to Space
- Ensure `app.py` is the main entry point
- Include all dependencies in `requirements.txt`
3. **Test Authentication**:
- Space automatically enables OAuth for Gradio apps
- Test login/logout functionality
- Verify Qwen model access works
- Test GAIA evaluation with LangGraph workflow
### 7. Verification Steps
After deployment, verify these work:
- [ ] **Interface Loads**: Gradio interface appears correctly
- [ ] **OAuth Login**: Login button works and shows user profile
- [ ] **Authentication Check**: Clear error messages when insufficient permissions
- [ ] **Qwen Model Access**: Models initialize and respond correctly
- [ ] **LangGraph Workflow**: Multi-agent system processes questions
- [ ] **Manual Testing**: Individual questions work with full workflow
- [ ] **GAIA Evaluation**: Full evaluation runs and submits to Unit 4 API
- [ ] **Results Display**: Scores and detailed results show correctly
### 8. Troubleshooting
#### Common Issues
**Issue**: "HuggingFace token with inference permissions is required"
**Solution**: Set HF_TOKEN in Space secrets or login with full OAuth permissions
**Issue**: "Failed to initialize any Qwen models"
**Solution**: Verify HF_TOKEN has inference scope and Qwen model access
**Issue**: "Authentication Error: Your OAuth token lacks inference permissions"
**Solution**: Logout and login again, or set HF_TOKEN as Space secret
#### Debug Commands
```python
# In Space, add debug logging to check authentication:
logger.info(f"HF_TOKEN available: {os.getenv('HF_TOKEN') is not None}")
logger.info(f"OAuth token available: {oauth_token is not None}")
logger.info(f"Qwen models initialized: {client.get_model_status()}")
```
### 9. Performance Optimization
For production efficiency with Qwen models:
```python
# Intelligent Model Selection Strategy
- Simple questions: Qwen 2.5-7B (fast, cost-effective)
- Medium complexity: Qwen 2.5-32B (balanced performance)
- Complex reasoning: Qwen 2.5-72B (best quality)
- Budget management: Auto-downgrade when budget exceeded
- LangGraph workflow: Optimal agent routing and synthesis
```
### 10. Monitoring and Maintenance
**Key Metrics to Monitor**:
- GAIA benchmark success rate (target: 30%+)
- Average response time per question
- Cost per question processed
- LangGraph workflow success rate
- Qwen model availability and performance
**Regular Maintenance**:
- Monitor HuggingFace Inference API status
- Update dependencies for security
- Review and optimize LangGraph workflow performance
- Check Unit 4 API compatibility
- Monitor Qwen model performance and costs
## π― Success Metrics
### Expected Production Results π
With proper deployment and authentication:
- **GAIA Benchmark**: 30%+ success rate
- **LangGraph Workflow**: Multi-agent orchestration working
- **Qwen Model Performance**: Intelligent tier selection (7Bβ32Bβ72B)
- **User Experience**: Professional interface with clear authentication
- **System Reliability**: Clear success/failure modes (no degraded performance)
### Final Status:
- **Architecture**: Qwen 2.5 models + LangGraph multi-agent workflow
- **Requirements**: Clear authentication requirements (HF_TOKEN or OAuth with inference)
- **Performance**: 30%+ GAIA benchmark with full functionality
- **Reliability**: Robust error handling with clear user guidance
- **Deployment**: Ready for immediate HuggingFace Space deployment
**The GAIA Agent is now a focused, high-performance system using proper AI models and multi-agent orchestration!** π
|