# 🚀 GAIA Agent Production Deployment Guide ## System Architecture: Qwen Models + LangGraph Workflow ### **🎯 Updated System Requirements** **GAIA Agent now uses ONLY:** - ✅ **Qwen 2.5 Models**: 7B/32B/72B via HuggingFace Inference API - ✅ **LangGraph Workflow**: Multi-agent orchestration with synthesis - ✅ **Specialized Agents**: Router, web research, file processing, reasoning - ✅ **Professional Tools**: Wikipedia, web search, calculator, file processor - ❌ **No Fallbacks**: Requires proper authentication - no simplified responses ### **🚨 Authentication Requirements - CRITICAL** **The system now REQUIRES proper authentication:** ```python # REQUIRED: HuggingFace token with inference permissions HF_TOKEN=hf_your_token_here # The system will FAIL without proper authentication # No SimpleClient fallback available ``` ### **🎯 Expected Results** With proper authentication and Qwen model access: - **✅ GAIA Benchmark Score**: 30%+ (full LangGraph workflow with Qwen models) - **✅ Multi-Agent Processing**: Router → Specialized Agents → Tools → Synthesis - **✅ Intelligent Model Selection**: 7B (fast) → 32B (balanced) → 72B (complex) - **✅ Professional Tools**: Wikipedia API, DuckDuckGo search, calculator, file processor - **✅ Detailed Analysis**: Processing details, confidence scores, cost tracking **Without proper authentication:** - **❌ System Initialization Fails**: No fallback options available - **❌ Clear Error Messages**: Guides users to proper authentication setup ## 🔧 Technical Implementation ### OAuth Authentication (Production) ```python class GAIAAgentApp: def __init__(self, hf_token: Optional[str] = None): if not hf_token: raise ValueError("HuggingFace token with inference permissions is required") # Initialize QwenClient with token self.llm_client = QwenClient(hf_token=hf_token) # Initialize LangGraph workflow with tools self.workflow = SimpleGAIAWorkflow(self.llm_client) # OAuth token extraction in production def run_and_submit_all(profile: gr.OAuthProfile | None): oauth_token = getattr(profile, 'oauth_token', None) agent = GAIAAgentApp.create_with_oauth_token(oauth_token) ``` ### Qwen Model Configuration ```python # QwenClient now uses ONLY Qwen models self.models = { ModelTier.ROUTER: ModelConfig( name="Qwen/Qwen2.5-7B-Instruct", # Fast classification cost_per_token=0.0003 ), ModelTier.MAIN: ModelConfig( name="Qwen/Qwen2.5-32B-Instruct", # Balanced performance cost_per_token=0.0008 ), ModelTier.COMPLEX: ModelConfig( name="Qwen/Qwen2.5-72B-Instruct", # Best performance cost_per_token=0.0015 ) } ``` ### Error Handling ```python # Clear error messages guide users to proper authentication if not oauth_token: return "Authentication Required: Valid token with inference permissions needed for Qwen model access." try: agent = GAIAAgentApp.create_with_oauth_token(oauth_token) except ValueError as ve: return f"Authentication Error: {ve}" except RuntimeError as re: return f"System Error: {re}" ``` ## 🎯 Deployment Steps ### 1. Pre-Deployment Checklist - [ ] **Code Ready**: All Qwen-only changes committed - [ ] **Dependencies**: `requirements.txt` updated with all packages - [ ] **Testing**: QwenClient initialization test passes locally - [ ] **Environment**: No hardcoded tokens in code - [ ] **Authentication**: HF_TOKEN available with inference permissions ### 2. HuggingFace Space Configuration Create a new HuggingFace Space with these settings: ```yaml # Space Configuration title: "GAIA Agent System" emoji: "🤖" colorFrom: "blue" colorTo: "green" sdk: gradio sdk_version: "4.44.0" app_file: "src/app.py" pinned: false license: "mit" suggested_hardware: "cpu-basic" suggested_storage: "small" ``` ### 3. Required Files Structure ``` / ├── src/ │ ├── app.py # Main application (Qwen + LangGraph) │ ├── models/ │ │ └── qwen_client.py # Qwen-only client │ ├── agents/ # All agent files │ ├── tools/ # All tool files │ ├── workflow/ # LangGraph workflow │ └── requirements.txt # All dependencies ├── README.md # Space documentation └── .gitignore # Exclude sensitive files ``` ### 4. Environment Variables (Space Secrets) **🎯 CRITICAL: Set HF_TOKEN for Qwen Model Access** To get **real GAIA Agent performance** with Qwen models and LangGraph workflow: ```bash # REQUIRED for Qwen model access and LangGraph functionality HF_TOKEN=hf_your_token_here # REQUIRED: Your HuggingFace token ``` **How to set HF_TOKEN:** 1. Go to your Space settings in HuggingFace 2. Navigate to "Repository secrets" 3. Add new secret: - **Name**: `HF_TOKEN` - **Value**: Your HuggingFace token (from [https://huggingface.co/settings/tokens](https://huggingface.co/settings/tokens)) ⚠️ **IMPORTANT**: Do NOT set `HF_TOKEN` as a regular environment variable - use Space secrets for security. **Token Requirements:** - Token must have **`read`** and **`inference`** scopes - Generate token at: https://huggingface.co/settings/tokens - Select "Fine-grained" token type - Enable both scopes for Qwen model functionality **Optional environment variables:** ```bash # Optional: LangSmith tracing (if you want observability) LANGCHAIN_TRACING_V2=true # Optional: LangSmith tracing LANGCHAIN_API_KEY=your_key_here # Optional: LangSmith API key LANGCHAIN_PROJECT=gaia-agent # Optional: LangSmith project ``` ### 5. Authentication Flow in Production ```python # Production OAuth Flow: 1. User clicks "Login with HuggingFace" button 2. OAuth flow provides profile with token 3. System validates OAuth token for Qwen model access 4. If sufficient scopes: Initialize QwenClient with LangGraph workflow 5. If insufficient scopes: Show clear error message with guidance 6. System either works fully or fails clearly - no degraded modes ``` #### OAuth Requirements ⚠️ **CRITICAL**: Gradio OAuth tokens often have **limited scopes** by default: - ✅ **"read" scope**: Can access user profile, model info - ❌ **"inference" scope**: Often missing - REQUIRED for Qwen models - ❌ **"write" scope**: Not needed for this application **System Behavior**: - **Full-scope token**: Uses Qwen models with LangGraph → 30%+ GAIA performance - **Limited-scope token**: Clear error message → User guided to proper authentication - **No token**: Clear error message → User guided to login **Clear Error Handling**: ```python # No more fallback confusion - clear requirements if test_response.status_code == 401: return "Authentication Error: Your OAuth token lacks inference permissions. Please logout and login again with full access." ``` ### 6. Deployment Process 1. **Create Space**: ```bash # Visit https://huggingface.co/new-space # Choose Gradio SDK # Upload all files from src/ directory ``` 2. **Upload Files**: - Copy entire `src/` directory to Space - Ensure `app.py` is the main entry point - Include all dependencies in `requirements.txt` 3. **Test Authentication**: - Space automatically enables OAuth for Gradio apps - Test login/logout functionality - Verify Qwen model access works - Test GAIA evaluation with LangGraph workflow ### 7. Verification Steps After deployment, verify these work: - [ ] **Interface Loads**: Gradio interface appears correctly - [ ] **OAuth Login**: Login button works and shows user profile - [ ] **Authentication Check**: Clear error messages when insufficient permissions - [ ] **Qwen Model Access**: Models initialize and respond correctly - [ ] **LangGraph Workflow**: Multi-agent system processes questions - [ ] **Manual Testing**: Individual questions work with full workflow - [ ] **GAIA Evaluation**: Full evaluation runs and submits to Unit 4 API - [ ] **Results Display**: Scores and detailed results show correctly ### 8. Troubleshooting #### Common Issues **Issue**: "HuggingFace token with inference permissions is required" **Solution**: Set HF_TOKEN in Space secrets or login with full OAuth permissions **Issue**: "Failed to initialize any Qwen models" **Solution**: Verify HF_TOKEN has inference scope and Qwen model access **Issue**: "Authentication Error: Your OAuth token lacks inference permissions" **Solution**: Logout and login again, or set HF_TOKEN as Space secret #### Debug Commands ```python # In Space, add debug logging to check authentication: logger.info(f"HF_TOKEN available: {os.getenv('HF_TOKEN') is not None}") logger.info(f"OAuth token available: {oauth_token is not None}") logger.info(f"Qwen models initialized: {client.get_model_status()}") ``` ### 9. Performance Optimization For production efficiency with Qwen models: ```python # Intelligent Model Selection Strategy - Simple questions: Qwen 2.5-7B (fast, cost-effective) - Medium complexity: Qwen 2.5-32B (balanced performance) - Complex reasoning: Qwen 2.5-72B (best quality) - Budget management: Auto-downgrade when budget exceeded - LangGraph workflow: Optimal agent routing and synthesis ``` ### 10. Monitoring and Maintenance **Key Metrics to Monitor**: - GAIA benchmark success rate (target: 30%+) - Average response time per question - Cost per question processed - LangGraph workflow success rate - Qwen model availability and performance **Regular Maintenance**: - Monitor HuggingFace Inference API status - Update dependencies for security - Review and optimize LangGraph workflow performance - Check Unit 4 API compatibility - Monitor Qwen model performance and costs ## 🎯 Success Metrics ### Expected Production Results 🚀 With proper deployment and authentication: - **GAIA Benchmark**: 30%+ success rate - **LangGraph Workflow**: Multi-agent orchestration working - **Qwen Model Performance**: Intelligent tier selection (7B→32B→72B) - **User Experience**: Professional interface with clear authentication - **System Reliability**: Clear success/failure modes (no degraded performance) ### Final Status: - **Architecture**: Qwen 2.5 models + LangGraph multi-agent workflow - **Requirements**: Clear authentication requirements (HF_TOKEN or OAuth with inference) - **Performance**: 30%+ GAIA benchmark with full functionality - **Reliability**: Robust error handling with clear user guidance - **Deployment**: Ready for immediate HuggingFace Space deployment **The GAIA Agent is now a focused, high-performance system using proper AI models and multi-agent orchestration!** 🎉