| # Deployment Guide - Runpod Cloud | |
| After local testing is complete, follow this guide to deploy your Feedback Analysis Agent to Runpod. | |
| --- | |
| ## ✅ Pre-Deployment Checklist | |
| Before deploying to Runpod, ensure: | |
| - [ ] All local tests pass: `python3 scripts/validate_local.py` shows 7/7 ✅ | |
| - [ ] API server runs locally: `python3 run.py` starts without errors | |
| - [ ] Endpoints tested: Use TESTING_CHECKLIST.md or curl commands | |
| - [ ] Git repository clean: `git status` shows no uncommitted changes | |
| - [ ] All code committed: `git log --oneline | head -5` shows your commits | |
| - [ ] Docker image builds: `docker build -t feedback-analysis:latest .` succeeds | |
| - [ ] Requirements.txt updated: All dependencies listed | |
| --- | |
| ## 📦 Step 1: Prepare Docker Image | |
| ### 1.1 Build Docker Image Locally | |
| ```bash | |
| cd /Users/galbd/Desktop/personal/software/ai_agent_gov/Feedback_Analysis_RAG_Agent_runpod | |
| # Build the image | |
| docker build -t feedback-analysis:latest . | |
| # Verify it built | |
| docker images | grep feedback-analysis | |
| ``` | |
| **Expected output:** | |
| ``` | |
| REPOSITORY TAG IMAGE ID CREATED SIZE | |
| feedback-analysis latest abc123def456 2 minutes ago 2.5GB | |
| ``` | |
| ### 1.2 Test Docker Image Locally (Optional) | |
| ```bash | |
| # Run container | |
| docker run -p 8001:8000 feedback-analysis:latest | |
| # In another terminal, test | |
| curl -X POST http://localhost:8001/health | |
| ``` | |
| **Expected:** `{"status":"ok"}` | |
| --- | |
| ## 🔑 Step 2: Set Up Docker Registry | |
| ### Option A: Docker Hub (Easiest) | |
| **2A.1 Create Docker Hub Account** | |
| - Go to https://hub.docker.com | |
| - Sign up for free account | |
| - Note your username (e.g., `galbendavids`) | |
| **2A.2 Login to Docker** | |
| ```bash | |
| docker login | |
| # Enter your Docker Hub username and password | |
| ``` | |
| **2A.3 Tag and Push Image** | |
| ```bash | |
| # Tag with your Docker Hub username | |
| docker tag feedback-analysis:latest galbendavids/feedback-analysis:latest | |
| # Push to Docker Hub | |
| docker push galbendavids/feedback-analysis:latest | |
| # Verify it's uploaded | |
| # Visit https://hub.docker.com/r/YOUR_USERNAME/feedback-analysis | |
| ``` | |
| ### Option B: Private Registry (Advanced) | |
| - Use AWS ECR, Google Container Registry, or Azure Container Registry | |
| - Follow their documentation for authentication and push | |
| --- | |
| ## 🚀 Step 3: Create Runpod Template | |
| ### 3.1 Access Runpod Console | |
| 1. Go to https://www.runpod.io | |
| 2. Sign in to your account (create if needed) | |
| 3. Click **"Console"** in top menu | |
| 4. Go to **"Serverless"** or **"Pods"** section | |
| ### 3.2 Create New Template | |
| **For Serverless Endpoints (Recommended):** | |
| 1. Click **"Create New"** → **"API Endpoint Template"** | |
| 2. Fill in: | |
| - **Template Name:** `feedback-analysis-sql` | |
| - **Docker Image:** `galbendavids/feedback-analysis:latest` | |
| - **Ports:** `8000` | |
| - **GPU:** None (CPU-only is fine) | |
| - **Memory:** 4GB minimum | |
| - **Environment Variables:** | |
| ``` | |
| GEMINI_API_KEY=your_key_here (optional) | |
| OPENAI_API_KEY=sk-... (optional) | |
| ``` | |
| 3. Click **"Save Template"** | |
| **For Pods (Traditional VM):** | |
| 1. Click **"Create"** → **"New Pod"** | |
| 2. Select template | |
| 3. Choose GPU type (optional, not needed for this workload) | |
| 4. Set min/max auto-scale settings | |
| 5. Click **"Run Pod"** | |
| ### 3.3 Configure Networking | |
| - **Expose Port:** 8000 | |
| - **HTTPS:** Enabled automatically | |
| - **Public URL:** Runpod generates automatically | |
| --- | |
| ## 🧪 Step 4: Test Deployed Endpoint | |
| ### 4.1 Get Endpoint URL | |
| After deployment, Runpod provides a URL like: | |
| ``` | |
| https://your-endpoint-id.runpod-pods.net/ | |
| ``` | |
| Or for Serverless: | |
| ``` | |
| https://api.runpod.io/v1/YOUR_ENDPOINT_ID/run | |
| ``` | |
| ### 4.2 Test Basic Connectivity | |
| ```bash | |
| # For Pods (direct connection) | |
| curl -X POST https://your-endpoint-id.runpod-pods.net/health | |
| # For Serverless (requires different format) | |
| # See Runpod API documentation | |
| ``` | |
| **Expected response:** | |
| ```json | |
| {"status":"ok"} | |
| ``` | |
| ### 4.3 Test Query Endpoint | |
| ```bash | |
| curl -X POST https://your-endpoint-id.runpod-pods.net/query \ | |
| -H "Content-Type: application/json" \ | |
| -d '{"query":"כמה משתמשים כתבו תודה","top_k":5}' | |
| ``` | |
| **Expected response:** | |
| ```json | |
| { | |
| "query": "כמה משתמשים כתבו תודה", | |
| "summary": "1168 משובים מכילים ביטויי תודה.", | |
| "results": [...] | |
| } | |
| ``` | |
| ### 4.4 Test All Endpoints | |
| Use the same curl commands from TESTING_CHECKLIST.md, but replace: | |
| - `http://localhost:8000` → `https://your-endpoint-id.runpod-pods.net` | |
| Or use Swagger UI: | |
| - `https://your-endpoint-id.runpod-pods.net/docs` | |
| --- | |
| ## 💰 Step 5: Configure Auto-Scaling (Optional) | |
| In Runpod Pod settings: | |
| 1. **Minimum GPUs:** 0 (not needed) | |
| 2. **Maximum GPUs:** 1 (if you add GPU support) | |
| 3. **Idle timeout:** 5 minutes | |
| 4. **Auto-pause:** Enabled (to save costs) | |
| --- | |
| ## 🔐 Step 6: Add API Keys (Optional) | |
| If you want LLM summaries (not required, system works without): | |
| ### 6.1 In Runpod Dashboard | |
| 1. Go to Pod settings | |
| 2. Add Environment Variables: | |
| ``` | |
| GEMINI_API_KEY=your_actual_key | |
| OPENAI_API_KEY=sk-your_actual_key | |
| ``` | |
| 3. Restart pod | |
| ### 6.2 Get API Keys | |
| **For Google Gemini:** | |
| 1. Go to https://makersuite.google.com/app/apikeys | |
| 2. Click "Create API Key" | |
| 3. Copy the key | |
| **For OpenAI:** | |
| 1. Go to https://platform.openai.com/api-keys | |
| 2. Create new secret key | |
| 3. Copy the key | |
| --- | |
| ## 📊 Step 7: Monitor & Manage | |
| ### 7.1 Check Logs | |
| In Runpod dashboard: | |
| 1. Click on your pod/endpoint | |
| 2. View **Logs** tab | |
| 3. Look for errors or warnings | |
| ### 7.2 Performance Metrics | |
| Monitor: | |
| - **CPU usage:** Should be <50% at rest | |
| - **Memory:** Should be <80% usage | |
| - **Response times:** Query endpoint 1-3 seconds | |
| - **Uptime:** Should be 99%+ | |
| ### 7.3 Scale & Pricing | |
| - **Auto-scaling:** Runpod manages based on demand | |
| - **Costs:** Typically $0.25-$0.50/hour for 4GB CPU-only pod | |
| - **Savings:** Pod auto-pauses when idle (no charge) | |
| --- | |
| ## 🔄 Step 8: Update Deployment | |
| ### When You Update Code | |
| 1. **Make changes locally** | |
| ```bash | |
| # Edit code, test locally | |
| git add . | |
| git commit -m "feat: new feature" | |
| git push origin main | |
| ``` | |
| 2. **Rebuild Docker image** | |
| ```bash | |
| docker build -t feedback-analysis:v2 . | |
| docker tag feedback-analysis:v2 galbendavids/feedback-analysis:v2 | |
| docker push galbendavids/feedback-analysis:v2 | |
| ``` | |
| 3. **Update Runpod template** | |
| - Edit template image: `galbendavids/feedback-analysis:v2` | |
| - Save | |
| - Restart pod with new image | |
| 4. **Or redeploy** | |
| - Delete old pod | |
| - Create new pod from updated template | |
| --- | |
| ## ✨ Advanced: Optimization for Cloud | |
| ### A. Pre-download Models in Dockerfile | |
| To avoid long first-request delays in cloud, add to Dockerfile: | |
| ```dockerfile | |
| # After RUN pip install requirements.txt | |
| # Pre-download embedding model | |
| RUN python3 -c "from sentence_transformers import SentenceTransformer; SentenceTransformer('paraphrase-multilingual-MiniLM-L12-v2')" | |
| # Pre-download sentiment model | |
| RUN python3 -c "from transformers import pipeline; pipeline('sentiment-analysis', model='nlptown/bert-base-multilingual-uncased-sentiment')" | |
| ``` | |
| This adds ~2GB to image, but eliminates download on first request. | |
| ### B. Use GPU for Faster Embeddings | |
| ```dockerfile | |
| # Install GPU support | |
| RUN pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118 | |
| RUN pip install faiss-gpu | |
| ``` | |
| Then in Runpod, select a GPU pod (more expensive but faster). | |
| ### C. Enable Caching | |
| Add to `app/config.py`: | |
| ```python | |
| EMBEDDING_CACHE_SIZE = 10000 # Cache more embeddings | |
| INDEX_RELOAD_INTERVAL = 3600 # Reload index hourly | |
| ``` | |
| --- | |
| ## 🐛 Troubleshooting | |
| ### Problem: Pod won't start | |
| ``` | |
| Error: Container failed to start | |
| ``` | |
| **Fix:** Check Dockerfile syntax and ensure image builds locally first. | |
| ### Problem: Out of memory | |
| ``` | |
| OOMKilled or similar | |
| ``` | |
| **Fix:** Increase allocated memory in pod settings (go from 4GB to 8GB). | |
| ### Problem: Slow responses | |
| ``` | |
| Queries taking >10 seconds | |
| ``` | |
| **Fix:** | |
| - Add GPU support | |
| - Pre-download models (see optimization section) | |
| - Increase allocated CPU cores | |
| ### Problem: Model not found | |
| ``` | |
| Error: Model 'xyz' not found | |
| ``` | |
| **Fix:** Add model download to Dockerfile (see optimization section). | |
| ### Problem: HTTPS certificate error | |
| ``` | |
| SSL Certificate verification failed | |
| ``` | |
| **Fix:** Runpod handles this automatically, should not occur. | |
| --- | |
| ## 📈 Monitoring & Alerts | |
| ### Set Up Alerts (Optional) | |
| 1. Go to Runpod **Billing** tab | |
| 2. Set max spend limit | |
| 3. Enable email alerts | |
| ### Check Status | |
| ```bash | |
| # Query your endpoint | |
| curl -X POST https://your-endpoint-id.runpod-pods.net/health | |
| # If it fails, pod may be down | |
| # Check Runpod dashboard for status | |
| ``` | |
| --- | |
| ## 🔄 Rollback Plan | |
| If deployment has issues: | |
| 1. **Keep previous image tagged** | |
| ```bash | |
| docker tag galbendavids/feedback-analysis:v1 galbendavids/feedback-analysis:latest-stable | |
| docker push galbendavids/feedback-analysis:latest-stable | |
| ``` | |
| 2. **If new deployment fails, revert** | |
| - Update Runpod template back to `latest-stable` | |
| - Restart pod | |
| - Investigate issue locally | |
| 3. **Don't delete old pods immediately** | |
| - Keep for at least 1 day | |
| - Then delete if new version stable | |
| --- | |
| ## 🎯 Testing Checklist Before Going Live | |
| Before sharing endpoint with users: | |
| - [ ] `/health` endpoint responds | |
| - [ ] `/query` endpoint returns results | |
| - [ ] Hebrew queries work correctly | |
| - [ ] Response times acceptable (<5s for most queries) | |
| - [ ] Error handling working (try invalid JSON) | |
| - [ ] Swagger UI accessible at `/docs` | |
| - [ ] SSL/HTTPS working (URL is secure) | |
| - [ ] Logs show no errors | |
| - [ ] Auto-scaling responding to load | |
| --- | |
| ## 📋 Production Deployment Checklist | |
| Before announcing to users: | |
| - [ ] Load tested with 100+ concurrent requests | |
| - [ ] Backup plan documented | |
| - [ ] Monitoring alerts set up | |
| - [ ] Support procedure documented | |
| - [ ] SLA defined (99.9% uptime target, etc.) | |
| - [ ] Rate limiting configured (optional) | |
| - [ ] API key authentication enforced (optional) | |
| - [ ] CORS settings reviewed | |
| - [ ] Backup of deployment config saved | |
| - [ ] Runpod support ticket submitted for any questions | |
| --- | |
| ## 📞 Support & Resources | |
| - **Runpod Docs:** https://docs.runpod.io | |
| - **Runpod Community:** https://forums.runpod.io | |
| - **FastAPI Docs:** https://fastapi.tiangolo.com | |
| - **Docker Docs:** https://docs.docker.com | |
| --- | |
| ## 🎓 What's Next | |
| After successful deployment: | |
| 1. **Monitor the endpoint** - Check logs daily | |
| 2. **Gather feedback** - What works well, what needs improvement | |
| 3. **Iterate** - Make improvements, redeploy | |
| 4. **Scale** - Add more features, more data | |
| 5. **Secure** - Add authentication, rate limiting as needed | |
| --- | |
| ## ✅ Congratulations! | |
| Your SQL-based feedback analysis agent is now live in the cloud! 🎉 | |
| **Summary:** | |
| - ✅ Local validation complete | |
| - ✅ Docker image built | |
| - ✅ Deployed to Runpod | |
| - ✅ Cloud endpoint tested | |
| - ✅ Ready for production | |
| **Next:** Share the endpoint URL with users or integrate into your application. | |
| --- | |
| *Last Updated: Today* | |
| *Version: 1.0* | |
| *Status: Production Ready* ✨ | |