galbendavids's picture
עדכון: הסרת RAG, הוספת ארכיטקטורה מפורטת, תיקון לינקים, שינוי שם פרויקט ל-SQL-based
f073efc
# Deployment Guide - Runpod Cloud
After local testing is complete, follow this guide to deploy your Feedback Analysis Agent to Runpod.
---
## ✅ Pre-Deployment Checklist
Before deploying to Runpod, ensure:
- [ ] All local tests pass: `python3 scripts/validate_local.py` shows 7/7 ✅
- [ ] API server runs locally: `python3 run.py` starts without errors
- [ ] Endpoints tested: Use TESTING_CHECKLIST.md or curl commands
- [ ] Git repository clean: `git status` shows no uncommitted changes
- [ ] All code committed: `git log --oneline | head -5` shows your commits
- [ ] Docker image builds: `docker build -t feedback-analysis:latest .` succeeds
- [ ] Requirements.txt updated: All dependencies listed
---
## 📦 Step 1: Prepare Docker Image
### 1.1 Build Docker Image Locally
```bash
cd /Users/galbd/Desktop/personal/software/ai_agent_gov/Feedback_Analysis_RAG_Agent_runpod
# Build the image
docker build -t feedback-analysis:latest .
# Verify it built
docker images | grep feedback-analysis
```
**Expected output:**
```
REPOSITORY TAG IMAGE ID CREATED SIZE
feedback-analysis latest abc123def456 2 minutes ago 2.5GB
```
### 1.2 Test Docker Image Locally (Optional)
```bash
# Run container
docker run -p 8001:8000 feedback-analysis:latest
# In another terminal, test
curl -X POST http://localhost:8001/health
```
**Expected:** `{"status":"ok"}`
---
## 🔑 Step 2: Set Up Docker Registry
### Option A: Docker Hub (Easiest)
**2A.1 Create Docker Hub Account**
- Go to https://hub.docker.com
- Sign up for free account
- Note your username (e.g., `galbendavids`)
**2A.2 Login to Docker**
```bash
docker login
# Enter your Docker Hub username and password
```
**2A.3 Tag and Push Image**
```bash
# Tag with your Docker Hub username
docker tag feedback-analysis:latest galbendavids/feedback-analysis:latest
# Push to Docker Hub
docker push galbendavids/feedback-analysis:latest
# Verify it's uploaded
# Visit https://hub.docker.com/r/YOUR_USERNAME/feedback-analysis
```
### Option B: Private Registry (Advanced)
- Use AWS ECR, Google Container Registry, or Azure Container Registry
- Follow their documentation for authentication and push
---
## 🚀 Step 3: Create Runpod Template
### 3.1 Access Runpod Console
1. Go to https://www.runpod.io
2. Sign in to your account (create if needed)
3. Click **"Console"** in top menu
4. Go to **"Serverless"** or **"Pods"** section
### 3.2 Create New Template
**For Serverless Endpoints (Recommended):**
1. Click **"Create New"** → **"API Endpoint Template"**
2. Fill in:
- **Template Name:** `feedback-analysis-sql`
- **Docker Image:** `galbendavids/feedback-analysis:latest`
- **Ports:** `8000`
- **GPU:** None (CPU-only is fine)
- **Memory:** 4GB minimum
- **Environment Variables:**
```
GEMINI_API_KEY=your_key_here (optional)
OPENAI_API_KEY=sk-... (optional)
```
3. Click **"Save Template"**
**For Pods (Traditional VM):**
1. Click **"Create"****"New Pod"**
2. Select template
3. Choose GPU type (optional, not needed for this workload)
4. Set min/max auto-scale settings
5. Click **"Run Pod"**
### 3.3 Configure Networking
- **Expose Port:** 8000
- **HTTPS:** Enabled automatically
- **Public URL:** Runpod generates automatically
---
## 🧪 Step 4: Test Deployed Endpoint
### 4.1 Get Endpoint URL
After deployment, Runpod provides a URL like:
```
https://your-endpoint-id.runpod-pods.net/
```
Or for Serverless:
```
https://api.runpod.io/v1/YOUR_ENDPOINT_ID/run
```
### 4.2 Test Basic Connectivity
```bash
# For Pods (direct connection)
curl -X POST https://your-endpoint-id.runpod-pods.net/health
# For Serverless (requires different format)
# See Runpod API documentation
```
**Expected response:**
```json
{"status":"ok"}
```
### 4.3 Test Query Endpoint
```bash
curl -X POST https://your-endpoint-id.runpod-pods.net/query \
-H "Content-Type: application/json" \
-d '{"query":"כמה משתמשים כתבו תודה","top_k":5}'
```
**Expected response:**
```json
{
"query": "כמה משתמשים כתבו תודה",
"summary": "1168 משובים מכילים ביטויי תודה.",
"results": [...]
}
```
### 4.4 Test All Endpoints
Use the same curl commands from TESTING_CHECKLIST.md, but replace:
- `http://localhost:8000` → `https://your-endpoint-id.runpod-pods.net`
Or use Swagger UI:
- `https://your-endpoint-id.runpod-pods.net/docs`
---
## 💰 Step 5: Configure Auto-Scaling (Optional)
In Runpod Pod settings:
1. **Minimum GPUs:** 0 (not needed)
2. **Maximum GPUs:** 1 (if you add GPU support)
3. **Idle timeout:** 5 minutes
4. **Auto-pause:** Enabled (to save costs)
---
## 🔐 Step 6: Add API Keys (Optional)
If you want LLM summaries (not required, system works without):
### 6.1 In Runpod Dashboard
1. Go to Pod settings
2. Add Environment Variables:
```
GEMINI_API_KEY=your_actual_key
OPENAI_API_KEY=sk-your_actual_key
```
3. Restart pod
### 6.2 Get API Keys
**For Google Gemini:**
1. Go to https://makersuite.google.com/app/apikeys
2. Click "Create API Key"
3. Copy the key
**For OpenAI:**
1. Go to https://platform.openai.com/api-keys
2. Create new secret key
3. Copy the key
---
## 📊 Step 7: Monitor & Manage
### 7.1 Check Logs
In Runpod dashboard:
1. Click on your pod/endpoint
2. View **Logs** tab
3. Look for errors or warnings
### 7.2 Performance Metrics
Monitor:
- **CPU usage:** Should be <50% at rest
- **Memory:** Should be <80% usage
- **Response times:** Query endpoint 1-3 seconds
- **Uptime:** Should be 99%+
### 7.3 Scale & Pricing
- **Auto-scaling:** Runpod manages based on demand
- **Costs:** Typically $0.25-$0.50/hour for 4GB CPU-only pod
- **Savings:** Pod auto-pauses when idle (no charge)
---
## 🔄 Step 8: Update Deployment
### When You Update Code
1. **Make changes locally**
```bash
# Edit code, test locally
git add .
git commit -m "feat: new feature"
git push origin main
```
2. **Rebuild Docker image**
```bash
docker build -t feedback-analysis:v2 .
docker tag feedback-analysis:v2 galbendavids/feedback-analysis:v2
docker push galbendavids/feedback-analysis:v2
```
3. **Update Runpod template**
- Edit template image: `galbendavids/feedback-analysis:v2`
- Save
- Restart pod with new image
4. **Or redeploy**
- Delete old pod
- Create new pod from updated template
---
## ✨ Advanced: Optimization for Cloud
### A. Pre-download Models in Dockerfile
To avoid long first-request delays in cloud, add to Dockerfile:
```dockerfile
# After RUN pip install requirements.txt
# Pre-download embedding model
RUN python3 -c "from sentence_transformers import SentenceTransformer; SentenceTransformer('paraphrase-multilingual-MiniLM-L12-v2')"
# Pre-download sentiment model
RUN python3 -c "from transformers import pipeline; pipeline('sentiment-analysis', model='nlptown/bert-base-multilingual-uncased-sentiment')"
```
This adds ~2GB to image, but eliminates download on first request.
### B. Use GPU for Faster Embeddings
```dockerfile
# Install GPU support
RUN pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
RUN pip install faiss-gpu
```
Then in Runpod, select a GPU pod (more expensive but faster).
### C. Enable Caching
Add to `app/config.py`:
```python
EMBEDDING_CACHE_SIZE = 10000 # Cache more embeddings
INDEX_RELOAD_INTERVAL = 3600 # Reload index hourly
```
---
## 🐛 Troubleshooting
### Problem: Pod won't start
```
Error: Container failed to start
```
**Fix:** Check Dockerfile syntax and ensure image builds locally first.
### Problem: Out of memory
```
OOMKilled or similar
```
**Fix:** Increase allocated memory in pod settings (go from 4GB to 8GB).
### Problem: Slow responses
```
Queries taking >10 seconds
```
**Fix:**
- Add GPU support
- Pre-download models (see optimization section)
- Increase allocated CPU cores
### Problem: Model not found
```
Error: Model 'xyz' not found
```
**Fix:** Add model download to Dockerfile (see optimization section).
### Problem: HTTPS certificate error
```
SSL Certificate verification failed
```
**Fix:** Runpod handles this automatically, should not occur.
---
## 📈 Monitoring & Alerts
### Set Up Alerts (Optional)
1. Go to Runpod **Billing** tab
2. Set max spend limit
3. Enable email alerts
### Check Status
```bash
# Query your endpoint
curl -X POST https://your-endpoint-id.runpod-pods.net/health
# If it fails, pod may be down
# Check Runpod dashboard for status
```
---
## 🔄 Rollback Plan
If deployment has issues:
1. **Keep previous image tagged**
```bash
docker tag galbendavids/feedback-analysis:v1 galbendavids/feedback-analysis:latest-stable
docker push galbendavids/feedback-analysis:latest-stable
```
2. **If new deployment fails, revert**
- Update Runpod template back to `latest-stable`
- Restart pod
- Investigate issue locally
3. **Don't delete old pods immediately**
- Keep for at least 1 day
- Then delete if new version stable
---
## 🎯 Testing Checklist Before Going Live
Before sharing endpoint with users:
- [ ] `/health` endpoint responds
- [ ] `/query` endpoint returns results
- [ ] Hebrew queries work correctly
- [ ] Response times acceptable (<5s for most queries)
- [ ] Error handling working (try invalid JSON)
- [ ] Swagger UI accessible at `/docs`
- [ ] SSL/HTTPS working (URL is secure)
- [ ] Logs show no errors
- [ ] Auto-scaling responding to load
---
## 📋 Production Deployment Checklist
Before announcing to users:
- [ ] Load tested with 100+ concurrent requests
- [ ] Backup plan documented
- [ ] Monitoring alerts set up
- [ ] Support procedure documented
- [ ] SLA defined (99.9% uptime target, etc.)
- [ ] Rate limiting configured (optional)
- [ ] API key authentication enforced (optional)
- [ ] CORS settings reviewed
- [ ] Backup of deployment config saved
- [ ] Runpod support ticket submitted for any questions
---
## 📞 Support & Resources
- **Runpod Docs:** https://docs.runpod.io
- **Runpod Community:** https://forums.runpod.io
- **FastAPI Docs:** https://fastapi.tiangolo.com
- **Docker Docs:** https://docs.docker.com
---
## 🎓 What's Next
After successful deployment:
1. **Monitor the endpoint** - Check logs daily
2. **Gather feedback** - What works well, what needs improvement
3. **Iterate** - Make improvements, redeploy
4. **Scale** - Add more features, more data
5. **Secure** - Add authentication, rate limiting as needed
---
## ✅ Congratulations!
Your SQL-based feedback analysis agent is now live in the cloud! 🎉
**Summary:**
- ✅ Local validation complete
- ✅ Docker image built
- ✅ Deployed to Runpod
- ✅ Cloud endpoint tested
- ✅ Ready for production
**Next:** Share the endpoint URL with users or integrate into your application.
---
*Last Updated: Today*
*Version: 1.0*
*Status: Production Ready* ✨