Deployment Guide - Runpod Cloud
After local testing is complete, follow this guide to deploy your Feedback Analysis Agent to Runpod.
✅ Pre-Deployment Checklist
Before deploying to Runpod, ensure:
- All local tests pass:
python3 scripts/validate_local.pyshows 7/7 ✅ - API server runs locally:
python3 run.pystarts without errors - Endpoints tested: Use TESTING_CHECKLIST.md or curl commands
- Git repository clean:
git statusshows no uncommitted changes - All code committed:
git log --oneline | head -5shows your commits - Docker image builds:
docker build -t feedback-analysis:latest .succeeds - Requirements.txt updated: All dependencies listed
📦 Step 1: Prepare Docker Image
1.1 Build Docker Image Locally
cd /Users/galbd/Desktop/personal/software/ai_agent_gov/Feedback_Analysis_RAG_Agent_runpod
# Build the image
docker build -t feedback-analysis:latest .
# Verify it built
docker images | grep feedback-analysis
Expected output:
REPOSITORY TAG IMAGE ID CREATED SIZE
feedback-analysis latest abc123def456 2 minutes ago 2.5GB
1.2 Test Docker Image Locally (Optional)
# Run container
docker run -p 8001:8000 feedback-analysis:latest
# In another terminal, test
curl -X POST http://localhost:8001/health
Expected: {"status":"ok"}
🔑 Step 2: Set Up Docker Registry
Option A: Docker Hub (Easiest)
2A.1 Create Docker Hub Account
- Go to https://hub.docker.com
- Sign up for free account
- Note your username (e.g.,
galbendavids)
2A.2 Login to Docker
docker login
# Enter your Docker Hub username and password
2A.3 Tag and Push Image
# Tag with your Docker Hub username
docker tag feedback-analysis:latest galbendavids/feedback-analysis:latest
# Push to Docker Hub
docker push galbendavids/feedback-analysis:latest
# Verify it's uploaded
# Visit https://hub.docker.com/r/YOUR_USERNAME/feedback-analysis
Option B: Private Registry (Advanced)
- Use AWS ECR, Google Container Registry, or Azure Container Registry
- Follow their documentation for authentication and push
🚀 Step 3: Create Runpod Template
3.1 Access Runpod Console
- Go to https://www.runpod.io
- Sign in to your account (create if needed)
- Click "Console" in top menu
- Go to "Serverless" or "Pods" section
3.2 Create New Template
For Serverless Endpoints (Recommended):
Click "Create New" → "API Endpoint Template"
Fill in:
- Template Name:
feedback-analysis-sql - Docker Image:
galbendavids/feedback-analysis:latest - Ports:
8000 - GPU: None (CPU-only is fine)
- Memory: 4GB minimum
- Environment Variables:
GEMINI_API_KEY=your_key_here (optional) OPENAI_API_KEY=sk-... (optional)
- Template Name:
Click "Save Template"
For Pods (Traditional VM):
- Click "Create" → "New Pod"
- Select template
- Choose GPU type (optional, not needed for this workload)
- Set min/max auto-scale settings
- Click "Run Pod"
3.3 Configure Networking
- Expose Port: 8000
- HTTPS: Enabled automatically
- Public URL: Runpod generates automatically
🧪 Step 4: Test Deployed Endpoint
4.1 Get Endpoint URL
After deployment, Runpod provides a URL like:
https://your-endpoint-id.runpod-pods.net/
Or for Serverless:
https://api.runpod.io/v1/YOUR_ENDPOINT_ID/run
4.2 Test Basic Connectivity
# For Pods (direct connection)
curl -X POST https://your-endpoint-id.runpod-pods.net/health
# For Serverless (requires different format)
# See Runpod API documentation
Expected response:
{"status":"ok"}
4.3 Test Query Endpoint
curl -X POST https://your-endpoint-id.runpod-pods.net/query \
-H "Content-Type: application/json" \
-d '{"query":"כמה משתמשים כתבו תודה","top_k":5}'
Expected response:
{
"query": "כמה משתמשים כתבו תודה",
"summary": "1168 משובים מכילים ביטויי תודה.",
"results": [...]
}
4.4 Test All Endpoints
Use the same curl commands from TESTING_CHECKLIST.md, but replace:
http://localhost:8000→https://your-endpoint-id.runpod-pods.net
Or use Swagger UI:
https://your-endpoint-id.runpod-pods.net/docs
💰 Step 5: Configure Auto-Scaling (Optional)
In Runpod Pod settings:
- Minimum GPUs: 0 (not needed)
- Maximum GPUs: 1 (if you add GPU support)
- Idle timeout: 5 minutes
- Auto-pause: Enabled (to save costs)
🔐 Step 6: Add API Keys (Optional)
If you want LLM summaries (not required, system works without):
6.1 In Runpod Dashboard
- Go to Pod settings
- Add Environment Variables:
GEMINI_API_KEY=your_actual_key OPENAI_API_KEY=sk-your_actual_key - Restart pod
6.2 Get API Keys
For Google Gemini:
- Go to https://makersuite.google.com/app/apikeys
- Click "Create API Key"
- Copy the key
For OpenAI:
- Go to https://platform.openai.com/api-keys
- Create new secret key
- Copy the key
📊 Step 7: Monitor & Manage
7.1 Check Logs
In Runpod dashboard:
- Click on your pod/endpoint
- View Logs tab
- Look for errors or warnings
7.2 Performance Metrics
Monitor:
- CPU usage: Should be <50% at rest
- Memory: Should be <80% usage
- Response times: Query endpoint 1-3 seconds
- Uptime: Should be 99%+
7.3 Scale & Pricing
- Auto-scaling: Runpod manages based on demand
- Costs: Typically $0.25-$0.50/hour for 4GB CPU-only pod
- Savings: Pod auto-pauses when idle (no charge)
🔄 Step 8: Update Deployment
When You Update Code
Make changes locally
# Edit code, test locally git add . git commit -m "feat: new feature" git push origin mainRebuild Docker image
docker build -t feedback-analysis:v2 . docker tag feedback-analysis:v2 galbendavids/feedback-analysis:v2 docker push galbendavids/feedback-analysis:v2Update Runpod template
- Edit template image:
galbendavids/feedback-analysis:v2 - Save
- Restart pod with new image
- Edit template image:
Or redeploy
- Delete old pod
- Create new pod from updated template
✨ Advanced: Optimization for Cloud
A. Pre-download Models in Dockerfile
To avoid long first-request delays in cloud, add to Dockerfile:
# After RUN pip install requirements.txt
# Pre-download embedding model
RUN python3 -c "from sentence_transformers import SentenceTransformer; SentenceTransformer('paraphrase-multilingual-MiniLM-L12-v2')"
# Pre-download sentiment model
RUN python3 -c "from transformers import pipeline; pipeline('sentiment-analysis', model='nlptown/bert-base-multilingual-uncased-sentiment')"
This adds ~2GB to image, but eliminates download on first request.
B. Use GPU for Faster Embeddings
# Install GPU support
RUN pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
RUN pip install faiss-gpu
Then in Runpod, select a GPU pod (more expensive but faster).
C. Enable Caching
Add to app/config.py:
EMBEDDING_CACHE_SIZE = 10000 # Cache more embeddings
INDEX_RELOAD_INTERVAL = 3600 # Reload index hourly
🐛 Troubleshooting
Problem: Pod won't start
Error: Container failed to start
Fix: Check Dockerfile syntax and ensure image builds locally first.
Problem: Out of memory
OOMKilled or similar
Fix: Increase allocated memory in pod settings (go from 4GB to 8GB).
Problem: Slow responses
Queries taking >10 seconds
Fix:
- Add GPU support
- Pre-download models (see optimization section)
- Increase allocated CPU cores
Problem: Model not found
Error: Model 'xyz' not found
Fix: Add model download to Dockerfile (see optimization section).
Problem: HTTPS certificate error
SSL Certificate verification failed
Fix: Runpod handles this automatically, should not occur.
📈 Monitoring & Alerts
Set Up Alerts (Optional)
- Go to Runpod Billing tab
- Set max spend limit
- Enable email alerts
Check Status
# Query your endpoint
curl -X POST https://your-endpoint-id.runpod-pods.net/health
# If it fails, pod may be down
# Check Runpod dashboard for status
🔄 Rollback Plan
If deployment has issues:
Keep previous image tagged
docker tag galbendavids/feedback-analysis:v1 galbendavids/feedback-analysis:latest-stable docker push galbendavids/feedback-analysis:latest-stableIf new deployment fails, revert
- Update Runpod template back to
latest-stable - Restart pod
- Investigate issue locally
- Update Runpod template back to
Don't delete old pods immediately
- Keep for at least 1 day
- Then delete if new version stable
🎯 Testing Checklist Before Going Live
Before sharing endpoint with users:
-
/healthendpoint responds -
/queryendpoint returns results - Hebrew queries work correctly
- Response times acceptable (<5s for most queries)
- Error handling working (try invalid JSON)
- Swagger UI accessible at
/docs - SSL/HTTPS working (URL is secure)
- Logs show no errors
- Auto-scaling responding to load
📋 Production Deployment Checklist
Before announcing to users:
- Load tested with 100+ concurrent requests
- Backup plan documented
- Monitoring alerts set up
- Support procedure documented
- SLA defined (99.9% uptime target, etc.)
- Rate limiting configured (optional)
- API key authentication enforced (optional)
- CORS settings reviewed
- Backup of deployment config saved
- Runpod support ticket submitted for any questions
📞 Support & Resources
- Runpod Docs: https://docs.runpod.io
- Runpod Community: https://forums.runpod.io
- FastAPI Docs: https://fastapi.tiangolo.com
- Docker Docs: https://docs.docker.com
🎓 What's Next
After successful deployment:
- Monitor the endpoint - Check logs daily
- Gather feedback - What works well, what needs improvement
- Iterate - Make improvements, redeploy
- Scale - Add more features, more data
- Secure - Add authentication, rate limiting as needed
✅ Congratulations!
Your SQL-based feedback analysis agent is now live in the cloud! 🎉
Summary:
- ✅ Local validation complete
- ✅ Docker image built
- ✅ Deployed to Runpod
- ✅ Cloud endpoint tested
- ✅ Ready for production
Next: Share the endpoint URL with users or integrate into your application.
Last Updated: Today
Version: 1.0
Status: Production Ready ✨