Spaces:

galbendavids
/

feedback-analysis-agent

Sleeping

App Files Files Community

feedback-analysis-agent / 0_preprocessing /DEPLOYMENT_GUIDE.md

galbendavids

עדכון: הסרת RAG, הוספת ארכיטקטורה מפורטת, תיקון לינקים, שינוי שם פרויקט ל-SQL-based

f073efc 3 months ago

preview code

raw

history blame contribute delete

10.9 kB

	# Deployment Guide - Runpod Cloud

	After local testing is complete, follow this guide to deploy your Feedback Analysis Agent to Runpod.

	---

	## ✅ Pre-Deployment Checklist

	Before deploying to Runpod, ensure:

	- [ ] All local tests pass: `python3 scripts/validate_local.py` shows 7/7 ✅
	- [ ] API server runs locally: `python3 run.py` starts without errors
	- [ ] Endpoints tested: Use TESTING_CHECKLIST.md or curl commands
	- [ ] Git repository clean: `git status` shows no uncommitted changes
	- [ ] All code committed: `git log --oneline \| head -5` shows your commits
	- [ ] Docker image builds: `docker build -t feedback-analysis:latest .` succeeds
	- [ ] Requirements.txt updated: All dependencies listed

	---

	## 📦 Step 1: Prepare Docker Image

	### 1.1 Build Docker Image Locally

	```bash
	cd /Users/galbd/Desktop/personal/software/ai_agent_gov/Feedback_Analysis_RAG_Agent_runpod

	# Build the image
	docker build -t feedback-analysis:latest .

	# Verify it built
	docker images \| grep feedback-analysis
	```

	Expected output:
	```
	REPOSITORY TAG IMAGE ID CREATED SIZE
	feedback-analysis latest abc123def456 2 minutes ago 2.5GB
	```

	### 1.2 Test Docker Image Locally (Optional)

	```bash
	# Run container
	docker run -p 8001:8000 feedback-analysis:latest

	# In another terminal, test
	curl -X POST http://localhost:8001/health
	```

	Expected: `{"status":"ok"}`

	---

	## 🔑 Step 2: Set Up Docker Registry

	### Option A: Docker Hub (Easiest)

	2A.1 Create Docker Hub Account
	- Go to https://hub.docker.com
	- Sign up for free account
	- Note your username (e.g., `galbendavids`)

	2A.2 Login to Docker
	```bash
	docker login
	# Enter your Docker Hub username and password
	```

	2A.3 Tag and Push Image
	```bash
	# Tag with your Docker Hub username
	docker tag feedback-analysis:latest galbendavids/feedback-analysis:latest

	# Push to Docker Hub
	docker push galbendavids/feedback-analysis:latest

	# Verify it's uploaded
	# Visit https://hub.docker.com/r/YOUR_USERNAME/feedback-analysis
	```

	### Option B: Private Registry (Advanced)
	- Use AWS ECR, Google Container Registry, or Azure Container Registry
	- Follow their documentation for authentication and push

	---

	## 🚀 Step 3: Create Runpod Template

	### 3.1 Access Runpod Console

	1. Go to https://www.runpod.io
	2. Sign in to your account (create if needed)
	3. Click "Console" in top menu
	4. Go to "Serverless" or "Pods" section

	### 3.2 Create New Template

	For Serverless Endpoints (Recommended):

	1. Click "Create New" → "API Endpoint Template"
	2. Fill in:
	- Template Name: `feedback-analysis-sql`
	- Docker Image: `galbendavids/feedback-analysis:latest`
	- Ports: `8000`
	- GPU: None (CPU-only is fine)
	- Memory: 4GB minimum
	- Environment Variables:
	```
	GEMINI_API_KEY=your_key_here (optional)
	OPENAI_API_KEY=sk-... (optional)
	```

	3. Click "Save Template"

	For Pods (Traditional VM):

	1. Click "Create" → "New Pod"
	2. Select template
	3. Choose GPU type (optional, not needed for this workload)
	4. Set min/max auto-scale settings
	5. Click "Run Pod"

	### 3.3 Configure Networking

	- Expose Port: 8000
	- HTTPS: Enabled automatically
	- Public URL: Runpod generates automatically

	---

	## 🧪 Step 4: Test Deployed Endpoint

	### 4.1 Get Endpoint URL

	After deployment, Runpod provides a URL like:
	```
	https://your-endpoint-id.runpod-pods.net/
	```

	Or for Serverless:
	```
	https://api.runpod.io/v1/YOUR_ENDPOINT_ID/run
	```

	### 4.2 Test Basic Connectivity

	```bash
	# For Pods (direct connection)
	curl -X POST https://your-endpoint-id.runpod-pods.net/health

	# For Serverless (requires different format)
	# See Runpod API documentation
	```

	Expected response:
	```json
	{"status":"ok"}
	```

	### 4.3 Test Query Endpoint

	```bash
	curl -X POST https://your-endpoint-id.runpod-pods.net/query \
	-H "Content-Type: application/json" \
	-d '{"query":"כמה משתמשים כתבו תודה","top_k":5}'
	```

	Expected response:
	```json
	{
	"query": "כמה משתמשים כתבו תודה",
	"summary": "1168 משובים מכילים ביטויי תודה.",
	"results": [...]
	}
	```

	### 4.4 Test All Endpoints

	Use the same curl commands from TESTING_CHECKLIST.md, but replace:
	- `http://localhost:8000` → `https://your-endpoint-id.runpod-pods.net`

	Or use Swagger UI:
	- `https://your-endpoint-id.runpod-pods.net/docs`

	---

	## 💰 Step 5: Configure Auto-Scaling (Optional)

	In Runpod Pod settings:

	1. Minimum GPUs: 0 (not needed)
	2. Maximum GPUs: 1 (if you add GPU support)
	3. Idle timeout: 5 minutes
	4. Auto-pause: Enabled (to save costs)

	---

	## 🔐 Step 6: Add API Keys (Optional)

	If you want LLM summaries (not required, system works without):

	### 6.1 In Runpod Dashboard

	1. Go to Pod settings
	2. Add Environment Variables:
	```
	GEMINI_API_KEY=your_actual_key
	OPENAI_API_KEY=sk-your_actual_key
	```
	3. Restart pod

	### 6.2 Get API Keys

	For Google Gemini:
	1. Go to https://makersuite.google.com/app/apikeys
	2. Click "Create API Key"
	3. Copy the key

	For OpenAI:
	1. Go to https://platform.openai.com/api-keys
	2. Create new secret key
	3. Copy the key

	---

	## 📊 Step 7: Monitor & Manage

	### 7.1 Check Logs

	In Runpod dashboard:
	1. Click on your pod/endpoint
	2. View Logs tab
	3. Look for errors or warnings

	### 7.2 Performance Metrics

	Monitor:
	- CPU usage: Should be <50% at rest
	- Memory: Should be <80% usage
	- Response times: Query endpoint 1-3 seconds
	- Uptime: Should be 99%+

	### 7.3 Scale & Pricing

	- Auto-scaling: Runpod manages based on demand
	- Costs: Typically $0.25-$0.50/hour for 4GB CPU-only pod
	- Savings: Pod auto-pauses when idle (no charge)

	---

	## 🔄 Step 8: Update Deployment

	### When You Update Code

	1. Make changes locally
	```bash
	# Edit code, test locally
	git add .
	git commit -m "feat: new feature"
	git push origin main
	```

	2. Rebuild Docker image
	```bash
	docker build -t feedback-analysis:v2 .
	docker tag feedback-analysis:v2 galbendavids/feedback-analysis:v2
	docker push galbendavids/feedback-analysis:v2
	```

	3. Update Runpod template
	- Edit template image: `galbendavids/feedback-analysis:v2`
	- Save
	- Restart pod with new image

	4. Or redeploy
	- Delete old pod
	- Create new pod from updated template

	---

	## ✨ Advanced: Optimization for Cloud

	### A. Pre-download Models in Dockerfile

	To avoid long first-request delays in cloud, add to Dockerfile:

	```dockerfile
	# After RUN pip install requirements.txt

	# Pre-download embedding model
	RUN python3 -c "from sentence_transformers import SentenceTransformer; SentenceTransformer('paraphrase-multilingual-MiniLM-L12-v2')"

	# Pre-download sentiment model
	RUN python3 -c "from transformers import pipeline; pipeline('sentiment-analysis', model='nlptown/bert-base-multilingual-uncased-sentiment')"
	```

	This adds ~2GB to image, but eliminates download on first request.

	### B. Use GPU for Faster Embeddings

	```dockerfile
	# Install GPU support
	RUN pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
	RUN pip install faiss-gpu
	```

	Then in Runpod, select a GPU pod (more expensive but faster).

	### C. Enable Caching

	Add to `app/config.py`:
	```python
	EMBEDDING_CACHE_SIZE = 10000 # Cache more embeddings
	INDEX_RELOAD_INTERVAL = 3600 # Reload index hourly
	```

	---

	## 🐛 Troubleshooting

	### Problem: Pod won't start
	```
	Error: Container failed to start
	```
	Fix: Check Dockerfile syntax and ensure image builds locally first.

	### Problem: Out of memory
	```
	OOMKilled or similar
	```
	Fix: Increase allocated memory in pod settings (go from 4GB to 8GB).

	### Problem: Slow responses
	```
	Queries taking >10 seconds
	```
	Fix:
	- Add GPU support
	- Pre-download models (see optimization section)
	- Increase allocated CPU cores

	### Problem: Model not found
	```
	Error: Model 'xyz' not found
	```
	Fix: Add model download to Dockerfile (see optimization section).

	### Problem: HTTPS certificate error
	```
	SSL Certificate verification failed
	```
	Fix: Runpod handles this automatically, should not occur.

	---

	## 📈 Monitoring & Alerts

	### Set Up Alerts (Optional)

	1. Go to Runpod Billing tab
	2. Set max spend limit
	3. Enable email alerts

	### Check Status

	```bash
	# Query your endpoint
	curl -X POST https://your-endpoint-id.runpod-pods.net/health

	# If it fails, pod may be down
	# Check Runpod dashboard for status
	```

	---

	## 🔄 Rollback Plan

	If deployment has issues:

	1. Keep previous image tagged
	```bash
	docker tag galbendavids/feedback-analysis:v1 galbendavids/feedback-analysis:latest-stable
	docker push galbendavids/feedback-analysis:latest-stable
	```

	2. If new deployment fails, revert
	- Update Runpod template back to `latest-stable`
	- Restart pod
	- Investigate issue locally

	3. Don't delete old pods immediately
	- Keep for at least 1 day
	- Then delete if new version stable

	---

	## 🎯 Testing Checklist Before Going Live

	Before sharing endpoint with users:

	- [ ] `/health` endpoint responds
	- [ ] `/query` endpoint returns results
	- [ ] Hebrew queries work correctly
	- [ ] Response times acceptable (<5s for most queries)
	- [ ] Error handling working (try invalid JSON)
	- [ ] Swagger UI accessible at `/docs`
	- [ ] SSL/HTTPS working (URL is secure)
	- [ ] Logs show no errors
	- [ ] Auto-scaling responding to load

	---

	## 📋 Production Deployment Checklist

	Before announcing to users:

	- [ ] Load tested with 100+ concurrent requests
	- [ ] Backup plan documented
	- [ ] Monitoring alerts set up
	- [ ] Support procedure documented
	- [ ] SLA defined (99.9% uptime target, etc.)
	- [ ] Rate limiting configured (optional)
	- [ ] API key authentication enforced (optional)
	- [ ] CORS settings reviewed
	- [ ] Backup of deployment config saved
	- [ ] Runpod support ticket submitted for any questions

	---

	## 📞 Support & Resources

	- Runpod Docs: https://docs.runpod.io
	- Runpod Community: https://forums.runpod.io
	- FastAPI Docs: https://fastapi.tiangolo.com
	- Docker Docs: https://docs.docker.com

	---

	## 🎓 What's Next

	After successful deployment:

	1. Monitor the endpoint - Check logs daily
	2. Gather feedback - What works well, what needs improvement
	3. Iterate - Make improvements, redeploy
	4. Scale - Add more features, more data
	5. Secure - Add authentication, rate limiting as needed

	---

	## ✅ Congratulations!

	Your SQL-based feedback analysis agent is now live in the cloud! 🎉

	Summary:
	- ✅ Local validation complete
	- ✅ Docker image built
	- ✅ Deployed to Runpod
	- ✅ Cloud endpoint tested
	- ✅ Ready for production

	Next: Share the endpoint URL with users or integrate into your application.

	---

	Last Updated: Today
	Version: 1.0
	Status: Production Ready ✨