Spaces:

galbendavids
/

feedback-analysis-agent

Sleeping

File size: 10,913 Bytes

# Deployment Guide - Runpod Cloud

After local testing is complete, follow this guide to deploy your Feedback Analysis Agent to Runpod.

---

## ✅ Pre-Deployment Checklist

Before deploying to Runpod, ensure:

- [ ] All local tests pass: `python3 scripts/validate_local.py` shows 7/7 ✅
- [ ] API server runs locally: `python3 run.py` starts without errors
- [ ] Endpoints tested: Use TESTING_CHECKLIST.md or curl commands
- [ ] Git repository clean: `git status` shows no uncommitted changes
- [ ] All code committed: `git log --oneline | head -5` shows your commits
- [ ] Docker image builds: `docker build -t feedback-analysis:latest .` succeeds
- [ ] Requirements.txt updated: All dependencies listed

---

## 📦 Step 1: Prepare Docker Image

### 1.1 Build Docker Image Locally

```bash
cd /Users/galbd/Desktop/personal/software/ai_agent_gov/Feedback_Analysis_RAG_Agent_runpod

# Build the image
docker build -t feedback-analysis:latest .

# Verify it built
docker images | grep feedback-analysis
```

**Expected output:**
```
REPOSITORY           TAG      IMAGE ID      CREATED        SIZE
feedback-analysis    latest   abc123def456  2 minutes ago   2.5GB
```

### 1.2 Test Docker Image Locally (Optional)

```bash
# Run container
docker run -p 8001:8000 feedback-analysis:latest

# In another terminal, test
curl -X POST http://localhost:8001/health
```

**Expected:** `{"status":"ok"}`

---

## 🔑 Step 2: Set Up Docker Registry

### Option A: Docker Hub (Easiest)

**2A.1 Create Docker Hub Account**
- Go to https://hub.docker.com
- Sign up for free account
- Note your username (e.g., `galbendavids`)

**2A.2 Login to Docker**
```bash
docker login
# Enter your Docker Hub username and password
```

**2A.3 Tag and Push Image**
```bash
# Tag with your Docker Hub username
docker tag feedback-analysis:latest galbendavids/feedback-analysis:latest

# Push to Docker Hub
docker push galbendavids/feedback-analysis:latest

# Verify it's uploaded
# Visit https://hub.docker.com/r/YOUR_USERNAME/feedback-analysis
```

### Option B: Private Registry (Advanced)
- Use AWS ECR, Google Container Registry, or Azure Container Registry
- Follow their documentation for authentication and push

---

## 🚀 Step 3: Create Runpod Template

### 3.1 Access Runpod Console

1. Go to https://www.runpod.io
2. Sign in to your account (create if needed)
3. Click **"Console"** in top menu
4. Go to **"Serverless"** or **"Pods"** section

### 3.2 Create New Template

**For Serverless Endpoints (Recommended):**

1. Click **"Create New"** → **"API Endpoint Template"**
2. Fill in:
   - **Template Name:** `feedback-analysis-sql`
   - **Docker Image:** `galbendavids/feedback-analysis:latest`
   - **Ports:** `8000`
   - **GPU:** None (CPU-only is fine)
   - **Memory:** 4GB minimum
   - **Environment Variables:**
     ```
     GEMINI_API_KEY=your_key_here (optional)
     OPENAI_API_KEY=sk-... (optional)
     ```

3. Click **"Save Template"**

**For Pods (Traditional VM):**

1. Click **"Create"** → **"New Pod"**
2. Select template
3. Choose GPU type (optional, not needed for this workload)
4. Set min/max auto-scale settings
5. Click **"Run Pod"**

### 3.3 Configure Networking

- **Expose Port:** 8000
- **HTTPS:** Enabled automatically
- **Public URL:** Runpod generates automatically

---

## 🧪 Step 4: Test Deployed Endpoint

### 4.1 Get Endpoint URL

After deployment, Runpod provides a URL like:
```
https://your-endpoint-id.runpod-pods.net/
```

Or for Serverless:
```
https://api.runpod.io/v1/YOUR_ENDPOINT_ID/run
```

### 4.2 Test Basic Connectivity

```bash
# For Pods (direct connection)
curl -X POST https://your-endpoint-id.runpod-pods.net/health

# For Serverless (requires different format)
# See Runpod API documentation
```

**Expected response:**
```json
{"status":"ok"}
```

### 4.3 Test Query Endpoint

```bash
curl -X POST https://your-endpoint-id.runpod-pods.net/query \
  -H "Content-Type: application/json" \
  -d '{"query":"כמה משתמשים כתבו תודה","top_k":5}'
```

**Expected response:**
```json
{
  "query": "כמה משתמשים כתבו תודה",
  "summary": "1168 משובים מכילים ביטויי תודה.",
  "results": [...]
}
```

### 4.4 Test All Endpoints

Use the same curl commands from TESTING_CHECKLIST.md, but replace:
- `http://localhost:8000` → `https://your-endpoint-id.runpod-pods.net`

Or use Swagger UI:
- `https://your-endpoint-id.runpod-pods.net/docs`

---

## 💰 Step 5: Configure Auto-Scaling (Optional)

In Runpod Pod settings:

1. **Minimum GPUs:** 0 (not needed)
2. **Maximum GPUs:** 1 (if you add GPU support)
3. **Idle timeout:** 5 minutes
4. **Auto-pause:** Enabled (to save costs)

---

## 🔐 Step 6: Add API Keys (Optional)

If you want LLM summaries (not required, system works without):

### 6.1 In Runpod Dashboard

1. Go to Pod settings
2. Add Environment Variables:
   ```
   GEMINI_API_KEY=your_actual_key
   OPENAI_API_KEY=sk-your_actual_key
   ```
3. Restart pod

### 6.2 Get API Keys

**For Google Gemini:**
1. Go to https://makersuite.google.com/app/apikeys
2. Click "Create API Key"
3. Copy the key

**For OpenAI:**
1. Go to https://platform.openai.com/api-keys
2. Create new secret key
3. Copy the key

---

## 📊 Step 7: Monitor & Manage

### 7.1 Check Logs

In Runpod dashboard:
1. Click on your pod/endpoint
2. View **Logs** tab
3. Look for errors or warnings

### 7.2 Performance Metrics

Monitor:
- **CPU usage:** Should be <50% at rest
- **Memory:** Should be <80% usage
- **Response times:** Query endpoint 1-3 seconds
- **Uptime:** Should be 99%+

### 7.3 Scale & Pricing

- **Auto-scaling:** Runpod manages based on demand
- **Costs:** Typically $0.25-$0.50/hour for 4GB CPU-only pod
- **Savings:** Pod auto-pauses when idle (no charge)

---

## 🔄 Step 8: Update Deployment

### When You Update Code

1. **Make changes locally**
   ```bash
   # Edit code, test locally
   git add .
   git commit -m "feat: new feature"
   git push origin main
   ```

2. **Rebuild Docker image**
   ```bash
   docker build -t feedback-analysis:v2 .
   docker tag feedback-analysis:v2 galbendavids/feedback-analysis:v2
   docker push galbendavids/feedback-analysis:v2
   ```

3. **Update Runpod template**
   - Edit template image: `galbendavids/feedback-analysis:v2`
   - Save
   - Restart pod with new image

4. **Or redeploy**
   - Delete old pod
   - Create new pod from updated template

---

## ✨ Advanced: Optimization for Cloud

### A. Pre-download Models in Dockerfile

To avoid long first-request delays in cloud, add to Dockerfile:

```dockerfile
# After RUN pip install requirements.txt

# Pre-download embedding model
RUN python3 -c "from sentence_transformers import SentenceTransformer; SentenceTransformer('paraphrase-multilingual-MiniLM-L12-v2')"

# Pre-download sentiment model
RUN python3 -c "from transformers import pipeline; pipeline('sentiment-analysis', model='nlptown/bert-base-multilingual-uncased-sentiment')"
```

This adds ~2GB to image, but eliminates download on first request.

### B. Use GPU for Faster Embeddings

```dockerfile
# Install GPU support
RUN pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
RUN pip install faiss-gpu
```

Then in Runpod, select a GPU pod (more expensive but faster).

### C. Enable Caching

Add to `app/config.py`:
```python
EMBEDDING_CACHE_SIZE = 10000  # Cache more embeddings
INDEX_RELOAD_INTERVAL = 3600  # Reload index hourly
```

---

## 🐛 Troubleshooting

### Problem: Pod won't start
```
Error: Container failed to start
```
**Fix:** Check Dockerfile syntax and ensure image builds locally first.

### Problem: Out of memory
```
OOMKilled or similar
```
**Fix:** Increase allocated memory in pod settings (go from 4GB to 8GB).

### Problem: Slow responses
```
Queries taking >10 seconds
```
**Fix:** 
- Add GPU support
- Pre-download models (see optimization section)
- Increase allocated CPU cores

### Problem: Model not found
```
Error: Model 'xyz' not found
```
**Fix:** Add model download to Dockerfile (see optimization section).

### Problem: HTTPS certificate error
```
SSL Certificate verification failed
```
**Fix:** Runpod handles this automatically, should not occur.

---

## 📈 Monitoring & Alerts

### Set Up Alerts (Optional)

1. Go to Runpod **Billing** tab
2. Set max spend limit
3. Enable email alerts

### Check Status

```bash
# Query your endpoint
curl -X POST https://your-endpoint-id.runpod-pods.net/health

# If it fails, pod may be down
# Check Runpod dashboard for status
```

---

## 🔄 Rollback Plan

If deployment has issues:

1. **Keep previous image tagged**
   ```bash
   docker tag galbendavids/feedback-analysis:v1 galbendavids/feedback-analysis:latest-stable
   docker push galbendavids/feedback-analysis:latest-stable
   ```

2. **If new deployment fails, revert**
   - Update Runpod template back to `latest-stable`
   - Restart pod
   - Investigate issue locally

3. **Don't delete old pods immediately**
   - Keep for at least 1 day
   - Then delete if new version stable

---

## 🎯 Testing Checklist Before Going Live

Before sharing endpoint with users:

- [ ] `/health` endpoint responds
- [ ] `/query` endpoint returns results
- [ ] Hebrew queries work correctly
- [ ] Response times acceptable (<5s for most queries)
- [ ] Error handling working (try invalid JSON)
- [ ] Swagger UI accessible at `/docs`
- [ ] SSL/HTTPS working (URL is secure)
- [ ] Logs show no errors
- [ ] Auto-scaling responding to load

---

## 📋 Production Deployment Checklist

Before announcing to users:

- [ ] Load tested with 100+ concurrent requests
- [ ] Backup plan documented
- [ ] Monitoring alerts set up
- [ ] Support procedure documented
- [ ] SLA defined (99.9% uptime target, etc.)
- [ ] Rate limiting configured (optional)
- [ ] API key authentication enforced (optional)
- [ ] CORS settings reviewed
- [ ] Backup of deployment config saved
- [ ] Runpod support ticket submitted for any questions

---

## 📞 Support & Resources

- **Runpod Docs:** https://docs.runpod.io
- **Runpod Community:** https://forums.runpod.io
- **FastAPI Docs:** https://fastapi.tiangolo.com
- **Docker Docs:** https://docs.docker.com

---

## 🎓 What's Next

After successful deployment:

1. **Monitor the endpoint** - Check logs daily
2. **Gather feedback** - What works well, what needs improvement
3. **Iterate** - Make improvements, redeploy
4. **Scale** - Add more features, more data
5. **Secure** - Add authentication, rate limiting as needed

---

## ✅ Congratulations!

Your SQL-based feedback analysis agent is now live in the cloud! 🎉

**Summary:**
- ✅ Local validation complete
- ✅ Docker image built
- ✅ Deployed to Runpod
- ✅ Cloud endpoint tested
- ✅ Ready for production

**Next:** Share the endpoint URL with users or integrate into your application.

---

*Last Updated: Today*  
*Version: 1.0*  
*Status: Production Ready* ✨