Spaces:

galbendavids
/

feedback-analysis-agent

Sleeping

App Files Files Community

feedback-analysis-agent / 0_preprocessing /DEPLOYMENT_GUIDE.md

galbendavids

עדכון: הסרת RAG, הוספת ארכיטקטורה מפורטת, תיקון לינקים, שינוי שם פרויקט ל-SQL-based

f073efc 3 months ago

preview code

raw

history blame contribute delete

10.9 kB

Deployment Guide - Runpod Cloud

After local testing is complete, follow this guide to deploy your Feedback Analysis Agent to Runpod.

✅ Pre-Deployment Checklist

Before deploying to Runpod, ensure:

All local tests pass: python3 scripts/validate_local.py shows 7/7 ✅
API server runs locally: python3 run.py starts without errors
Endpoints tested: Use TESTING_CHECKLIST.md or curl commands
Git repository clean: git status shows no uncommitted changes
All code committed: git log --oneline | head -5 shows your commits
Docker image builds: docker build -t feedback-analysis:latest . succeeds
Requirements.txt updated: All dependencies listed

📦 Step 1: Prepare Docker Image

1.1 Build Docker Image Locally

cd /Users/galbd/Desktop/personal/software/ai_agent_gov/Feedback_Analysis_RAG_Agent_runpod

# Build the image
docker build -t feedback-analysis:latest .

# Verify it built
docker images | grep feedback-analysis

Expected output:

REPOSITORY           TAG      IMAGE ID      CREATED        SIZE
feedback-analysis    latest   abc123def456  2 minutes ago   2.5GB

1.2 Test Docker Image Locally (Optional)

# Run container
docker run -p 8001:8000 feedback-analysis:latest

# In another terminal, test
curl -X POST http://localhost:8001/health

Expected: {"status":"ok"}

🔑 Step 2: Set Up Docker Registry

Option A: Docker Hub (Easiest)

2A.1 Create Docker Hub Account

Go to https://hub.docker.com
Sign up for free account
Note your username (e.g., galbendavids)

2A.2 Login to Docker

docker login
# Enter your Docker Hub username and password

2A.3 Tag and Push Image

# Tag with your Docker Hub username
docker tag feedback-analysis:latest galbendavids/feedback-analysis:latest

# Push to Docker Hub
docker push galbendavids/feedback-analysis:latest

# Verify it's uploaded
# Visit https://hub.docker.com/r/YOUR_USERNAME/feedback-analysis

Option B: Private Registry (Advanced)

Use AWS ECR, Google Container Registry, or Azure Container Registry
Follow their documentation for authentication and push

🚀 Step 3: Create Runpod Template

3.1 Access Runpod Console

Go to https://www.runpod.io
Sign in to your account (create if needed)
Click "Console" in top menu
Go to "Serverless" or "Pods" section

3.2 Create New Template

For Serverless Endpoints (Recommended):

Click "Create New" → "API Endpoint Template"
Fill in:
- Template Name: feedback-analysis-sql
- Docker Image: galbendavids/feedback-analysis:latest
- Ports: 8000
- GPU: None (CPU-only is fine)
- Memory: 4GB minimum
- Environment Variables:
```
GEMINI_API_KEY=your_key_here (optional)
OPENAI_API_KEY=sk-... (optional)
```
Click "Save Template"

For Pods (Traditional VM):

Click "Create" → "New Pod"
Select template
Choose GPU type (optional, not needed for this workload)
Set min/max auto-scale settings
Click "Run Pod"

3.3 Configure Networking

Expose Port: 8000
HTTPS: Enabled automatically
Public URL: Runpod generates automatically

🧪 Step 4: Test Deployed Endpoint

4.1 Get Endpoint URL

After deployment, Runpod provides a URL like:

https://your-endpoint-id.runpod-pods.net/

Or for Serverless:

https://api.runpod.io/v1/YOUR_ENDPOINT_ID/run

4.2 Test Basic Connectivity

# For Pods (direct connection)
curl -X POST https://your-endpoint-id.runpod-pods.net/health

# For Serverless (requires different format)
# See Runpod API documentation

Expected response:

{"status":"ok"}

4.3 Test Query Endpoint

curl -X POST https://your-endpoint-id.runpod-pods.net/query \
  -H "Content-Type: application/json" \
  -d '{"query":"כמה משתמשים כתבו תודה","top_k":5}'

Expected response:

{
  "query": "כמה משתמשים כתבו תודה",
  "summary": "1168 משובים מכילים ביטויי תודה.",
  "results": [...]
}

4.4 Test All Endpoints

Use the same curl commands from TESTING_CHECKLIST.md, but replace:

http://localhost:8000 → https://your-endpoint-id.runpod-pods.net

Or use Swagger UI:

https://your-endpoint-id.runpod-pods.net/docs

💰 Step 5: Configure Auto-Scaling (Optional)

In Runpod Pod settings:

Minimum GPUs: 0 (not needed)
Maximum GPUs: 1 (if you add GPU support)
Idle timeout: 5 minutes
Auto-pause: Enabled (to save costs)

🔐 Step 6: Add API Keys (Optional)

If you want LLM summaries (not required, system works without):

6.1 In Runpod Dashboard

Go to Pod settings

Add Environment Variables:

GEMINI_API_KEY=your_actual_key
OPENAI_API_KEY=sk-your_actual_key

Restart pod

6.2 Get API Keys

For Google Gemini:

Go to https://makersuite.google.com/app/apikeys
Click "Create API Key"
Copy the key

For OpenAI:

Go to https://platform.openai.com/api-keys
Create new secret key
Copy the key

📊 Step 7: Monitor & Manage

7.1 Check Logs

In Runpod dashboard:

Click on your pod/endpoint
View Logs tab
Look for errors or warnings

7.2 Performance Metrics

Monitor:

CPU usage: Should be <50% at rest
Memory: Should be <80% usage
Response times: Query endpoint 1-3 seconds
Uptime: Should be 99%+

7.3 Scale & Pricing

Auto-scaling: Runpod manages based on demand
Costs: Typically $0.25-$0.50/hour for 4GB CPU-only pod
Savings: Pod auto-pauses when idle (no charge)

🔄 Step 8: Update Deployment

When You Update Code

Make changes locally

# Edit code, test locally
git add .
git commit -m "feat: new feature"
git push origin main

Rebuild Docker image

docker build -t feedback-analysis:v2 .
docker tag feedback-analysis:v2 galbendavids/feedback-analysis:v2
docker push galbendavids/feedback-analysis:v2

Update Runpod template
- Edit template image: galbendavids/feedback-analysis:v2
- Save
- Restart pod with new image
Or redeploy
- Delete old pod
- Create new pod from updated template

✨ Advanced: Optimization for Cloud

A. Pre-download Models in Dockerfile

To avoid long first-request delays in cloud, add to Dockerfile:

# After RUN pip install requirements.txt

# Pre-download embedding model
RUN python3 -c "from sentence_transformers import SentenceTransformer; SentenceTransformer('paraphrase-multilingual-MiniLM-L12-v2')"

# Pre-download sentiment model
RUN python3 -c "from transformers import pipeline; pipeline('sentiment-analysis', model='nlptown/bert-base-multilingual-uncased-sentiment')"

This adds ~2GB to image, but eliminates download on first request.

B. Use GPU for Faster Embeddings

# Install GPU support
RUN pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
RUN pip install faiss-gpu

Then in Runpod, select a GPU pod (more expensive but faster).

C. Enable Caching

Add to app/config.py:

EMBEDDING_CACHE_SIZE = 10000  # Cache more embeddings
INDEX_RELOAD_INTERVAL = 3600  # Reload index hourly

🐛 Troubleshooting

Problem: Pod won't start

Error: Container failed to start

Fix: Check Dockerfile syntax and ensure image builds locally first.

Problem: Out of memory

OOMKilled or similar

Fix: Increase allocated memory in pod settings (go from 4GB to 8GB).

Problem: Slow responses

Queries taking >10 seconds

Fix:

Add GPU support
Pre-download models (see optimization section)
Increase allocated CPU cores

Problem: Model not found

Error: Model 'xyz' not found

Fix: Add model download to Dockerfile (see optimization section).

Problem: HTTPS certificate error

SSL Certificate verification failed

Fix: Runpod handles this automatically, should not occur.

📈 Monitoring & Alerts

Set Up Alerts (Optional)

Go to Runpod Billing tab
Set max spend limit
Enable email alerts

Check Status

# Query your endpoint
curl -X POST https://your-endpoint-id.runpod-pods.net/health

# If it fails, pod may be down
# Check Runpod dashboard for status

🔄 Rollback Plan

If deployment has issues:

Keep previous image tagged

docker tag galbendavids/feedback-analysis:v1 galbendavids/feedback-analysis:latest-stable
docker push galbendavids/feedback-analysis:latest-stable

If new deployment fails, revert
- Update Runpod template back to latest-stable
- Restart pod
- Investigate issue locally
Don't delete old pods immediately
- Keep for at least 1 day
- Then delete if new version stable

🎯 Testing Checklist Before Going Live

Before sharing endpoint with users:

/health endpoint responds
/query endpoint returns results
Hebrew queries work correctly
Response times acceptable (<5s for most queries)
Error handling working (try invalid JSON)
Swagger UI accessible at /docs
SSL/HTTPS working (URL is secure)
Logs show no errors
Auto-scaling responding to load

📋 Production Deployment Checklist

Before announcing to users:

Load tested with 100+ concurrent requests
Backup plan documented
Monitoring alerts set up
Support procedure documented
SLA defined (99.9% uptime target, etc.)
Rate limiting configured (optional)
API key authentication enforced (optional)
CORS settings reviewed
Backup of deployment config saved
Runpod support ticket submitted for any questions

📞 Support & Resources

Runpod Docs: https://docs.runpod.io
Runpod Community: https://forums.runpod.io
FastAPI Docs: https://fastapi.tiangolo.com
Docker Docs: https://docs.docker.com

🎓 What's Next

After successful deployment:

Monitor the endpoint - Check logs daily
Gather feedback - What works well, what needs improvement
Iterate - Make improvements, redeploy
Scale - Add more features, more data
Secure - Add authentication, rate limiting as needed

✅ Congratulations!

Your SQL-based feedback analysis agent is now live in the cloud! 🎉

Summary:

✅ Local validation complete
✅ Docker image built
✅ Deployed to Runpod
✅ Cloud endpoint tested
✅ Ready for production

Next: Share the endpoint URL with users or integrate into your application.

Last Updated: Today
Version: 1.0
Status: Production Ready ✨