galbendavids's picture
עדכון: הסרת RAG, הוספת ארכיטקטורה מפורטת, תיקון לינקים, שינוי שם פרויקט ל-SQL-based
f073efc

Deployment Guide - Runpod Cloud

After local testing is complete, follow this guide to deploy your Feedback Analysis Agent to Runpod.


✅ Pre-Deployment Checklist

Before deploying to Runpod, ensure:

  • All local tests pass: python3 scripts/validate_local.py shows 7/7 ✅
  • API server runs locally: python3 run.py starts without errors
  • Endpoints tested: Use TESTING_CHECKLIST.md or curl commands
  • Git repository clean: git status shows no uncommitted changes
  • All code committed: git log --oneline | head -5 shows your commits
  • Docker image builds: docker build -t feedback-analysis:latest . succeeds
  • Requirements.txt updated: All dependencies listed

📦 Step 1: Prepare Docker Image

1.1 Build Docker Image Locally

cd /Users/galbd/Desktop/personal/software/ai_agent_gov/Feedback_Analysis_RAG_Agent_runpod

# Build the image
docker build -t feedback-analysis:latest .

# Verify it built
docker images | grep feedback-analysis

Expected output:

REPOSITORY           TAG      IMAGE ID      CREATED        SIZE
feedback-analysis    latest   abc123def456  2 minutes ago   2.5GB

1.2 Test Docker Image Locally (Optional)

# Run container
docker run -p 8001:8000 feedback-analysis:latest

# In another terminal, test
curl -X POST http://localhost:8001/health

Expected: {"status":"ok"}


🔑 Step 2: Set Up Docker Registry

Option A: Docker Hub (Easiest)

2A.1 Create Docker Hub Account

2A.2 Login to Docker

docker login
# Enter your Docker Hub username and password

2A.3 Tag and Push Image

# Tag with your Docker Hub username
docker tag feedback-analysis:latest galbendavids/feedback-analysis:latest

# Push to Docker Hub
docker push galbendavids/feedback-analysis:latest

# Verify it's uploaded
# Visit https://hub.docker.com/r/YOUR_USERNAME/feedback-analysis

Option B: Private Registry (Advanced)

  • Use AWS ECR, Google Container Registry, or Azure Container Registry
  • Follow their documentation for authentication and push

🚀 Step 3: Create Runpod Template

3.1 Access Runpod Console

  1. Go to https://www.runpod.io
  2. Sign in to your account (create if needed)
  3. Click "Console" in top menu
  4. Go to "Serverless" or "Pods" section

3.2 Create New Template

For Serverless Endpoints (Recommended):

  1. Click "Create New""API Endpoint Template"

  2. Fill in:

    • Template Name: feedback-analysis-sql
    • Docker Image: galbendavids/feedback-analysis:latest
    • Ports: 8000
    • GPU: None (CPU-only is fine)
    • Memory: 4GB minimum
    • Environment Variables:
      GEMINI_API_KEY=your_key_here (optional)
      OPENAI_API_KEY=sk-... (optional)
      
  3. Click "Save Template"

For Pods (Traditional VM):

  1. Click "Create""New Pod"
  2. Select template
  3. Choose GPU type (optional, not needed for this workload)
  4. Set min/max auto-scale settings
  5. Click "Run Pod"

3.3 Configure Networking

  • Expose Port: 8000
  • HTTPS: Enabled automatically
  • Public URL: Runpod generates automatically

🧪 Step 4: Test Deployed Endpoint

4.1 Get Endpoint URL

After deployment, Runpod provides a URL like:

https://your-endpoint-id.runpod-pods.net/

Or for Serverless:

https://api.runpod.io/v1/YOUR_ENDPOINT_ID/run

4.2 Test Basic Connectivity

# For Pods (direct connection)
curl -X POST https://your-endpoint-id.runpod-pods.net/health

# For Serverless (requires different format)
# See Runpod API documentation

Expected response:

{"status":"ok"}

4.3 Test Query Endpoint

curl -X POST https://your-endpoint-id.runpod-pods.net/query \
  -H "Content-Type: application/json" \
  -d '{"query":"כמה משתמשים כתבו תודה","top_k":5}'

Expected response:

{
  "query": "כמה משתמשים כתבו תודה",
  "summary": "1168 משובים מכילים ביטויי תודה.",
  "results": [...]
}

4.4 Test All Endpoints

Use the same curl commands from TESTING_CHECKLIST.md, but replace:

  • http://localhost:8000https://your-endpoint-id.runpod-pods.net

Or use Swagger UI:

  • https://your-endpoint-id.runpod-pods.net/docs

💰 Step 5: Configure Auto-Scaling (Optional)

In Runpod Pod settings:

  1. Minimum GPUs: 0 (not needed)
  2. Maximum GPUs: 1 (if you add GPU support)
  3. Idle timeout: 5 minutes
  4. Auto-pause: Enabled (to save costs)

🔐 Step 6: Add API Keys (Optional)

If you want LLM summaries (not required, system works without):

6.1 In Runpod Dashboard

  1. Go to Pod settings
  2. Add Environment Variables:
    GEMINI_API_KEY=your_actual_key
    OPENAI_API_KEY=sk-your_actual_key
    
  3. Restart pod

6.2 Get API Keys

For Google Gemini:

  1. Go to https://makersuite.google.com/app/apikeys
  2. Click "Create API Key"
  3. Copy the key

For OpenAI:

  1. Go to https://platform.openai.com/api-keys
  2. Create new secret key
  3. Copy the key

📊 Step 7: Monitor & Manage

7.1 Check Logs

In Runpod dashboard:

  1. Click on your pod/endpoint
  2. View Logs tab
  3. Look for errors or warnings

7.2 Performance Metrics

Monitor:

  • CPU usage: Should be <50% at rest
  • Memory: Should be <80% usage
  • Response times: Query endpoint 1-3 seconds
  • Uptime: Should be 99%+

7.3 Scale & Pricing

  • Auto-scaling: Runpod manages based on demand
  • Costs: Typically $0.25-$0.50/hour for 4GB CPU-only pod
  • Savings: Pod auto-pauses when idle (no charge)

🔄 Step 8: Update Deployment

When You Update Code

  1. Make changes locally

    # Edit code, test locally
    git add .
    git commit -m "feat: new feature"
    git push origin main
    
  2. Rebuild Docker image

    docker build -t feedback-analysis:v2 .
    docker tag feedback-analysis:v2 galbendavids/feedback-analysis:v2
    docker push galbendavids/feedback-analysis:v2
    
  3. Update Runpod template

    • Edit template image: galbendavids/feedback-analysis:v2
    • Save
    • Restart pod with new image
  4. Or redeploy

    • Delete old pod
    • Create new pod from updated template

✨ Advanced: Optimization for Cloud

A. Pre-download Models in Dockerfile

To avoid long first-request delays in cloud, add to Dockerfile:

# After RUN pip install requirements.txt

# Pre-download embedding model
RUN python3 -c "from sentence_transformers import SentenceTransformer; SentenceTransformer('paraphrase-multilingual-MiniLM-L12-v2')"

# Pre-download sentiment model
RUN python3 -c "from transformers import pipeline; pipeline('sentiment-analysis', model='nlptown/bert-base-multilingual-uncased-sentiment')"

This adds ~2GB to image, but eliminates download on first request.

B. Use GPU for Faster Embeddings

# Install GPU support
RUN pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
RUN pip install faiss-gpu

Then in Runpod, select a GPU pod (more expensive but faster).

C. Enable Caching

Add to app/config.py:

EMBEDDING_CACHE_SIZE = 10000  # Cache more embeddings
INDEX_RELOAD_INTERVAL = 3600  # Reload index hourly

🐛 Troubleshooting

Problem: Pod won't start

Error: Container failed to start

Fix: Check Dockerfile syntax and ensure image builds locally first.

Problem: Out of memory

OOMKilled or similar

Fix: Increase allocated memory in pod settings (go from 4GB to 8GB).

Problem: Slow responses

Queries taking >10 seconds

Fix:

  • Add GPU support
  • Pre-download models (see optimization section)
  • Increase allocated CPU cores

Problem: Model not found

Error: Model 'xyz' not found

Fix: Add model download to Dockerfile (see optimization section).

Problem: HTTPS certificate error

SSL Certificate verification failed

Fix: Runpod handles this automatically, should not occur.


📈 Monitoring & Alerts

Set Up Alerts (Optional)

  1. Go to Runpod Billing tab
  2. Set max spend limit
  3. Enable email alerts

Check Status

# Query your endpoint
curl -X POST https://your-endpoint-id.runpod-pods.net/health

# If it fails, pod may be down
# Check Runpod dashboard for status

🔄 Rollback Plan

If deployment has issues:

  1. Keep previous image tagged

    docker tag galbendavids/feedback-analysis:v1 galbendavids/feedback-analysis:latest-stable
    docker push galbendavids/feedback-analysis:latest-stable
    
  2. If new deployment fails, revert

    • Update Runpod template back to latest-stable
    • Restart pod
    • Investigate issue locally
  3. Don't delete old pods immediately

    • Keep for at least 1 day
    • Then delete if new version stable

🎯 Testing Checklist Before Going Live

Before sharing endpoint with users:

  • /health endpoint responds
  • /query endpoint returns results
  • Hebrew queries work correctly
  • Response times acceptable (<5s for most queries)
  • Error handling working (try invalid JSON)
  • Swagger UI accessible at /docs
  • SSL/HTTPS working (URL is secure)
  • Logs show no errors
  • Auto-scaling responding to load

📋 Production Deployment Checklist

Before announcing to users:

  • Load tested with 100+ concurrent requests
  • Backup plan documented
  • Monitoring alerts set up
  • Support procedure documented
  • SLA defined (99.9% uptime target, etc.)
  • Rate limiting configured (optional)
  • API key authentication enforced (optional)
  • CORS settings reviewed
  • Backup of deployment config saved
  • Runpod support ticket submitted for any questions

📞 Support & Resources


🎓 What's Next

After successful deployment:

  1. Monitor the endpoint - Check logs daily
  2. Gather feedback - What works well, what needs improvement
  3. Iterate - Make improvements, redeploy
  4. Scale - Add more features, more data
  5. Secure - Add authentication, rate limiting as needed

✅ Congratulations!

Your SQL-based feedback analysis agent is now live in the cloud! 🎉

Summary:

  • ✅ Local validation complete
  • ✅ Docker image built
  • ✅ Deployed to Runpod
  • ✅ Cloud endpoint tested
  • ✅ Ready for production

Next: Share the endpoint URL with users or integrate into your application.


Last Updated: Today
Version: 1.0
Status: Production Ready