llm-api-backend / DEPLOY.md
cygon
intial commit
86042ad
# πŸš€ Deployment Guide
This guide covers deploying the LLM API Backend to various platforms.
## πŸ“‹ Prerequisites
- Node.js 18+ installed
- Encore CLI installed (`npm install -g encore.dev`)
- Git installed
- API keys for your chosen LLM provider
---
## 🌟 Hugging Face Spaces (Recommended for Demos)
### Step 1: Create a Space
1. Go to https://huggingface.co/spaces
2. Click **"Create new Space"**
3. Settings:
- **Space name:** `llm-api-backend` (or your choice)
- **SDK:** Docker
- **Visibility:** Public or Private
4. Click **Create Space**
### Step 2: Clone and Push
```bash
# Clone your new Space
git clone https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME
cd YOUR_SPACE_NAME
# Copy all files from this project
cp -r /path/to/llm-api-backend/* .
# Commit and push
git add .
git commit -m "Initial deployment"
git push
```
### Step 3: Configure Secrets
1. Go to your Space page
2. Click **Settings** β†’ **Repository secrets**
3. Add the following secrets:
**For Hugging Face Provider:**
```
LLMProvider = huggingface
HuggingFaceAPIKey = hf_your_token_here
DefaultModel = mistralai/Mistral-7B-Instruct-v0.2
```
**For Ollama Provider (requires custom Docker setup):**
```
LLMProvider = ollama
OllamaBaseURL = http://localhost:11434
DefaultModel = llama3
```
### Step 4: Wait for Build
- Hugging Face will automatically build your Docker container
- Watch the build logs in the Space interface
- Once complete, your API is live!
### Step 5: Test Your API
```bash
# Replace with your actual Space URL
export SPACE_URL="https://YOUR_USERNAME-YOUR_SPACE_NAME.hf.space"
# Test chat endpoint
curl -X POST $SPACE_URL/chat \
-H "Content-Type: application/json" \
-d '{"message": "Hello!"}'
# Test health endpoint
curl $SPACE_URL/health
```
---
## ☁️ Encore Cloud (Recommended for Production)
### Step 1: Install Encore
```bash
npm install -g encore.dev
```
### Step 2: Create Encore App
```bash
# If starting fresh
encore app create
# Or link existing app
encore app link
```
### Step 3: Set Secrets
```bash
# For Hugging Face
encore secret set LLMProvider huggingface
encore secret set HuggingFaceAPIKey hf_your_token_here
encore secret set DefaultModel mistralai/Mistral-7B-Instruct-v0.2
# For Ollama (local development)
encore secret set LLMProvider ollama
encore secret set OllamaBaseURL http://localhost:11434
encore secret set DefaultModel llama3
```
### Step 4: Deploy
```bash
# Deploy to staging
encore deploy
# Deploy to production
encore deploy --env production
```
### Step 5: Access Your API
```bash
# Your API will be available at:
# Staging: https://staging-YOUR_APP.encr.app
# Production: https://prod-YOUR_APP.encr.app
# Test it
curl https://staging-YOUR_APP.encr.app/health
```
---
## 🐳 Docker (Self-Hosted)
### Step 1: Build Image
```bash
docker build -t llm-api-backend .
```
### Step 2: Run Container
**Using Hugging Face:**
```bash
docker run -d \
-p 7860:7860 \
-e LLMProvider=huggingface \
-e HuggingFaceAPIKey=hf_your_token \
-e DefaultModel=mistralai/Mistral-7B-Instruct-v0.2 \
--name llm-api \
llm-api-backend
```
**Using Ollama (with host network):**
```bash
docker run -d \
--network host \
-e LLMProvider=ollama \
-e OllamaBaseURL=http://localhost:11434 \
-e DefaultModel=llama3 \
--name llm-api \
llm-api-backend
```
### Step 3: Test
```bash
curl http://localhost:7860/health
```
---
## πŸ–₯️ VPS / Bare Metal
### Step 1: Install Dependencies
```bash
# Install Node.js 20
curl -fsSL https://deb.nodesource.com/setup_20.x | sudo -E bash -
sudo apt-get install -y nodejs
# Install Encore CLI
npm install -g encore.dev
```
### Step 2: Clone Repository
```bash
git clone https://github.com/YOUR_USERNAME/llm-api-backend.git
cd llm-api-backend
```
### Step 3: Configure Environment
```bash
# Copy example env file
cp .env.example .env
# Edit with your values
nano .env
```
### Step 4: Set Encore Secrets
```bash
encore secret set LLMProvider huggingface
encore secret set HuggingFaceAPIKey hf_your_token
encore secret set DefaultModel mistralai/Mistral-7B-Instruct-v0.2
```
### Step 5: Run with PM2 (Production)
```bash
# Install PM2
npm install -g pm2
# Start application
pm2 start "encore run --port 8080" --name llm-api
# Save PM2 configuration
pm2 save
# Enable startup on boot
pm2 startup
```
### Step 6: Configure Nginx (Optional)
```nginx
server {
listen 80;
server_name your-domain.com;
location / {
proxy_pass http://localhost:8080;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection 'upgrade';
proxy_set_header Host $host;
proxy_cache_bypass $http_upgrade;
}
}
```
---
## πŸ” Security Checklist
Before deploying to production:
- [ ] All secrets configured properly
- [ ] API keys have appropriate permissions
- [ ] CORS configured for your frontend domains
- [ ] Rate limiting enabled (add middleware)
- [ ] HTTPS enabled (Encore/HF Spaces handle this)
- [ ] Environment variables not committed to git
- [ ] Monitoring and logging set up
- [ ] Error tracking configured
---
## πŸ“Š Monitoring
### Encore Cloud
- Built-in dashboard at https://app.encore.dev
- Real-time traces, logs, and metrics
- Performance monitoring
- Error tracking
### Hugging Face Spaces
- View logs in Space interface
- Use `/health` endpoint for uptime monitoring
- Configure external monitoring tools
### Self-Hosted
- Use `/health` endpoint
- Set up monitoring tools like:
- Prometheus + Grafana
- Datadog
- New Relic
- Sentry for errors
---
## πŸ†˜ Troubleshooting
### Build Failures on HF Spaces
**Issue:** Docker build fails
```bash
# Check Dockerfile syntax
# Ensure all required files are committed
# Check Space build logs
```
### "Secret not set" Errors
**Issue:** Application can't access secrets
```bash
# On Encore: Use 'encore secret set' command
# On HF Spaces: Configure in Space settings
# On Docker: Pass as environment variables (-e flag)
```
### Model Loading Timeout
**Issue:** HF models take too long to load
```bash
# Solution: Wait 30-60 seconds for cold start
# Use smaller models for faster loading
# Check model availability on HF
```
### Connection Refused (Ollama)
**Issue:** Can't connect to Ollama
```bash
# Ensure Ollama is running: ollama serve
# Check OllamaBaseURL is correct
# For Docker: Use --network host
```
---
## πŸ”„ Updates and Maintenance
### Updating on Hugging Face Spaces
```bash
git pull origin main
# Make your changes
git add .
git commit -m "Update: description"
git push
```
### Updating on Encore Cloud
```bash
# Make changes
git commit -am "Update: description"
encore deploy
```
### Updating Docker
```bash
# Rebuild image
docker build -t llm-api-backend .
# Stop old container
docker stop llm-api
docker rm llm-api
# Run new container
docker run -d [your flags] llm-api-backend
```
---
## πŸ“š Additional Resources
- [Encore Documentation](https://encore.dev/docs)
- [Hugging Face Spaces Docs](https://huggingface.co/docs/hub/spaces)
- [Docker Documentation](https://docs.docker.com)
- [Ollama Documentation](https://ollama.ai/docs)
---
**Need help?** Open an issue or check the main README.md for support options.