π DocGenie Deployment Guide
Complete guide for deploying DocGenie API + Handwriting Service to production with all interdependencies resolved.
π System Architecture
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Client β
ββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Railway (CPU) β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β DocGenie API (Port 8000) β β
β β - FastAPI server β β
β β - Imports: docgenie.generation.* β β
β β - Endpoints: /generate, /generate/pdf, /generate/asyncβ β
β ββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββββ β
β β β
β ββββββββββββββββΌββββββββββββββββββββββββββββββββββββββββ β
β β Background Worker β β
β β - RQ worker (Redis Queue) β β
β β - ClaudeBatchedClient (50% cost savings) β β
β β - Imports: docgenie.generation.* β β
β ββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββ΄βββββββββββ¬βββββββββββββββ
β β β
βΌ βΌ βΌ
βββββββββββββββββ ββββββββββββββββββββ ββββββββββββββββ
β Redis (Upstash)β β Supabase β β Google Drive β
β - Job queue β β - PostgreSQL β β - File storageβ
β - Free tier β β - Document DB β β - OAuth 2.0 β
βββββββββββββββββ ββββββββββββββββββββ ββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β RunPod Serverless (GPU) β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Handwriting Service (Port 8080) β β
β β - WordStylist diffusion model β β
β β - PyTorch + CUDA 11.8 β β
β β - NO docgenie imports (standalone) β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
π Dependency Resolution
β Problem: API imports from docgenie package
Solution: Deploy entire monorepo, install as package with pip install -e .
API Service imports:
# api/worker.py
from docgenie.generation.pipeline_01.claude_batching import ClaudeBatchedClient
from docgenie import ENV
# api/utils.py
from docgenie.generation.constants import BS_PARSER, HANDWRITING_CLASS_NAME
from docgenie.generation.pipeline_01.claude_batching import create_message
from docgenie.generation.pipeline_03_process_response import process_response
from docgenie.generation.pipeline_04_render_pdf_and_extract_geos import render_pdf
Dockerfile solution:
# Copy entire monorepo
COPY . .
# Install as editable package
RUN pip install -e .
# Install API requirements
RUN pip install -r api/requirements.txt
β Handwriting Service is Independent
No docgenie imports! Can be deployed standalone.
# handwriting_service/main.py - NO docgenie imports
from handwriting_service.inference import HandwritingGenerator
from handwriting_service.models import HandwritingRequest
π¦ Pre-Deployment Checklist
1. Environment Variables
Create api/.env with all required variables:
# Claude API
ANTHROPIC_API_KEY=sk-ant-xxxxx
# Redis (will be replaced with Upstash URL)
REDIS_URL=redis://localhost:6379
# Handwriting Service
HANDWRITING_SERVICE_URL=http://localhost:8080
# Supabase
SUPABASE_URL=https://xxxxx.supabase.co
SUPABASE_KEY=eyJxxxxx
# Google Drive (for token refresh only)
# The frontend handles OAuth and sends tokens in API requests
# These credentials are only needed to refresh expired tokens during long jobs
GOOGLE_CLIENT_ID=xxxxx.apps.googleusercontent.com
GOOGLE_CLIENT_SECRET=GOCSPX-xxxxx
GOOGLE_DRIVE_FOLDER_NAME=DocGenie Documents
2. Test Locally First
# Terminal 1: Start Redis
docker run -p 6379:6379 redis:7-alpine
# Terminal 2: Start Handwriting Service
cd handwriting_service
DEVICE=cpu uvicorn main:app --port 8080
# Terminal 3: Start API
cd api
source ../.venv/bin/activate
uvicorn main:app --reload --port 8000
# Terminal 4: Start Worker
cd api
source ../.venv/bin/activate
python worker.py
Test endpoints:
# Health check
curl http://localhost:8000/health
# Async generation (uses batched API)
curl -X POST http://localhost:8000/generate/async \
-H "Content-Type: application/json" \
-d '{"template_name": "DocGenie", "num_pages": 2}'
π’ Deployment Steps
Option A: Railway + RunPod (RECOMMENDED - $10/month)
Step 1: Deploy Redis to Upstash (FREE)
- Go to https://upstash.com
- Create account β New Redis Database
- Copy the
UPSTASH_REDIS_REST_URL(looks like:redis://default:xxxxx@xxxxx.upstash.io:6379)
Step 2: Deploy Handwriting Service to RunPod
Option A: Build from Git Repository (RECOMMENDED - No Docker Hub needed!)
This builds directly on RunPod's servers, avoiding the need to upload 10GB over your internet.
- Prepare and push code to Git:
cd /media/ahad-hassan/Volume_E/FYP/FYP/docgenie
# First, prepare optimized WordStylist (removes 432MB of unnecessary files)
cd handwriting_service
./prepare_build.sh
cd ..
# Now commit the optimized WordStylist
git add handwriting_service/
git status # Verify WordStylist is included (should show WordStylist/models/ema_ckpt.pt, etc.)
git commit -m "Add handwriting service with optimized WordStylist"
git push origin main
- Deploy to RunPod:
- Go to https://runpod.io β Serverless β New Endpoint
- Click "Build from Git" (not Docker Image)
- Settings:
- Name:
docgenie-handwriting - Git URL:
https://github.com/Ahadhassan-2003/FYP.git - Git Branch:
main - Docker Build Context:
docgenie/handwriting_service - Dockerfile Path:
Dockerfile - GPU: RTX 4090 or A40
- Container Disk: 15GB
- Max Workers: 1
- Idle Timeout: 5 seconds
- Exposed Port: 8080
- Name:
- Environment Variables:
DEVICE=cuda PYTHONUNBUFFERED=1 - Build Args (prepare WordStylist):
PREPARE_WORDSTYLIST=true - Click "Deploy"
RunPod will clone your repo and build the image on their fast servers!
Option B: Pre-built Docker Image (if Git unavailable)
Click to expand Docker Hub method
cd handwriting_service
# Prepare optimized build (removes 432MB)
./prepare_build.sh
# Login to Docker Hub
docker login
# Build image
docker buildx build --platform linux/amd64 \
-t yourusername/docgenie-handwriting:latest \
--build-arg BUILDKIT_INLINE_CACHE=1 \
.
# Push to Docker Hub (may take 20-30 minutes for 10GB)
docker push yourusername/docgenie-handwriting:latest
Then deploy on RunPod:
- Go to https://runpod.io β Serverless β New Endpoint
- Docker Image:
yourusername/docgenie-handwriting:latest - GPU: RTX 4090 or A40
- Port: 8080
- Environment Variables:
DEVICE=cuda
Step 3: Deploy API to Railway
- Install Railway CLI:
# Install Railway CLI
npm i -g @railway/cli
# Or use curl
bash <(curl -fsSL cli.new) railway
- Initialize Railway project:
cd /media/ahad-hassan/Volume_E/FYP/FYP/docgenie
# Login to Railway
railway login
# Create new project
railway init
# Link to project (creates railway.json)
railway link
- Set environment variables:
# Set all environment variables from api/.env
railway variables set ANTHROPIC_API_KEY=sk-ant-xxxxx
railway variables set REDIS_URL=redis://default:xxxxx@xxxxx.upstash.io:6379
railway variables set HANDWRITING_SERVICE_URL=https://api.runpod.ai/v2/xxxxx/runsync
railway variables set SUPABASE_URL=https://xxxxx.supabase.co
railway variables set SUPABASE_KEY=eyJxxxxx
# Google OAuth (for token refresh only - frontend provides tokens in requests)
railway variables set GOOGLE_CLIENT_ID=xxxxx.apps.googleusercontent.com
railway variables set GOOGLE_CLIENT_SECRET=GOCSPX-xxxxx
railway variables set GOOGLE_DRIVE_FOLDER_NAME="DocGenie Documents"
Note: Google access/refresh tokens are NOT environment variables! The frontend authenticates with Google OAuth, then passes google_drive_token and google_drive_refresh_token in the API request body. See API request schema.
- Deploy API + Worker:
# Railway will detect Dockerfile and deploy automatically
railway up
# Or connect to GitHub and deploy from there
railway connect
Option 1: Separate Worker Service (For Production Scale):
Note: Only needed if processing 50+ concurrent jobs. For most use cases, Option 2 (combined) is sufficient.
Method A: Connect to Same GitHub Repo (Recommended)
- Go to Railway dashboard β Your project β New Service
- Click "GitHub Repo" β Select your repo
- Name:
docgenie-worker - Settings β Deploy:
- Builder:
DOCKERFILE - Dockerfile Path:
Dockerfile - Root Directory:
/(same as API) - Custom Start Command:
rq worker --url $REDIS_URL
- Builder:
- Variables: Add all environment variables (same as API service)
- Deploy
Method B: Use Same Docker Image as API
- Railway dashboard β New Service β Empty Service
- Name:
docgenie-worker - Settings β Source: Link to API service's image
- Custom Start Command:
rq worker --url $REDIS_URL - Variables: Copy from API service
- Deploy
Option 2: Combined API + Worker (Recommended for Getting Started):
Update
railway.jsonto run both in one service:{ "deploy": { "startCommand": "uvicorn api.main:app --host 0.0.0.0 --port $PORT & rq worker --url $REDIS_URL & wait" } }Then push:
git add railway.json git commit -m "feat: Run API and worker in combined service" git pushBenefits:
- β Single service ($5/month instead of $10/month)
- β Simpler logs and monitoring
- β Automatic scaling together
- β Good for 90% of use cases
Get API URL:
- Railway dashboard β API service β Settings β Domains
- Generate domain (e.g.,
docgenie-api.up.railway.app)
Step 4: Update Frontend
Update your frontend API URL to Railway domain:
const API_URL = 'https://docgenie-api.up.railway.app';
Option B: AWS EC2 + RunPod (For Production)
Prerequisites
- AWS account with EC2 access
- Domain name (optional, for SSL)
Step 1: Launch EC2 Instance
# Launch t3.medium instance
aws ec2 run-instances \
--image-id ami-0c55b159cbfafe1f0 \
--instance-type t3.medium \
--key-name your-key-pair \
--security-group-ids sg-xxxxx \
--subnet-id subnet-xxxxx
Security Group Rules:
- Port 22 (SSH) - Your IP only
- Port 80 (HTTP) - 0.0.0.0/0
- Port 443 (HTTPS) - 0.0.0.0/0
- Port 8000 (API) - 0.0.0.0/0
Step 2: Setup EC2
# SSH into instance
ssh -i your-key.pem ubuntu@your-ec2-ip
# Update system
sudo apt update && sudo apt upgrade -y
# Install Docker
curl -fsSL https://get.docker.com -o get-docker.sh
sudo sh get-docker.sh
sudo usermod -aG docker ubuntu
# Install Docker Compose
sudo apt install docker-compose-plugin -y
# Install Git
sudo apt install git -y
# Clone repository
git clone https://gitlab.cs.hs-rm.de/diss_lamott/docgenie.git
cd docgenie
Step 3: Configure Environment
# Create .env file
cd api
nano .env
# Paste all environment variables
# Save: Ctrl+X, Y, Enter
# Update REDIS_URL to use Upstash
# Update HANDWRITING_SERVICE_URL to RunPod endpoint
Step 4: Deploy with Docker Compose
cd /home/ubuntu/docgenie
# Start services (API + Worker + Redis)
docker-compose up -d api worker redis
# Check logs
docker-compose logs -f api
docker-compose logs -f worker
Step 5: Setup Nginx Reverse Proxy
# Install Nginx
sudo apt install nginx -y
# Create config
sudo nano /etc/nginx/sites-available/docgenie
# Paste configuration:
server {
listen 80;
server_name your-domain.com; # Or use EC2 IP
location / {
proxy_pass http://localhost:8000;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection 'upgrade';
proxy_set_header Host $host;
proxy_cache_bypass $http_upgrade;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
# Increase timeout for long-running requests
proxy_read_timeout 300s;
proxy_connect_timeout 75s;
}
}
# Enable site
sudo ln -s /etc/nginx/sites-available/docgenie /etc/nginx/sites-enabled/
sudo nginx -t
sudo systemctl restart nginx
# Optional: Setup SSL with Let's Encrypt
sudo apt install certbot python3-certbot-nginx -y
sudo certbot --nginx -d your-domain.com
Step 6: Setup Systemd Service (Auto-restart)
# Create service file
sudo nano /etc/systemd/system/docgenie.service
[Unit]
Description=DocGenie API
After=docker.service
Requires=docker.service
[Service]
Type=oneshot
RemainAfterExit=yes
WorkingDirectory=/home/ubuntu/docgenie
ExecStart=/usr/bin/docker-compose up -d api worker redis
ExecStop=/usr/bin/docker-compose down
User=ubuntu
[Install]
WantedBy=multi-user.target
# Enable service
sudo systemctl daemon-reload
sudo systemctl enable docgenie
sudo systemctl start docgenie
# Check status
sudo systemctl status docgenie
π§ͺ Testing Production Deployment
1. Health Check
curl https://your-domain.com/health
2. Sync Generation (Fast)
curl -X POST https://your-domain.com/generate \
-H "Content-Type: application/json" \
-d '{
"template_name": "DocGenie",
"num_pages": 1
}'
3. Async Generation (Batched, Cheap)
# Start async job
RESPONSE=$(curl -X POST https://your-domain.com/generate/async \
-H "Content-Type: application/json" \
-d '{
"template_name": "DocGenie",
"num_pages": 2
}')
REQUEST_ID=$(echo $RESPONSE | jq -r '.request_id')
echo "Request ID: $REQUEST_ID"
# Poll status
while true; do
STATUS=$(curl -s https://your-domain.com/jobs/$REQUEST_ID/status | jq -r '.status')
echo "Status: $STATUS"
if [ "$STATUS" = "completed" ] || [ "$STATUS" = "failed" ]; then
break
fi
sleep 10
done
# Get result
curl https://your-domain.com/jobs/$REQUEST_ID/status | jq
π Cost Breakdown
Railway + RunPod (Recommended)
| Service | Cost | Notes |
|---|---|---|
| Railway (API + Worker) | $5-10/month | Includes 500 hours |
| Upstash Redis | FREE | 10K requests/day |
| RunPod Serverless GPU | $0.20/hr | Only charged when active |
| Supabase | FREE | 500MB database |
| Total | ~$10-15/month | + $0.20/hr GPU usage |
EC2 + RunPod
| Service | Cost | Notes |
|---|---|---|
| EC2 t3.medium | $30/month | 2 vCPU, 4GB RAM |
| Upstash Redis | FREE | External Redis |
| RunPod Serverless GPU | $0.20/hr | Only when needed |
| Supabase | FREE | External DB |
| Total | ~$30/month | + $0.20/hr GPU usage |
EC2 + Dedicated GPU (Production)
| Service | Cost | Notes |
|---|---|---|
| EC2 g4dn.xlarge | $150/month | 4 vCPU, 16GB RAM, T4 GPU |
| Supabase | FREE | External DB |
| Total | ~$150/month | All-in-one solution |
π§ Maintenance
Update Deployment
Railway:
# Push to main branch (auto-deploy)
git push origin main
# Or manual deploy
railway up
EC2:
ssh ubuntu@your-ec2-ip
cd docgenie
git pull
docker-compose down
docker-compose up -d --build
View Logs
Railway:
railway logs
EC2:
# API logs
docker-compose logs -f api
# Worker logs
docker-compose logs -f worker
# Nginx logs
sudo tail -f /var/log/nginx/access.log
sudo tail -f /var/log/nginx/error.log
Monitor Redis Queue
# Connect to Redis
redis-cli -u $REDIS_URL
# Check queue status
> LLEN rq:queue:default
> LRANGE rq:queue:default 0 -1
π¨ Troubleshooting
Issue: Worker can't import docgenie package
Solution: Dockerfile installs entire monorepo with pip install -e .
Issue: Handwriting service connection timeout
Solution: Use RunPod's /runsync endpoint, not /run (synchronous)
Issue: Google token expired during job
Solution: Ensure GOOGLE_REFRESH_TOKEN, GOOGLE_CLIENT_ID, GOOGLE_CLIENT_SECRET are set
Issue: Railway build fails (too large)
Solution: Check .dockerignore excludes data/ folders
Issue: Worker heartbeat timeout
Solution: Job is still running, batched API takes 10-30 minutes
π Next Steps
- Monitor costs: Railway dashboard, RunPod usage page
- Setup alerts: Railway β Settings β Notifications
- Scale workers: Railway β Worker service β Settings β Replicas
- Add caching: Redis cache for generated documents
- Setup CI/CD: GitHub Actions β Railway auto-deploy
π You're Done!
Your DocGenie API is now deployed with:
- β All docgenie package imports resolved
- β GPU handwriting service on RunPod
- β Background workers for batched API
- β Auto-scaling and cost optimization
- β Google token refresh working
- β Database schema compatibility
API URL: https://your-domain.com
Docs: https://your-domain.com/docs
Health: https://your-domain.com/health
π₯οΈ Local Testing Guide
Architecture
βββββββββββββββββββββββββββββββββββ
β DocGenie API (Port 8000) ββββ HTTP
βββββββββββββββββββββββββββββββββββ β localhost:8080
βΌ
βββββββββββββββββββββββββββββββββββ
β Handwriting Service (Port 8080) β
β - Loads WordStylist model β
βββββββββββββββββββββββββββββββββββ
Prerequisites
- Python environment:
source .venv/bin/activate - WordStylist Model at
WordStylist/models/ckpt.ptandema_ckpt.pt api/.envwithANTHROPIC_API_KEY,HANDWRITING_SERVICE_ENABLED=true,HANDWRITING_SERVICE_URL=http://localhost:8080
Step-by-Step Setup
Terminal 1 β Handwriting Service:
cd handwriting_service
DEVICE=cpu ./start.sh # CPU (no GPU required)
# DEVICE=cuda ./start.sh # GPU (faster)
Terminal 2 β DocGenie API:
cd api
uvicorn main:app --reload
Terminal 3 β Test:
curl http://localhost:8080/health # Handwriting service
curl http://localhost:8000/health # API
cd api && python test_api.py
Performance Notes
- CPU mode: ~5β10 s/word | GPU mode: ~0.5β1 s/word
- Service processes all words in one batch for efficiency
βοΈ Railway-Specific Configuration
Critical Issues & Fixes
1. .dockerignore β Keep required data folders:
!data/prompt_templates/
!data/visual_element_prefabs/
2. railway.json β Start both API and worker:
"startCommand": "cd api && uvicorn main:app --host 0.0.0.0 --port $PORT & rq worker --url $REDIS_URL & wait"
Environment Variables
π΄ Required
ANTHROPIC_API_KEY=sk-ant-api03-xxx
REDIS_URL=rediss://default:xxx@xxx.upstash.io:6379
HANDWRITING_SERVICE_URL=https://api.runpod.ai/v2/ht9ajgrduitgpr/runsync
HANDWRITING_SERVICE_ENABLED=true
SUPABASE_URL=https://xxx.supabase.co
SUPABASE_KEY=xxx
GOOGLE_CLIENT_ID=xxx.apps.googleusercontent.com
GOOGLE_CLIENT_SECRET=xxx
π‘ Recommended
RUNPOD_API_KEY=xxx
OCR_SERVICE_ENABLED=true
OCR_USE_LOCAL=true
OCR_ENGINE=microsoft_di
OCR_DPI=300
HANDWRITING_SERVICE_TIMEOUT=300
HANDWRITING_SERVICE_MAX_RETRIES=3
RQ_QUEUE_NAME=docgenie
LOG_LEVEL=INFO
π’ Optional (defaults are fine)
API_HOST=0.0.0.0
API_PORT=8000
DEBUG_MODE=false
CLAUDE_MODEL=claude-sonnet-4-5-20250929
CORS_ORIGINS=*
GOOGLE_DRIVE_FOLDER_NAME=DocGenie Documents
TEMP_DIR=/tmp/docgenie_api
HANDWRITING_APPLY_BLUR=false
BBOX_NORMALIZATION_ENABLED=false
GT_VERIFICATION_ENABLED=false
ANALYSIS_ENABLED=false
DEBUG_VISUALIZATION_ENABLED=false
Validation Steps
# 1. Health check
curl https://your-app.up.railway.app/health
# 2. Sync generation
curl -X POST https://your-app.up.railway.app/api/generate \
-H "Content-Type: application/json" \
-d '{"document_category": "invoice", "pages": 1}'
# 3. Async generation
curl -X POST https://your-app.up.railway.app/api/async/generate \
-H "Content-Type: application/json" \
-d '{"document_category": "invoice", "pages": 1, "google_access_token": "ya29.xxx"}'
Common Railway Issues
| Issue | Cause | Solution |
|---|---|---|
| Worker not starting | Missing rq worker in start command |
Check railway.json startCommand |
| Missing prompt templates | .dockerignore too aggressive |
Add !data/prompt_templates/ |
| Playwright errors | Browser not installed | Ensure playwright install chromium in Dockerfile |
| Redis connection errors | Wrong REDIS_URL |
Verify in Railway env variables |
| Handwriting timeout | Batch too large | Increase HANDWRITING_SERVICE_TIMEOUT |
| Large Docker image | data/ folders included |
Check .dockerignore excludes datasets/embeddings |
β‘ RunPod Batch Optimization
Problem (Old Parallel Processing)
Each text was sent as a separate RunPod request β N texts = N workers = NΓ activation cost.
Example: 10 texts β 10 workers Γ 18 s = 180 worker-seconds + 10Γ activation fees
Solution (New Batch Processing)
All texts sent in one RunPod request β 1 worker handles everything.
Example: 10 texts β 1 worker Γ 190 s = 190 worker-seconds + 1Γ activation fee
Savings: ~45β60% cost reduction (activation fees dominate RunPod pricing)
Batch Request Format (handler.py)
{
"input": {
"texts": [
{"text": "Hello", "author_id": 42, "hw_id": "hw_0"},
{"text": "World", "author_id": 42, "hw_id": "hw_1"}
],
"apply_blur": true
}
}
Response:
{
"status": "COMPLETED",
"output": {
"images": [
{"image_base64": "...", "width": 217, "height": 61, "text": "Hello", "author_id": 42, "hw_id": "hw_0"},
{"image_base64": "...", "width": 195, "height": 58, "text": "World", "author_id": 42, "hw_id": "hw_1"}
],
"total_generated": 2
}
}
Note: Backward-compatible β single text requests (old format) are still supported. Handler auto-detects batch vs single based on the
"texts"key.
Timeout Configuration
Timeout is dynamically calculated: num_texts Γ 20 + 30 seconds.
For large batches (20+ texts), set RunPod endpoint max execution time to 600 s.
Cost Comparison
| Scenario | OLD (parallel) | NEW (batched) | Savings |
|---|---|---|---|
| 2 texts | 2 workers Γ 18 s | 1 worker Γ 38 s | ~50% |
| 10 texts | 10 workers Γ 18 s | 1 worker Γ 190 s | ~55% |
| 25 texts | 25 workers Γ 18 s | 1 worker Γ 480 s | ~60% |
Integration Test
cd api
python test_runpod_integration.py