Spaces:

Nomearod
/

agentbench

Running

App Files Files Community

agentbench / data /tech_docs /fastapi_deployment.md

Nomearod

feat: Day 4 — corpus, ingest script, first 10 golden questions

a152b95 about 1 month ago

preview code

raw

history blame contribute delete

5.45 kB

Deploying FastAPI Applications

FastAPI applications are deployed using ASGI servers. This guide covers production deployment with Uvicorn, Gunicorn, Docker, and related infrastructure considerations.

Uvicorn (Single Process)

Uvicorn is the recommended ASGI server for FastAPI. For development:

uvicorn main:app --reload --host 127.0.0.1 --port 8000

For production with a single process:

uvicorn main:app --host 0.0.0.0 --port 8000 --workers 1 --log-level info

Key Uvicorn configuration options:

Flag	Default	Description
`--host`	`127.0.0.1`	Bind address
`--port`	`8000`	Bind port
`--workers`	`1`	Number of worker processes
`--loop`	`auto`	Event loop (auto, asyncio, uvloop)
`--http`	`auto`	HTTP protocol (auto, h11, httptools)
`--ws`	`auto`	WebSocket protocol (auto, websockets, wsproto)
`--log-level`	`info`	Logging level (critical, error, warning, info, debug, trace)
`--access-log`	`True`	Enable/disable access log
`--ws-max-size`	`16777216`	Max WebSocket message size (16 MB)
`--timeout-keep-alive`	`5`	Keep-alive timeout in seconds

Using uvloop and httptools (installed automatically on Linux) provides a 20-30% performance boost over the pure-Python asyncio and h11 alternatives.

Gunicorn with Uvicorn Workers

For production deployments requiring multiple worker processes, use Gunicorn as the process manager with Uvicorn workers:

gunicorn main:app \
    --workers 4 \
    --worker-class uvicorn.workers.UvicornWorker \
    --bind 0.0.0.0:8000 \
    --timeout 120 \
    --graceful-timeout 30 \
    --keep-alive 5 \
    --max-requests 1000 \
    --max-requests-jitter 50 \
    --access-logfile -

The recommended number of workers is (2 * CPU_CORES) + 1. For a server with 4 CPU cores, that is 9 workers. The --max-requests 1000 flag restarts each worker after handling 1,000 requests, preventing memory leaks. The --max-requests-jitter 50 adds a random offset (0-50) so workers do not all restart simultaneously.

The --timeout 120 flag sets the maximum time (in seconds) a worker can take to handle a request before being killed and restarted. The default is 30 seconds. The --graceful-timeout 30 gives workers 30 seconds to finish current requests during shutdown.

Docker Deployment

A production-ready Dockerfile:

FROM python:3.12-slim

WORKDIR /app

# Install dependencies first for layer caching
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy application code
COPY ./app ./app

# Create non-root user
RUN adduser --disabled-password --gecos "" appuser
USER appuser

EXPOSE 8000

CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000", "--workers", "4"]

Build and run:

docker build -t myapi:latest .
docker run -d --name myapi -p 8000:8000 -e DATABASE_URL=postgresql://... myapi:latest

The python:3.12-slim base image is approximately 120 MB, compared to the full python:3.12 image at approximately 890 MB. For even smaller images, use python:3.12-alpine (approximately 50 MB), though it may require additional build dependencies for some Python packages.

Proxy Headers and HTTPS

When running behind a reverse proxy (Nginx, Traefik, AWS ALB), configure Uvicorn to trust proxy headers:

uvicorn main:app \
    --host 0.0.0.0 \
    --port 8000 \
    --proxy-headers \
    --forwarded-allow-ips="*"

The --proxy-headers flag tells Uvicorn to read X-Forwarded-For and X-Forwarded-Proto headers from the proxy. The --forwarded-allow-ips flag specifies which proxy IPs are trusted. Using "*" trusts all proxies (acceptable when the application is not directly exposed to the internet).

An Nginx reverse proxy configuration:

upstream fastapi_backend {
    server 127.0.0.1:8000;
}

server {
    listen 443 ssl;
    server_name api.example.com;

    ssl_certificate /etc/ssl/certs/api.example.com.pem;
    ssl_certificate_key /etc/ssl/private/api.example.com.key;

    location / {
        proxy_pass http://fastapi_backend;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
        proxy_buffering off;
    }
}

Setting proxy_buffering off ensures streamed responses (like SSE or large file downloads) are forwarded immediately rather than buffered by Nginx.

Health Checks

Include a health check endpoint for container orchestrators:

@app.get("/health", status_code=200)
async def health_check():
    return {"status": "healthy"}

Docker health check configuration:

HEALTHCHECK --interval=30s --timeout=10s --retries=3 --start-period=10s \
    CMD curl -f http://localhost:8000/health || exit 1

This checks health every 30 seconds, allows 10 seconds per check, retries 3 times before marking unhealthy, and waits 10 seconds after container start before the first check.