# Deployment Guide This guide explains how to run LedgerShield locally, in Docker, or as a Docker-backed Hugging Face Space, and documents the runtime environment variables that control benchmark behavior. ## Deployment Modes ### Local Python process Best for development and testing. ```bash python -m venv .venv source .venv/bin/activate pip install -e . pip install -r requirements.txt python -m server.app ``` Default bind: - host: `0.0.0.0` - port: `8000` Health check: ```bash curl http://127.0.0.1:8000/health ``` ### Docker The repo ships with a ready-to-build [`../Dockerfile`](../Dockerfile). Build: ```bash docker build -t ledgershield:latest . ``` Run: ```bash docker run --rm -p 8000:8000 ledgershield:latest ``` Smoke test: ```bash curl http://127.0.0.1:8000/health ``` ### Hugging Face Spaces The root `README.md` includes Docker Space front matter, and `openenv.yaml` describes the benchmark metadata. For a Docker Space deployment: 1. create a new Hugging Face Space using the Docker SDK 2. push this repo contents to the Space 3. ensure the Space exposes port `8000` 4. verify `/health`, `/reset`, and `/step` ### CI-backed validation GitHub Actions already validates: - Python test runs - Docker build and container smoke test - `openenv.yaml` integrity See [`../.github/workflows/ci.yml`](../.github/workflows/ci.yml). ## Runtime Environment Variables ### Server bind settings | Variable | Default | Meaning | |---|---|---| | `HOST` | `0.0.0.0` | bind host used by `server.app:main` | | `PORT` | `8000` | bind port used by `server.app:main` | ### Case-loader controls These are read by [`../server/data_loader.py`](../server/data_loader.py). | Variable | Default | Meaning | |---|---|---| | `LEDGERSHIELD_INCLUDE_CHALLENGE` | `true` | include generated challenge variants in the loaded case pool | | `LEDGERSHIELD_CHALLENGE_VARIANTS` | `2` | number of generated challenge variants per hard case | | `LEDGERSHIELD_CHALLENGE_SEED` | `2026` | RNG seed for challenge generation | | `LEDGERSHIELD_INCLUDE_HOLDOUT` | `false` | include generated holdout cases in the loaded case pool | | `LEDGERSHIELD_HOLDOUT_VARIANTS` | `1` | holdout variants per hard case | | `LEDGERSHIELD_HOLDOUT_SEED` | `31415` | RNG seed for holdout generation | | `LEDGERSHIELD_INCLUDE_TWINS` | `false` | include benign contrastive twins in the loaded case pool | ### Agent-side variables Common variables used by `inference.py` and related scripts: | Variable | Typical use | |---|---| | `API_BASE_URL` | OpenAI-compatible API endpoint | | `MODEL_NAME` | model name for inference (determines `ModelCapabilityProfile` tier) | | `HF_TOKEN` | token used by the submission-safe agent | | `OPENAI_API_KEY` | credential for live comparison scripts | | `ENV_URL` | environment server base URL | | `LOCAL_IMAGE_NAME` | optional Docker image name for local environment use | | `LEDGERSHIELD_DEBUG` | set to `1` to enable stderr output from the inference agent (default: stderr suppressed) | ## Operational Checks ### Basic API checks ```bash curl http://127.0.0.1:8000/health curl http://127.0.0.1:8000/ ``` ### Reset a known case ```bash curl -X POST http://127.0.0.1:8000/reset \ -H 'Content-Type: application/json' \ -d '{"case_id":"CASE-A-001"}' ``` ### Run benchmark report generation locally ```bash python benchmark_report.py --format markdown ``` Generated artifacts land under `artifacts/` when written. ## Recommended Deployment Profiles ### Minimal benchmark server Use this when you only need the curated benchmark and generated challenge variants: ```bash HOST=0.0.0.0 PORT=8000 python -m server.app ``` ### Public benchmark only Disable generated challenge variants: ```bash LEDGERSHIELD_INCLUDE_CHALLENGE=0 python -m server.app ``` ### Holdout-enabled evaluation server ```bash LEDGERSHIELD_INCLUDE_HOLDOUT=1 \ LEDGERSHIELD_HOLDOUT_VARIANTS=1 \ python -m server.app ``` ### Calibration-heavy server with twins ```bash LEDGERSHIELD_INCLUDE_TWINS=1 python -m server.app ``` ## Production Notes LedgerShield is still a benchmark, not a payment system. For production-like hosting: - terminate TLS outside the app - health-check `/health` - treat the service as stateless and restartable - version-control `openenv.yaml` and benchmark artifacts - avoid mixing benchmark servers with live finance systems ## Troubleshooting ### Server starts but endpoints fail Check: - port `8000` is not already in use - dependencies from `requirements.txt` are installed - you are running from the repo root so fixture paths resolve correctly ### Docker container builds but health check fails Check: - `curl http://localhost:8000/health` - container logs for import/path issues - whether your host already has something bound to `8000` ### Unexpected case counts Remember that the loader includes challenge variants by default. If you expect only the curated 21-case benchmark, set: ```bash LEDGERSHIELD_INCLUDE_CHALLENGE=0 ``` ### Missing benchmark report endpoint data `/benchmark-report` and `/leaderboard` only return rich artifacts after report generation. Run: ```bash python benchmark_report.py --format json ```