ledgershield / docs /deployment.md
king673134's picture
Upload folder using huggingface_hub
5f7588b verified
|
Raw
History Blame Contribute Delete
5.17 kB

Deployment Guide

This guide explains how to run LedgerShield locally, in Docker, or as a Docker-backed Hugging Face Space, and documents the runtime environment variables that control benchmark behavior.

Deployment Modes

Local Python process

Best for development and testing.

python -m venv .venv
source .venv/bin/activate
pip install -e .
pip install -r requirements.txt
python -m server.app

Default bind:

  • host: 0.0.0.0
  • port: 8000

Health check:

curl http://127.0.0.1:8000/health

Docker

The repo ships with a ready-to-build ../Dockerfile.

Build:

docker build -t ledgershield:latest .

Run:

docker run --rm -p 8000:8000 ledgershield:latest

Smoke test:

curl http://127.0.0.1:8000/health

Hugging Face Spaces

The root README.md includes Docker Space front matter, and openenv.yaml describes the benchmark metadata. For a Docker Space deployment:

  1. create a new Hugging Face Space using the Docker SDK
  2. push this repo contents to the Space
  3. ensure the Space exposes port 8000
  4. verify /health, /reset, and /step

CI-backed validation

GitHub Actions already validates:

  • Python test runs
  • Docker build and container smoke test
  • openenv.yaml integrity

See ../.github/workflows/ci.yml.

Runtime Environment Variables

Server bind settings

Variable Default Meaning
HOST 0.0.0.0 bind host used by server.app:main
PORT 8000 bind port used by server.app:main

Case-loader controls

These are read by ../server/data_loader.py.

Variable Default Meaning
LEDGERSHIELD_INCLUDE_CHALLENGE true include generated challenge variants in the loaded case pool
LEDGERSHIELD_CHALLENGE_VARIANTS 2 number of generated challenge variants per hard case
LEDGERSHIELD_CHALLENGE_SEED 2026 RNG seed for challenge generation
LEDGERSHIELD_INCLUDE_HOLDOUT false include generated holdout cases in the loaded case pool
LEDGERSHIELD_HOLDOUT_VARIANTS 1 holdout variants per hard case
LEDGERSHIELD_HOLDOUT_SEED 31415 RNG seed for holdout generation
LEDGERSHIELD_INCLUDE_TWINS false include benign contrastive twins in the loaded case pool

Agent-side variables

Common variables used by inference.py and related scripts:

Variable Typical use
API_BASE_URL OpenAI-compatible API endpoint
MODEL_NAME model name for inference (determines ModelCapabilityProfile tier)
HF_TOKEN token used by the submission-safe agent
OPENAI_API_KEY credential for live comparison scripts
ENV_URL environment server base URL
LOCAL_IMAGE_NAME optional Docker image name for local environment use
LEDGERSHIELD_DEBUG set to 1 to enable stderr output from the inference agent (default: stderr suppressed)

Operational Checks

Basic API checks

curl http://127.0.0.1:8000/health
curl http://127.0.0.1:8000/

Reset a known case

curl -X POST http://127.0.0.1:8000/reset \
  -H 'Content-Type: application/json' \
  -d '{"case_id":"CASE-A-001"}'

Run benchmark report generation locally

python benchmark_report.py --format markdown

Generated artifacts land under artifacts/ when written.

Recommended Deployment Profiles

Minimal benchmark server

Use this when you only need the curated benchmark and generated challenge variants:

HOST=0.0.0.0 PORT=8000 python -m server.app

Public benchmark only

Disable generated challenge variants:

LEDGERSHIELD_INCLUDE_CHALLENGE=0 python -m server.app

Holdout-enabled evaluation server

LEDGERSHIELD_INCLUDE_HOLDOUT=1 \
LEDGERSHIELD_HOLDOUT_VARIANTS=1 \
python -m server.app

Calibration-heavy server with twins

LEDGERSHIELD_INCLUDE_TWINS=1 python -m server.app

Production Notes

LedgerShield is still a benchmark, not a payment system. For production-like hosting:

  • terminate TLS outside the app
  • health-check /health
  • treat the service as stateless and restartable
  • version-control openenv.yaml and benchmark artifacts
  • avoid mixing benchmark servers with live finance systems

Troubleshooting

Server starts but endpoints fail

Check:

  • port 8000 is not already in use
  • dependencies from requirements.txt are installed
  • you are running from the repo root so fixture paths resolve correctly

Docker container builds but health check fails

Check:

  • curl http://localhost:8000/health
  • container logs for import/path issues
  • whether your host already has something bound to 8000

Unexpected case counts

Remember that the loader includes challenge variants by default. If you expect only the curated 21-case benchmark, set:

LEDGERSHIELD_INCLUDE_CHALLENGE=0

Missing benchmark report endpoint data

/benchmark-report and /leaderboard only return rich artifacts after report generation. Run:

python benchmark_report.py --format json