Spaces:
Sleeping
Deployment Guide
This guide explains how to run LedgerShield locally, in Docker, or as a Docker-backed Hugging Face Space, and documents the runtime environment variables that control benchmark behavior.
Deployment Modes
Local Python process
Best for development and testing.
python -m venv .venv
source .venv/bin/activate
pip install -e .
pip install -r requirements.txt
python -m server.app
Default bind:
- host:
0.0.0.0 - port:
8000
Health check:
curl http://127.0.0.1:8000/health
Docker
The repo ships with a ready-to-build ../Dockerfile.
Build:
docker build -t ledgershield:latest .
Run:
docker run --rm -p 8000:8000 ledgershield:latest
Smoke test:
curl http://127.0.0.1:8000/health
Hugging Face Spaces
The root README.md includes Docker Space front matter, and openenv.yaml describes the benchmark metadata. For a Docker Space deployment:
- create a new Hugging Face Space using the Docker SDK
- push this repo contents to the Space
- ensure the Space exposes port
8000 - verify
/health,/reset, and/step
CI-backed validation
GitHub Actions already validates:
- Python test runs
- Docker build and container smoke test
openenv.yamlintegrity
See ../.github/workflows/ci.yml.
Runtime Environment Variables
Server bind settings
| Variable | Default | Meaning |
|---|---|---|
HOST |
0.0.0.0 |
bind host used by server.app:main |
PORT |
8000 |
bind port used by server.app:main |
Case-loader controls
These are read by ../server/data_loader.py.
| Variable | Default | Meaning |
|---|---|---|
LEDGERSHIELD_INCLUDE_CHALLENGE |
true |
include generated challenge variants in the loaded case pool |
LEDGERSHIELD_CHALLENGE_VARIANTS |
2 |
number of generated challenge variants per hard case |
LEDGERSHIELD_CHALLENGE_SEED |
2026 |
RNG seed for challenge generation |
LEDGERSHIELD_INCLUDE_HOLDOUT |
false |
include generated holdout cases in the loaded case pool |
LEDGERSHIELD_HOLDOUT_VARIANTS |
1 |
holdout variants per hard case |
LEDGERSHIELD_HOLDOUT_SEED |
31415 |
RNG seed for holdout generation |
LEDGERSHIELD_INCLUDE_TWINS |
false |
include benign contrastive twins in the loaded case pool |
Agent-side variables
Common variables used by inference.py and related scripts:
| Variable | Typical use |
|---|---|
API_BASE_URL |
OpenAI-compatible API endpoint |
MODEL_NAME |
model name for inference (determines ModelCapabilityProfile tier) |
HF_TOKEN |
token used by the submission-safe agent |
OPENAI_API_KEY |
credential for live comparison scripts |
ENV_URL |
environment server base URL |
LOCAL_IMAGE_NAME |
optional Docker image name for local environment use |
LEDGERSHIELD_DEBUG |
set to 1 to enable stderr output from the inference agent (default: stderr suppressed) |
Operational Checks
Basic API checks
curl http://127.0.0.1:8000/health
curl http://127.0.0.1:8000/
Reset a known case
curl -X POST http://127.0.0.1:8000/reset \
-H 'Content-Type: application/json' \
-d '{"case_id":"CASE-A-001"}'
Run benchmark report generation locally
python benchmark_report.py --format markdown
Generated artifacts land under artifacts/ when written.
Recommended Deployment Profiles
Minimal benchmark server
Use this when you only need the curated benchmark and generated challenge variants:
HOST=0.0.0.0 PORT=8000 python -m server.app
Public benchmark only
Disable generated challenge variants:
LEDGERSHIELD_INCLUDE_CHALLENGE=0 python -m server.app
Holdout-enabled evaluation server
LEDGERSHIELD_INCLUDE_HOLDOUT=1 \
LEDGERSHIELD_HOLDOUT_VARIANTS=1 \
python -m server.app
Calibration-heavy server with twins
LEDGERSHIELD_INCLUDE_TWINS=1 python -m server.app
Production Notes
LedgerShield is still a benchmark, not a payment system. For production-like hosting:
- terminate TLS outside the app
- health-check
/health - treat the service as stateless and restartable
- version-control
openenv.yamland benchmark artifacts - avoid mixing benchmark servers with live finance systems
Troubleshooting
Server starts but endpoints fail
Check:
- port
8000is not already in use - dependencies from
requirements.txtare installed - you are running from the repo root so fixture paths resolve correctly
Docker container builds but health check fails
Check:
curl http://localhost:8000/health- container logs for import/path issues
- whether your host already has something bound to
8000
Unexpected case counts
Remember that the loader includes challenge variants by default. If you expect only the curated 21-case benchmark, set:
LEDGERSHIELD_INCLUDE_CHALLENGE=0
Missing benchmark report endpoint data
/benchmark-report and /leaderboard only return rich artifacts after report generation. Run:
python benchmark_report.py --format json