Spaces:

king673134
/

ledgershield

Sleeping

App Files Files Community

ledgershield / docs /deployment.md

king673134

Upload folder using huggingface_hub

5f7588b verified 3 months ago

preview code

Raw

History Blame Contribute Delete

5.17 kB

	# Deployment Guide

	This guide explains how to run LedgerShield locally, in Docker, or as a Docker-backed Hugging Face Space, and documents the runtime environment variables that control benchmark behavior.

	## Deployment Modes

	### Local Python process

	Best for development and testing.

	```bash
	python -m venv .venv
	source .venv/bin/activate
	pip install -e .
	pip install -r requirements.txt
	python -m server.app
	```

	Default bind:

	- host: `0.0.0.0`
	- port: `8000`

	Health check:

	```bash
	curl http://127.0.0.1:8000/health
	```

	### Docker

	The repo ships with a ready-to-build [`../Dockerfile`](../Dockerfile).

	Build:

	```bash
	docker build -t ledgershield:latest .
	```

	Run:

	```bash
	docker run --rm -p 8000:8000 ledgershield:latest
	```

	Smoke test:

	```bash
	curl http://127.0.0.1:8000/health
	```

	### Hugging Face Spaces

	The root `README.md` includes Docker Space front matter, and `openenv.yaml` describes the benchmark metadata. For a Docker Space deployment:

	1. create a new Hugging Face Space using the Docker SDK
	2. push this repo contents to the Space
	3. ensure the Space exposes port `8000`
	4. verify `/health`, `/reset`, and `/step`

	### CI-backed validation

	GitHub Actions already validates:

	- Python test runs
	- Docker build and container smoke test
	- `openenv.yaml` integrity

	See [`../.github/workflows/ci.yml`](../.github/workflows/ci.yml).

	## Runtime Environment Variables

	### Server bind settings

	\| Variable \| Default \| Meaning \|
	\|---\|---\|---\|
	\| `HOST` \| `0.0.0.0` \| bind host used by `server.app:main` \|
	\| `PORT` \| `8000` \| bind port used by `server.app:main` \|

	### Case-loader controls

	These are read by [`../server/data_loader.py`](../server/data_loader.py).

	\| Variable \| Default \| Meaning \|
	\|---\|---\|---\|
	\| `LEDGERSHIELD_INCLUDE_CHALLENGE` \| `true` \| include generated challenge variants in the loaded case pool \|
	\| `LEDGERSHIELD_CHALLENGE_VARIANTS` \| `2` \| number of generated challenge variants per hard case \|
	\| `LEDGERSHIELD_CHALLENGE_SEED` \| `2026` \| RNG seed for challenge generation \|
	\| `LEDGERSHIELD_INCLUDE_HOLDOUT` \| `false` \| include generated holdout cases in the loaded case pool \|
	\| `LEDGERSHIELD_HOLDOUT_VARIANTS` \| `1` \| holdout variants per hard case \|
	\| `LEDGERSHIELD_HOLDOUT_SEED` \| `31415` \| RNG seed for holdout generation \|
	\| `LEDGERSHIELD_INCLUDE_TWINS` \| `false` \| include benign contrastive twins in the loaded case pool \|

	### Agent-side variables

	Common variables used by `inference.py` and related scripts:

	\| Variable \| Typical use \|
	\|---\|---\|
	\| `API_BASE_URL` \| OpenAI-compatible API endpoint \|
	\| `MODEL_NAME` \| model name for inference (determines `ModelCapabilityProfile` tier) \|
	\| `HF_TOKEN` \| token used by the submission-safe agent \|
	\| `OPENAI_API_KEY` \| credential for live comparison scripts \|
	\| `ENV_URL` \| environment server base URL \|
	\| `LOCAL_IMAGE_NAME` \| optional Docker image name for local environment use \|
	\| `LEDGERSHIELD_DEBUG` \| set to `1` to enable stderr output from the inference agent (default: stderr suppressed) \|

	## Operational Checks

	### Basic API checks

	```bash
	curl http://127.0.0.1:8000/health
	curl http://127.0.0.1:8000/
	```

	### Reset a known case

	```bash
	curl -X POST http://127.0.0.1:8000/reset \
	-H 'Content-Type: application/json' \
	-d '{"case_id":"CASE-A-001"}'
	```

	### Run benchmark report generation locally

	```bash
	python benchmark_report.py --format markdown
	```

	Generated artifacts land under `artifacts/` when written.

	## Recommended Deployment Profiles

	### Minimal benchmark server

	Use this when you only need the curated benchmark and generated challenge variants:

	```bash
	HOST=0.0.0.0 PORT=8000 python -m server.app
	```

	### Public benchmark only

	Disable generated challenge variants:

	```bash
	LEDGERSHIELD_INCLUDE_CHALLENGE=0 python -m server.app
	```

	### Holdout-enabled evaluation server

	```bash
	LEDGERSHIELD_INCLUDE_HOLDOUT=1 \
	LEDGERSHIELD_HOLDOUT_VARIANTS=1 \
	python -m server.app
	```

	### Calibration-heavy server with twins

	```bash
	LEDGERSHIELD_INCLUDE_TWINS=1 python -m server.app
	```

	## Production Notes

	LedgerShield is still a benchmark, not a payment system. For production-like hosting:

	- terminate TLS outside the app
	- health-check `/health`
	- treat the service as stateless and restartable
	- version-control `openenv.yaml` and benchmark artifacts
	- avoid mixing benchmark servers with live finance systems

	## Troubleshooting

	### Server starts but endpoints fail

	Check:

	- port `8000` is not already in use
	- dependencies from `requirements.txt` are installed
	- you are running from the repo root so fixture paths resolve correctly

	### Docker container builds but health check fails

	Check:

	- `curl http://localhost:8000/health`
	- container logs for import/path issues
	- whether your host already has something bound to `8000`

	### Unexpected case counts

	Remember that the loader includes challenge variants by default. If you expect only the curated 21-case benchmark, set:

	```bash
	LEDGERSHIELD_INCLUDE_CHALLENGE=0
	```

	### Missing benchmark report endpoint data

	`/benchmark-report` and `/leaderboard` only return rich artifacts after report generation. Run:

	```bash
	python benchmark_report.py --format json
	```