# Deployment Guide

This guide explains how to run LedgerShield locally, in Docker, or as a Docker-backed Hugging Face Space, and documents the runtime environment variables that control benchmark behavior.

## Deployment Modes

### Local Python process

Best for development and testing.

```bash
python -m venv .venv
source .venv/bin/activate
pip install -e .
pip install -r requirements.txt
python -m server.app
```

Default bind:

- host: `0.0.0.0`
- port: `8000`

Health check:

```bash
curl http://127.0.0.1:8000/health
```

### Docker

The repo ships with a ready-to-build [`../Dockerfile`](../Dockerfile).

Build:

```bash
docker build -t ledgershield:latest .
```

Run:

```bash
docker run --rm -p 8000:8000 ledgershield:latest
```

Smoke test:

```bash
curl http://127.0.0.1:8000/health
```

### Hugging Face Spaces

The root `README.md` includes Docker Space front matter, and `openenv.yaml` describes the benchmark metadata. For a Docker Space deployment:

1. create a new Hugging Face Space using the Docker SDK
2. push this repo contents to the Space
3. ensure the Space exposes port `8000`
4. verify `/health`, `/reset`, and `/step`

### CI-backed validation

GitHub Actions already validates:

- Python test runs
- Docker build and container smoke test
- `openenv.yaml` integrity

See [`../.github/workflows/ci.yml`](../.github/workflows/ci.yml).

## Runtime Environment Variables

### Server bind settings

| Variable | Default | Meaning |
|---|---|---|
| `HOST` | `0.0.0.0` | bind host used by `server.app:main` |
| `PORT` | `8000` | bind port used by `server.app:main` |

### Case-loader controls

These are read by [`../server/data_loader.py`](../server/data_loader.py).

| Variable | Default | Meaning |
|---|---|---|
| `LEDGERSHIELD_INCLUDE_CHALLENGE` | `true` | include generated challenge variants in the loaded case pool |
| `LEDGERSHIELD_CHALLENGE_VARIANTS` | `2` | number of generated challenge variants per hard case |
| `LEDGERSHIELD_CHALLENGE_SEED` | `2026` | RNG seed for challenge generation |
| `LEDGERSHIELD_INCLUDE_HOLDOUT` | `false` | include generated holdout cases in the loaded case pool |
| `LEDGERSHIELD_HOLDOUT_VARIANTS` | `1` | holdout variants per hard case |
| `LEDGERSHIELD_HOLDOUT_SEED` | `31415` | RNG seed for holdout generation |
| `LEDGERSHIELD_INCLUDE_TWINS` | `false` | include benign contrastive twins in the loaded case pool |

### Agent-side variables

Common variables used by `inference.py` and related scripts:

| Variable | Typical use |
|---|---|
| `API_BASE_URL` | OpenAI-compatible API endpoint |
| `MODEL_NAME` | model name for inference (determines `ModelCapabilityProfile` tier) |
| `HF_TOKEN` | token used by the submission-safe agent |
| `OPENAI_API_KEY` | credential for live comparison scripts |
| `ENV_URL` | environment server base URL |
| `LOCAL_IMAGE_NAME` | optional Docker image name for local environment use |
| `LEDGERSHIELD_DEBUG` | set to `1` to enable stderr output from the inference agent (default: stderr suppressed) |

## Operational Checks

### Basic API checks

```bash
curl http://127.0.0.1:8000/health
curl http://127.0.0.1:8000/
```

### Reset a known case

```bash
curl -X POST http://127.0.0.1:8000/reset \
  -H 'Content-Type: application/json' \
  -d '{"case_id":"CASE-A-001"}'
```

### Run benchmark report generation locally

```bash
python benchmark_report.py --format markdown
```

Generated artifacts land under `artifacts/` when written.

## Recommended Deployment Profiles

### Minimal benchmark server

Use this when you only need the curated benchmark and generated challenge variants:

```bash
HOST=0.0.0.0 PORT=8000 python -m server.app
```

### Public benchmark only

Disable generated challenge variants:

```bash
LEDGERSHIELD_INCLUDE_CHALLENGE=0 python -m server.app
```

### Holdout-enabled evaluation server

```bash
LEDGERSHIELD_INCLUDE_HOLDOUT=1 \
LEDGERSHIELD_HOLDOUT_VARIANTS=1 \
python -m server.app
```

### Calibration-heavy server with twins

```bash
LEDGERSHIELD_INCLUDE_TWINS=1 python -m server.app
```

## Production Notes

LedgerShield is still a benchmark, not a payment system. For production-like hosting:

- terminate TLS outside the app
- health-check `/health`
- treat the service as stateless and restartable
- version-control `openenv.yaml` and benchmark artifacts
- avoid mixing benchmark servers with live finance systems

## Troubleshooting

### Server starts but endpoints fail

Check:

- port `8000` is not already in use
- dependencies from `requirements.txt` are installed
- you are running from the repo root so fixture paths resolve correctly

### Docker container builds but health check fails

Check:

- `curl http://localhost:8000/health`
- container logs for import/path issues
- whether your host already has something bound to `8000`

### Unexpected case counts

Remember that the loader includes challenge variants by default. If you expect only the curated 21-case benchmark, set:

```bash
LEDGERSHIELD_INCLUDE_CHALLENGE=0
```

### Missing benchmark report endpoint data

`/benchmark-report` and `/leaderboard` only return rich artifacts after report generation. Run:

```bash
python benchmark_report.py --format json
```