dispatchpulse / README.md
Arun-Sanjay's picture
Update README for canonical OpenEnv structure and submission spec
538acd4
---
title: DispatchPulse
emoji: πŸš‘
colorFrom: red
colorTo: blue
sdk: docker
app_port: 8000
pinned: false
license: apache-2.0
---
# DispatchPulse
**An OpenEnv environment where an AI agent acts as a 911 emergency dispatch coordinator.**
The agent receives incoming calls, classifies their severity, and dispatches limited
emergency units (ALS / BLS ambulances, fire engines, police) under time pressure.
Patient outcomes are scored against **real clinical survival curves** β€” no
LLM-as-judge, just defensible math.
> Submission for the [Meta PyTorch OpenEnv Hackathon β€” India 2026](https://www.scaler.com/school-of-technology/meta-pytorch-hackathon).
---
## Why this environment
In India, an estimated 24,000+ people die every day because of slow emergency
response β€” average ambulance time is 25–35 minutes, well beyond the golden hour,
and only ~20% of ambulances carry advanced life support. DispatchPulse simulates
this crisis as an interactive RL environment where the agent has to learn the
*counter-intuitive* strategies real dispatchers use:
- **The greedy "closest unit" strategy fails.** Dispatching the only ALS to a
sprained ankle leaves nothing for the cardiac arrest that arrives 3 minutes
later β€” survival drops from 70% to 15%.
- **Triage matters more than speed.** A weighted reward (severity 1 calls
count 3Γ— more than severity 4) means the agent has to *prioritise*, not
just react.
- **Hospital choice matters.** Sending a stroke patient to a hospital without
a stroke unit, or to one on diversion, costs you score.
The reward function uses real clinical survival curves from the EMS literature
(Larsen et al. 1993 for cardiac arrest; Saver 2006 "Time is Brain" for stroke;
golden hour curves for trauma). It's deterministic, defensible, and gives a
continuous signal an RL agent can actually learn from.
---
## OpenEnv compliance
| Requirement | Status |
|---|---|
| Real-world task (not games or toys) | βœ… Emergency dispatch β€” actual profession |
| Typed Pydantic models inheriting from OpenEnv `Action` / `Observation` / `State` | βœ… `models.py` |
| `Environment` base-class subclass with `reset()` / `step()` / `state` | βœ… `server/environment.py` |
| FastAPI server via `create_fastapi_app(...)` | βœ… `server/app.py` |
| `EnvClient` client with `_step_payload` / `_parse_result` / `_parse_state` | βœ… `client.py` |
| `openenv.yaml` manifest | βœ… |
| β‰₯ 3 tasks with graders, scores 0.0–1.0 | βœ… easy / medium / hard |
| Meaningful reward + partial progress | βœ… survival curves + per-step rewards |
| `inference.py` at root, OpenAI client, mandatory env vars, `[START]/[STEP]/[END]` format | βœ… |
| Reproducible (fixed seed) | βœ… `seed=42` default everywhere |
| Pre-submission validator script | βœ… `scripts/validate-submission.sh` |
| Dockerfile + HF Spaces deploy | βœ… uses `openenv-base` |
| Runs on 2 vCPU / 8 GB RAM | βœ… pure Python math, no ML inference |
---
## Project layout (canonical OpenEnv structure)
```
DispatchPulse/
β”œβ”€β”€ README.md
β”œβ”€β”€ Dockerfile # uses ghcr.io/meta-pytorch/openenv-base
β”œβ”€β”€ openenv.yaml # OpenEnv manifest
β”œβ”€β”€ pyproject.toml
β”œβ”€β”€ inference.py # ROUND 1 ENTRY POINT β€” must be in root
β”œβ”€β”€ client.py # DispatchPulseEnv (subclass of EnvClient)
β”œβ”€β”€ models.py # DispatchPulseAction / Observation / State
β”‚ # plus internal sim models
β”œβ”€β”€ simulation.py # DispatchSimulation engine
β”œβ”€β”€ reward.py # Survival curves + episode reward
β”œβ”€β”€ grader.py # Programmatic 0.0–1.0 grader
β”œβ”€β”€ scenario_loader.py # YAML task loader
β”œβ”€β”€ text_view.py # LLM-friendly dispatch center renderer
β”œβ”€β”€ utils.py # Distance / ETA / templates
β”œβ”€β”€ server/
β”‚ β”œβ”€β”€ __init__.py
β”‚ β”œβ”€β”€ app.py # FastAPI app via create_fastapi_app(...)
β”‚ └── environment.py # DispatchPulseEnvironment(Environment)
β”œβ”€β”€ tasks/
β”‚ β”œβ”€β”€ easy.yaml
β”‚ β”œβ”€β”€ medium.yaml
β”‚ └── hard.yaml
β”œβ”€β”€ scripts/
β”‚ └── validate-submission.sh # runs the 3 grader checks locally
└── tests/
β”œβ”€β”€ test_reward.py
└── test_simulation.py
```
---
## Action space (typed Pydantic)
`DispatchPulseAction` has these `action_type` values:
| `action_type` | Required fields | Time cost | What it does |
|---|---|---|---|
| `dispatch` | `call_id`, `unit_id`, `hospital_id?` | 1 min | Send a unit to a call (optionally pre-routing to a hospital). |
| `classify` | `call_id`, `severity` (1-5) | 1 min | Reclassify a call's severity. |
| `callback` | `call_id`, `message` | 1 min | Phone the caller back. 70% chance they clarify the true emergency type. |
| `wait` | `minutes` (default 1, max 5) | n min | Skip ahead in the simulation when there's nothing to do. |
| `view` | β€” | free | Re-fetch the dispatch center text without advancing time. |
The action also has a free-text `text` field β€” the server parses lines like
`dispatch CALL-001 ALS-1 H1` so an LLM can produce them directly.
## Observation space
`DispatchPulseObservation` has:
- `text` β€” formatted dispatch center view (the field the LLM reads)
- `current_time`, `time_limit`
- `calls_pending`, `units_available`, `calls_completed`, `calls_timed_out`, `total_calls`
- `last_action_error` β€” error string from the previous action, or `None`
- `info_message` β€” what just happened
- inherited `done`, `reward`, `metadata`
## Tasks
| Task | Calls | Units | Hospitals | Duration | Caller misreporting | What's hard about it |
|---|---|---|---|---|---|---|
| `easy` | 5 | 4 | 1 | 30 min | 0% | Basic dispatch β€” learn the action grammar |
| `medium` | 15 | 6 | 2 | 45 min | 20% | Mass casualty bus accident at minute 12; some callers lie |
| `hard` | 30 | 8 | 3 (1 on diversion) | 60 min | 35% | Earthquake β€” extreme scarcity, panicked callers, hospital triage matters |
All three are deterministic given the seed.
---
## Reward function
Final episode score = weighted combination of four components, all in [0, 1]:
| Component | Weight | What it measures |
|---|---|---|
| `survival_score` | 0.60 | Severity-weighted average outcome across all calls (uses clinical survival curves Γ— unit effectiveness Γ— hospital modifier) |
| `efficiency_score` | 0.15 | Fraction of calls dispatched, penalised for wasting ALS on minor calls |
| `triage_accuracy` | 0.15 | Fraction of severity-1 calls dispatched within 25% of their timeout window |
| `penalty` | βˆ’0.10 | Deductions for timed-out criticals and wrong-unit assignments |
Severity weights inside the survival score: **3Γ— for severity 1, 2Γ— for 2, 1.5Γ— for 3, 1Γ— for 4, 0.5Γ— for 5**.
### Survival curves (from EMS literature)
| Emergency | Curve | Source / notes |
|---|---|---|
| Cardiac arrest | exponential, ~10%/min decay | Larsen et al. 1993 |
| Trauma | sigmoid centred at 45 min | "golden hour" |
| Stroke | exponential decay | Saver 2006 β€” every minute = 1.9M neurons |
| Fire | exponential, doubles per minute | property loss |
| Breathing difficulty | gentler exponential | |
| Minor injury | nearly flat | stable patient |
| Mental health | gentler exponential | de-escalation success |
Each call's outcome is multiplied by:
- **Unit effectiveness** (e.g., ALS β†’ cardiac = 1.0; BLS β†’ cardiac = 0.5; fire engine β†’ cardiac = 0.1)
- **Hospital modifier** (specialty match: +5%; on diversion or zero beds: βˆ’15%)
---
## Baseline scores (heuristic agent, seed=42)
A simple rule-based heuristic (always pick the most-critical call, send the
most effective available unit, reserve ALS for high-severity calls) produces
the following calibrated scores:
| Task | Total | Survival | Efficiency | Triage | Penalty | Completed/Total |
|---|---|---|---|---|---|---|
| easy | 0.5476 | 0.463 | 0.800 | 1.000 | βˆ’0.000 | 4/5 |
| medium | 0.3750 | 0.377 | 0.600 | 0.500 | βˆ’0.160 | 9/15 |
| hard | 0.2183 | 0.214 | 0.433 | 0.500 | βˆ’0.500 | 13/30 |
| **Average** | **0.3803** | | | | | |
The clean monotonic decrease across difficulty (easy > medium > hard) confirms
the env discriminates between scenarios as designed.
---
## Inference script β€” `inference.py`
Per the hackathon spec, `inference.py` is in the **project root** and follows
the mandatory contract:
### Required environment variables
| Variable | Purpose | Default in script |
|---|---|---|
| `API_BASE_URL` | LLM endpoint | `https://router.huggingface.co/v1` |
| `MODEL_NAME` | Which model to call | `Qwen/Qwen2.5-72B-Instruct` |
| `HF_TOKEN` | API key for the LLM | (no default) |
| `LOCAL_IMAGE_NAME` | Docker image for `from_docker_image()` | (no default) |
| `DISPATCHPULSE_TASK` | Which task to run (`easy`/`medium`/`hard`) | `easy` |
### Stdout format (verbatim)
```
[START] task=<task_name> env=dispatchpulse model=<model_name>
[STEP] step=<n> action=<action_str> reward=<0.00> done=<true|false> error=<msg|null>
[END] success=<true|false> steps=<n> score=<score> rewards=<r1,r2,...,rn>
```
- One `[START]` line at episode begin
- One `[STEP]` line per step, immediately after `env.step()` returns
- One `[END]` line after `env.close()`, ALWAYS emitted (even on exception)
- `reward` and `rewards` to 2 decimal places; `score` to 3 decimal places
- `done` and `success` are lowercase booleans
### Connection logic
1. If `LOCAL_IMAGE_NAME` is set β†’ `await DispatchPulseEnv.from_docker_image(LOCAL_IMAGE_NAME)`
2. Else if `ENV_BASE_URL` is set β†’ connect directly to a running env server
3. Otherwise β†’ spin up an in-process simulation as a fallback (for offline runs)
### Run it
```bash
# Against the live HF Space
ENV_BASE_URL=https://arun-sanjay-dispatchpulse.hf.space \
HF_TOKEN=$HF_TOKEN \
python inference.py
# Against a local Docker image
LOCAL_IMAGE_NAME=dispatchpulse:latest \
HF_TOKEN=$HF_TOKEN \
python inference.py
# In-process fallback (no network, no Docker)
python inference.py
```
---
## Setup
### Run locally with Python
```bash
python -m venv .venv && source .venv/bin/activate
pip install -e .
python inference.py
```
### Run locally with Docker
```bash
docker build -t dispatchpulse .
docker run -p 8000:8000 dispatchpulse
# Then in another shell:
curl http://localhost:8000/health
```
### Use as a client (OpenEnv `EnvClient` pattern)
```python
import asyncio
from client import DispatchPulseEnv
from models import DispatchPulseAction
async def main():
async with DispatchPulseEnv(base_url="https://arun-sanjay-dispatchpulse.hf.space") as env:
result = await env.reset(task_name="easy", seed=42)
while not result.done:
action = DispatchPulseAction(action_type="wait", minutes=1, text="wait 1")
result = await env.step(action)
print(result.observation.text[:200])
print(f"Final score: {result.reward}")
asyncio.run(main())
```
### Run on Hugging Face Spaces
Auto-built as a Docker Space:
[`https://huggingface.co/spaces/Arun-Sanjay/dispatchpulse`](https://huggingface.co/spaces/Arun-Sanjay/dispatchpulse)
---
## Pre-submission validator
Run the same three checks the hackathon's automated grader runs:
```bash
./scripts/validate-submission.sh https://arun-sanjay-dispatchpulse.hf.space .
```
It checks:
1. **HF Space deploys** β€” `POST /reset` returns HTTP 200
2. **Docker build** β€” `docker build .` succeeds (≀ 10 min)
3. **OpenEnv compliance** β€” `openenv validate` passes
---
## Calibration tests
The reward function ships with calibration tests that double as documentation:
```bash
python tests/test_reward.py
python tests/test_simulation.py
```
These verify that:
- Survival curves match published clinical numbers
- A "do-nothing" agent scores below 0.15 on every task
- A simple heuristic strictly outperforms the silent agent
- Heuristic scores monotonically decrease easy β†’ medium β†’ hard
- ALS at cardiac arrest beats fire engine at cardiac arrest by β‰₯5Γ—
- Specialty hospital match boosts outcome; diversion hurts it
---
## License
Apache 2.0. Built for the Meta PyTorch OpenEnv Hackathon β€” India 2026.