Spaces:
Sleeping
title: DispatchPulse
emoji: π
colorFrom: red
colorTo: blue
sdk: docker
app_port: 8000
pinned: false
license: apache-2.0
DispatchPulse
An OpenEnv environment where an AI agent acts as a 911 emergency dispatch coordinator. The agent receives incoming calls, classifies their severity, and dispatches limited emergency units (ALS / BLS ambulances, fire engines, police) under time pressure. Patient outcomes are scored against real clinical survival curves β no LLM-as-judge, just defensible math.
Submission for the Meta PyTorch OpenEnv Hackathon β India 2026.
Why this environment
In India, an estimated 24,000+ people die every day because of slow emergency response β average ambulance time is 25β35 minutes, well beyond the golden hour, and only ~20% of ambulances carry advanced life support. DispatchPulse simulates this crisis as an interactive RL environment where the agent has to learn the counter-intuitive strategies real dispatchers use:
- The greedy "closest unit" strategy fails. Dispatching the only ALS to a sprained ankle leaves nothing for the cardiac arrest that arrives 3 minutes later β survival drops from 70% to 15%.
- Triage matters more than speed. A weighted reward (severity 1 calls count 3Γ more than severity 4) means the agent has to prioritise, not just react.
- Hospital choice matters. Sending a stroke patient to a hospital without a stroke unit, or to one on diversion, costs you score.
The reward function uses real clinical survival curves from the EMS literature (Larsen et al. 1993 for cardiac arrest; Saver 2006 "Time is Brain" for stroke; golden hour curves for trauma). It's deterministic, defensible, and gives a continuous signal an RL agent can actually learn from.
OpenEnv compliance
| Requirement | Status |
|---|---|
| Real-world task (not games or toys) | β Emergency dispatch β actual profession |
Typed Pydantic models inheriting from OpenEnv Action / Observation / State |
β
models.py |
Environment base-class subclass with reset() / step() / state |
β
server/environment.py |
FastAPI server via create_fastapi_app(...) |
β
server/app.py |
EnvClient client with _step_payload / _parse_result / _parse_state |
β
client.py |
openenv.yaml manifest |
β |
| β₯ 3 tasks with graders, scores 0.0β1.0 | β easy / medium / hard |
| Meaningful reward + partial progress | β survival curves + per-step rewards |
inference.py at root, OpenAI client, mandatory env vars, [START]/[STEP]/[END] format |
β |
| Reproducible (fixed seed) | β
seed=42 default everywhere |
| Pre-submission validator script | β
scripts/validate-submission.sh |
| Dockerfile + HF Spaces deploy | β
uses openenv-base |
| Runs on 2 vCPU / 8 GB RAM | β pure Python math, no ML inference |
Project layout (canonical OpenEnv structure)
DispatchPulse/
βββ README.md
βββ Dockerfile # uses ghcr.io/meta-pytorch/openenv-base
βββ openenv.yaml # OpenEnv manifest
βββ pyproject.toml
βββ inference.py # ROUND 1 ENTRY POINT β must be in root
βββ client.py # DispatchPulseEnv (subclass of EnvClient)
βββ models.py # DispatchPulseAction / Observation / State
β # plus internal sim models
βββ simulation.py # DispatchSimulation engine
βββ reward.py # Survival curves + episode reward
βββ grader.py # Programmatic 0.0β1.0 grader
βββ scenario_loader.py # YAML task loader
βββ text_view.py # LLM-friendly dispatch center renderer
βββ utils.py # Distance / ETA / templates
βββ server/
β βββ __init__.py
β βββ app.py # FastAPI app via create_fastapi_app(...)
β βββ environment.py # DispatchPulseEnvironment(Environment)
βββ tasks/
β βββ easy.yaml
β βββ medium.yaml
β βββ hard.yaml
βββ scripts/
β βββ validate-submission.sh # runs the 3 grader checks locally
βββ tests/
βββ test_reward.py
βββ test_simulation.py
Action space (typed Pydantic)
DispatchPulseAction has these action_type values:
action_type |
Required fields | Time cost | What it does |
|---|---|---|---|
dispatch |
call_id, unit_id, hospital_id? |
1 min | Send a unit to a call (optionally pre-routing to a hospital). |
classify |
call_id, severity (1-5) |
1 min | Reclassify a call's severity. |
callback |
call_id, message |
1 min | Phone the caller back. 70% chance they clarify the true emergency type. |
wait |
minutes (default 1, max 5) |
n min | Skip ahead in the simulation when there's nothing to do. |
view |
β | free | Re-fetch the dispatch center text without advancing time. |
The action also has a free-text text field β the server parses lines like
dispatch CALL-001 ALS-1 H1 so an LLM can produce them directly.
Observation space
DispatchPulseObservation has:
textβ formatted dispatch center view (the field the LLM reads)current_time,time_limitcalls_pending,units_available,calls_completed,calls_timed_out,total_callslast_action_errorβ error string from the previous action, orNoneinfo_messageβ what just happened- inherited
done,reward,metadata
Tasks
| Task | Calls | Units | Hospitals | Duration | Caller misreporting | What's hard about it |
|---|---|---|---|---|---|---|
easy |
5 | 4 | 1 | 30 min | 0% | Basic dispatch β learn the action grammar |
medium |
15 | 6 | 2 | 45 min | 20% | Mass casualty bus accident at minute 12; some callers lie |
hard |
30 | 8 | 3 (1 on diversion) | 60 min | 35% | Earthquake β extreme scarcity, panicked callers, hospital triage matters |
All three are deterministic given the seed.
Reward function
Final episode score = weighted combination of four components, all in [0, 1]:
| Component | Weight | What it measures |
|---|---|---|
survival_score |
0.60 | Severity-weighted average outcome across all calls (uses clinical survival curves Γ unit effectiveness Γ hospital modifier) |
efficiency_score |
0.15 | Fraction of calls dispatched, penalised for wasting ALS on minor calls |
triage_accuracy |
0.15 | Fraction of severity-1 calls dispatched within 25% of their timeout window |
penalty |
β0.10 | Deductions for timed-out criticals and wrong-unit assignments |
Severity weights inside the survival score: 3Γ for severity 1, 2Γ for 2, 1.5Γ for 3, 1Γ for 4, 0.5Γ for 5.
Survival curves (from EMS literature)
| Emergency | Curve | Source / notes |
|---|---|---|
| Cardiac arrest | exponential, ~10%/min decay | Larsen et al. 1993 |
| Trauma | sigmoid centred at 45 min | "golden hour" |
| Stroke | exponential decay | Saver 2006 β every minute = 1.9M neurons |
| Fire | exponential, doubles per minute | property loss |
| Breathing difficulty | gentler exponential | |
| Minor injury | nearly flat | stable patient |
| Mental health | gentler exponential | de-escalation success |
Each call's outcome is multiplied by:
- Unit effectiveness (e.g., ALS β cardiac = 1.0; BLS β cardiac = 0.5; fire engine β cardiac = 0.1)
- Hospital modifier (specialty match: +5%; on diversion or zero beds: β15%)
Baseline scores (heuristic agent, seed=42)
A simple rule-based heuristic (always pick the most-critical call, send the most effective available unit, reserve ALS for high-severity calls) produces the following calibrated scores:
| Task | Total | Survival | Efficiency | Triage | Penalty | Completed/Total |
|---|---|---|---|---|---|---|
| easy | 0.5476 | 0.463 | 0.800 | 1.000 | β0.000 | 4/5 |
| medium | 0.3750 | 0.377 | 0.600 | 0.500 | β0.160 | 9/15 |
| hard | 0.2183 | 0.214 | 0.433 | 0.500 | β0.500 | 13/30 |
| Average | 0.3803 |
The clean monotonic decrease across difficulty (easy > medium > hard) confirms the env discriminates between scenarios as designed.
Inference script β inference.py
Per the hackathon spec, inference.py is in the project root and follows
the mandatory contract:
Required environment variables
| Variable | Purpose | Default in script |
|---|---|---|
API_BASE_URL |
LLM endpoint | https://router.huggingface.co/v1 |
MODEL_NAME |
Which model to call | Qwen/Qwen2.5-72B-Instruct |
HF_TOKEN |
API key for the LLM | (no default) |
LOCAL_IMAGE_NAME |
Docker image for from_docker_image() |
(no default) |
DISPATCHPULSE_TASK |
Which task to run (easy/medium/hard) |
easy |
Stdout format (verbatim)
[START] task=<task_name> env=dispatchpulse model=<model_name>
[STEP] step=<n> action=<action_str> reward=<0.00> done=<true|false> error=<msg|null>
[END] success=<true|false> steps=<n> score=<score> rewards=<r1,r2,...,rn>
- One
[START]line at episode begin - One
[STEP]line per step, immediately afterenv.step()returns - One
[END]line afterenv.close(), ALWAYS emitted (even on exception) rewardandrewardsto 2 decimal places;scoreto 3 decimal placesdoneandsuccessare lowercase booleans
Connection logic
- If
LOCAL_IMAGE_NAMEis set βawait DispatchPulseEnv.from_docker_image(LOCAL_IMAGE_NAME) - Else if
ENV_BASE_URLis set β connect directly to a running env server - Otherwise β spin up an in-process simulation as a fallback (for offline runs)
Run it
# Against the live HF Space
ENV_BASE_URL=https://arun-sanjay-dispatchpulse.hf.space \
HF_TOKEN=$HF_TOKEN \
python inference.py
# Against a local Docker image
LOCAL_IMAGE_NAME=dispatchpulse:latest \
HF_TOKEN=$HF_TOKEN \
python inference.py
# In-process fallback (no network, no Docker)
python inference.py
Setup
Run locally with Python
python -m venv .venv && source .venv/bin/activate
pip install -e .
python inference.py
Run locally with Docker
docker build -t dispatchpulse .
docker run -p 8000:8000 dispatchpulse
# Then in another shell:
curl http://localhost:8000/health
Use as a client (OpenEnv EnvClient pattern)
import asyncio
from client import DispatchPulseEnv
from models import DispatchPulseAction
async def main():
async with DispatchPulseEnv(base_url="https://arun-sanjay-dispatchpulse.hf.space") as env:
result = await env.reset(task_name="easy", seed=42)
while not result.done:
action = DispatchPulseAction(action_type="wait", minutes=1, text="wait 1")
result = await env.step(action)
print(result.observation.text[:200])
print(f"Final score: {result.reward}")
asyncio.run(main())
Run on Hugging Face Spaces
Auto-built as a Docker Space:
https://huggingface.co/spaces/Arun-Sanjay/dispatchpulse
Pre-submission validator
Run the same three checks the hackathon's automated grader runs:
./scripts/validate-submission.sh https://arun-sanjay-dispatchpulse.hf.space .
It checks:
- HF Space deploys β
POST /resetreturns HTTP 200 - Docker build β
docker build .succeeds (β€ 10 min) - OpenEnv compliance β
openenv validatepasses
Calibration tests
The reward function ships with calibration tests that double as documentation:
python tests/test_reward.py
python tests/test_simulation.py
These verify that:
- Survival curves match published clinical numbers
- A "do-nothing" agent scores below 0.15 on every task
- A simple heuristic strictly outperforms the silent agent
- Heuristic scores monotonically decrease easy β medium β hard
- ALS at cardiac arrest beats fire engine at cardiac arrest by β₯5Γ
- Specialty hospital match boosts outcome; diversion hurts it
License
Apache 2.0. Built for the Meta PyTorch OpenEnv Hackathon β India 2026.