dispatchpulse / README.md
Arun-Sanjay's picture
Update README for canonical OpenEnv structure and submission spec
538acd4
metadata
title: DispatchPulse
emoji: πŸš‘
colorFrom: red
colorTo: blue
sdk: docker
app_port: 8000
pinned: false
license: apache-2.0

DispatchPulse

An OpenEnv environment where an AI agent acts as a 911 emergency dispatch coordinator. The agent receives incoming calls, classifies their severity, and dispatches limited emergency units (ALS / BLS ambulances, fire engines, police) under time pressure. Patient outcomes are scored against real clinical survival curves β€” no LLM-as-judge, just defensible math.

Submission for the Meta PyTorch OpenEnv Hackathon β€” India 2026.


Why this environment

In India, an estimated 24,000+ people die every day because of slow emergency response β€” average ambulance time is 25–35 minutes, well beyond the golden hour, and only ~20% of ambulances carry advanced life support. DispatchPulse simulates this crisis as an interactive RL environment where the agent has to learn the counter-intuitive strategies real dispatchers use:

  • The greedy "closest unit" strategy fails. Dispatching the only ALS to a sprained ankle leaves nothing for the cardiac arrest that arrives 3 minutes later β€” survival drops from 70% to 15%.
  • Triage matters more than speed. A weighted reward (severity 1 calls count 3Γ— more than severity 4) means the agent has to prioritise, not just react.
  • Hospital choice matters. Sending a stroke patient to a hospital without a stroke unit, or to one on diversion, costs you score.

The reward function uses real clinical survival curves from the EMS literature (Larsen et al. 1993 for cardiac arrest; Saver 2006 "Time is Brain" for stroke; golden hour curves for trauma). It's deterministic, defensible, and gives a continuous signal an RL agent can actually learn from.


OpenEnv compliance

Requirement Status
Real-world task (not games or toys) βœ… Emergency dispatch β€” actual profession
Typed Pydantic models inheriting from OpenEnv Action / Observation / State βœ… models.py
Environment base-class subclass with reset() / step() / state βœ… server/environment.py
FastAPI server via create_fastapi_app(...) βœ… server/app.py
EnvClient client with _step_payload / _parse_result / _parse_state βœ… client.py
openenv.yaml manifest βœ…
β‰₯ 3 tasks with graders, scores 0.0–1.0 βœ… easy / medium / hard
Meaningful reward + partial progress βœ… survival curves + per-step rewards
inference.py at root, OpenAI client, mandatory env vars, [START]/[STEP]/[END] format βœ…
Reproducible (fixed seed) βœ… seed=42 default everywhere
Pre-submission validator script βœ… scripts/validate-submission.sh
Dockerfile + HF Spaces deploy βœ… uses openenv-base
Runs on 2 vCPU / 8 GB RAM βœ… pure Python math, no ML inference

Project layout (canonical OpenEnv structure)

DispatchPulse/
β”œβ”€β”€ README.md
β”œβ”€β”€ Dockerfile               # uses ghcr.io/meta-pytorch/openenv-base
β”œβ”€β”€ openenv.yaml             # OpenEnv manifest
β”œβ”€β”€ pyproject.toml
β”œβ”€β”€ inference.py             # ROUND 1 ENTRY POINT β€” must be in root
β”œβ”€β”€ client.py                # DispatchPulseEnv (subclass of EnvClient)
β”œβ”€β”€ models.py                # DispatchPulseAction / Observation / State
β”‚                            # plus internal sim models
β”œβ”€β”€ simulation.py            # DispatchSimulation engine
β”œβ”€β”€ reward.py                # Survival curves + episode reward
β”œβ”€β”€ grader.py                # Programmatic 0.0–1.0 grader
β”œβ”€β”€ scenario_loader.py       # YAML task loader
β”œβ”€β”€ text_view.py             # LLM-friendly dispatch center renderer
β”œβ”€β”€ utils.py                 # Distance / ETA / templates
β”œβ”€β”€ server/
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ app.py               # FastAPI app via create_fastapi_app(...)
β”‚   └── environment.py       # DispatchPulseEnvironment(Environment)
β”œβ”€β”€ tasks/
β”‚   β”œβ”€β”€ easy.yaml
β”‚   β”œβ”€β”€ medium.yaml
β”‚   └── hard.yaml
β”œβ”€β”€ scripts/
β”‚   └── validate-submission.sh   # runs the 3 grader checks locally
└── tests/
    β”œβ”€β”€ test_reward.py
    └── test_simulation.py

Action space (typed Pydantic)

DispatchPulseAction has these action_type values:

action_type Required fields Time cost What it does
dispatch call_id, unit_id, hospital_id? 1 min Send a unit to a call (optionally pre-routing to a hospital).
classify call_id, severity (1-5) 1 min Reclassify a call's severity.
callback call_id, message 1 min Phone the caller back. 70% chance they clarify the true emergency type.
wait minutes (default 1, max 5) n min Skip ahead in the simulation when there's nothing to do.
view β€” free Re-fetch the dispatch center text without advancing time.

The action also has a free-text text field β€” the server parses lines like dispatch CALL-001 ALS-1 H1 so an LLM can produce them directly.

Observation space

DispatchPulseObservation has:

  • text β€” formatted dispatch center view (the field the LLM reads)
  • current_time, time_limit
  • calls_pending, units_available, calls_completed, calls_timed_out, total_calls
  • last_action_error β€” error string from the previous action, or None
  • info_message β€” what just happened
  • inherited done, reward, metadata

Tasks

Task Calls Units Hospitals Duration Caller misreporting What's hard about it
easy 5 4 1 30 min 0% Basic dispatch β€” learn the action grammar
medium 15 6 2 45 min 20% Mass casualty bus accident at minute 12; some callers lie
hard 30 8 3 (1 on diversion) 60 min 35% Earthquake β€” extreme scarcity, panicked callers, hospital triage matters

All three are deterministic given the seed.


Reward function

Final episode score = weighted combination of four components, all in [0, 1]:

Component Weight What it measures
survival_score 0.60 Severity-weighted average outcome across all calls (uses clinical survival curves Γ— unit effectiveness Γ— hospital modifier)
efficiency_score 0.15 Fraction of calls dispatched, penalised for wasting ALS on minor calls
triage_accuracy 0.15 Fraction of severity-1 calls dispatched within 25% of their timeout window
penalty βˆ’0.10 Deductions for timed-out criticals and wrong-unit assignments

Severity weights inside the survival score: 3Γ— for severity 1, 2Γ— for 2, 1.5Γ— for 3, 1Γ— for 4, 0.5Γ— for 5.

Survival curves (from EMS literature)

Emergency Curve Source / notes
Cardiac arrest exponential, ~10%/min decay Larsen et al. 1993
Trauma sigmoid centred at 45 min "golden hour"
Stroke exponential decay Saver 2006 β€” every minute = 1.9M neurons
Fire exponential, doubles per minute property loss
Breathing difficulty gentler exponential
Minor injury nearly flat stable patient
Mental health gentler exponential de-escalation success

Each call's outcome is multiplied by:

  • Unit effectiveness (e.g., ALS β†’ cardiac = 1.0; BLS β†’ cardiac = 0.5; fire engine β†’ cardiac = 0.1)
  • Hospital modifier (specialty match: +5%; on diversion or zero beds: βˆ’15%)

Baseline scores (heuristic agent, seed=42)

A simple rule-based heuristic (always pick the most-critical call, send the most effective available unit, reserve ALS for high-severity calls) produces the following calibrated scores:

Task Total Survival Efficiency Triage Penalty Completed/Total
easy 0.5476 0.463 0.800 1.000 βˆ’0.000 4/5
medium 0.3750 0.377 0.600 0.500 βˆ’0.160 9/15
hard 0.2183 0.214 0.433 0.500 βˆ’0.500 13/30
Average 0.3803

The clean monotonic decrease across difficulty (easy > medium > hard) confirms the env discriminates between scenarios as designed.


Inference script β€” inference.py

Per the hackathon spec, inference.py is in the project root and follows the mandatory contract:

Required environment variables

Variable Purpose Default in script
API_BASE_URL LLM endpoint https://router.huggingface.co/v1
MODEL_NAME Which model to call Qwen/Qwen2.5-72B-Instruct
HF_TOKEN API key for the LLM (no default)
LOCAL_IMAGE_NAME Docker image for from_docker_image() (no default)
DISPATCHPULSE_TASK Which task to run (easy/medium/hard) easy

Stdout format (verbatim)

[START] task=<task_name> env=dispatchpulse model=<model_name>
[STEP] step=<n> action=<action_str> reward=<0.00> done=<true|false> error=<msg|null>
[END] success=<true|false> steps=<n> score=<score> rewards=<r1,r2,...,rn>
  • One [START] line at episode begin
  • One [STEP] line per step, immediately after env.step() returns
  • One [END] line after env.close(), ALWAYS emitted (even on exception)
  • reward and rewards to 2 decimal places; score to 3 decimal places
  • done and success are lowercase booleans

Connection logic

  1. If LOCAL_IMAGE_NAME is set β†’ await DispatchPulseEnv.from_docker_image(LOCAL_IMAGE_NAME)
  2. Else if ENV_BASE_URL is set β†’ connect directly to a running env server
  3. Otherwise β†’ spin up an in-process simulation as a fallback (for offline runs)

Run it

# Against the live HF Space
ENV_BASE_URL=https://arun-sanjay-dispatchpulse.hf.space \
HF_TOKEN=$HF_TOKEN \
python inference.py

# Against a local Docker image
LOCAL_IMAGE_NAME=dispatchpulse:latest \
HF_TOKEN=$HF_TOKEN \
python inference.py

# In-process fallback (no network, no Docker)
python inference.py

Setup

Run locally with Python

python -m venv .venv && source .venv/bin/activate
pip install -e .
python inference.py

Run locally with Docker

docker build -t dispatchpulse .
docker run -p 8000:8000 dispatchpulse
# Then in another shell:
curl http://localhost:8000/health

Use as a client (OpenEnv EnvClient pattern)

import asyncio
from client import DispatchPulseEnv
from models import DispatchPulseAction

async def main():
    async with DispatchPulseEnv(base_url="https://arun-sanjay-dispatchpulse.hf.space") as env:
        result = await env.reset(task_name="easy", seed=42)
        while not result.done:
            action = DispatchPulseAction(action_type="wait", minutes=1, text="wait 1")
            result = await env.step(action)
            print(result.observation.text[:200])
        print(f"Final score: {result.reward}")

asyncio.run(main())

Run on Hugging Face Spaces

Auto-built as a Docker Space: https://huggingface.co/spaces/Arun-Sanjay/dispatchpulse


Pre-submission validator

Run the same three checks the hackathon's automated grader runs:

./scripts/validate-submission.sh https://arun-sanjay-dispatchpulse.hf.space .

It checks:

  1. HF Space deploys β€” POST /reset returns HTTP 200
  2. Docker build β€” docker build . succeeds (≀ 10 min)
  3. OpenEnv compliance β€” openenv validate passes

Calibration tests

The reward function ships with calibration tests that double as documentation:

python tests/test_reward.py
python tests/test_simulation.py

These verify that:

  • Survival curves match published clinical numbers
  • A "do-nothing" agent scores below 0.15 on every task
  • A simple heuristic strictly outperforms the silent agent
  • Heuristic scores monotonically decrease easy β†’ medium β†’ hard
  • ALS at cardiac arrest beats fire engine at cardiac arrest by β‰₯5Γ—
  • Specialty hospital match boosts outcome; diversion hurts it

License

Apache 2.0. Built for the Meta PyTorch OpenEnv Hackathon β€” India 2026.