File size: 12,188 Bytes
f65d331
 
 
 
 
 
fa0944a
f65d331
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
538acd4
f65d331
 
 
 
 
 
 
 
538acd4
 
 
 
f65d331
 
 
538acd4
f65d331
538acd4
f65d331
538acd4
f65d331
 
 
538acd4
f65d331
538acd4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f65d331
538acd4
 
 
 
 
 
 
f65d331
538acd4
 
 
 
 
 
 
 
f65d331
 
 
538acd4
f65d331
538acd4
 
 
 
 
 
f65d331
 
 
 
 
 
 
538acd4
f65d331
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
538acd4
f65d331
 
 
 
 
 
 
 
 
 
 
 
 
 
 
538acd4
f65d331
538acd4
 
 
f65d331
538acd4
 
 
 
 
 
f65d331
 
 
 
538acd4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f65d331
538acd4
f65d331
538acd4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f65d331
 
 
 
 
 
 
 
 
 
538acd4
f65d331
 
 
 
 
 
 
 
 
 
 
538acd4
f65d331
 
538acd4
f65d331
538acd4
 
 
 
 
 
 
 
 
 
 
 
f65d331
 
 
 
538acd4
f65d331
 
 
 
538acd4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f65d331
 
 
 
 
 
 
 
 
 
 
 
538acd4
f65d331
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
---
title: DispatchPulse
emoji: πŸš‘
colorFrom: red
colorTo: blue
sdk: docker
app_port: 8000
pinned: false
license: apache-2.0
---

# DispatchPulse

**An OpenEnv environment where an AI agent acts as a 911 emergency dispatch coordinator.**
The agent receives incoming calls, classifies their severity, and dispatches limited
emergency units (ALS / BLS ambulances, fire engines, police) under time pressure.
Patient outcomes are scored against **real clinical survival curves** β€” no
LLM-as-judge, just defensible math.

> Submission for the [Meta PyTorch OpenEnv Hackathon β€” India 2026](https://www.scaler.com/school-of-technology/meta-pytorch-hackathon).

---

## Why this environment

In India, an estimated 24,000+ people die every day because of slow emergency
response β€” average ambulance time is 25–35 minutes, well beyond the golden hour,
and only ~20% of ambulances carry advanced life support. DispatchPulse simulates
this crisis as an interactive RL environment where the agent has to learn the
*counter-intuitive* strategies real dispatchers use:

- **The greedy "closest unit" strategy fails.** Dispatching the only ALS to a
  sprained ankle leaves nothing for the cardiac arrest that arrives 3 minutes
  later β€” survival drops from 70% to 15%.
- **Triage matters more than speed.** A weighted reward (severity 1 calls
  count 3Γ— more than severity 4) means the agent has to *prioritise*, not
  just react.
- **Hospital choice matters.** Sending a stroke patient to a hospital without
  a stroke unit, or to one on diversion, costs you score.

The reward function uses real clinical survival curves from the EMS literature
(Larsen et al. 1993 for cardiac arrest; Saver 2006 "Time is Brain" for stroke;
golden hour curves for trauma). It's deterministic, defensible, and gives a
continuous signal an RL agent can actually learn from.

---

## OpenEnv compliance

| Requirement | Status |
|---|---|
| Real-world task (not games or toys) | βœ… Emergency dispatch β€” actual profession |
| Typed Pydantic models inheriting from OpenEnv `Action` / `Observation` / `State` | βœ… `models.py` |
| `Environment` base-class subclass with `reset()` / `step()` / `state` | βœ… `server/environment.py` |
| FastAPI server via `create_fastapi_app(...)` | βœ… `server/app.py` |
| `EnvClient` client with `_step_payload` / `_parse_result` / `_parse_state` | βœ… `client.py` |
| `openenv.yaml` manifest | βœ… |
| β‰₯ 3 tasks with graders, scores 0.0–1.0 | βœ… easy / medium / hard |
| Meaningful reward + partial progress | βœ… survival curves + per-step rewards |
| `inference.py` at root, OpenAI client, mandatory env vars, `[START]/[STEP]/[END]` format | βœ… |
| Reproducible (fixed seed) | βœ… `seed=42` default everywhere |
| Pre-submission validator script | βœ… `scripts/validate-submission.sh` |
| Dockerfile + HF Spaces deploy | βœ… uses `openenv-base` |
| Runs on 2 vCPU / 8 GB RAM | βœ… pure Python math, no ML inference |

---

## Project layout (canonical OpenEnv structure)

```
DispatchPulse/
β”œβ”€β”€ README.md
β”œβ”€β”€ Dockerfile               # uses ghcr.io/meta-pytorch/openenv-base
β”œβ”€β”€ openenv.yaml             # OpenEnv manifest
β”œβ”€β”€ pyproject.toml
β”œβ”€β”€ inference.py             # ROUND 1 ENTRY POINT β€” must be in root
β”œβ”€β”€ client.py                # DispatchPulseEnv (subclass of EnvClient)
β”œβ”€β”€ models.py                # DispatchPulseAction / Observation / State
β”‚                            # plus internal sim models
β”œβ”€β”€ simulation.py            # DispatchSimulation engine
β”œβ”€β”€ reward.py                # Survival curves + episode reward
β”œβ”€β”€ grader.py                # Programmatic 0.0–1.0 grader
β”œβ”€β”€ scenario_loader.py       # YAML task loader
β”œβ”€β”€ text_view.py             # LLM-friendly dispatch center renderer
β”œβ”€β”€ utils.py                 # Distance / ETA / templates
β”œβ”€β”€ server/
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ app.py               # FastAPI app via create_fastapi_app(...)
β”‚   └── environment.py       # DispatchPulseEnvironment(Environment)
β”œβ”€β”€ tasks/
β”‚   β”œβ”€β”€ easy.yaml
β”‚   β”œβ”€β”€ medium.yaml
β”‚   └── hard.yaml
β”œβ”€β”€ scripts/
β”‚   └── validate-submission.sh   # runs the 3 grader checks locally
└── tests/
    β”œβ”€β”€ test_reward.py
    └── test_simulation.py
```

---

## Action space (typed Pydantic)

`DispatchPulseAction` has these `action_type` values:

| `action_type` | Required fields | Time cost | What it does |
|---|---|---|---|
| `dispatch` | `call_id`, `unit_id`, `hospital_id?` | 1 min | Send a unit to a call (optionally pre-routing to a hospital). |
| `classify` | `call_id`, `severity` (1-5) | 1 min | Reclassify a call's severity. |
| `callback` | `call_id`, `message` | 1 min | Phone the caller back. 70% chance they clarify the true emergency type. |
| `wait` | `minutes` (default 1, max 5) | n min | Skip ahead in the simulation when there's nothing to do. |
| `view` | β€” | free | Re-fetch the dispatch center text without advancing time. |

The action also has a free-text `text` field β€” the server parses lines like
`dispatch CALL-001 ALS-1 H1` so an LLM can produce them directly.

## Observation space

`DispatchPulseObservation` has:

- `text` β€” formatted dispatch center view (the field the LLM reads)
- `current_time`, `time_limit`
- `calls_pending`, `units_available`, `calls_completed`, `calls_timed_out`, `total_calls`
- `last_action_error` β€” error string from the previous action, or `None`
- `info_message` β€” what just happened
- inherited `done`, `reward`, `metadata`

## Tasks

| Task | Calls | Units | Hospitals | Duration | Caller misreporting | What's hard about it |
|---|---|---|---|---|---|---|
| `easy` | 5 | 4 | 1 | 30 min | 0% | Basic dispatch β€” learn the action grammar |
| `medium` | 15 | 6 | 2 | 45 min | 20% | Mass casualty bus accident at minute 12; some callers lie |
| `hard` | 30 | 8 | 3 (1 on diversion) | 60 min | 35% | Earthquake β€” extreme scarcity, panicked callers, hospital triage matters |

All three are deterministic given the seed.

---

## Reward function

Final episode score = weighted combination of four components, all in [0, 1]:

| Component | Weight | What it measures |
|---|---|---|
| `survival_score` | 0.60 | Severity-weighted average outcome across all calls (uses clinical survival curves Γ— unit effectiveness Γ— hospital modifier) |
| `efficiency_score` | 0.15 | Fraction of calls dispatched, penalised for wasting ALS on minor calls |
| `triage_accuracy` | 0.15 | Fraction of severity-1 calls dispatched within 25% of their timeout window |
| `penalty` | βˆ’0.10 | Deductions for timed-out criticals and wrong-unit assignments |

Severity weights inside the survival score: **3Γ— for severity 1, 2Γ— for 2, 1.5Γ— for 3, 1Γ— for 4, 0.5Γ— for 5**.

### Survival curves (from EMS literature)

| Emergency | Curve | Source / notes |
|---|---|---|
| Cardiac arrest | exponential, ~10%/min decay | Larsen et al. 1993 |
| Trauma | sigmoid centred at 45 min | "golden hour" |
| Stroke | exponential decay | Saver 2006 β€” every minute = 1.9M neurons |
| Fire | exponential, doubles per minute | property loss |
| Breathing difficulty | gentler exponential | |
| Minor injury | nearly flat | stable patient |
| Mental health | gentler exponential | de-escalation success |

Each call's outcome is multiplied by:
- **Unit effectiveness** (e.g., ALS β†’ cardiac = 1.0; BLS β†’ cardiac = 0.5; fire engine β†’ cardiac = 0.1)
- **Hospital modifier** (specialty match: +5%; on diversion or zero beds: βˆ’15%)

---

## Baseline scores (heuristic agent, seed=42)

A simple rule-based heuristic (always pick the most-critical call, send the
most effective available unit, reserve ALS for high-severity calls) produces
the following calibrated scores:

| Task | Total | Survival | Efficiency | Triage | Penalty | Completed/Total |
|---|---|---|---|---|---|---|
| easy   | 0.5476 | 0.463 | 0.800 | 1.000 | βˆ’0.000 | 4/5 |
| medium | 0.3750 | 0.377 | 0.600 | 0.500 | βˆ’0.160 | 9/15 |
| hard   | 0.2183 | 0.214 | 0.433 | 0.500 | βˆ’0.500 | 13/30 |
| **Average** | **0.3803** | | | | | |

The clean monotonic decrease across difficulty (easy > medium > hard) confirms
the env discriminates between scenarios as designed.

---

## Inference script β€” `inference.py`

Per the hackathon spec, `inference.py` is in the **project root** and follows
the mandatory contract:

### Required environment variables

| Variable | Purpose | Default in script |
|---|---|---|
| `API_BASE_URL` | LLM endpoint | `https://router.huggingface.co/v1` |
| `MODEL_NAME` | Which model to call | `Qwen/Qwen2.5-72B-Instruct` |
| `HF_TOKEN` | API key for the LLM | (no default) |
| `LOCAL_IMAGE_NAME` | Docker image for `from_docker_image()` | (no default) |
| `DISPATCHPULSE_TASK` | Which task to run (`easy`/`medium`/`hard`) | `easy` |

### Stdout format (verbatim)

```
[START] task=<task_name> env=dispatchpulse model=<model_name>
[STEP] step=<n> action=<action_str> reward=<0.00> done=<true|false> error=<msg|null>
[END] success=<true|false> steps=<n> score=<score> rewards=<r1,r2,...,rn>
```

- One `[START]` line at episode begin
- One `[STEP]` line per step, immediately after `env.step()` returns
- One `[END]` line after `env.close()`, ALWAYS emitted (even on exception)
- `reward` and `rewards` to 2 decimal places; `score` to 3 decimal places
- `done` and `success` are lowercase booleans

### Connection logic

1. If `LOCAL_IMAGE_NAME` is set β†’ `await DispatchPulseEnv.from_docker_image(LOCAL_IMAGE_NAME)`
2. Else if `ENV_BASE_URL` is set β†’ connect directly to a running env server
3. Otherwise β†’ spin up an in-process simulation as a fallback (for offline runs)

### Run it

```bash
# Against the live HF Space
ENV_BASE_URL=https://arun-sanjay-dispatchpulse.hf.space \
HF_TOKEN=$HF_TOKEN \
python inference.py

# Against a local Docker image
LOCAL_IMAGE_NAME=dispatchpulse:latest \
HF_TOKEN=$HF_TOKEN \
python inference.py

# In-process fallback (no network, no Docker)
python inference.py
```

---

## Setup

### Run locally with Python

```bash
python -m venv .venv && source .venv/bin/activate
pip install -e .
python inference.py
```

### Run locally with Docker

```bash
docker build -t dispatchpulse .
docker run -p 8000:8000 dispatchpulse
# Then in another shell:
curl http://localhost:8000/health
```

### Use as a client (OpenEnv `EnvClient` pattern)

```python
import asyncio
from client import DispatchPulseEnv
from models import DispatchPulseAction

async def main():
    async with DispatchPulseEnv(base_url="https://arun-sanjay-dispatchpulse.hf.space") as env:
        result = await env.reset(task_name="easy", seed=42)
        while not result.done:
            action = DispatchPulseAction(action_type="wait", minutes=1, text="wait 1")
            result = await env.step(action)
            print(result.observation.text[:200])
        print(f"Final score: {result.reward}")

asyncio.run(main())
```

### Run on Hugging Face Spaces

Auto-built as a Docker Space:
[`https://huggingface.co/spaces/Arun-Sanjay/dispatchpulse`](https://huggingface.co/spaces/Arun-Sanjay/dispatchpulse)

---

## Pre-submission validator

Run the same three checks the hackathon's automated grader runs:

```bash
./scripts/validate-submission.sh https://arun-sanjay-dispatchpulse.hf.space .
```

It checks:
1. **HF Space deploys** β€” `POST /reset` returns HTTP 200
2. **Docker build** β€” `docker build .` succeeds (≀ 10 min)
3. **OpenEnv compliance** β€” `openenv validate` passes

---

## Calibration tests

The reward function ships with calibration tests that double as documentation:

```bash
python tests/test_reward.py
python tests/test_simulation.py
```

These verify that:
- Survival curves match published clinical numbers
- A "do-nothing" agent scores below 0.15 on every task
- A simple heuristic strictly outperforms the silent agent
- Heuristic scores monotonically decrease easy β†’ medium β†’ hard
- ALS at cardiac arrest beats fire engine at cardiac arrest by β‰₯5Γ—
- Specialty hospital match boosts outcome; diversion hurts it

---

## License

Apache 2.0. Built for the Meta PyTorch OpenEnv Hackathon β€” India 2026.