garvitsachdeva commited on
Commit
43f2683
·
1 Parent(s): 8c359c3

Finalize OpenEnv baseline: OpenAI client, PORT binding, and docs

Browse files
Dockerfile CHANGED
@@ -5,4 +5,4 @@ WORKDIR /app
5
  COPY . /app
6
  RUN pip install uv && uv sync --frozen
7
  EXPOSE 8000
8
- CMD ["uv", "run", "uvicorn", "src.server.app:app", "--host", "0.0.0.0", "--port", "8000"]
 
5
  COPY . /app
6
  RUN pip install uv && uv sync --frozen
7
  EXPOSE 8000
8
+ CMD ["sh", "-c", "uv run uvicorn src.server.app:app --host 0.0.0.0 --port ${PORT:-8000}"]
README.md CHANGED
@@ -35,9 +35,12 @@ This project implements a benchmark environment for training and evaluating LLM
35
  |----------|----------|-------------|
36
  | `API_BASE_URL` | Yes | OpenAI-compatible endpoint base URL |
37
  | `MODEL_NAME` | Yes | Model identifier string |
38
- | `HF_TOKEN` | Yes (unless `USE_RANDOM=true`) | API key / HF token |
39
  | `USE_RANDOM` | No | Set to `true` to use deterministic random agent (no LLM) |
40
 
 
 
 
41
  ## Tasks
42
 
43
  ### 1. `single_incident`
@@ -114,8 +117,9 @@ uv sync
114
  # Run the demo (non-interactive episode visualization)
115
  uv run python demo.py
116
 
117
- # Run inference with LLM agent
118
- uv run python inference.py
 
119
 
120
  # Run API server
121
  uv run python -m src.server.app
@@ -144,7 +148,7 @@ python inference.py
144
  Run the random baseline agent against all 4 tasks:
145
 
146
  ```bash
147
- USE_RANDOM=true API_BASE_URL=https://api.openai.com/v1 MODEL_NAME=gpt-4 HF_TOKEN=x python inference.py
148
  ```
149
 
150
  Expected output (approximate):
@@ -243,7 +247,19 @@ curl -X POST http://localhost:8000/reset -H "Content-Type: application/json" -d
243
 
244
  ## HF Space
245
 
246
- **Placeholder**: (add link here)
 
 
 
 
 
 
 
 
 
 
 
 
247
 
248
  ## License
249
 
 
35
  |----------|----------|-------------|
36
  | `API_BASE_URL` | Yes | OpenAI-compatible endpoint base URL |
37
  | `MODEL_NAME` | Yes | Model identifier string |
38
+ | `OPENAI_API_KEY` | Yes (unless `USE_RANDOM=true`) | API key used by the OpenAI Python client |
39
  | `USE_RANDOM` | No | Set to `true` to use deterministic random agent (no LLM) |
40
 
41
+ Notes:
42
+ - `HF_TOKEN` is supported as a backwards-compatible alias for `OPENAI_API_KEY`.
43
+
44
  ## Tasks
45
 
46
  ### 1. `single_incident`
 
117
  # Run the demo (non-interactive episode visualization)
118
  uv run python demo.py
119
 
120
+ # Run inference (random baseline, no API calls)
121
+ USE_RANDOM=true API_BASE_URL=https://api.openai.com/v1 MODEL_NAME=gpt-4 OPENAI_API_KEY=x \
122
+ uv run python inference.py
123
 
124
  # Run API server
125
  uv run python -m src.server.app
 
148
  Run the random baseline agent against all 4 tasks:
149
 
150
  ```bash
151
+ USE_RANDOM=true API_BASE_URL=https://api.openai.com/v1 MODEL_NAME=gpt-4 OPENAI_API_KEY=x python inference.py
152
  ```
153
 
154
  Expected output (approximate):
 
247
 
248
  ## HF Space
249
 
250
+ ### Deploying to Hugging Face Spaces (Docker)
251
+
252
+ This repository is compatible with **Docker Spaces** (the README frontmatter includes `sdk: docker` and the Space tags include `openenv`).
253
+
254
+ 1) Create a new Space → choose **Docker**.
255
+ 2) Push this repository to the Space.
256
+ 3) The server binds to the `PORT` environment variable (HF commonly sets `PORT=7860`).
257
+
258
+ Once running, the Space should respond to:
259
+ - `GET /health`
260
+ - `POST /reset`
261
+ - `POST /step`
262
+ - `GET /state`
263
 
264
  ## License
265
 
changes.md CHANGED
@@ -1,353 +1,117 @@
1
- # 911 Dispatch SupervisorFix & Polish for OpenEnv Submission
2
 
3
- You are working on the repo at the current directory. Apply ALL fixes below in order.
4
- Do not skip any item. After all fixes, run the final validation checklist.
5
 
6
  ---
7
 
8
- ## SECTION 1 — CRITICAL BUGS (fix these first)
9
-
10
- ### 1.1 Fix `openenv.yaml` — Replace entire file content
11
-
12
- The file uses hard tab characters which breaks YAML parsing. Replace the entire file with:
13
- ```yaml
14
- name: citywide-dispatch-supervisor
15
- version: "0.1.0"
16
- description: >
17
- City-wide 911 emergency dispatch supervisor RL environment.
18
- An LLM agent learns to manage simultaneous incidents by dispatching
19
- police, fire, and EMS units across a city grid under realistic constraints.
20
- entrypoint: src.openenv_environment:OpenEnvEnvironment
21
- tasks:
22
- - id: single_incident
23
- name: Single Incident Response
24
- description: One incident with a small unit pool; learn basic dispatch, correct unit type, and response time.
25
- - id: multi_incident
26
- name: Simultaneous Multi-Incident
27
- description: Multiple concurrent incidents requiring triage, prioritization, and correct unit matching.
28
- - id: mass_casualty
29
- name: Mass Casualty Event
30
- description: Wave-based Priority-1 surge with resource conflict; maximize survival outcomes.
31
- - id: shift_surge
32
- name: Shift Surge
33
- description: Incident waves combined with units going out of service; maintain coverage over time.
34
- ```
35
-
36
- Verify with: `python -c "import yaml; yaml.safe_load(open('openenv.yaml')); print('YAML OK')"`
37
 
38
- ---
39
-
40
- ### 1.2 Fix `src/server/app.py` — Server never starts
41
-
42
- Add these two lines at the very bottom of `src/server/app.py`, after the `def main()` block:
43
- ```python
44
- if __name__ == "__main__":
45
- main()
46
- ```
47
-
48
- Also update the `main()` function to:
49
- ```python
50
- def main():
51
- import uvicorn
52
- uvicorn.run("src.server.app:app", host="0.0.0.0", port=8000, reload=False)
53
- ```
54
-
55
- ---
56
-
57
- ### 1.3 Fix `src/server/app.py` — `/reset` rejects empty body
58
-
59
- Change `ResetRequest` so `task_id` has a default:
60
- ```python
61
- class ResetRequest(BaseModel):
62
- task_id: str = "single_incident"
63
- seed: int | None = None
64
- ```
65
-
66
- ---
67
 
68
- ### 1.4 Fix `Dockerfile` — Use uvicorn directly in CMD
 
 
 
69
 
70
- Replace the CMD line in the root `Dockerfile` with:
71
- ```dockerfile
72
- CMD ["uv", "run", "uvicorn", "src.server.app:app", "--host", "0.0.0.0", "--port", "8000"]
73
- ```
74
 
75
  ---
76
 
77
- ## SECTION 2 HIGH PRIORITY BUGS
 
 
78
 
79
- ### 2.1 Fix `validate_local.py` — `check_inference()` never uses random mode
 
 
 
80
 
81
- In `validate_local.py`, inside `check_inference()`, add `env["USE_RANDOM"] = "true"` before the `subprocess.run` call:
82
- ```python
83
- env["USE_RANDOM"] = "true"
84
- ```
85
-
86
- Also increase the timeout to 300 seconds if not already set.
87
-
88
- ---
89
-
90
- ### 2.2 Fix `pyproject.toml` — Add `asyncio_mode`
91
-
92
- In `[tool.pytest.ini_options]`, add:
93
- ```toml
94
- asyncio_mode = "auto"
95
- ```
96
 
97
  ---
98
 
99
- ### 2.3 Fix `inference.py` Normalize exception error token
100
-
101
- In `inference.py`, inside the inner `except Exception as e` block within the step loop, change the error string:
102
- ```python
103
- except Exception as e:
104
- error_msg = "step_error"
105
- print(
106
- f"[STEP] step={step_count} action={action_str} "
107
- f"reward=0.00 done=true error={error_msg}"
108
- )
109
- success = False
110
- break
111
- ```
112
-
113
- ---
114
-
115
- ### 2.4 Fix `inference.py` — Score computation excludes reset reward
116
-
117
- Change score computation to exclude the initial reset observation score:
118
- ```python
119
- step_rewards = rewards[1:]
120
- if step_rewards:
121
- total_score = sum(step_rewards) / len(step_rewards)
122
- else:
123
- total_score = 0.0
124
- total_score = max(0.0, min(1.0, total_score))
125
 
126
- rewards_str = ",".join(f"{r:.2f}" for r in step_rewards) if step_rewards else "0.00"
127
- ```
 
128
 
129
- ---
130
-
131
- ### 2.5 Fix `src/server/app.py` — Guard `get_dashboard_state` against None env
132
-
133
- The `/dashboard/state` endpoint should return a safe empty structure before `/reset` is called. It already does this in the current code — verify it matches:
134
- ```python
135
- @app.get("/dashboard/state")
136
- async def get_dashboard_state() -> dict[str, Any]:
137
- if _env is None:
138
- return {
139
- "units": {},
140
- "incidents": {},
141
- "episode_id": "not-initialized",
142
- "step_count": 0,
143
- "task_id": "none",
144
- "city_time": 0.0,
145
- "metadata": {},
146
- "legal_actions": [],
147
- "issues": [],
148
- "observation": None,
149
- }
150
- # ... rest unchanged
151
- ```
152
 
153
  ---
154
 
155
- ## SECTION 3ENVIRONMENT DESIGN IMPROVEMENTS
156
-
157
- ### 3.1 Improve `src/tasks/single_incident.py` grader
158
-
159
- Replace `SingleIncidentGrader.grade()` with:
160
- ```python
161
- def grade(self, state: State, rewards: list[float]) -> float:
162
- if not rewards:
163
- return 0.0
164
-
165
- incident = state.incidents.get("INC-001")
166
- if incident is None:
167
- return 0.0
168
 
169
- score = 0.0
 
 
170
 
171
- if incident.status.value == "RESOLVED":
172
- score += 0.50
 
173
 
174
- medic_dispatched = any(
175
- u.unit_type.value == "MEDIC"
176
- and (
177
- u.assigned_incident_id == "INC-001"
178
- or u.status.value in {"ON_SCENE", "DISPATCHED"}
179
- )
180
- for u in state.units.values()
181
- )
182
- if medic_dispatched:
183
- score += 0.30
184
-
185
- if incident.status.value == "RESOLVED" and state.step_count <= 10:
186
- score += 0.20
187
-
188
- return max(0.0, min(1.0, score))
189
- ```
190
 
191
  ---
192
 
193
- ### 3.2 Improve `src/tasks/multi_incident.py` grader
194
-
195
- Replace `MultiIncidentGrader.grade()` with:
196
- ```python
197
- def grade(self, state: State, rewards: list[float]) -> float:
198
- if not rewards:
199
- return 0.0
200
-
201
- total = len(state.incidents)
202
- if total == 0:
203
- return 0.0
204
-
205
- resolved = sum(1 for i in state.incidents.values() if i.status.value == "RESOLVED")
206
- failed = sum(1 for i in state.incidents.values() if i.status.value == "ESCALATED")
207
- p1_total = sum(1 for i in state.incidents.values() if i.severity.value == "PRIORITY_1")
208
- p1_resolved = sum(
209
- 1
210
- for iid in state.metadata.get("resolved_incidents", [])
211
- if state.incidents.get(iid)
212
- and state.incidents[iid].severity.value == "PRIORITY_1"
213
- )
214
 
215
- resolution_score = resolved / total
216
- p1_score = (p1_resolved / p1_total) if p1_total > 0 else 1.0
217
- failure_penalty = failed / total
 
 
218
 
219
- score = 0.5 * p1_score + 0.3 * resolution_score - 0.2 * failure_penalty
220
- return max(0.0, min(1.0, score))
221
- ```
222
 
223
  ---
224
 
225
- ### 3.3 Improve `src/tasks/mass_casualty.py` grader
226
 
227
- Replace `MassCasualtyGrader.grade()` with:
228
- ```python
229
- def grade(self, state: State, rewards: list[float]) -> float:
230
- if not rewards:
231
- return 0.0
232
 
233
- p1_seen = list(state.metadata.get("p1_seen", []))
234
- p1_resolved = [
235
- iid
236
- for iid in state.metadata.get("resolved_incidents", [])
237
- if iid in p1_seen and iid not in state.metadata.get("failed_incidents", [])
238
- ]
239
- p1_failed = list(state.metadata.get("failed_incidents", []))
240
 
241
- survival_score = len(p1_resolved) / max(len(p1_seen), 1)
242
- failure_penalty = len(p1_failed) / max(len(p1_seen), 1) * 0.5
243
-
244
- mean_reward = sum(rewards) / len(rewards)
245
- score = 0.6 * survival_score + 0.3 * mean_reward - failure_penalty
246
- return max(0.0, min(1.0, score))
247
- ```
248
-
249
- ---
250
-
251
- ### 3.4 Fix `src/rewards.py` — Triage key format mismatch
252
-
253
- In `_compute_triage()`, the metadata lookup uses inconsistent key formats. Ensure it tries both:
254
- ```python
255
- required_types = (
256
- required_map.get(incident.incident_type.value, [])
257
- or required_map.get(str(incident.incident_type), [])
258
- )
259
- ```
260
 
261
  ---
262
 
263
- ### 3.5 Fix `src/state_machine.py` Use Manhattan distance for ETA
264
 
265
- In `_apply_dispatch()`, replace Euclidean distance with Manhattan:
266
- ```python
267
- dx = abs(unit.location_x - incident.location_x)
268
- dy = abs(unit.location_y - incident.location_y)
269
- manhattan_dist = dx + dy
270
- eta = manhattan_dist / max(speed, 1e-6)
271
- ```
272
 
273
- ---
274
-
275
- ## SECTION 4 — TEST FIXES
276
 
277
- ### 4.1 Update `tests/test_inference.py` Add `step_error` to valid error tokens
278
 
279
- Find `valid_errors` in `test_step_line_error_format` and add `"step_error"`:
280
- ```python
281
- valid_errors = {"null", "max_steps_exceeded", "illegal_transition", "step_error"}
282
- ```
283
 
284
- ---
 
285
 
286
- ### 4.2 Verify `tests/test_openenv_integration.py` has these two tests
287
-
288
- Confirm the following tests exist (they appear to be already present based on the file):
289
- ```python
290
- def test_reset_with_empty_body_returns_200(self) -> None:
291
- c = TestClient(server_app.app)
292
- response = c.post("/reset", json={})
293
- assert response.status_code == 200
294
- data = response.json()
295
- assert data["result"] == "dispatch center online"
296
-
297
- def test_tasks_endpoint_returns_four_tasks(self) -> None:
298
- c = TestClient(server_app.app)
299
- response = c.get("/tasks")
300
- assert response.status_code == 200
301
- tasks = response.json()
302
- assert len(tasks) == 4
303
- task_ids = {t["task_id"] for t in tasks}
304
- assert task_ids == {"single_incident", "multi_incident", "mass_casualty", "shift_surge"}
305
- ```
306
-
307
- If missing, add them to the `TestTasksEndpoint` and `TestResetEndpoint` classes.
308
-
309
- ---
310
 
311
- ## SECTION 5 — FINAL VALIDATION CHECKLIST
312
-
313
- Run these commands in order and confirm each passes:
314
- ```bash
315
- # 1. YAML parse check
316
- python -c "import yaml; yaml.safe_load(open('openenv.yaml')); print('YAML OK')"
317
-
318
- # 2. Full test suite
319
- uv run python -m pytest tests/ -v --tb=short
320
-
321
- # 3. Inference script with random agent
322
- USE_RANDOM=true API_BASE_URL=https://api.openai.com/v1 MODEL_NAME=gpt-4 HF_TOKEN=x \
323
- uv run python inference.py 2>&1 | grep -E '^\[(START|STEP|END)\]' | head -20
324
-
325
- # 4. Demo script
326
- uv run python demo.py
327
-
328
- # 5. OpenEnv validate
329
- uv run openenv validate
330
-
331
- # 6. Docker build
332
- docker build -t citywide-dispatch-supervisor .
333
-
334
- # 7. Docker run + health check + empty reset
335
- docker run -d -p 8000:8000 --name test-dispatch citywide-dispatch-supervisor
336
- sleep 5
337
- curl -s http://localhost:8000/health
338
- curl -s -X POST http://localhost:8000/reset \
339
- -H "Content-Type: application/json" -d '{}'
340
- docker stop test-dispatch && docker rm test-dispatch
341
-
342
- # 8. Benchmark scores all in [0.0, 1.0]
343
- uv run python -c "
344
- from src.benchmark import run_all
345
- scores = run_all()
346
- for task_id, score in scores.items():
347
- assert 0.0 <= score <= 1.0, f'{task_id}: score {score} out of range'
348
- print(f'{task_id}: {score:.3f}')
349
- print('All scores in [0.0, 1.0] — PASS')
350
- "
351
- ```
352
-
353
- All 8 checks must pass before the submission is ready.
 
1
+ # Remaining Changes Needed911 Dispatch Supervisor (as of 2026-04-06)
2
 
3
+ This file lists ONLY the work still required to fully match the hackathon requirements provided (OpenAI client + OPENAI_API_KEY baseline, HF Spaces readiness, and portable validation tooling). Items already implemented (OpenEnv YAML, tasks/graders, reward shaping, Docker boot, /reset {} support, etc.) are intentionally omitted.
 
4
 
5
  ---
6
 
7
+ ## SECTION 1 — BASELINE INFERENCE MUST USE OPENAI CLIENT + OPENAI_API_KEY (REQUIRED)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
 
9
+ ### 1.1 Update inference auth variables to match requirement
10
+ **Problem:** The requirement explicitly calls for `OPENAI_API_KEY`. Current code requires `HF_TOKEN` and does not recognize `OPENAI_API_KEY`.
11
+ **Where:** [inference.py](inference.py), [README.md](README.md), [validate_local.py](validate_local.py), [tests/test_inference.py](tests/test_inference.py)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
12
 
13
+ **Action:**
14
+ - Treat `OPENAI_API_KEY` as the primary credential env var.
15
+ - Keep backward-compatible support for `HF_TOKEN` (optional), but do not require it.
16
+ - Update README Environment Variables table + examples to show `OPENAI_API_KEY`.
17
 
18
+ **Verify:**
19
+ - `OPENAI_API_KEY=x USE_RANDOM=true API_BASE_URL=https://api.openai.com/v1 MODEL_NAME=gpt-4 uv run python inference.py`
20
+ - Must run and print `[START]` / `[STEP]` / `[END]` lines.
 
21
 
22
  ---
23
 
24
+ ### 1.2 Replace hand-rolled HTTPX chat call with the official OpenAI Python client
25
+ **Problem:** Requirement says “Uses the OpenAI API client”. Current LLM agent calls `/chat/completions` via HTTPX directly.
26
+ **Where:** [inference.py](inference.py)
27
 
28
+ **Action:**
29
+ - Implement the LLM agent using the `openai` Python package already present in dependencies.
30
+ - Continue supporting `API_BASE_URL` + `MODEL_NAME`.
31
+ - Ensure output format stays unchanged (tests depend on it).
32
 
33
+ **Verify:**
34
+ - With `USE_RANDOM=false` and a real key, it should complete at least one episode.
35
+ - With `USE_RANDOM=true`, it should not require any API key.
 
 
 
 
 
 
 
 
 
 
 
 
36
 
37
  ---
38
 
39
+ ### 1.3 Update env-var validation tests to reflect OPENAI_API_KEY support
40
+ **Problem:** Tests currently set `HF_TOKEN` and never mention `OPENAI_API_KEY`.
41
+ **Where:** [tests/test_inference.py](tests/test_inference.py)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
42
 
43
+ **Action:**
44
+ - Update tests to provide `OPENAI_API_KEY` instead of `HF_TOKEN` (or accept either).
45
+ - Add/adjust a test that asserts: missing `OPENAI_API_KEY` fails only when `USE_RANDOM != true`.
46
 
47
+ **Verify:**
48
+ - `uv run python -m pytest tests/test_inference.py -q` passes.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
49
 
50
  ---
51
 
52
+ ## SECTION 2HF SPACES (DOCKER) READINESS (REQUIRED)
 
 
 
 
 
 
 
 
 
 
 
 
53
 
54
+ ### 2.1 Make server bind to the Hugging Face provided port
55
+ **Problem:** HF Docker Spaces typically set `PORT=7860`. Current server binds to port 8000 unconditionally.
56
+ **Where:** [src/server/app.py](src/server/app.py), and Docker entrypoints in [Dockerfile](Dockerfile) + [src/server/Dockerfile](src/server/Dockerfile)
57
 
58
+ **Action:**
59
+ - In the server `main()`, read port from `PORT` env var (default 8000).
60
+ - Ensure Docker CMD uses that same port behavior (either via the Python `main()` or uvicorn args).
61
 
62
+ **Verify:**
63
+ - `PORT=7860 uv run python -m src.server.app` listens on 7860.
64
+ - `docker run -e PORT=7860 -p 7860:7860 citywide-dispatch-supervisor` works and `/health` responds.
 
 
 
 
 
 
 
 
 
 
 
 
 
65
 
66
  ---
67
 
68
+ ### 2.2 Replace README “HF Space Placeholder” with real deploy instructions (or link)
69
+ **Problem:** Requirement says “Deploy to Hugging Face Spaces”. README currently has a placeholder only.
70
+ **Where:** [README.md](README.md)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
71
 
72
+ **Action:**
73
+ - Add either:
74
+ - A real link to the deployed Space, OR
75
+ - Minimal, accurate deployment steps for creating a Docker Space (with required tags already present).
76
+ - Mention expected public URL and what endpoints should work (`/health`, `/reset`, `/step`, `/state`).
77
 
78
+ **Verify:**
79
+ - README no longer contains “Placeholder”.
 
80
 
81
  ---
82
 
83
+ ## SECTION 3 PORTABLE VALIDATION TOOLING (STRONGLY RECOMMENDED)
84
 
85
+ ### 3.1 Ensure `openenv validate` is installable from dependencies
86
+ **Problem:** Repo depends on `openenv-core`, but the CLI validator is provided by the `openenv` package. On a clean machine, `openenv validate` may be missing unless `openenv` is a dependency.
87
+ **Where:** [pyproject.toml](pyproject.toml), [requirements.txt](requirements.txt)
 
 
88
 
89
+ **Action:**
90
+ - Add `openenv>=0.2.0` (or the current compatible version) to dependencies so `openenv validate` is guaranteed available after install.
 
 
 
 
 
91
 
92
+ **Verify:**
93
+ - In a fresh venv after installing dependencies: `uv run openenv validate` succeeds.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
94
 
95
  ---
96
 
97
+ ## SECTION 4FINAL SUBMISSION CHECKS (RUN BEFORE SUBMITTING)
98
 
99
+ Run these in order:
 
 
 
 
 
 
100
 
101
+ 1) `python -c "import yaml; yaml.safe_load(open('openenv.yaml')); print('YAML OK')"`
 
 
102
 
103
+ 2) `uv run python -m pytest tests/ -q`
104
 
105
+ 3) Random baseline inference (no API key required):
106
+ - `USE_RANDOM=true API_BASE_URL=https://api.openai.com/v1 MODEL_NAME=gpt-4 uv run python inference.py`
 
 
107
 
108
+ 4) Local structure validation:
109
+ - `uv run openenv validate`
110
 
111
+ 5) Docker sanity:
112
+ - `docker build -t citywide-dispatch-supervisor .`
113
+ - `docker run -p 8000:8000 citywide-dispatch-supervisor`
114
+ - `curl -s http://localhost:8000/health`
115
+ - `curl -s -X POST http://localhost:8000/reset -H "Content-Type: application/json" -d '{}'`
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
116
 
117
+ All must pass.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
inference.py CHANGED
@@ -7,6 +7,7 @@ import sys
7
  from typing import Any
8
 
9
  import httpx
 
10
 
11
  from src.models import Action, DispatchAction
12
  from src.openenv_environment import OpenEnvEnvironment
@@ -24,10 +25,15 @@ def _validate_env_vars() -> None:
24
  )
25
 
26
  use_random = os.environ.get("USE_RANDOM", "").lower() == "true"
27
- api_base_url = os.environ.get("API_BASE_URL", "")
28
- is_gemini = "gemini" in api_base_url.lower()
29
- if not use_random and not is_gemini and not os.environ.get("HF_TOKEN"):
30
- raise EnvironmentError("Missing required environment variable: HF_TOKEN")
 
 
 
 
 
31
 
32
 
33
  def _get_env(var: str) -> str:
@@ -65,30 +71,38 @@ class LLMAgent:
65
  self.base_url = base_url.rstrip("/")
66
  self.model = model
67
 
 
 
 
68
  async def chat(self, messages: list[dict]) -> str:
69
  """Send chat request to LLM endpoint with appropriate auth.
70
 
71
- Auth method depends on endpoint:
72
- - Gemini (contains 'gemini'): use ?key= query param
73
- - Groq (contains 'groq'): use Authorization: Bearer header
74
- - Other OpenAI-compatible: use Authorization: Bearer header
 
75
  """
76
  is_gemini = "gemini" in self.base_url.lower()
77
- headers = {"Content-Type": "application/json"}
78
-
79
  if is_gemini:
 
 
80
  url = f"{self.base_url}/chat/completions?key={self.api_key}"
81
- else:
82
- url = f"{self.base_url}/chat/completions"
83
- headers["Authorization"] = f"Bearer {self.api_key}"
 
 
 
 
 
 
84
 
85
- async with httpx.AsyncClient(timeout=60.0) as client:
86
- resp = await client.post(
87
- url, json={"model": self.model, "messages": messages}, headers=headers
88
- )
89
- resp.raise_for_status()
90
- data = resp.json()
91
- return data["choices"][0]["message"]["content"]
92
 
93
  async def select_action(
94
  self, legal_actions: list[Action], state_desc: str, prev_obs: Any = None
@@ -305,8 +319,8 @@ async def main() -> int:
305
  if use_random:
306
  agent: RandomAgent | LLMAgent = RandomAgent(seed=42)
307
  else:
308
- hf_token = os.environ.get("HF_TOKEN", "")
309
- agent = LLMAgent(api_key=hf_token, base_url=api_base_url, model=model_name)
310
 
311
  task_ids = ["single_incident", "multi_incident", "mass_casualty", "shift_surge"]
312
 
 
7
  from typing import Any
8
 
9
  import httpx
10
+ from openai import AsyncOpenAI
11
 
12
  from src.models import Action, DispatchAction
13
  from src.openenv_environment import OpenEnvEnvironment
 
25
  )
26
 
27
  use_random = os.environ.get("USE_RANDOM", "").lower() == "true"
28
+ if use_random:
29
+ return
30
+
31
+ # Prefer OPENAI_API_KEY for hackathon compliance; keep HF_TOKEN for backwards compatibility.
32
+ if os.environ.get("OPENAI_API_KEY"):
33
+ return
34
+ if os.environ.get("HF_TOKEN"):
35
+ return
36
+ raise EnvironmentError("Missing required environment variable: OPENAI_API_KEY")
37
 
38
 
39
  def _get_env(var: str) -> str:
 
71
  self.base_url = base_url.rstrip("/")
72
  self.model = model
73
 
74
+ # Official OpenAI Python client for OpenAI-compatible endpoints.
75
+ self._client = AsyncOpenAI(api_key=self.api_key, base_url=self.base_url)
76
+
77
  async def chat(self, messages: list[dict]) -> str:
78
  """Send chat request to LLM endpoint with appropriate auth.
79
 
80
+ Uses the official OpenAI client for OpenAI-compatible endpoints.
81
+
82
+ Note: Some non-OpenAI providers (e.g., certain Gemini endpoints) may not
83
+ be compatible with the OpenAI client; those are handled via a minimal
84
+ HTTPX fallback.
85
  """
86
  is_gemini = "gemini" in self.base_url.lower()
 
 
87
  if is_gemini:
88
+ # Fallback for Gemini-style "?key=" auth.
89
+ headers = {"Content-Type": "application/json"}
90
  url = f"{self.base_url}/chat/completions?key={self.api_key}"
91
+ async with httpx.AsyncClient(timeout=60.0) as client:
92
+ resp = await client.post(
93
+ url,
94
+ json={"model": self.model, "messages": messages},
95
+ headers=headers,
96
+ )
97
+ resp.raise_for_status()
98
+ data = resp.json()
99
+ return data["choices"][0]["message"]["content"]
100
 
101
+ resp = await self._client.chat.completions.create(
102
+ model=self.model,
103
+ messages=messages,
104
+ )
105
+ return resp.choices[0].message.content or ""
 
 
106
 
107
  async def select_action(
108
  self, legal_actions: list[Action], state_desc: str, prev_obs: Any = None
 
319
  if use_random:
320
  agent: RandomAgent | LLMAgent = RandomAgent(seed=42)
321
  else:
322
+ api_key = os.environ.get("OPENAI_API_KEY") or os.environ.get("HF_TOKEN", "")
323
+ agent = LLMAgent(api_key=api_key, base_url=api_base_url, model=model_name)
324
 
325
  task_ids = ["single_incident", "multi_incident", "mass_casualty", "shift_surge"]
326
 
pyproject.toml CHANGED
@@ -9,6 +9,7 @@ description = "911 Dispatch RL Environment. City-wide emergency dispatch benchma
9
  requires-python = ">=3.11"
10
  dependencies = [
11
  "pydantic>=2.7",
 
12
  "openenv-core>=0.2.0",
13
  "fastapi>=0.110",
14
  "uvicorn[standard]>=0.29",
 
9
  requires-python = ">=3.11"
10
  dependencies = [
11
  "pydantic>=2.7",
12
+ "openenv>=0.2.0",
13
  "openenv-core>=0.2.0",
14
  "fastapi>=0.110",
15
  "uvicorn[standard]>=0.29",
requirements.txt CHANGED
@@ -1,4 +1,5 @@
1
  pydantic>=2.7
 
2
  openenv-core>=0.2.0
3
  fastapi>=0.110
4
  uvicorn[standard]>=0.29
 
1
  pydantic>=2.7
2
+ openenv>=0.2.0
3
  openenv-core>=0.2.0
4
  fastapi>=0.110
5
  uvicorn[standard]>=0.29
src/server/Dockerfile CHANGED
@@ -10,4 +10,4 @@ COPY data/ /app/data/
10
 
11
  EXPOSE 8000
12
 
13
- CMD ["uvicorn", "src.server.app:app", "--host", "0.0.0.0", "--port", "8000"]
 
10
 
11
  EXPOSE 8000
12
 
13
+ CMD ["sh", "-c", "uvicorn src.server.app:app --host 0.0.0.0 --port ${PORT:-8000}"]
src/server/app.py CHANGED
@@ -173,8 +173,10 @@ async def get_dashboard_state() -> dict[str, Any]:
173
 
174
  def main():
175
  import uvicorn
 
176
 
177
- uvicorn.run("src.server.app:app", host="0.0.0.0", port=8000, reload=False)
 
178
 
179
 
180
  if __name__ == "__main__":
 
173
 
174
  def main():
175
  import uvicorn
176
+ import os
177
 
178
+ port = int(os.environ.get("PORT", "8000"))
179
+ uvicorn.run("src.server.app:app", host="0.0.0.0", port=port, reload=False)
180
 
181
 
182
  if __name__ == "__main__":
tests/test_inference.py CHANGED
@@ -29,7 +29,7 @@ class TestInferenceFormatCompliance:
29
  env = {
30
  "API_BASE_URL": "https://api.example.com",
31
  "MODEL_NAME": "test-model",
32
- "HF_TOKEN": "test-token",
33
  "USE_RANDOM": "true",
34
  }
35
  returncode, stdout, stderr = self._run_inference_capture(env)
@@ -46,7 +46,7 @@ class TestInferenceFormatCompliance:
46
  env = {
47
  "API_BASE_URL": "https://api.example.com",
48
  "MODEL_NAME": "test-model",
49
- "HF_TOKEN": "test-token",
50
  "USE_RANDOM": "true",
51
  }
52
  _, stdout, _ = self._run_inference_capture(env)
@@ -59,7 +59,7 @@ class TestInferenceFormatCompliance:
59
  env = {
60
  "API_BASE_URL": "https://api.example.com",
61
  "MODEL_NAME": "test-model",
62
- "HF_TOKEN": "test-token",
63
  "USE_RANDOM": "true",
64
  }
65
  _, stdout, _ = self._run_inference_capture(env)
@@ -84,6 +84,10 @@ class TestEnvVarValidation:
84
  merged_env.pop("API_BASE_URL", None)
85
  if "MODEL_NAME" not in env:
86
  merged_env.pop("MODEL_NAME", None)
 
 
 
 
87
  result = subprocess.run(
88
  cmd,
89
  capture_output=True,
@@ -94,13 +98,23 @@ class TestEnvVarValidation:
94
  return result.returncode, result.stdout, result.stderr
95
 
96
  def test_missing_api_base_url(self) -> None:
97
- env = {"MODEL_NAME": "m", "HF_TOKEN": "t", "USE_RANDOM": "true"}
98
  returncode, stdout, stderr = self._run_inference_capture(env)
99
  assert returncode != 0
100
  assert "API_BASE_URL" in (stdout + stderr)
101
 
102
  def test_missing_model_name(self) -> None:
103
- env = {"API_BASE_URL": "x", "HF_TOKEN": "t", "USE_RANDOM": "true"}
104
  returncode, stdout, stderr = self._run_inference_capture(env)
105
  assert returncode != 0
106
  assert "MODEL_NAME" in (stdout + stderr)
 
 
 
 
 
 
 
 
 
 
 
29
  env = {
30
  "API_BASE_URL": "https://api.example.com",
31
  "MODEL_NAME": "test-model",
32
+ "OPENAI_API_KEY": "test-token",
33
  "USE_RANDOM": "true",
34
  }
35
  returncode, stdout, stderr = self._run_inference_capture(env)
 
46
  env = {
47
  "API_BASE_URL": "https://api.example.com",
48
  "MODEL_NAME": "test-model",
49
+ "OPENAI_API_KEY": "test-token",
50
  "USE_RANDOM": "true",
51
  }
52
  _, stdout, _ = self._run_inference_capture(env)
 
59
  env = {
60
  "API_BASE_URL": "https://api.example.com",
61
  "MODEL_NAME": "test-model",
62
+ "OPENAI_API_KEY": "test-token",
63
  "USE_RANDOM": "true",
64
  }
65
  _, stdout, _ = self._run_inference_capture(env)
 
84
  merged_env.pop("API_BASE_URL", None)
85
  if "MODEL_NAME" not in env:
86
  merged_env.pop("MODEL_NAME", None)
87
+ if "OPENAI_API_KEY" not in env:
88
+ merged_env.pop("OPENAI_API_KEY", None)
89
+ if "HF_TOKEN" not in env:
90
+ merged_env.pop("HF_TOKEN", None)
91
  result = subprocess.run(
92
  cmd,
93
  capture_output=True,
 
98
  return result.returncode, result.stdout, result.stderr
99
 
100
  def test_missing_api_base_url(self) -> None:
101
+ env = {"MODEL_NAME": "m", "OPENAI_API_KEY": "t", "USE_RANDOM": "true"}
102
  returncode, stdout, stderr = self._run_inference_capture(env)
103
  assert returncode != 0
104
  assert "API_BASE_URL" in (stdout + stderr)
105
 
106
  def test_missing_model_name(self) -> None:
107
+ env = {"API_BASE_URL": "x", "OPENAI_API_KEY": "t", "USE_RANDOM": "true"}
108
  returncode, stdout, stderr = self._run_inference_capture(env)
109
  assert returncode != 0
110
  assert "MODEL_NAME" in (stdout + stderr)
111
+
112
+ def test_missing_openai_api_key_when_not_random(self) -> None:
113
+ env = {
114
+ "API_BASE_URL": "https://api.example.com",
115
+ "MODEL_NAME": "m",
116
+ "USE_RANDOM": "false",
117
+ }
118
+ returncode, stdout, stderr = self._run_inference_capture(env)
119
+ assert returncode != 0
120
+ assert "OPENAI_API_KEY" in (stdout + stderr)
validate_local.py CHANGED
@@ -39,7 +39,7 @@ def check_inference() -> bool:
39
  env = os.environ.copy()
40
  env["API_BASE_URL"] = "https://api.openai.com/v1"
41
  env["MODEL_NAME"] = "gpt-4"
42
- env["HF_TOKEN"] = "dummy-token-for-local-validation"
43
  env["USE_RANDOM"] = "true"
44
 
45
  print("\nNOTE: Running inference.py in random-agent mode for local validation")
 
39
  env = os.environ.copy()
40
  env["API_BASE_URL"] = "https://api.openai.com/v1"
41
  env["MODEL_NAME"] = "gpt-4"
42
+ env["OPENAI_API_KEY"] = "dummy-token-for-local-validation"
43
  env["USE_RANDOM"] = "true"
44
 
45
  print("\nNOTE: Running inference.py in random-agent mode for local validation")