Spaces:
Sleeping
Sleeping
Commit ·
43f2683
1
Parent(s): 8c359c3
Finalize OpenEnv baseline: OpenAI client, PORT binding, and docs
Browse files- Dockerfile +1 -1
- README.md +21 -5
- changes.md +73 -309
- inference.py +36 -22
- pyproject.toml +1 -0
- requirements.txt +1 -0
- src/server/Dockerfile +1 -1
- src/server/app.py +3 -1
- tests/test_inference.py +19 -5
- validate_local.py +1 -1
Dockerfile
CHANGED
|
@@ -5,4 +5,4 @@ WORKDIR /app
|
|
| 5 |
COPY . /app
|
| 6 |
RUN pip install uv && uv sync --frozen
|
| 7 |
EXPOSE 8000
|
| 8 |
-
CMD ["
|
|
|
|
| 5 |
COPY . /app
|
| 6 |
RUN pip install uv && uv sync --frozen
|
| 7 |
EXPOSE 8000
|
| 8 |
+
CMD ["sh", "-c", "uv run uvicorn src.server.app:app --host 0.0.0.0 --port ${PORT:-8000}"]
|
README.md
CHANGED
|
@@ -35,9 +35,12 @@ This project implements a benchmark environment for training and evaluating LLM
|
|
| 35 |
|----------|----------|-------------|
|
| 36 |
| `API_BASE_URL` | Yes | OpenAI-compatible endpoint base URL |
|
| 37 |
| `MODEL_NAME` | Yes | Model identifier string |
|
| 38 |
-
| `
|
| 39 |
| `USE_RANDOM` | No | Set to `true` to use deterministic random agent (no LLM) |
|
| 40 |
|
|
|
|
|
|
|
|
|
|
| 41 |
## Tasks
|
| 42 |
|
| 43 |
### 1. `single_incident`
|
|
@@ -114,8 +117,9 @@ uv sync
|
|
| 114 |
# Run the demo (non-interactive episode visualization)
|
| 115 |
uv run python demo.py
|
| 116 |
|
| 117 |
-
# Run inference
|
| 118 |
-
|
|
|
|
| 119 |
|
| 120 |
# Run API server
|
| 121 |
uv run python -m src.server.app
|
|
@@ -144,7 +148,7 @@ python inference.py
|
|
| 144 |
Run the random baseline agent against all 4 tasks:
|
| 145 |
|
| 146 |
```bash
|
| 147 |
-
USE_RANDOM=true API_BASE_URL=https://api.openai.com/v1 MODEL_NAME=gpt-4
|
| 148 |
```
|
| 149 |
|
| 150 |
Expected output (approximate):
|
|
@@ -243,7 +247,19 @@ curl -X POST http://localhost:8000/reset -H "Content-Type: application/json" -d
|
|
| 243 |
|
| 244 |
## HF Space
|
| 245 |
|
| 246 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 247 |
|
| 248 |
## License
|
| 249 |
|
|
|
|
| 35 |
|----------|----------|-------------|
|
| 36 |
| `API_BASE_URL` | Yes | OpenAI-compatible endpoint base URL |
|
| 37 |
| `MODEL_NAME` | Yes | Model identifier string |
|
| 38 |
+
| `OPENAI_API_KEY` | Yes (unless `USE_RANDOM=true`) | API key used by the OpenAI Python client |
|
| 39 |
| `USE_RANDOM` | No | Set to `true` to use deterministic random agent (no LLM) |
|
| 40 |
|
| 41 |
+
Notes:
|
| 42 |
+
- `HF_TOKEN` is supported as a backwards-compatible alias for `OPENAI_API_KEY`.
|
| 43 |
+
|
| 44 |
## Tasks
|
| 45 |
|
| 46 |
### 1. `single_incident`
|
|
|
|
| 117 |
# Run the demo (non-interactive episode visualization)
|
| 118 |
uv run python demo.py
|
| 119 |
|
| 120 |
+
# Run inference (random baseline, no API calls)
|
| 121 |
+
USE_RANDOM=true API_BASE_URL=https://api.openai.com/v1 MODEL_NAME=gpt-4 OPENAI_API_KEY=x \
|
| 122 |
+
uv run python inference.py
|
| 123 |
|
| 124 |
# Run API server
|
| 125 |
uv run python -m src.server.app
|
|
|
|
| 148 |
Run the random baseline agent against all 4 tasks:
|
| 149 |
|
| 150 |
```bash
|
| 151 |
+
USE_RANDOM=true API_BASE_URL=https://api.openai.com/v1 MODEL_NAME=gpt-4 OPENAI_API_KEY=x python inference.py
|
| 152 |
```
|
| 153 |
|
| 154 |
Expected output (approximate):
|
|
|
|
| 247 |
|
| 248 |
## HF Space
|
| 249 |
|
| 250 |
+
### Deploying to Hugging Face Spaces (Docker)
|
| 251 |
+
|
| 252 |
+
This repository is compatible with **Docker Spaces** (the README frontmatter includes `sdk: docker` and the Space tags include `openenv`).
|
| 253 |
+
|
| 254 |
+
1) Create a new Space → choose **Docker**.
|
| 255 |
+
2) Push this repository to the Space.
|
| 256 |
+
3) The server binds to the `PORT` environment variable (HF commonly sets `PORT=7860`).
|
| 257 |
+
|
| 258 |
+
Once running, the Space should respond to:
|
| 259 |
+
- `GET /health`
|
| 260 |
+
- `POST /reset`
|
| 261 |
+
- `POST /step`
|
| 262 |
+
- `GET /state`
|
| 263 |
|
| 264 |
## License
|
| 265 |
|
changes.md
CHANGED
|
@@ -1,353 +1,117 @@
|
|
| 1 |
-
#
|
| 2 |
|
| 3 |
-
|
| 4 |
-
Do not skip any item. After all fixes, run the final validation checklist.
|
| 5 |
|
| 6 |
---
|
| 7 |
|
| 8 |
-
## SECTION 1 —
|
| 9 |
-
|
| 10 |
-
### 1.1 Fix `openenv.yaml` — Replace entire file content
|
| 11 |
-
|
| 12 |
-
The file uses hard tab characters which breaks YAML parsing. Replace the entire file with:
|
| 13 |
-
```yaml
|
| 14 |
-
name: citywide-dispatch-supervisor
|
| 15 |
-
version: "0.1.0"
|
| 16 |
-
description: >
|
| 17 |
-
City-wide 911 emergency dispatch supervisor RL environment.
|
| 18 |
-
An LLM agent learns to manage simultaneous incidents by dispatching
|
| 19 |
-
police, fire, and EMS units across a city grid under realistic constraints.
|
| 20 |
-
entrypoint: src.openenv_environment:OpenEnvEnvironment
|
| 21 |
-
tasks:
|
| 22 |
-
- id: single_incident
|
| 23 |
-
name: Single Incident Response
|
| 24 |
-
description: One incident with a small unit pool; learn basic dispatch, correct unit type, and response time.
|
| 25 |
-
- id: multi_incident
|
| 26 |
-
name: Simultaneous Multi-Incident
|
| 27 |
-
description: Multiple concurrent incidents requiring triage, prioritization, and correct unit matching.
|
| 28 |
-
- id: mass_casualty
|
| 29 |
-
name: Mass Casualty Event
|
| 30 |
-
description: Wave-based Priority-1 surge with resource conflict; maximize survival outcomes.
|
| 31 |
-
- id: shift_surge
|
| 32 |
-
name: Shift Surge
|
| 33 |
-
description: Incident waves combined with units going out of service; maintain coverage over time.
|
| 34 |
-
```
|
| 35 |
-
|
| 36 |
-
Verify with: `python -c "import yaml; yaml.safe_load(open('openenv.yaml')); print('YAML OK')"`
|
| 37 |
|
| 38 |
-
|
| 39 |
-
|
| 40 |
-
|
| 41 |
-
|
| 42 |
-
Add these two lines at the very bottom of `src/server/app.py`, after the `def main()` block:
|
| 43 |
-
```python
|
| 44 |
-
if __name__ == "__main__":
|
| 45 |
-
main()
|
| 46 |
-
```
|
| 47 |
-
|
| 48 |
-
Also update the `main()` function to:
|
| 49 |
-
```python
|
| 50 |
-
def main():
|
| 51 |
-
import uvicorn
|
| 52 |
-
uvicorn.run("src.server.app:app", host="0.0.0.0", port=8000, reload=False)
|
| 53 |
-
```
|
| 54 |
-
|
| 55 |
-
---
|
| 56 |
-
|
| 57 |
-
### 1.3 Fix `src/server/app.py` — `/reset` rejects empty body
|
| 58 |
-
|
| 59 |
-
Change `ResetRequest` so `task_id` has a default:
|
| 60 |
-
```python
|
| 61 |
-
class ResetRequest(BaseModel):
|
| 62 |
-
task_id: str = "single_incident"
|
| 63 |
-
seed: int | None = None
|
| 64 |
-
```
|
| 65 |
-
|
| 66 |
-
---
|
| 67 |
|
| 68 |
-
|
|
|
|
|
|
|
|
|
|
| 69 |
|
| 70 |
-
|
| 71 |
-
``
|
| 72 |
-
|
| 73 |
-
```
|
| 74 |
|
| 75 |
---
|
| 76 |
|
| 77 |
-
##
|
|
|
|
|
|
|
| 78 |
|
| 79 |
-
|
|
|
|
|
|
|
|
|
|
| 80 |
|
| 81 |
-
|
| 82 |
-
``
|
| 83 |
-
|
| 84 |
-
```
|
| 85 |
-
|
| 86 |
-
Also increase the timeout to 300 seconds if not already set.
|
| 87 |
-
|
| 88 |
-
---
|
| 89 |
-
|
| 90 |
-
### 2.2 Fix `pyproject.toml` — Add `asyncio_mode`
|
| 91 |
-
|
| 92 |
-
In `[tool.pytest.ini_options]`, add:
|
| 93 |
-
```toml
|
| 94 |
-
asyncio_mode = "auto"
|
| 95 |
-
```
|
| 96 |
|
| 97 |
---
|
| 98 |
|
| 99 |
-
###
|
| 100 |
-
|
| 101 |
-
|
| 102 |
-
```python
|
| 103 |
-
except Exception as e:
|
| 104 |
-
error_msg = "step_error"
|
| 105 |
-
print(
|
| 106 |
-
f"[STEP] step={step_count} action={action_str} "
|
| 107 |
-
f"reward=0.00 done=true error={error_msg}"
|
| 108 |
-
)
|
| 109 |
-
success = False
|
| 110 |
-
break
|
| 111 |
-
```
|
| 112 |
-
|
| 113 |
-
---
|
| 114 |
-
|
| 115 |
-
### 2.4 Fix `inference.py` — Score computation excludes reset reward
|
| 116 |
-
|
| 117 |
-
Change score computation to exclude the initial reset observation score:
|
| 118 |
-
```python
|
| 119 |
-
step_rewards = rewards[1:]
|
| 120 |
-
if step_rewards:
|
| 121 |
-
total_score = sum(step_rewards) / len(step_rewards)
|
| 122 |
-
else:
|
| 123 |
-
total_score = 0.0
|
| 124 |
-
total_score = max(0.0, min(1.0, total_score))
|
| 125 |
|
| 126 |
-
|
| 127 |
-
```
|
|
|
|
| 128 |
|
| 129 |
-
|
| 130 |
-
|
| 131 |
-
### 2.5 Fix `src/server/app.py` — Guard `get_dashboard_state` against None env
|
| 132 |
-
|
| 133 |
-
The `/dashboard/state` endpoint should return a safe empty structure before `/reset` is called. It already does this in the current code — verify it matches:
|
| 134 |
-
```python
|
| 135 |
-
@app.get("/dashboard/state")
|
| 136 |
-
async def get_dashboard_state() -> dict[str, Any]:
|
| 137 |
-
if _env is None:
|
| 138 |
-
return {
|
| 139 |
-
"units": {},
|
| 140 |
-
"incidents": {},
|
| 141 |
-
"episode_id": "not-initialized",
|
| 142 |
-
"step_count": 0,
|
| 143 |
-
"task_id": "none",
|
| 144 |
-
"city_time": 0.0,
|
| 145 |
-
"metadata": {},
|
| 146 |
-
"legal_actions": [],
|
| 147 |
-
"issues": [],
|
| 148 |
-
"observation": None,
|
| 149 |
-
}
|
| 150 |
-
# ... rest unchanged
|
| 151 |
-
```
|
| 152 |
|
| 153 |
---
|
| 154 |
|
| 155 |
-
## SECTION
|
| 156 |
-
|
| 157 |
-
### 3.1 Improve `src/tasks/single_incident.py` grader
|
| 158 |
-
|
| 159 |
-
Replace `SingleIncidentGrader.grade()` with:
|
| 160 |
-
```python
|
| 161 |
-
def grade(self, state: State, rewards: list[float]) -> float:
|
| 162 |
-
if not rewards:
|
| 163 |
-
return 0.0
|
| 164 |
-
|
| 165 |
-
incident = state.incidents.get("INC-001")
|
| 166 |
-
if incident is None:
|
| 167 |
-
return 0.0
|
| 168 |
|
| 169 |
-
|
|
|
|
|
|
|
| 170 |
|
| 171 |
-
|
| 172 |
-
|
|
|
|
| 173 |
|
| 174 |
-
|
| 175 |
-
|
| 176 |
-
|
| 177 |
-
u.assigned_incident_id == "INC-001"
|
| 178 |
-
or u.status.value in {"ON_SCENE", "DISPATCHED"}
|
| 179 |
-
)
|
| 180 |
-
for u in state.units.values()
|
| 181 |
-
)
|
| 182 |
-
if medic_dispatched:
|
| 183 |
-
score += 0.30
|
| 184 |
-
|
| 185 |
-
if incident.status.value == "RESOLVED" and state.step_count <= 10:
|
| 186 |
-
score += 0.20
|
| 187 |
-
|
| 188 |
-
return max(0.0, min(1.0, score))
|
| 189 |
-
```
|
| 190 |
|
| 191 |
---
|
| 192 |
|
| 193 |
-
###
|
| 194 |
-
|
| 195 |
-
|
| 196 |
-
```python
|
| 197 |
-
def grade(self, state: State, rewards: list[float]) -> float:
|
| 198 |
-
if not rewards:
|
| 199 |
-
return 0.0
|
| 200 |
-
|
| 201 |
-
total = len(state.incidents)
|
| 202 |
-
if total == 0:
|
| 203 |
-
return 0.0
|
| 204 |
-
|
| 205 |
-
resolved = sum(1 for i in state.incidents.values() if i.status.value == "RESOLVED")
|
| 206 |
-
failed = sum(1 for i in state.incidents.values() if i.status.value == "ESCALATED")
|
| 207 |
-
p1_total = sum(1 for i in state.incidents.values() if i.severity.value == "PRIORITY_1")
|
| 208 |
-
p1_resolved = sum(
|
| 209 |
-
1
|
| 210 |
-
for iid in state.metadata.get("resolved_incidents", [])
|
| 211 |
-
if state.incidents.get(iid)
|
| 212 |
-
and state.incidents[iid].severity.value == "PRIORITY_1"
|
| 213 |
-
)
|
| 214 |
|
| 215 |
-
|
| 216 |
-
|
| 217 |
-
|
|
|
|
|
|
|
| 218 |
|
| 219 |
-
|
| 220 |
-
|
| 221 |
-
```
|
| 222 |
|
| 223 |
---
|
| 224 |
|
| 225 |
-
##
|
| 226 |
|
| 227 |
-
|
| 228 |
-
```
|
| 229 |
-
|
| 230 |
-
if not rewards:
|
| 231 |
-
return 0.0
|
| 232 |
|
| 233 |
-
|
| 234 |
-
|
| 235 |
-
iid
|
| 236 |
-
for iid in state.metadata.get("resolved_incidents", [])
|
| 237 |
-
if iid in p1_seen and iid not in state.metadata.get("failed_incidents", [])
|
| 238 |
-
]
|
| 239 |
-
p1_failed = list(state.metadata.get("failed_incidents", []))
|
| 240 |
|
| 241 |
-
|
| 242 |
-
|
| 243 |
-
|
| 244 |
-
mean_reward = sum(rewards) / len(rewards)
|
| 245 |
-
score = 0.6 * survival_score + 0.3 * mean_reward - failure_penalty
|
| 246 |
-
return max(0.0, min(1.0, score))
|
| 247 |
-
```
|
| 248 |
-
|
| 249 |
-
---
|
| 250 |
-
|
| 251 |
-
### 3.4 Fix `src/rewards.py` — Triage key format mismatch
|
| 252 |
-
|
| 253 |
-
In `_compute_triage()`, the metadata lookup uses inconsistent key formats. Ensure it tries both:
|
| 254 |
-
```python
|
| 255 |
-
required_types = (
|
| 256 |
-
required_map.get(incident.incident_type.value, [])
|
| 257 |
-
or required_map.get(str(incident.incident_type), [])
|
| 258 |
-
)
|
| 259 |
-
```
|
| 260 |
|
| 261 |
---
|
| 262 |
|
| 263 |
-
##
|
| 264 |
|
| 265 |
-
|
| 266 |
-
```python
|
| 267 |
-
dx = abs(unit.location_x - incident.location_x)
|
| 268 |
-
dy = abs(unit.location_y - incident.location_y)
|
| 269 |
-
manhattan_dist = dx + dy
|
| 270 |
-
eta = manhattan_dist / max(speed, 1e-6)
|
| 271 |
-
```
|
| 272 |
|
| 273 |
-
-
|
| 274 |
-
|
| 275 |
-
## SECTION 4 — TEST FIXES
|
| 276 |
|
| 277 |
-
|
| 278 |
|
| 279 |
-
|
| 280 |
-
`
|
| 281 |
-
valid_errors = {"null", "max_steps_exceeded", "illegal_transition", "step_error"}
|
| 282 |
-
```
|
| 283 |
|
| 284 |
-
|
|
|
|
| 285 |
|
| 286 |
-
|
| 287 |
-
|
| 288 |
-
|
| 289 |
-
``
|
| 290 |
-
|
| 291 |
-
c = TestClient(server_app.app)
|
| 292 |
-
response = c.post("/reset", json={})
|
| 293 |
-
assert response.status_code == 200
|
| 294 |
-
data = response.json()
|
| 295 |
-
assert data["result"] == "dispatch center online"
|
| 296 |
-
|
| 297 |
-
def test_tasks_endpoint_returns_four_tasks(self) -> None:
|
| 298 |
-
c = TestClient(server_app.app)
|
| 299 |
-
response = c.get("/tasks")
|
| 300 |
-
assert response.status_code == 200
|
| 301 |
-
tasks = response.json()
|
| 302 |
-
assert len(tasks) == 4
|
| 303 |
-
task_ids = {t["task_id"] for t in tasks}
|
| 304 |
-
assert task_ids == {"single_incident", "multi_incident", "mass_casualty", "shift_surge"}
|
| 305 |
-
```
|
| 306 |
-
|
| 307 |
-
If missing, add them to the `TestTasksEndpoint` and `TestResetEndpoint` classes.
|
| 308 |
-
|
| 309 |
-
---
|
| 310 |
|
| 311 |
-
|
| 312 |
-
|
| 313 |
-
Run these commands in order and confirm each passes:
|
| 314 |
-
```bash
|
| 315 |
-
# 1. YAML parse check
|
| 316 |
-
python -c "import yaml; yaml.safe_load(open('openenv.yaml')); print('YAML OK')"
|
| 317 |
-
|
| 318 |
-
# 2. Full test suite
|
| 319 |
-
uv run python -m pytest tests/ -v --tb=short
|
| 320 |
-
|
| 321 |
-
# 3. Inference script with random agent
|
| 322 |
-
USE_RANDOM=true API_BASE_URL=https://api.openai.com/v1 MODEL_NAME=gpt-4 HF_TOKEN=x \
|
| 323 |
-
uv run python inference.py 2>&1 | grep -E '^\[(START|STEP|END)\]' | head -20
|
| 324 |
-
|
| 325 |
-
# 4. Demo script
|
| 326 |
-
uv run python demo.py
|
| 327 |
-
|
| 328 |
-
# 5. OpenEnv validate
|
| 329 |
-
uv run openenv validate
|
| 330 |
-
|
| 331 |
-
# 6. Docker build
|
| 332 |
-
docker build -t citywide-dispatch-supervisor .
|
| 333 |
-
|
| 334 |
-
# 7. Docker run + health check + empty reset
|
| 335 |
-
docker run -d -p 8000:8000 --name test-dispatch citywide-dispatch-supervisor
|
| 336 |
-
sleep 5
|
| 337 |
-
curl -s http://localhost:8000/health
|
| 338 |
-
curl -s -X POST http://localhost:8000/reset \
|
| 339 |
-
-H "Content-Type: application/json" -d '{}'
|
| 340 |
-
docker stop test-dispatch && docker rm test-dispatch
|
| 341 |
-
|
| 342 |
-
# 8. Benchmark scores all in [0.0, 1.0]
|
| 343 |
-
uv run python -c "
|
| 344 |
-
from src.benchmark import run_all
|
| 345 |
-
scores = run_all()
|
| 346 |
-
for task_id, score in scores.items():
|
| 347 |
-
assert 0.0 <= score <= 1.0, f'{task_id}: score {score} out of range'
|
| 348 |
-
print(f'{task_id}: {score:.3f}')
|
| 349 |
-
print('All scores in [0.0, 1.0] — PASS')
|
| 350 |
-
"
|
| 351 |
-
```
|
| 352 |
-
|
| 353 |
-
All 8 checks must pass before the submission is ready.
|
|
|
|
| 1 |
+
# Remaining Changes Needed — 911 Dispatch Supervisor (as of 2026-04-06)
|
| 2 |
|
| 3 |
+
This file lists ONLY the work still required to fully match the hackathon requirements provided (OpenAI client + OPENAI_API_KEY baseline, HF Spaces readiness, and portable validation tooling). Items already implemented (OpenEnv YAML, tasks/graders, reward shaping, Docker boot, /reset {} support, etc.) are intentionally omitted.
|
|
|
|
| 4 |
|
| 5 |
---
|
| 6 |
|
| 7 |
+
## SECTION 1 — BASELINE INFERENCE MUST USE OPENAI CLIENT + OPENAI_API_KEY (REQUIRED)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 8 |
|
| 9 |
+
### 1.1 Update inference auth variables to match requirement
|
| 10 |
+
**Problem:** The requirement explicitly calls for `OPENAI_API_KEY`. Current code requires `HF_TOKEN` and does not recognize `OPENAI_API_KEY`.
|
| 11 |
+
**Where:** [inference.py](inference.py), [README.md](README.md), [validate_local.py](validate_local.py), [tests/test_inference.py](tests/test_inference.py)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 12 |
|
| 13 |
+
**Action:**
|
| 14 |
+
- Treat `OPENAI_API_KEY` as the primary credential env var.
|
| 15 |
+
- Keep backward-compatible support for `HF_TOKEN` (optional), but do not require it.
|
| 16 |
+
- Update README Environment Variables table + examples to show `OPENAI_API_KEY`.
|
| 17 |
|
| 18 |
+
**Verify:**
|
| 19 |
+
- `OPENAI_API_KEY=x USE_RANDOM=true API_BASE_URL=https://api.openai.com/v1 MODEL_NAME=gpt-4 uv run python inference.py`
|
| 20 |
+
- Must run and print `[START]` / `[STEP]` / `[END]` lines.
|
|
|
|
| 21 |
|
| 22 |
---
|
| 23 |
|
| 24 |
+
### 1.2 Replace hand-rolled HTTPX chat call with the official OpenAI Python client
|
| 25 |
+
**Problem:** Requirement says “Uses the OpenAI API client”. Current LLM agent calls `/chat/completions` via HTTPX directly.
|
| 26 |
+
**Where:** [inference.py](inference.py)
|
| 27 |
|
| 28 |
+
**Action:**
|
| 29 |
+
- Implement the LLM agent using the `openai` Python package already present in dependencies.
|
| 30 |
+
- Continue supporting `API_BASE_URL` + `MODEL_NAME`.
|
| 31 |
+
- Ensure output format stays unchanged (tests depend on it).
|
| 32 |
|
| 33 |
+
**Verify:**
|
| 34 |
+
- With `USE_RANDOM=false` and a real key, it should complete at least one episode.
|
| 35 |
+
- With `USE_RANDOM=true`, it should not require any API key.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 36 |
|
| 37 |
---
|
| 38 |
|
| 39 |
+
### 1.3 Update env-var validation tests to reflect OPENAI_API_KEY support
|
| 40 |
+
**Problem:** Tests currently set `HF_TOKEN` and never mention `OPENAI_API_KEY`.
|
| 41 |
+
**Where:** [tests/test_inference.py](tests/test_inference.py)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 42 |
|
| 43 |
+
**Action:**
|
| 44 |
+
- Update tests to provide `OPENAI_API_KEY` instead of `HF_TOKEN` (or accept either).
|
| 45 |
+
- Add/adjust a test that asserts: missing `OPENAI_API_KEY` fails only when `USE_RANDOM != true`.
|
| 46 |
|
| 47 |
+
**Verify:**
|
| 48 |
+
- `uv run python -m pytest tests/test_inference.py -q` passes.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 49 |
|
| 50 |
---
|
| 51 |
|
| 52 |
+
## SECTION 2 — HF SPACES (DOCKER) READINESS (REQUIRED)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 53 |
|
| 54 |
+
### 2.1 Make server bind to the Hugging Face provided port
|
| 55 |
+
**Problem:** HF Docker Spaces typically set `PORT=7860`. Current server binds to port 8000 unconditionally.
|
| 56 |
+
**Where:** [src/server/app.py](src/server/app.py), and Docker entrypoints in [Dockerfile](Dockerfile) + [src/server/Dockerfile](src/server/Dockerfile)
|
| 57 |
|
| 58 |
+
**Action:**
|
| 59 |
+
- In the server `main()`, read port from `PORT` env var (default 8000).
|
| 60 |
+
- Ensure Docker CMD uses that same port behavior (either via the Python `main()` or uvicorn args).
|
| 61 |
|
| 62 |
+
**Verify:**
|
| 63 |
+
- `PORT=7860 uv run python -m src.server.app` listens on 7860.
|
| 64 |
+
- `docker run -e PORT=7860 -p 7860:7860 citywide-dispatch-supervisor` works and `/health` responds.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 65 |
|
| 66 |
---
|
| 67 |
|
| 68 |
+
### 2.2 Replace README “HF Space Placeholder” with real deploy instructions (or link)
|
| 69 |
+
**Problem:** Requirement says “Deploy to Hugging Face Spaces”. README currently has a placeholder only.
|
| 70 |
+
**Where:** [README.md](README.md)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 71 |
|
| 72 |
+
**Action:**
|
| 73 |
+
- Add either:
|
| 74 |
+
- A real link to the deployed Space, OR
|
| 75 |
+
- Minimal, accurate deployment steps for creating a Docker Space (with required tags already present).
|
| 76 |
+
- Mention expected public URL and what endpoints should work (`/health`, `/reset`, `/step`, `/state`).
|
| 77 |
|
| 78 |
+
**Verify:**
|
| 79 |
+
- README no longer contains “Placeholder”.
|
|
|
|
| 80 |
|
| 81 |
---
|
| 82 |
|
| 83 |
+
## SECTION 3 — PORTABLE VALIDATION TOOLING (STRONGLY RECOMMENDED)
|
| 84 |
|
| 85 |
+
### 3.1 Ensure `openenv validate` is installable from dependencies
|
| 86 |
+
**Problem:** Repo depends on `openenv-core`, but the CLI validator is provided by the `openenv` package. On a clean machine, `openenv validate` may be missing unless `openenv` is a dependency.
|
| 87 |
+
**Where:** [pyproject.toml](pyproject.toml), [requirements.txt](requirements.txt)
|
|
|
|
|
|
|
| 88 |
|
| 89 |
+
**Action:**
|
| 90 |
+
- Add `openenv>=0.2.0` (or the current compatible version) to dependencies so `openenv validate` is guaranteed available after install.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 91 |
|
| 92 |
+
**Verify:**
|
| 93 |
+
- In a fresh venv after installing dependencies: `uv run openenv validate` succeeds.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 94 |
|
| 95 |
---
|
| 96 |
|
| 97 |
+
## SECTION 4 — FINAL SUBMISSION CHECKS (RUN BEFORE SUBMITTING)
|
| 98 |
|
| 99 |
+
Run these in order:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 100 |
|
| 101 |
+
1) `python -c "import yaml; yaml.safe_load(open('openenv.yaml')); print('YAML OK')"`
|
|
|
|
|
|
|
| 102 |
|
| 103 |
+
2) `uv run python -m pytest tests/ -q`
|
| 104 |
|
| 105 |
+
3) Random baseline inference (no API key required):
|
| 106 |
+
- `USE_RANDOM=true API_BASE_URL=https://api.openai.com/v1 MODEL_NAME=gpt-4 uv run python inference.py`
|
|
|
|
|
|
|
| 107 |
|
| 108 |
+
4) Local structure validation:
|
| 109 |
+
- `uv run openenv validate`
|
| 110 |
|
| 111 |
+
5) Docker sanity:
|
| 112 |
+
- `docker build -t citywide-dispatch-supervisor .`
|
| 113 |
+
- `docker run -p 8000:8000 citywide-dispatch-supervisor`
|
| 114 |
+
- `curl -s http://localhost:8000/health`
|
| 115 |
+
- `curl -s -X POST http://localhost:8000/reset -H "Content-Type: application/json" -d '{}'`
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 116 |
|
| 117 |
+
All must pass.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
inference.py
CHANGED
|
@@ -7,6 +7,7 @@ import sys
|
|
| 7 |
from typing import Any
|
| 8 |
|
| 9 |
import httpx
|
|
|
|
| 10 |
|
| 11 |
from src.models import Action, DispatchAction
|
| 12 |
from src.openenv_environment import OpenEnvEnvironment
|
|
@@ -24,10 +25,15 @@ def _validate_env_vars() -> None:
|
|
| 24 |
)
|
| 25 |
|
| 26 |
use_random = os.environ.get("USE_RANDOM", "").lower() == "true"
|
| 27 |
-
|
| 28 |
-
|
| 29 |
-
|
| 30 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 31 |
|
| 32 |
|
| 33 |
def _get_env(var: str) -> str:
|
|
@@ -65,30 +71,38 @@ class LLMAgent:
|
|
| 65 |
self.base_url = base_url.rstrip("/")
|
| 66 |
self.model = model
|
| 67 |
|
|
|
|
|
|
|
|
|
|
| 68 |
async def chat(self, messages: list[dict]) -> str:
|
| 69 |
"""Send chat request to LLM endpoint with appropriate auth.
|
| 70 |
|
| 71 |
-
|
| 72 |
-
|
| 73 |
-
-
|
| 74 |
-
|
|
|
|
| 75 |
"""
|
| 76 |
is_gemini = "gemini" in self.base_url.lower()
|
| 77 |
-
headers = {"Content-Type": "application/json"}
|
| 78 |
-
|
| 79 |
if is_gemini:
|
|
|
|
|
|
|
| 80 |
url = f"{self.base_url}/chat/completions?key={self.api_key}"
|
| 81 |
-
|
| 82 |
-
|
| 83 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 84 |
|
| 85 |
-
|
| 86 |
-
|
| 87 |
-
|
| 88 |
-
|
| 89 |
-
|
| 90 |
-
data = resp.json()
|
| 91 |
-
return data["choices"][0]["message"]["content"]
|
| 92 |
|
| 93 |
async def select_action(
|
| 94 |
self, legal_actions: list[Action], state_desc: str, prev_obs: Any = None
|
|
@@ -305,8 +319,8 @@ async def main() -> int:
|
|
| 305 |
if use_random:
|
| 306 |
agent: RandomAgent | LLMAgent = RandomAgent(seed=42)
|
| 307 |
else:
|
| 308 |
-
|
| 309 |
-
agent = LLMAgent(api_key=
|
| 310 |
|
| 311 |
task_ids = ["single_incident", "multi_incident", "mass_casualty", "shift_surge"]
|
| 312 |
|
|
|
|
| 7 |
from typing import Any
|
| 8 |
|
| 9 |
import httpx
|
| 10 |
+
from openai import AsyncOpenAI
|
| 11 |
|
| 12 |
from src.models import Action, DispatchAction
|
| 13 |
from src.openenv_environment import OpenEnvEnvironment
|
|
|
|
| 25 |
)
|
| 26 |
|
| 27 |
use_random = os.environ.get("USE_RANDOM", "").lower() == "true"
|
| 28 |
+
if use_random:
|
| 29 |
+
return
|
| 30 |
+
|
| 31 |
+
# Prefer OPENAI_API_KEY for hackathon compliance; keep HF_TOKEN for backwards compatibility.
|
| 32 |
+
if os.environ.get("OPENAI_API_KEY"):
|
| 33 |
+
return
|
| 34 |
+
if os.environ.get("HF_TOKEN"):
|
| 35 |
+
return
|
| 36 |
+
raise EnvironmentError("Missing required environment variable: OPENAI_API_KEY")
|
| 37 |
|
| 38 |
|
| 39 |
def _get_env(var: str) -> str:
|
|
|
|
| 71 |
self.base_url = base_url.rstrip("/")
|
| 72 |
self.model = model
|
| 73 |
|
| 74 |
+
# Official OpenAI Python client for OpenAI-compatible endpoints.
|
| 75 |
+
self._client = AsyncOpenAI(api_key=self.api_key, base_url=self.base_url)
|
| 76 |
+
|
| 77 |
async def chat(self, messages: list[dict]) -> str:
|
| 78 |
"""Send chat request to LLM endpoint with appropriate auth.
|
| 79 |
|
| 80 |
+
Uses the official OpenAI client for OpenAI-compatible endpoints.
|
| 81 |
+
|
| 82 |
+
Note: Some non-OpenAI providers (e.g., certain Gemini endpoints) may not
|
| 83 |
+
be compatible with the OpenAI client; those are handled via a minimal
|
| 84 |
+
HTTPX fallback.
|
| 85 |
"""
|
| 86 |
is_gemini = "gemini" in self.base_url.lower()
|
|
|
|
|
|
|
| 87 |
if is_gemini:
|
| 88 |
+
# Fallback for Gemini-style "?key=" auth.
|
| 89 |
+
headers = {"Content-Type": "application/json"}
|
| 90 |
url = f"{self.base_url}/chat/completions?key={self.api_key}"
|
| 91 |
+
async with httpx.AsyncClient(timeout=60.0) as client:
|
| 92 |
+
resp = await client.post(
|
| 93 |
+
url,
|
| 94 |
+
json={"model": self.model, "messages": messages},
|
| 95 |
+
headers=headers,
|
| 96 |
+
)
|
| 97 |
+
resp.raise_for_status()
|
| 98 |
+
data = resp.json()
|
| 99 |
+
return data["choices"][0]["message"]["content"]
|
| 100 |
|
| 101 |
+
resp = await self._client.chat.completions.create(
|
| 102 |
+
model=self.model,
|
| 103 |
+
messages=messages,
|
| 104 |
+
)
|
| 105 |
+
return resp.choices[0].message.content or ""
|
|
|
|
|
|
|
| 106 |
|
| 107 |
async def select_action(
|
| 108 |
self, legal_actions: list[Action], state_desc: str, prev_obs: Any = None
|
|
|
|
| 319 |
if use_random:
|
| 320 |
agent: RandomAgent | LLMAgent = RandomAgent(seed=42)
|
| 321 |
else:
|
| 322 |
+
api_key = os.environ.get("OPENAI_API_KEY") or os.environ.get("HF_TOKEN", "")
|
| 323 |
+
agent = LLMAgent(api_key=api_key, base_url=api_base_url, model=model_name)
|
| 324 |
|
| 325 |
task_ids = ["single_incident", "multi_incident", "mass_casualty", "shift_surge"]
|
| 326 |
|
pyproject.toml
CHANGED
|
@@ -9,6 +9,7 @@ description = "911 Dispatch RL Environment. City-wide emergency dispatch benchma
|
|
| 9 |
requires-python = ">=3.11"
|
| 10 |
dependencies = [
|
| 11 |
"pydantic>=2.7",
|
|
|
|
| 12 |
"openenv-core>=0.2.0",
|
| 13 |
"fastapi>=0.110",
|
| 14 |
"uvicorn[standard]>=0.29",
|
|
|
|
| 9 |
requires-python = ">=3.11"
|
| 10 |
dependencies = [
|
| 11 |
"pydantic>=2.7",
|
| 12 |
+
"openenv>=0.2.0",
|
| 13 |
"openenv-core>=0.2.0",
|
| 14 |
"fastapi>=0.110",
|
| 15 |
"uvicorn[standard]>=0.29",
|
requirements.txt
CHANGED
|
@@ -1,4 +1,5 @@
|
|
| 1 |
pydantic>=2.7
|
|
|
|
| 2 |
openenv-core>=0.2.0
|
| 3 |
fastapi>=0.110
|
| 4 |
uvicorn[standard]>=0.29
|
|
|
|
| 1 |
pydantic>=2.7
|
| 2 |
+
openenv>=0.2.0
|
| 3 |
openenv-core>=0.2.0
|
| 4 |
fastapi>=0.110
|
| 5 |
uvicorn[standard]>=0.29
|
src/server/Dockerfile
CHANGED
|
@@ -10,4 +10,4 @@ COPY data/ /app/data/
|
|
| 10 |
|
| 11 |
EXPOSE 8000
|
| 12 |
|
| 13 |
-
CMD ["
|
|
|
|
| 10 |
|
| 11 |
EXPOSE 8000
|
| 12 |
|
| 13 |
+
CMD ["sh", "-c", "uvicorn src.server.app:app --host 0.0.0.0 --port ${PORT:-8000}"]
|
src/server/app.py
CHANGED
|
@@ -173,8 +173,10 @@ async def get_dashboard_state() -> dict[str, Any]:
|
|
| 173 |
|
| 174 |
def main():
|
| 175 |
import uvicorn
|
|
|
|
| 176 |
|
| 177 |
-
|
|
|
|
| 178 |
|
| 179 |
|
| 180 |
if __name__ == "__main__":
|
|
|
|
| 173 |
|
| 174 |
def main():
|
| 175 |
import uvicorn
|
| 176 |
+
import os
|
| 177 |
|
| 178 |
+
port = int(os.environ.get("PORT", "8000"))
|
| 179 |
+
uvicorn.run("src.server.app:app", host="0.0.0.0", port=port, reload=False)
|
| 180 |
|
| 181 |
|
| 182 |
if __name__ == "__main__":
|
tests/test_inference.py
CHANGED
|
@@ -29,7 +29,7 @@ class TestInferenceFormatCompliance:
|
|
| 29 |
env = {
|
| 30 |
"API_BASE_URL": "https://api.example.com",
|
| 31 |
"MODEL_NAME": "test-model",
|
| 32 |
-
"
|
| 33 |
"USE_RANDOM": "true",
|
| 34 |
}
|
| 35 |
returncode, stdout, stderr = self._run_inference_capture(env)
|
|
@@ -46,7 +46,7 @@ class TestInferenceFormatCompliance:
|
|
| 46 |
env = {
|
| 47 |
"API_BASE_URL": "https://api.example.com",
|
| 48 |
"MODEL_NAME": "test-model",
|
| 49 |
-
"
|
| 50 |
"USE_RANDOM": "true",
|
| 51 |
}
|
| 52 |
_, stdout, _ = self._run_inference_capture(env)
|
|
@@ -59,7 +59,7 @@ class TestInferenceFormatCompliance:
|
|
| 59 |
env = {
|
| 60 |
"API_BASE_URL": "https://api.example.com",
|
| 61 |
"MODEL_NAME": "test-model",
|
| 62 |
-
"
|
| 63 |
"USE_RANDOM": "true",
|
| 64 |
}
|
| 65 |
_, stdout, _ = self._run_inference_capture(env)
|
|
@@ -84,6 +84,10 @@ class TestEnvVarValidation:
|
|
| 84 |
merged_env.pop("API_BASE_URL", None)
|
| 85 |
if "MODEL_NAME" not in env:
|
| 86 |
merged_env.pop("MODEL_NAME", None)
|
|
|
|
|
|
|
|
|
|
|
|
|
| 87 |
result = subprocess.run(
|
| 88 |
cmd,
|
| 89 |
capture_output=True,
|
|
@@ -94,13 +98,23 @@ class TestEnvVarValidation:
|
|
| 94 |
return result.returncode, result.stdout, result.stderr
|
| 95 |
|
| 96 |
def test_missing_api_base_url(self) -> None:
|
| 97 |
-
env = {"MODEL_NAME": "m", "
|
| 98 |
returncode, stdout, stderr = self._run_inference_capture(env)
|
| 99 |
assert returncode != 0
|
| 100 |
assert "API_BASE_URL" in (stdout + stderr)
|
| 101 |
|
| 102 |
def test_missing_model_name(self) -> None:
|
| 103 |
-
env = {"API_BASE_URL": "x", "
|
| 104 |
returncode, stdout, stderr = self._run_inference_capture(env)
|
| 105 |
assert returncode != 0
|
| 106 |
assert "MODEL_NAME" in (stdout + stderr)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 29 |
env = {
|
| 30 |
"API_BASE_URL": "https://api.example.com",
|
| 31 |
"MODEL_NAME": "test-model",
|
| 32 |
+
"OPENAI_API_KEY": "test-token",
|
| 33 |
"USE_RANDOM": "true",
|
| 34 |
}
|
| 35 |
returncode, stdout, stderr = self._run_inference_capture(env)
|
|
|
|
| 46 |
env = {
|
| 47 |
"API_BASE_URL": "https://api.example.com",
|
| 48 |
"MODEL_NAME": "test-model",
|
| 49 |
+
"OPENAI_API_KEY": "test-token",
|
| 50 |
"USE_RANDOM": "true",
|
| 51 |
}
|
| 52 |
_, stdout, _ = self._run_inference_capture(env)
|
|
|
|
| 59 |
env = {
|
| 60 |
"API_BASE_URL": "https://api.example.com",
|
| 61 |
"MODEL_NAME": "test-model",
|
| 62 |
+
"OPENAI_API_KEY": "test-token",
|
| 63 |
"USE_RANDOM": "true",
|
| 64 |
}
|
| 65 |
_, stdout, _ = self._run_inference_capture(env)
|
|
|
|
| 84 |
merged_env.pop("API_BASE_URL", None)
|
| 85 |
if "MODEL_NAME" not in env:
|
| 86 |
merged_env.pop("MODEL_NAME", None)
|
| 87 |
+
if "OPENAI_API_KEY" not in env:
|
| 88 |
+
merged_env.pop("OPENAI_API_KEY", None)
|
| 89 |
+
if "HF_TOKEN" not in env:
|
| 90 |
+
merged_env.pop("HF_TOKEN", None)
|
| 91 |
result = subprocess.run(
|
| 92 |
cmd,
|
| 93 |
capture_output=True,
|
|
|
|
| 98 |
return result.returncode, result.stdout, result.stderr
|
| 99 |
|
| 100 |
def test_missing_api_base_url(self) -> None:
|
| 101 |
+
env = {"MODEL_NAME": "m", "OPENAI_API_KEY": "t", "USE_RANDOM": "true"}
|
| 102 |
returncode, stdout, stderr = self._run_inference_capture(env)
|
| 103 |
assert returncode != 0
|
| 104 |
assert "API_BASE_URL" in (stdout + stderr)
|
| 105 |
|
| 106 |
def test_missing_model_name(self) -> None:
|
| 107 |
+
env = {"API_BASE_URL": "x", "OPENAI_API_KEY": "t", "USE_RANDOM": "true"}
|
| 108 |
returncode, stdout, stderr = self._run_inference_capture(env)
|
| 109 |
assert returncode != 0
|
| 110 |
assert "MODEL_NAME" in (stdout + stderr)
|
| 111 |
+
|
| 112 |
+
def test_missing_openai_api_key_when_not_random(self) -> None:
|
| 113 |
+
env = {
|
| 114 |
+
"API_BASE_URL": "https://api.example.com",
|
| 115 |
+
"MODEL_NAME": "m",
|
| 116 |
+
"USE_RANDOM": "false",
|
| 117 |
+
}
|
| 118 |
+
returncode, stdout, stderr = self._run_inference_capture(env)
|
| 119 |
+
assert returncode != 0
|
| 120 |
+
assert "OPENAI_API_KEY" in (stdout + stderr)
|
validate_local.py
CHANGED
|
@@ -39,7 +39,7 @@ def check_inference() -> bool:
|
|
| 39 |
env = os.environ.copy()
|
| 40 |
env["API_BASE_URL"] = "https://api.openai.com/v1"
|
| 41 |
env["MODEL_NAME"] = "gpt-4"
|
| 42 |
-
env["
|
| 43 |
env["USE_RANDOM"] = "true"
|
| 44 |
|
| 45 |
print("\nNOTE: Running inference.py in random-agent mode for local validation")
|
|
|
|
| 39 |
env = os.environ.copy()
|
| 40 |
env["API_BASE_URL"] = "https://api.openai.com/v1"
|
| 41 |
env["MODEL_NAME"] = "gpt-4"
|
| 42 |
+
env["OPENAI_API_KEY"] = "dummy-token-for-local-validation"
|
| 43 |
env["USE_RANDOM"] = "true"
|
| 44 |
|
| 45 |
print("\nNOTE: Running inference.py in random-agent mode for local validation")
|