Spaces:

garvitsachdeva
/

911

Sleeping

App Files Files Community

garvitsachdeva commited on Apr 6

Commit

43f2683

1 Parent(s): 8c359c3

Finalize OpenEnv baseline: OpenAI client, PORT binding, and docs

Browse files

Files changed (10) hide show

Dockerfile +1 -1
README.md +21 -5
changes.md +73 -309
inference.py +36 -22
pyproject.toml +1 -0
requirements.txt +1 -0
src/server/Dockerfile +1 -1
src/server/app.py +3 -1
tests/test_inference.py +19 -5
validate_local.py +1 -1

Dockerfile CHANGED Viewed

@@ -5,4 +5,4 @@ WORKDIR /app
 COPY . /app
 RUN pip install uv && uv sync --frozen
 EXPOSE 8000
-CMD ["uv", "run", "uvicorn", "src.server.app:app", "--host", "0.0.0.0", "--port", "8000"]

 COPY . /app
 RUN pip install uv && uv sync --frozen
 EXPOSE 8000
+CMD ["sh", "-c", "uv run uvicorn src.server.app:app --host 0.0.0.0 --port ${PORT:-8000}"]

README.md CHANGED Viewed

@@ -35,9 +35,12 @@ This project implements a benchmark environment for training and evaluating LLM
 |----------|----------|-------------|
 | `API_BASE_URL` | Yes | OpenAI-compatible endpoint base URL |
 | `MODEL_NAME` | Yes | Model identifier string |
-| `HF_TOKEN` | Yes (unless `USE_RANDOM=true`) | API key / HF token |
 | `USE_RANDOM` | No | Set to `true` to use deterministic random agent (no LLM) |
 ## Tasks
 ### 1. `single_incident`
@@ -114,8 +117,9 @@ uv sync
 # Run the demo (non-interactive episode visualization)
 uv run python demo.py
-# Run inference with LLM agent
-uv run python inference.py
 # Run API server
 uv run python -m src.server.app
@@ -144,7 +148,7 @@ python inference.py
 Run the random baseline agent against all 4 tasks:
 ```bash
-USE_RANDOM=true API_BASE_URL=https://api.openai.com/v1 MODEL_NAME=gpt-4 HF_TOKEN=x python inference.py
 ```
 Expected output (approximate):
@@ -243,7 +247,19 @@ curl -X POST http://localhost:8000/reset -H "Content-Type: application/json" -d
 ## HF Space
-**Placeholder**: (add link here)
 ## License

 |----------|----------|-------------|
 | `API_BASE_URL` | Yes | OpenAI-compatible endpoint base URL |
 | `MODEL_NAME` | Yes | Model identifier string |
+| `OPENAI_API_KEY` | Yes (unless `USE_RANDOM=true`) | API key used by the OpenAI Python client |
 | `USE_RANDOM` | No | Set to `true` to use deterministic random agent (no LLM) |
+Notes:
+- `HF_TOKEN` is supported as a backwards-compatible alias for `OPENAI_API_KEY`.
 ## Tasks
 ### 1. `single_incident`
 # Run the demo (non-interactive episode visualization)
 uv run python demo.py
+# Run inference (random baseline, no API calls)
+USE_RANDOM=true API_BASE_URL=https://api.openai.com/v1 MODEL_NAME=gpt-4 OPENAI_API_KEY=x \
+  uv run python inference.py
 # Run API server
 uv run python -m src.server.app
 Run the random baseline agent against all 4 tasks:
 ```bash
+USE_RANDOM=true API_BASE_URL=https://api.openai.com/v1 MODEL_NAME=gpt-4 OPENAI_API_KEY=x python inference.py
 ```
 Expected output (approximate):
 ## HF Space
+### Deploying to Hugging Face Spaces (Docker)
+This repository is compatible with **Docker Spaces** (the README frontmatter includes `sdk: docker` and the Space tags include `openenv`).
+1) Create a new Space → choose **Docker**.
+2) Push this repository to the Space.
+3) The server binds to the `PORT` environment variable (HF commonly sets `PORT=7860`).
+Once running, the Space should respond to:
+- `GET /health`
+- `POST /reset`
+- `POST /step`
+- `GET /state`
 ## License

changes.md CHANGED Viewed

@@ -1,353 +1,117 @@
-# 911 Dispatch Supervisor — Fix & Polish for OpenEnv Submission
-You are working on the repo at the current directory. Apply ALL fixes below in order.
-Do not skip any item. After all fixes, run the final validation checklist.
 ---
-## SECTION 1 — CRITICAL BUGS (fix these first)
-### 1.1 Fix `openenv.yaml` — Replace entire file content
-The file uses hard tab characters which breaks YAML parsing. Replace the entire file with:
-```yaml
-name: citywide-dispatch-supervisor
-version: "0.1.0"
-description: >
-  City-wide 911 emergency dispatch supervisor RL environment.
-  An LLM agent learns to manage simultaneous incidents by dispatching
-  police, fire, and EMS units across a city grid under realistic constraints.
-entrypoint: src.openenv_environment:OpenEnvEnvironment
-tasks:
-  - id: single_incident
-    name: Single Incident Response
-    description: One incident with a small unit pool; learn basic dispatch, correct unit type, and response time.
-  - id: multi_incident
-    name: Simultaneous Multi-Incident
-    description: Multiple concurrent incidents requiring triage, prioritization, and correct unit matching.
-  - id: mass_casualty
-    name: Mass Casualty Event
-    description: Wave-based Priority-1 surge with resource conflict; maximize survival outcomes.
-  - id: shift_surge
-    name: Shift Surge
-    description: Incident waves combined with units going out of service; maintain coverage over time.
-```
-Verify with: `python -c "import yaml; yaml.safe_load(open('openenv.yaml')); print('YAML OK')"`
----
-### 1.2 Fix `src/server/app.py` — Server never starts
-Add these two lines at the very bottom of `src/server/app.py`, after the `def main()` block:
-```python
-if __name__ == "__main__":
-    main()
-```
-Also update the `main()` function to:
-```python
-def main():
-    import uvicorn
-    uvicorn.run("src.server.app:app", host="0.0.0.0", port=8000, reload=False)
-```
----
-### 1.3 Fix `src/server/app.py` — `/reset` rejects empty body
-Change `ResetRequest` so `task_id` has a default:
-```python
-class ResetRequest(BaseModel):
-    task_id: str = "single_incident"
-    seed: int | None = None
-```
----
-### 1.4 Fix `Dockerfile` — Use uvicorn directly in CMD
-Replace the CMD line in the root `Dockerfile` with:
-```dockerfile
-CMD ["uv", "run", "uvicorn", "src.server.app:app", "--host", "0.0.0.0", "--port", "8000"]
-```
 ---
-## SECTION 2 — HIGH PRIORITY BUGS
-### 2.1 Fix `validate_local.py` — `check_inference()` never uses random mode
-In `validate_local.py`, inside `check_inference()`, add `env["USE_RANDOM"] = "true"` before the `subprocess.run` call:
-```python
-env["USE_RANDOM"] = "true"
-```
-Also increase the timeout to 300 seconds if not already set.
----
-### 2.2 Fix `pyproject.toml` — Add `asyncio_mode`
-In `[tool.pytest.ini_options]`, add:
-```toml
-asyncio_mode = "auto"
-```
 ---
-### 2.3 Fix `inference.py` — Normalize exception error token
-In `inference.py`, inside the inner `except Exception as e` block within the step loop, change the error string:
-```python
-except Exception as e:
-    error_msg = "step_error"
-    print(
-        f"[STEP] step={step_count} action={action_str} "
-        f"reward=0.00 done=true error={error_msg}"
-    )
-    success = False
-    break
-```
----
-### 2.4 Fix `inference.py` — Score computation excludes reset reward
-Change score computation to exclude the initial reset observation score:
-```python
-step_rewards = rewards[1:]
-if step_rewards:
-    total_score = sum(step_rewards) / len(step_rewards)
-else:
-    total_score = 0.0
-total_score = max(0.0, min(1.0, total_score))
-rewards_str = ",".join(f"{r:.2f}" for r in step_rewards) if step_rewards else "0.00"
-```
----
-### 2.5 Fix `src/server/app.py` — Guard `get_dashboard_state` against None env
-The `/dashboard/state` endpoint should return a safe empty structure before `/reset` is called. It already does this in the current code — verify it matches:
-```python
-@app.get("/dashboard/state")
-async def get_dashboard_state() -> dict[str, Any]:
-    if _env is None:
-        return {
-            "units": {},
-            "incidents": {},
-            "episode_id": "not-initialized",
-            "step_count": 0,
-            "task_id": "none",
-            "city_time": 0.0,
-            "metadata": {},
-            "legal_actions": [],
-            "issues": [],
-            "observation": None,
-        }
-    # ... rest unchanged
-```
 ---
-## SECTION 3 — ENVIRONMENT DESIGN IMPROVEMENTS
-### 3.1 Improve `src/tasks/single_incident.py` grader
-Replace `SingleIncidentGrader.grade()` with:
-```python
-def grade(self, state: State, rewards: list[float]) -> float:
-    if not rewards:
-        return 0.0
-    incident = state.incidents.get("INC-001")
-    if incident is None:
-        return 0.0
-    score = 0.0
-    if incident.status.value == "RESOLVED":
-        score += 0.50
-    medic_dispatched = any(
-        u.unit_type.value == "MEDIC"
-        and (
-            u.assigned_incident_id == "INC-001"
-            or u.status.value in {"ON_SCENE", "DISPATCHED"}
-        )
-        for u in state.units.values()
-    )
-    if medic_dispatched:
-        score += 0.30
-    if incident.status.value == "RESOLVED" and state.step_count <= 10:
-        score += 0.20
-    return max(0.0, min(1.0, score))
-```
 ---
-### 3.2 Improve `src/tasks/multi_incident.py` grader
-Replace `MultiIncidentGrader.grade()` with:
-```python
-def grade(self, state: State, rewards: list[float]) -> float:
-    if not rewards:
-        return 0.0
-    total = len(state.incidents)
-    if total == 0:
-        return 0.0
-    resolved = sum(1 for i in state.incidents.values() if i.status.value == "RESOLVED")
-    failed = sum(1 for i in state.incidents.values() if i.status.value == "ESCALATED")
-    p1_total = sum(1 for i in state.incidents.values() if i.severity.value == "PRIORITY_1")
-    p1_resolved = sum(
-        1
-        for iid in state.metadata.get("resolved_incidents", [])
-        if state.incidents.get(iid)
-        and state.incidents[iid].severity.value == "PRIORITY_1"
-    )
-    resolution_score = resolved / total
-    p1_score = (p1_resolved / p1_total) if p1_total > 0 else 1.0
-    failure_penalty = failed / total
-    score = 0.5 * p1_score + 0.3 * resolution_score - 0.2 * failure_penalty
-    return max(0.0, min(1.0, score))
-```
 ---
-### 3.3 Improve `src/tasks/mass_casualty.py` grader
-Replace `MassCasualtyGrader.grade()` with:
-```python
-def grade(self, state: State, rewards: list[float]) -> float:
-    if not rewards:
-        return 0.0
-    p1_seen = list(state.metadata.get("p1_seen", []))
-    p1_resolved = [
-        iid
-        for iid in state.metadata.get("resolved_incidents", [])
-        if iid in p1_seen and iid not in state.metadata.get("failed_incidents", [])
-    ]
-    p1_failed = list(state.metadata.get("failed_incidents", []))
-    survival_score = len(p1_resolved) / max(len(p1_seen), 1)
-    failure_penalty = len(p1_failed) / max(len(p1_seen), 1) * 0.5
-    mean_reward = sum(rewards) / len(rewards)
-    score = 0.6 * survival_score + 0.3 * mean_reward - failure_penalty
-    return max(0.0, min(1.0, score))
-```
----
-### 3.4 Fix `src/rewards.py` — Triage key format mismatch
-In `_compute_triage()`, the metadata lookup uses inconsistent key formats. Ensure it tries both:
-```python
-required_types = (
-    required_map.get(incident.incident_type.value, [])
-    or required_map.get(str(incident.incident_type), [])
-)
-```
 ---
-### 3.5 Fix `src/state_machine.py` — Use Manhattan distance for ETA
-In `_apply_dispatch()`, replace Euclidean distance with Manhattan:
-```python
-dx = abs(unit.location_x - incident.location_x)
-dy = abs(unit.location_y - incident.location_y)
-manhattan_dist = dx + dy
-eta = manhattan_dist / max(speed, 1e-6)
-```
----
-## SECTION 4 — TEST FIXES
-### 4.1 Update `tests/test_inference.py` — Add `step_error` to valid error tokens
-Find `valid_errors` in `test_step_line_error_format` and add `"step_error"`:
-```python
-valid_errors = {"null", "max_steps_exceeded", "illegal_transition", "step_error"}
-```
----
-### 4.2 Verify `tests/test_openenv_integration.py` has these two tests
-Confirm the following tests exist (they appear to be already present based on the file):
-```python
-def test_reset_with_empty_body_returns_200(self) -> None:
-    c = TestClient(server_app.app)
-    response = c.post("/reset", json={})
-    assert response.status_code == 200
-    data = response.json()
-    assert data["result"] == "dispatch center online"
-def test_tasks_endpoint_returns_four_tasks(self) -> None:
-    c = TestClient(server_app.app)
-    response = c.get("/tasks")
-    assert response.status_code == 200
-    tasks = response.json()
-    assert len(tasks) == 4
-    task_ids = {t["task_id"] for t in tasks}
-    assert task_ids == {"single_incident", "multi_incident", "mass_casualty", "shift_surge"}
-```
-If missing, add them to the `TestTasksEndpoint` and `TestResetEndpoint` classes.
----
-## SECTION 5 — FINAL VALIDATION CHECKLIST
-Run these commands in order and confirm each passes:
-```bash
-# 1. YAML parse check
-python -c "import yaml; yaml.safe_load(open('openenv.yaml')); print('YAML OK')"
-# 2. Full test suite
-uv run python -m pytest tests/ -v --tb=short
-# 3. Inference script with random agent
-USE_RANDOM=true API_BASE_URL=https://api.openai.com/v1 MODEL_NAME=gpt-4 HF_TOKEN=x \
-  uv run python inference.py 2>&1 | grep -E '^\[(START|STEP|END)\]' | head -20
-# 4. Demo script
-uv run python demo.py
-# 5. OpenEnv validate
-uv run openenv validate
-# 6. Docker build
-docker build -t citywide-dispatch-supervisor .
-# 7. Docker run + health check + empty reset
-docker run -d -p 8000:8000 --name test-dispatch citywide-dispatch-supervisor
-sleep 5
-curl -s http://localhost:8000/health
-curl -s -X POST http://localhost:8000/reset \
-  -H "Content-Type: application/json" -d '{}'
-docker stop test-dispatch && docker rm test-dispatch
-# 8. Benchmark scores all in [0.0, 1.0]
-uv run python -c "
-from src.benchmark import run_all
-scores = run_all()
-for task_id, score in scores.items():
-    assert 0.0 <= score <= 1.0, f'{task_id}: score {score} out of range'
-    print(f'{task_id}: {score:.3f}')
-print('All scores in [0.0, 1.0] — PASS')
-"
-```
-All 8 checks must pass before the submission is ready.

+# Remaining Changes Needed — 911 Dispatch Supervisor (as of 2026-04-06)
+This file lists ONLY the work still required to fully match the hackathon requirements provided (OpenAI client + OPENAI_API_KEY baseline, HF Spaces readiness, and portable validation tooling). Items already implemented (OpenEnv YAML, tasks/graders, reward shaping, Docker boot, /reset {} support, etc.) are intentionally omitted.
 ---
+## SECTION 1 — BASELINE INFERENCE MUST USE OPENAI CLIENT + OPENAI_API_KEY (REQUIRED)
+### 1.1 Update inference auth variables to match requirement
+**Problem:** The requirement explicitly calls for `OPENAI_API_KEY`. Current code requires `HF_TOKEN` and does not recognize `OPENAI_API_KEY`.
+**Where:** [inference.py](inference.py), [README.md](README.md), [validate_local.py](validate_local.py), [tests/test_inference.py](tests/test_inference.py)
+**Action:**
+- Treat `OPENAI_API_KEY` as the primary credential env var.
+- Keep backward-compatible support for `HF_TOKEN` (optional), but do not require it.
+- Update README Environment Variables table + examples to show `OPENAI_API_KEY`.
+**Verify:**
+- `OPENAI_API_KEY=x USE_RANDOM=true API_BASE_URL=https://api.openai.com/v1 MODEL_NAME=gpt-4 uv run python inference.py`
+  - Must run and print `[START]` / `[STEP]` / `[END]` lines.
 ---
+### 1.2 Replace hand-rolled HTTPX chat call with the official OpenAI Python client
+**Problem:** Requirement says “Uses the OpenAI API client”. Current LLM agent calls `/chat/completions` via HTTPX directly.
+**Where:** [inference.py](inference.py)
+**Action:**
+- Implement the LLM agent using the `openai` Python package already present in dependencies.
+- Continue supporting `API_BASE_URL` + `MODEL_NAME`.
+- Ensure output format stays unchanged (tests depend on it).
+**Verify:**
+- With `USE_RANDOM=false` and a real key, it should complete at least one episode.
+- With `USE_RANDOM=true`, it should not require any API key.
 ---
+### 1.3 Update env-var validation tests to reflect OPENAI_API_KEY support
+**Problem:** Tests currently set `HF_TOKEN` and never mention `OPENAI_API_KEY`.
+**Where:** [tests/test_inference.py](tests/test_inference.py)
+**Action:**
+- Update tests to provide `OPENAI_API_KEY` instead of `HF_TOKEN` (or accept either).
+- Add/adjust a test that asserts: missing `OPENAI_API_KEY` fails only when `USE_RANDOM != true`.
+**Verify:**
+- `uv run python -m pytest tests/test_inference.py -q` passes.
 ---
+## SECTION 2 — HF SPACES (DOCKER) READINESS (REQUIRED)
+### 2.1 Make server bind to the Hugging Face provided port
+**Problem:** HF Docker Spaces typically set `PORT=7860`. Current server binds to port 8000 unconditionally.
+**Where:** [src/server/app.py](src/server/app.py), and Docker entrypoints in [Dockerfile](Dockerfile) + [src/server/Dockerfile](src/server/Dockerfile)
+**Action:**
+- In the server `main()`, read port from `PORT` env var (default 8000).
+- Ensure Docker CMD uses that same port behavior (either via the Python `main()` or uvicorn args).
+**Verify:**
+- `PORT=7860 uv run python -m src.server.app` listens on 7860.
+- `docker run -e PORT=7860 -p 7860:7860 citywide-dispatch-supervisor` works and `/health` responds.
 ---
+### 2.2 Replace README “HF Space Placeholder” with real deploy instructions (or link)
+**Problem:** Requirement says “Deploy to Hugging Face Spaces”. README currently has a placeholder only.
+**Where:** [README.md](README.md)
+**Action:**
+- Add either:
+  - A real link to the deployed Space, OR
+  - Minimal, accurate deployment steps for creating a Docker Space (with required tags already present).
+- Mention expected public URL and what endpoints should work (`/health`, `/reset`, `/step`, `/state`).
+**Verify:**
+- README no longer contains “Placeholder”.
 ---
+## SECTION 3 — PORTABLE VALIDATION TOOLING (STRONGLY RECOMMENDED)
+### 3.1 Ensure `openenv validate` is installable from dependencies
+**Problem:** Repo depends on `openenv-core`, but the CLI validator is provided by the `openenv` package. On a clean machine, `openenv validate` may be missing unless `openenv` is a dependency.
+**Where:** [pyproject.toml](pyproject.toml), [requirements.txt](requirements.txt)
+**Action:**
+- Add `openenv>=0.2.0` (or the current compatible version) to dependencies so `openenv validate` is guaranteed available after install.
+**Verify:**
+- In a fresh venv after installing dependencies: `uv run openenv validate` succeeds.
 ---
+## SECTION 4 — FINAL SUBMISSION CHECKS (RUN BEFORE SUBMITTING)
+Run these in order:
+1) `python -c "import yaml; yaml.safe_load(open('openenv.yaml')); print('YAML OK')"`
+2) `uv run python -m pytest tests/ -q`
+3) Random baseline inference (no API key required):
+- `USE_RANDOM=true API_BASE_URL=https://api.openai.com/v1 MODEL_NAME=gpt-4 uv run python inference.py`
+4) Local structure validation:
+- `uv run openenv validate`
+5) Docker sanity:
+- `docker build -t citywide-dispatch-supervisor .`
+- `docker run -p 8000:8000 citywide-dispatch-supervisor`
+- `curl -s http://localhost:8000/health`
+- `curl -s -X POST http://localhost:8000/reset -H "Content-Type: application/json" -d '{}'`
+All must pass.

inference.py CHANGED Viewed

@@ -7,6 +7,7 @@ import sys
 from typing import Any
 import httpx
 from src.models import Action, DispatchAction
 from src.openenv_environment import OpenEnvEnvironment
@@ -24,10 +25,15 @@ def _validate_env_vars() -> None:
         )
     use_random = os.environ.get("USE_RANDOM", "").lower() == "true"
-    api_base_url = os.environ.get("API_BASE_URL", "")
-    is_gemini = "gemini" in api_base_url.lower()
-    if not use_random and not is_gemini and not os.environ.get("HF_TOKEN"):
-        raise EnvironmentError("Missing required environment variable: HF_TOKEN")
 def _get_env(var: str) -> str:
@@ -65,30 +71,38 @@ class LLMAgent:
         self.base_url = base_url.rstrip("/")
         self.model = model
     async def chat(self, messages: list[dict]) -> str:
         """Send chat request to LLM endpoint with appropriate auth.
-        Auth method depends on endpoint:
-        - Gemini (contains 'gemini'): use ?key= query param
-        - Groq (contains 'groq'): use Authorization: Bearer header
-        - Other OpenAI-compatible: use Authorization: Bearer header
         """
         is_gemini = "gemini" in self.base_url.lower()
-        headers = {"Content-Type": "application/json"}
         if is_gemini:
             url = f"{self.base_url}/chat/completions?key={self.api_key}"
-        else:
-            url = f"{self.base_url}/chat/completions"
-            headers["Authorization"] = f"Bearer {self.api_key}"
-        async with httpx.AsyncClient(timeout=60.0) as client:
-            resp = await client.post(
-                url, json={"model": self.model, "messages": messages}, headers=headers
-            )
-            resp.raise_for_status()
-            data = resp.json()
-            return data["choices"][0]["message"]["content"]
     async def select_action(
         self, legal_actions: list[Action], state_desc: str, prev_obs: Any = None
@@ -305,8 +319,8 @@ async def main() -> int:
         if use_random:
             agent: RandomAgent | LLMAgent = RandomAgent(seed=42)
         else:
-            hf_token = os.environ.get("HF_TOKEN", "")
-            agent = LLMAgent(api_key=hf_token, base_url=api_base_url, model=model_name)
         task_ids = ["single_incident", "multi_incident", "mass_casualty", "shift_surge"]

 from typing import Any
 import httpx
+from openai import AsyncOpenAI
 from src.models import Action, DispatchAction
 from src.openenv_environment import OpenEnvEnvironment
         )
     use_random = os.environ.get("USE_RANDOM", "").lower() == "true"
+    if use_random:
+        return
+    # Prefer OPENAI_API_KEY for hackathon compliance; keep HF_TOKEN for backwards compatibility.
+    if os.environ.get("OPENAI_API_KEY"):
+        return
+    if os.environ.get("HF_TOKEN"):
+        return
+    raise EnvironmentError("Missing required environment variable: OPENAI_API_KEY")
 def _get_env(var: str) -> str:
         self.base_url = base_url.rstrip("/")
         self.model = model
+        # Official OpenAI Python client for OpenAI-compatible endpoints.
+        self._client = AsyncOpenAI(api_key=self.api_key, base_url=self.base_url)
     async def chat(self, messages: list[dict]) -> str:
         """Send chat request to LLM endpoint with appropriate auth.
+        Uses the official OpenAI client for OpenAI-compatible endpoints.
+        Note: Some non-OpenAI providers (e.g., certain Gemini endpoints) may not
+        be compatible with the OpenAI client; those are handled via a minimal
+        HTTPX fallback.
         """
         is_gemini = "gemini" in self.base_url.lower()
         if is_gemini:
+            # Fallback for Gemini-style "?key=" auth.
+            headers = {"Content-Type": "application/json"}
             url = f"{self.base_url}/chat/completions?key={self.api_key}"
+            async with httpx.AsyncClient(timeout=60.0) as client:
+                resp = await client.post(
+                    url,
+                    json={"model": self.model, "messages": messages},
+                    headers=headers,
+                )
+                resp.raise_for_status()
+                data = resp.json()
+                return data["choices"][0]["message"]["content"]
+        resp = await self._client.chat.completions.create(
+            model=self.model,
+            messages=messages,
+        )
+        return resp.choices[0].message.content or ""
     async def select_action(
         self, legal_actions: list[Action], state_desc: str, prev_obs: Any = None
         if use_random:
             agent: RandomAgent | LLMAgent = RandomAgent(seed=42)
         else:
+            api_key = os.environ.get("OPENAI_API_KEY") or os.environ.get("HF_TOKEN", "")
+            agent = LLMAgent(api_key=api_key, base_url=api_base_url, model=model_name)
         task_ids = ["single_incident", "multi_incident", "mass_casualty", "shift_surge"]

pyproject.toml CHANGED Viewed

@@ -9,6 +9,7 @@ description = "911 Dispatch RL Environment. City-wide emergency dispatch benchma
 requires-python = ">=3.11"
 dependencies = [
     "pydantic>=2.7",
     "openenv-core>=0.2.0",
     "fastapi>=0.110",
     "uvicorn[standard]>=0.29",

 requires-python = ">=3.11"
 dependencies = [
     "pydantic>=2.7",
+    "openenv>=0.2.0",
     "openenv-core>=0.2.0",
     "fastapi>=0.110",
     "uvicorn[standard]>=0.29",

requirements.txt CHANGED Viewed

@@ -1,4 +1,5 @@
 pydantic>=2.7
 openenv-core>=0.2.0
 fastapi>=0.110
 uvicorn[standard]>=0.29

 pydantic>=2.7
+openenv>=0.2.0
 openenv-core>=0.2.0
 fastapi>=0.110
 uvicorn[standard]>=0.29

src/server/Dockerfile CHANGED Viewed

@@ -10,4 +10,4 @@ COPY data/ /app/data/
 EXPOSE 8000
-CMD ["uvicorn", "src.server.app:app", "--host", "0.0.0.0", "--port", "8000"]


10
11	EXPOSE 8000
12
13	+ CMD ["sh", "-c", "uvicorn src.server.app:app --host 0.0.0.0 --port ${PORT:-8000}"]

src/server/app.py CHANGED Viewed

@@ -173,8 +173,10 @@ async def get_dashboard_state() -> dict[str, Any]:
 def main():
     import uvicorn
-    uvicorn.run("src.server.app:app", host="0.0.0.0", port=8000, reload=False)
 if __name__ == "__main__":

 def main():
     import uvicorn
+    import os
+    port = int(os.environ.get("PORT", "8000"))
+    uvicorn.run("src.server.app:app", host="0.0.0.0", port=port, reload=False)
 if __name__ == "__main__":

tests/test_inference.py CHANGED Viewed

@@ -29,7 +29,7 @@ class TestInferenceFormatCompliance:
         env = {
             "API_BASE_URL": "https://api.example.com",
             "MODEL_NAME": "test-model",
-            "HF_TOKEN": "test-token",
             "USE_RANDOM": "true",
         }
         returncode, stdout, stderr = self._run_inference_capture(env)
@@ -46,7 +46,7 @@ class TestInferenceFormatCompliance:
         env = {
             "API_BASE_URL": "https://api.example.com",
             "MODEL_NAME": "test-model",
-            "HF_TOKEN": "test-token",
             "USE_RANDOM": "true",
         }
         _, stdout, _ = self._run_inference_capture(env)
@@ -59,7 +59,7 @@ class TestInferenceFormatCompliance:
         env = {
             "API_BASE_URL": "https://api.example.com",
             "MODEL_NAME": "test-model",
-            "HF_TOKEN": "test-token",
             "USE_RANDOM": "true",
         }
         _, stdout, _ = self._run_inference_capture(env)
@@ -84,6 +84,10 @@ class TestEnvVarValidation:
             merged_env.pop("API_BASE_URL", None)
         if "MODEL_NAME" not in env:
             merged_env.pop("MODEL_NAME", None)
         result = subprocess.run(
             cmd,
             capture_output=True,
@@ -94,13 +98,23 @@ class TestEnvVarValidation:
         return result.returncode, result.stdout, result.stderr
     def test_missing_api_base_url(self) -> None:
-        env = {"MODEL_NAME": "m", "HF_TOKEN": "t", "USE_RANDOM": "true"}
         returncode, stdout, stderr = self._run_inference_capture(env)
         assert returncode != 0
         assert "API_BASE_URL" in (stdout + stderr)
     def test_missing_model_name(self) -> None:
-        env = {"API_BASE_URL": "x", "HF_TOKEN": "t", "USE_RANDOM": "true"}
         returncode, stdout, stderr = self._run_inference_capture(env)
         assert returncode != 0
         assert "MODEL_NAME" in (stdout + stderr)

         env = {
             "API_BASE_URL": "https://api.example.com",
             "MODEL_NAME": "test-model",
+            "OPENAI_API_KEY": "test-token",
             "USE_RANDOM": "true",
         }
         returncode, stdout, stderr = self._run_inference_capture(env)
         env = {
             "API_BASE_URL": "https://api.example.com",
             "MODEL_NAME": "test-model",
+            "OPENAI_API_KEY": "test-token",
             "USE_RANDOM": "true",
         }
         _, stdout, _ = self._run_inference_capture(env)
         env = {
             "API_BASE_URL": "https://api.example.com",
             "MODEL_NAME": "test-model",
+            "OPENAI_API_KEY": "test-token",
             "USE_RANDOM": "true",
         }
         _, stdout, _ = self._run_inference_capture(env)
             merged_env.pop("API_BASE_URL", None)
         if "MODEL_NAME" not in env:
             merged_env.pop("MODEL_NAME", None)
+        if "OPENAI_API_KEY" not in env:
+            merged_env.pop("OPENAI_API_KEY", None)
+        if "HF_TOKEN" not in env:
+            merged_env.pop("HF_TOKEN", None)
         result = subprocess.run(
             cmd,
             capture_output=True,
         return result.returncode, result.stdout, result.stderr
     def test_missing_api_base_url(self) -> None:
+        env = {"MODEL_NAME": "m", "OPENAI_API_KEY": "t", "USE_RANDOM": "true"}
         returncode, stdout, stderr = self._run_inference_capture(env)
         assert returncode != 0
         assert "API_BASE_URL" in (stdout + stderr)
     def test_missing_model_name(self) -> None:
+        env = {"API_BASE_URL": "x", "OPENAI_API_KEY": "t", "USE_RANDOM": "true"}
         returncode, stdout, stderr = self._run_inference_capture(env)
         assert returncode != 0
         assert "MODEL_NAME" in (stdout + stderr)
+    def test_missing_openai_api_key_when_not_random(self) -> None:
+        env = {
+            "API_BASE_URL": "https://api.example.com",
+            "MODEL_NAME": "m",
+            "USE_RANDOM": "false",
+        }
+        returncode, stdout, stderr = self._run_inference_capture(env)
+        assert returncode != 0
+        assert "OPENAI_API_KEY" in (stdout + stderr)

validate_local.py CHANGED Viewed

@@ -39,7 +39,7 @@ def check_inference() -> bool:
     env = os.environ.copy()
     env["API_BASE_URL"] = "https://api.openai.com/v1"
     env["MODEL_NAME"] = "gpt-4"
-    env["HF_TOKEN"] = "dummy-token-for-local-validation"
     env["USE_RANDOM"] = "true"
     print("\nNOTE: Running inference.py in random-agent mode for local validation")

     env = os.environ.copy()
     env["API_BASE_URL"] = "https://api.openai.com/v1"
     env["MODEL_NAME"] = "gpt-4"
+    env["OPENAI_API_KEY"] = "dummy-token-for-local-validation"
     env["USE_RANDOM"] = "true"
     print("\nNOTE: Running inference.py in random-agent mode for local validation")