nik-55 commited on
Commit
4afc4db
·
verified ·
1 Parent(s): 39cf111

Upload folder using huggingface_hub

Browse files
CLAUDE.md ADDED
@@ -0,0 +1,105 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # CLAUDE.md
2
+
3
+ This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
4
+
5
+ ## Project Context
6
+
7
+ Hackathon submission for the Scaler School of Technology — Meta PyTorch Hackathon (OpenEnv track).
8
+ Deadline: 8 April 2026.
9
+
10
+ Environment: **MedChain Env** — hospital supply chain management simulation.
11
+ Agents operate a legacy ERP system across 4 tasks of increasing difficulty.
12
+
13
+ ## Key Commands
14
+
15
+ ```bash
16
+ # Build Docker image
17
+ docker build -t <LOCAL_IMAGE_NAME> -f server/Dockerfile .
18
+
19
+ # Set env vars then run inference
20
+ export API_BASE_URL=https://router.huggingface.co/v1
21
+ export MODEL_NAME=openai/gpt-oss-120b:groq
22
+ export HF_TOKEN=<your_token>
23
+ export LOCAL_IMAGE_NAME=nik-55_medchain-openenv
24
+
25
+ uv run python inference.py
26
+
27
+ # Run non-LLM integration test (writes test_inference.log)
28
+ uv run python test.py
29
+
30
+ # Run a single task container manually
31
+ docker run -e MEDCHAIN_TASK=hospital_network_crisis -p 8000:8000 $LOCAL_IMAGE_NAME
32
+ ```
33
+
34
+ ### Environment variables
35
+
36
+ | Variable | Description |
37
+ |----------|-------------|
38
+ | `API_BASE_URL` | LLM API endpoint (OpenAI-compatible) |
39
+ | `MODEL_NAME` / `MODEL` | Model identifier |
40
+ | `HF_TOKEN` / `API_KEY` | API authentication key |
41
+ | `LOCAL_IMAGE_NAME` | Docker image tag used by inference.py |
42
+ | `MEDCHAIN_TASK` | Select a single task in one container (default: `single_ward_stable`) |
43
+ | `SLEEP_BETWEEN_STEPS` | Seconds between LLM calls, default 2 |
44
+ | `LOG_LEVEL` | `INFO` (default) or `DEBUG` (writes timestamped log to `logs/`) |
45
+
46
+ ## Architecture
47
+
48
+ ### Package layout
49
+
50
+ The repo root is both the `medchain_env` package root and a Python namespace. `pyproject.toml` maps:
51
+ - `medchain_env` → repo root (`client.py`, `models.py`)
52
+ - `medchain_env.server` → `server/` subdirectory
53
+
54
+ `inference.py` imports from `medchain_env` directly (not `server/`).
55
+
56
+ ### Request flow (server-side)
57
+
58
+ ```
59
+ FastAPI (app.py)
60
+ └─ MedchainEnvironment (medchain_env_environment.py) ← reward logic lives here
61
+ ├─ MedchainSimulation (simulation.py) ← FEFO, demand, events, budget
62
+ └─ _MedchainMCPDelegate (MCPEnvironment) ← FastMCP schema + tool dispatch
63
+ └─ FastMCP server with 9 @mcp.tool functions
64
+ ```
65
+
66
+ `MedchainEnvironment` is an `Environment` (not an `MCPEnvironment`). It holds a `_MedchainMCPDelegate` for tool dispatch via composition, then wraps the result in a `MedchainToolObservation` with reward already computed. This pattern exists so that reward calculation happens in the outer class, not inside FastMCP.
67
+
68
+ ### Two reward streams
69
+
70
+ 1. **Per-step shaping** (`_shaping_reward` in `medchain_env_environment.py`): small fixed rewards for useful actions (read_inbox/query_erp: +0.01 first time per shift; submit_po success: +0.02; transfer/quarantine: +0.01; incoherent justification: −0.05).
71
+ 2. **Terminal score** (`grader.py`, called by `_sim.get_last_reward()` on `end_shift`): deterministic formula with task-specific weights — no LLM judge.
72
+
73
+ ### Expedited order flow (two-step)
74
+
75
+ `submit_po(..., priority="expedited")` → returns `BUDGET_OVERRIDE_REQUIRED` with a ticket ID → agent must call `file_justification(ticket_id=..., reason=...)` to proceed. Justification grading in `grader.py:grade_justification()` uses keyword matching against active event types — incoherent justifications score −0.05 each (capped at −0.15).
76
+
77
+ ### Context window management (inference.py)
78
+
79
+ At each `end_shift`, the conversation is pruned to: `[system_prompt, all_past_shift_summaries_msg, last_SHIFT_HISTORY_KEEP=6_messages]`. Summaries are extracted from `end_shift` result text by keyword (DEMAND, FULFILLED, DELIVERIES, SPEND, WASTE, EXPIRY, CRITICAL, etc.). This keeps context O(days) not O(steps).
80
+
81
+ ### Key constants in inference.py
82
+
83
+ - `MAX_STEPS_PER_TASK = 150` — hard cap per task episode
84
+ - `TEMPERATURE = 0.1` — low temperature for deterministic ordering decisions
85
+ - `MAX_TOKENS = 6000` — max completion tokens
86
+ - `SHIFT_HISTORY_KEEP = 6` — recent messages retained across shift boundary
87
+ - `MAX_CONSECUTIVE_ERRORS = 5` — aborts episode after 5 consecutive BadRequestErrors
88
+
89
+ ## Tasks at a Glance
90
+
91
+ | Task | Days | Actions/shift | Budget |
92
+ |------|------|---------------|--------|
93
+ | `orientation_ward` | 2 | 5 | $5k |
94
+ | `single_ward_stable` | 3 | 6 | $20k |
95
+ | `multi_ward_seasonal` | 6 | 8 | $50k |
96
+ | `hospital_network_crisis` | 12 | 10 | $150k |
97
+
98
+ The hardest task, `hospital_network_crisis` (internally `TASK3`, 0-indexed), has 5 overlapping events: cold-chain breach (day 3), supplier force majeure (day 6), MCI standby warning (day 8), MCI activation with 3× blood demand (days 9-11), and mandatory IV saline lot recall (day 11).
99
+
100
+ ## Simulation internals
101
+
102
+ - **FEFO**: inventory stored as `(qty, expiry_day)` lots in `simulation.py`, consumed oldest-first.
103
+ - **Event system**: `SimEvent.trigger_day` fires the inbox message; `warning_message` fires one day earlier. The simulation checks active events each `end_shift()` to apply demand multipliers or supplier lead-time changes.
104
+ - **Justification coherence**: `grade_justification()` checks reason text against a dict of event-type keywords; falls back to generic keywords (urgent, critical, stockout…) if no active event type matches.
105
+ - **No external deps**: fully self-contained — no databases, no external APIs.
Dockerfile ADDED
@@ -0,0 +1,81 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright (c) Meta Platforms, Inc. and affiliates.
2
+ # All rights reserved.
3
+ #
4
+ # This source code is licensed under the BSD-style license found in the
5
+ # LICENSE file in the root directory of this source tree.
6
+
7
+ # Multi-stage build using openenv-base
8
+ # This Dockerfile is flexible and works for both:
9
+ # - In-repo environments (with local OpenEnv sources)
10
+ # - Standalone environments (with openenv from PyPI/Git)
11
+ # The build script (openenv build) handles context detection and sets appropriate build args.
12
+
13
+ ARG BASE_IMAGE=ghcr.io/meta-pytorch/openenv-base:latest
14
+ FROM ${BASE_IMAGE} AS builder
15
+
16
+ WORKDIR /app
17
+
18
+ # Ensure git is available (required for installing dependencies from VCS)
19
+ RUN apt-get update && \
20
+ apt-get install -y --no-install-recommends git && \
21
+ rm -rf /var/lib/apt/lists/*
22
+
23
+ # Build argument to control whether we're building standalone or in-repo
24
+ ARG BUILD_MODE=in-repo
25
+ ARG ENV_NAME=medchain_env
26
+
27
+ # Copy environment code (always at root of build context)
28
+ COPY . /app/env
29
+
30
+ # For in-repo builds, openenv is already vendored in the build context
31
+ # For standalone builds, openenv will be installed via pyproject.toml
32
+ WORKDIR /app/env
33
+
34
+ # Ensure uv is available (for local builds where base image lacks it)
35
+ RUN if ! command -v uv >/dev/null 2>&1; then \
36
+ curl -LsSf https://astral.sh/uv/install.sh | sh && \
37
+ mv /root/.local/bin/uv /usr/local/bin/uv && \
38
+ mv /root/.local/bin/uvx /usr/local/bin/uvx; \
39
+ fi
40
+
41
+ # Install dependencies using uv sync
42
+ # If uv.lock exists, use it; otherwise resolve on the fly
43
+ RUN --mount=type=cache,target=/root/.cache/uv \
44
+ if [ -f uv.lock ]; then \
45
+ uv sync --frozen --no-install-project --no-editable; \
46
+ else \
47
+ uv sync --no-install-project --no-editable; \
48
+ fi
49
+
50
+ RUN --mount=type=cache,target=/root/.cache/uv \
51
+ if [ -f uv.lock ]; then \
52
+ uv sync --frozen --no-editable; \
53
+ else \
54
+ uv sync --no-editable; \
55
+ fi
56
+
57
+ # Final runtime stage
58
+ FROM ${BASE_IMAGE}
59
+
60
+ WORKDIR /app
61
+
62
+ # Copy the virtual environment from builder
63
+ COPY --from=builder /app/env/.venv /app/.venv
64
+
65
+ # Copy the environment code
66
+ COPY --from=builder /app/env /app/env
67
+
68
+ # Set PATH to use the virtual environment
69
+ ENV PATH="/app/.venv/bin:$PATH"
70
+
71
+ # Set PYTHONPATH so imports work correctly
72
+ ENV PYTHONPATH="/app/env:$PYTHONPATH"
73
+
74
+ # Health check
75
+ HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
76
+ CMD curl -f http://localhost:8000/health || exit 1
77
+
78
+ # Run the FastAPI server
79
+ # The module path is constructed to work with the /app/env structure
80
+ ENV ENABLE_WEB_INTERFACE=true
81
+ CMD ["sh", "-c", "cd /app/env && uvicorn server.app:app --host 0.0.0.0 --port 8000"]
README.md CHANGED
@@ -1,10 +1,274 @@
1
  ---
2
- title: Medchain Openenv Hackathon
3
- emoji: 😻
4
- colorFrom: gray
5
- colorTo: gray
6
  sdk: docker
7
  pinned: false
 
 
 
 
8
  ---
9
 
10
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: Medchain Env Environment Server
3
+ emoji: 🎰
4
+ colorFrom: red
5
+ colorTo: yellow
6
  sdk: docker
7
  pinned: false
8
+ app_port: 8000
9
+ base_path: /web
10
+ tags:
11
+ - openenv
12
  ---
13
 
14
+ # MedChain Env Hospital Supply Chain Management
15
+
16
+ > Train AI agents to keep hospitals stocked — where running out of blood costs lives and ordering too much costs money.
17
+
18
+ ---
19
+
20
+ ## What is hospital supply chain management?
21
+
22
+ A hospital runs on supplies: surgical gloves, IV bags, blood, insulin, saline — hundreds of products across dozens of wards and operating theatres. Someone has to make sure the right amount of each product is in the right place at the right time. That person is the supply chain manager.
23
+
24
+ The job sounds simple but has several compounding pressures:
25
+
26
+ **Orders take time to arrive.** A supplier might take 2–7 days to deliver. If a ward runs out of IV bags today, you can't just call and get them in an hour. You have to have ordered them days ago — which means constantly forecasting future need.
27
+
28
+ **Supplies expire.** Blood expires in 42 days. Platelets in 5 days. Insulin in 28 days. Order too much and it rots on the shelf, wasting both money and the product. Order too little and patients go without.
29
+
30
+ **Demand spikes without warning.** A multi-vehicle accident sends 18 trauma patients to the ER at once. Flu season doubles demand for antivirals over two weeks. An elective surgery backlog clears and suddenly the ward needs 35% more consumables. A manager who doesn't read the situation will be caught short.
31
+
32
+ **Emergencies require documentation.** In a real hospital, spending extra money to rush an urgent order isn't automatic — it requires a written justification audited by Finance. A manager who writes "routine restock" when they're actually responding to a mass casualty event raises a red flag.
33
+
34
+ **Multiple suppliers, multiple trade-offs.** The cheap supplier takes 7 days; the premium supplier delivers in 1 day at 40% higher cost. When the cheap supplier announces delays due to a warehouse strike, you have to pivot — but if you always use the premium supplier you'll blow the budget.
35
+
36
+ ---
37
+
38
+ ## What is this environment?
39
+
40
+ MedChain Env puts an AI agent in the role of a hospital supply chain manager. Each episode runs for 2–12 simulated days. Every day is a **shift**: the agent gets a limited number of actions to check the situation and respond before the simulation advances.
41
+
42
+ The agent interacts with a **simulated legacy ERP system** — the kind of fragmented, text-heavy enterprise software real hospitals actually use. There are three sub-systems to navigate:
43
+
44
+ - **COMMS Pager** — the inbox. Unstructured text messages from the incident command system, suppliers, ward managers, and the pharmacy. Critical alerts arrive here: MCI activations, supplier disruptions, lot recalls.
45
+ - **Inventory DB** — query stock levels by ward, product, and expiry date. Returns table-formatted output.
46
+ - **Procurement Portal** — place purchase orders, file justifications for expedited spending, track deliveries.
47
+
48
+ The agent must decide *what to look up* (with a limited action budget), interpret what it finds, and act. There is no clean "here is everything you need to know" state dump — just the same messy interface a human manager would use.
49
+
50
+ ---
51
+
52
+ ## Why does this matter for AI research?
53
+
54
+ Classical inventory management algorithms (like the (s, S) policy — "reorder when stock drops below S, order up to s") work reasonably well when demand is predictable and stable.
55
+
56
+ They fail completely when the environment sends a message like:
57
+
58
+ > *"Incident Command activated. Multi-vehicle accident I-95 northbound. Confirmed 18 critically injured en route. Blood bank placed on AMBER alert."*
59
+
60
+ A classical algorithm sees only numbers. It cannot read that message and reason: "O-negative blood is the universal donor, expiry is 42 days but we only have 8 units left, lead time from the blood bank is 1 day, I need to order 30 units now and file a justification because this is expedited."
61
+
62
+ An LLM that reads and understands that message can respond correctly.
63
+
64
+ **The heuristic→LLM performance gap in Task 3 (0.38 → 0.58) is the environment's core scientific contribution** — a quantified measure of how much contextual language understanding is worth in real operational decisions.
65
+
66
+ ---
67
+
68
+ ## Tasks
69
+
70
+ Four tasks of increasing difficulty. Each task is a self-contained episode with its own scenario, products, suppliers, and events.
71
+
72
+ ### Task 1 — `orientation_ward` (Easy, 2 days)
73
+
74
+ A single general ward, 3 non-perishable supplies, one reliable supplier with 1-day delivery. Initial stock covers only the first day. The agent's job: read the inbox to understand the situation, check what's in stock, and place at least one replenishment order before supplies run out on Day 2.
75
+
76
+ This task is purely about exploring the interface. No crises, no expiry pressure, no multi-supplier decisions.
77
+
78
+ | | |
79
+ |---|---|
80
+ | **Score formula** | 70% service level + 30% whether at least one order was placed |
81
+ | **Expected scores** | Random agent: 0.55 · Heuristic: 0.88 · LLM: 0.95 |
82
+
83
+ ---
84
+
85
+ ### Task 2 — `single_ward_stable` (Medium, 3 days)
86
+
87
+ One ward, 6 products (some with expiry dates), stable demand, 2-day delivery. Initial stock covers 2 days — an agent that waits until Day 2 to order will arrive at Day 3 with empty shelves. The task introduces cost efficiency: placing sensibly-sized orders (not over-ordering) is rewarded alongside not running out.
88
+
89
+ | | |
90
+ |---|---|
91
+ | **Score formula** | 50% service level + 50% cost efficiency vs. benchmark |
92
+ | **Expected scores** | Random agent: 0.30 · Heuristic: 0.68 · LLM: 0.82 |
93
+
94
+ ---
95
+
96
+ ### Task 3 — `multi_ward_seasonal` (Medium-Hard, 6 days)
97
+
98
+ Three wards plus a central pharmacy. Ten products. Two suppliers with different speed/cost trade-offs:
99
+ - **FastMed**: delivers in 1 day, costs 40% more
100
+ - **MedLine**: delivers in 4 days, base price
101
+
102
+ Two events unfold over the episode — both announced in the inbox before they hit:
103
+
104
+ **Day 2 (early warning) → Day 3–5 (active):** Regional influenza alert. Antiviral, mask, and paracetamol demand surges 50% above normal. An agent that pre-orders on Day 2 after reading the warning is protected. An agent that ignores the warning scrambles to catch up.
105
+
106
+ **Day 4–6:** MedLine warehouse strike. Standard delivery extends from 4 to 7 days — which means any order placed after Day 4 via MedLine arrives after the episode ends. The agent must pivot to FastMed (at higher cost) for anything urgent.
107
+
108
+ The task tests whether the agent can act on early warnings rather than reacting to crises after they arrive.
109
+
110
+ | | |
111
+ |---|---|
112
+ | **Score formula** | 40% service level + 35% cost efficiency + 15% capacity management + 10% transfer efficiency |
113
+ | **Expected scores** | Random: 0.22 · Heuristic (ignores alerts): 0.55 · LLM (reads alerts): 0.73 |
114
+
115
+ ---
116
+
117
+ ### Task 4 — `hospital_network_crisis` (Hard, 12 days)
118
+
119
+ A full regional network: 3 hospitals plus a regional distribution centre, 15 products including **life-critical perishables**: O-negative blood (universal donor, expires in 42 days), platelets (expires in 5 days), fresh frozen plasma. Budget ceiling: $150,000 outstanding at any time.
120
+
121
+ Five crisis events unfold across the 12-day episode — some overlapping:
122
+
123
+ | Day | Event |
124
+ |-----|-------|
125
+ | 3 | **Cold chain breach** at the regional DC — refrigeration failure destroys all platelet inventory. An alert fires. Agent must order replacements immediately or hospitals run out within days. |
126
+ | 6–14 | **Supplier force majeure** — HealthCo Supplies lead time extends from 3 to 7 days due to flu absenteeism. Agent must switch to the premium express supplier for urgent items. |
127
+ | 8 (warning) → 9–11 (active) | **Mass casualty incident** — large multi-vehicle accident. Blood product demand triples across all hospitals for 3 days. An agent that reads the Day 8 warning and pre-orders blood survives this; one that waits until Day 9 faces critical stockouts. |
128
+ | 11 | **Mandatory product recall** — a specific lot of IV Saline is flagged by the health authority. The agent must find which locations hold that lot, quarantine all units, and order replacements. Failing to quarantine by end of shift is a patient safety failure. |
129
+
130
+ This task also introduces the **paper trail mechanic**: any expedited (rush) order triggers a mandatory written justification that Finance reviews. If the agent writes "routine restock" when there is an active mass casualty event, the justification is flagged as incoherent and a score penalty applies.
131
+
132
+ | | |
133
+ |---|---|
134
+ | **Score formula** | 35% service level + 25% cost efficiency + 20% critical product availability + 15% waste reduction (expired product value) + 5% crisis response speed |
135
+ | **Justification penalty** | −0.05 per incoherent expedited justification (max −0.15) |
136
+ | **Expected scores** | Random: 0.12 · Heuristic (no alert reading): 0.38 · LLM: 0.58 · Near-optimal: 0.82 |
137
+
138
+ ---
139
+
140
+ ## How scoring works
141
+
142
+ Each task produces a score between 0.0 and 1.0 computed deterministically from the simulation history. There is no LLM judge.
143
+
144
+ ### Partial credit during the episode
145
+
146
+ Small reward signals after each action reinforce useful behaviour:
147
+ - Reading the inbox or checking inventory (first time each shift): **+0.01**
148
+ - Successfully placing an order: **+0.02**
149
+ - Executing a transfer or quarantine: **+0.01**
150
+ - Filing a coherent justification: **+0.01**
151
+ - Filing an incoherent justification (flagged by Finance): **−0.05**
152
+
153
+ ### Terminal score at episode end
154
+
155
+ The main score is computed when the episode completes. Components vary by task:
156
+
157
+ - **Service level** — what fraction of demand was actually fulfilled across all products, locations, and days
158
+ - **Cost efficiency** — how close actual spend was to a reasonable benchmark (spending less than the benchmark is good; spending far over it penalises over-ordering)
159
+ - **Critical product availability** (Task 4) — blood and platelets tracked separately; running out of these carries heavy penalty
160
+ - **Waste fraction** (Task 4) — value of expired inventory divided by total spend; rewards active expiry management
161
+ - **Crisis response score** (Task 4) — how quickly the agent positioned blood during the MCI window and handled the recall
162
+
163
+ ---
164
+
165
+ ## How the agent interacts — the 9 tools
166
+
167
+ The agent has 9 tools, each consuming one action from the shift budget:
168
+
169
+ | Tool | What it does |
170
+ |------|-------------|
171
+ | `read_inbox` | Read messages from the COMMS pager (filter: unread / all / flagged) |
172
+ | `query_erp` | Query stock levels, expiry dates, pipeline orders, or demand history |
173
+ | `query_supplier` | Get a supplier's current lead time and any disruption notices |
174
+ | `query_forecast` | Request a demand forecast for a product at a location |
175
+ | `submit_po` | Place a purchase order (standard or expedited) |
176
+ | `transfer` | Move stock from one location to another |
177
+ | `quarantine_lot` | Isolate a specific inventory lot (for recalls or cold chain failures) |
178
+ | `file_justification` | Write the Finance audit reason for an expedited order |
179
+ | `end_shift` | Close the shift — simulation advances by one day |
180
+
181
+ **The action budget is the core constraint.** Each shift, the agent gets 5–10 actions depending on the task. Using all 10 actions to query every product at every location leaves no budget to place orders. The agent must triage: what is most urgent to check right now?
182
+
183
+ ---
184
+
185
+ ## How the LLM agent works (`inference.py`)
186
+
187
+ The inference script runs a multi-turn LLM agent using the OpenAI API format. Any model accessible through an OpenAI-compatible endpoint works.
188
+
189
+ **Each shift is one LLM conversation:**
190
+
191
+ 1. The agent sees the shift dashboard (what day it is, how many actions remain, inbox alerts count)
192
+ 2. The LLM picks a tool to call
193
+ 3. The tool result is appended to the conversation
194
+ 4. Loop repeats until the agent calls `end_shift()` or runs out of action budget
195
+ 5. At `end_shift()`, the conversation is compressed into a short summary before the next shift begins
196
+
197
+ **Context window management:** After each shift, the history is pruned down to the system prompt + all past shift summaries + the last 6 messages from the current shift. This keeps context bounded regardless of how many days the episode runs.
198
+
199
+ **System prompt guidance:** The agent is told to prioritise: read inbox → check inventory → place orders → end shift. It's given explicit guidance on lead time arithmetic, expiry rotation, how to respond to MCI alerts, and when expedited orders are warranted.
200
+
201
+ ---
202
+
203
+ ## Setup & Running
204
+
205
+ ### Prerequisites
206
+
207
+ - Docker
208
+ - Python 3.10+ with `uv`
209
+
210
+ ### Build the Docker image
211
+
212
+ ```bash
213
+ docker build -t <LOCAL_IMAGE_NAME> -f server/Dockerfile .
214
+ ```
215
+
216
+ ### Set environment variables and run
217
+
218
+ ```bash
219
+ export API_BASE_URL=https://router.huggingface.co/v1
220
+ export MODEL_NAME=openai/gpt-oss-120b:groq
221
+ export HF_TOKEN=<your_token>
222
+ export LOCAL_IMAGE_NAME=nik-55_medchain-openenv
223
+
224
+ uv run python inference.py
225
+ ```
226
+
227
+ The script runs all 4 tasks in sequence and emits structured logs to stdout:
228
+
229
+ ```
230
+ [START] task=orientation_ward env=medchain model=openai/gpt-oss-120b:groq
231
+ [STEP] step=1 action=read_inbox({}) reward=0.01 done=false error=null
232
+ [STEP] step=2 action=query_erp({...}) reward=0.01 done=false error=null
233
+ ...
234
+ [END] success=true steps=10 score=0.923 rewards=0.01,0.01,0.02,...
235
+ ```
236
+
237
+ ### Environment variables
238
+
239
+ | Variable | Description |
240
+ |----------|-------------|
241
+ | `API_BASE_URL` | LLM API endpoint (OpenAI-compatible) |
242
+ | `MODEL_NAME` | Model identifier |
243
+ | `HF_TOKEN` or `API_KEY` | API authentication key |
244
+ | `LOCAL_IMAGE_NAME` | Docker image tag used by inference.py to launch containers |
245
+ | `MEDCHAIN_TASK` | Run a single task in one container (default: `single_ward_stable`) |
246
+ | `SLEEP_BETWEEN_STEPS` | Seconds between LLM calls, default 2 |
247
+ | `LOG_LEVEL` | `INFO` (default) or `DEBUG` (writes timestamped log to `logs/`) |
248
+
249
+ ### Run a single task container
250
+
251
+ ```bash
252
+ docker run -e MEDCHAIN_TASK=hospital_network_crisis -p 8000:8000 <LOCAL_IMAGE_NAME>
253
+ ```
254
+
255
+ ---
256
+
257
+ ## Project Structure
258
+
259
+ ```
260
+ medchain_env/
261
+ ├── inference.py # Entry point — LLM agent runs all 4 tasks
262
+ ├── client.py # MedchainEnv OpenEnv client
263
+ ├── models.py # State and observation types
264
+ ├── openenv.yaml # OpenEnv manifest
265
+ ├── pyproject.toml # Dependencies
266
+ └── server/
267
+ ├── app.py # FastAPI application
268
+ ├── Dockerfile # Container build
269
+ ├── medchain_env_environment.py # OpenEnv Environment + 9 MCP tools
270
+ ├── simulation.py # Simulation engine (inventory, demand, events)
271
+ ├── tasks.py # Task configurations
272
+ ├── grader.py # Terminal reward computation
273
+ └── erp_formatter.py # ERP text output formatters
274
+ ```
__init__.py ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """MedChain Env — Hospital Supply Chain Management Environment."""
2
+
3
+ from openenv.core.env_server.mcp_types import CallToolAction, CallToolObservation
4
+
5
+ from .client import MedchainEnv
6
+ from .models import AVAILABLE_TOOLS, MedchainState, MedchainToolObservation
7
+
8
+ __all__ = [
9
+ "MedchainEnv",
10
+ "MedchainState",
11
+ "MedchainToolObservation",
12
+ "AVAILABLE_TOOLS",
13
+ "CallToolAction",
14
+ "CallToolObservation",
15
+ ]
client.py ADDED
@@ -0,0 +1,157 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """MedChain Env Environment Client."""
2
+
3
+ import logging
4
+ import re
5
+ from typing import Any, Dict, List, Optional
6
+
7
+ from openenv.core import EnvClient
8
+ from openenv.core.client_types import StepResult
9
+ from openenv.core.env_server.mcp_types import (
10
+ CallToolAction,
11
+ ListToolsAction,
12
+ ListToolsObservation,
13
+ Tool,
14
+ )
15
+ from openenv.core.env_server.types import Observation, State
16
+
17
+ from .models import MedchainState
18
+
19
+ _log = logging.getLogger(__name__)
20
+
21
+
22
+ class MedchainEnv(EnvClient[CallToolAction, Observation, MedchainState]):
23
+ """
24
+ Client for the MedChain Env hospital supply chain environment.
25
+
26
+ Inherits from EnvClient and communicates via the standard OpenEnv
27
+ WebSocket protocol (simulation mode).
28
+
29
+ Example:
30
+ >>> async with MedchainEnv(base_url="http://localhost:8000") as env:
31
+ ... obs = await env.reset()
32
+ ... print(obs.observation.metadata["dashboard"])
33
+ ... tools = await env.list_tools()
34
+ ... result = await env.step(CallToolAction(tool_name="read_inbox", arguments={}))
35
+
36
+ Example with Docker:
37
+ >>> env = await MedchainEnv.from_docker_image(
38
+ ... "medchain_env-env:latest",
39
+ ... env_vars={"MEDCHAIN_TASK": "single_ward_stable"},
40
+ ... )
41
+ >>> obs = await env.reset()
42
+ """
43
+
44
+ def __init__(self, **kwargs: Any) -> None:
45
+ kwargs.setdefault("message_timeout_s", 1500.0)
46
+ super().__init__(**kwargs)
47
+ self._tools_cache: Optional[List[Tool]] = None
48
+
49
+ # ── EnvClient abstract methods ─────────────────────────────────────────
50
+
51
+ def _step_payload(self, action: Any) -> Dict[str, Any]:
52
+ if isinstance(action, ListToolsAction):
53
+ return {"type": "list_tools"}
54
+ if isinstance(action, CallToolAction):
55
+ return {
56
+ "type": "call_tool",
57
+ "tool_name": action.tool_name,
58
+ "arguments": action.arguments,
59
+ }
60
+ raise ValueError(f"Unsupported action type: {type(action).__name__}")
61
+
62
+ def _parse_result(self, payload: Dict[str, Any]) -> StepResult[Observation]:
63
+ obs_data = payload.get("observation", {})
64
+ reward = payload.get("reward")
65
+ done = payload.get("done", False) or obs_data.get("done", False)
66
+
67
+ # ── List-tools response ──────────────────────────────────────────
68
+ if "tools" in obs_data:
69
+ tools = [
70
+ Tool(
71
+ name=t.get("name", ""),
72
+ description=t.get("description", ""),
73
+ input_schema=t.get("input_schema", t.get("inputSchema", {})),
74
+ )
75
+ for t in obs_data.get("tools", [])
76
+ ]
77
+ observation = ListToolsObservation(
78
+ tools=tools,
79
+ done=done,
80
+ reward=reward,
81
+ )
82
+ return StepResult(observation=observation, reward=reward, done=done)
83
+
84
+ # ── Reset response (has "dashboard" field) ───────────────────────
85
+ if "dashboard" in obs_data:
86
+ observation = Observation(done=done, reward=reward, metadata=obs_data)
87
+ return StepResult(observation=observation, reward=reward, done=done)
88
+
89
+ # ── Tool-call response (has "tool_name" and "tool_result") ───────
90
+ if "tool_name" in obs_data:
91
+ result_text = obs_data.get("tool_result", "")
92
+
93
+ # Safety net: if reward is still None (should not happen after the
94
+ # serialization fix), fall back to parsing the Final Score from text.
95
+ if reward is None and result_text:
96
+ m = re.search(r"Final Score:\s*([\d.]+)", result_text)
97
+ if m:
98
+ reward = float(m.group(1))
99
+
100
+ observation = Observation(
101
+ done=done,
102
+ reward=reward,
103
+ metadata={"tool_result": result_text},
104
+ )
105
+ return StepResult(observation=observation, reward=reward, done=done)
106
+
107
+ # ── Generic fallback ─────────────────────────────────────────────
108
+ observation = Observation(done=done, reward=reward, metadata=obs_data)
109
+ return StepResult(observation=observation, reward=reward, done=done)
110
+
111
+ def _parse_state(self, payload: Dict[str, Any]) -> MedchainState:
112
+ return MedchainState(
113
+ episode_id=payload.get("episode_id"),
114
+ step_count=payload.get("step_count", 0),
115
+ task=payload.get("task", ""),
116
+ day=payload.get("day", 0),
117
+ max_days=payload.get("max_days", 0),
118
+ actions_remaining=payload.get("actions_remaining", 0),
119
+ budget_used=payload.get("budget_used", 0.0),
120
+ budget_limit=payload.get("budget_limit", 0.0),
121
+ unread_messages=payload.get("unread_messages", 0),
122
+ orders_in_transit=payload.get("orders_in_transit", 0),
123
+ )
124
+
125
+ # ── Tool discovery ─────────────────────────────────────────────────────
126
+
127
+ async def list_tools(self, use_cache: bool = True) -> List[Tool]:
128
+ """
129
+ Discover the 9 ERP tools available in this environment.
130
+
131
+ Args:
132
+ use_cache: Return cached tools if available (default True).
133
+
134
+ Returns:
135
+ List of Tool objects with name, description, and input_schema.
136
+ """
137
+ if use_cache and self._tools_cache is not None:
138
+ return self._tools_cache
139
+
140
+ result = await self.step(ListToolsAction())
141
+ if isinstance(result.observation, ListToolsObservation):
142
+ self._tools_cache = result.observation.tools
143
+ return self._tools_cache
144
+
145
+ self._tools_cache = []
146
+ return self._tools_cache
147
+
148
+ # ── Resource cleanup ───────────────────────────────────────────────────
149
+
150
+ async def close(self) -> None:
151
+ """Close client, tolerating Docker stop timeouts gracefully."""
152
+ try:
153
+ await super().close()
154
+ except Exception as e:
155
+ # docker stop can time out (10 s) when the container is slow to exit.
156
+ # Log and swallow so the inference script doesn't crash.
157
+ _log.warning("MedchainEnv.close() suppressed error during shutdown: %s", e)
hackathon_guide.md ADDED
@@ -0,0 +1,365 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Scaler School of Technology — Meta PyTorch Hackathon
2
+ ## OpenEnv Hackathon Dashboard
3
+
4
+ **URL:** https://www.scaler.com/school-of-technology/meta-pytorch-hackathon/dashboard#form
5
+
6
+ ---
7
+
8
+ ## Timeline
9
+
10
+ | Stage | Dates |
11
+ |--------------|--------------------------|
12
+ | Registration | 14th March – 3rd April |
13
+ | Declaration | Before Round 1 |
14
+ | Prepare | Now – 25th March |
15
+ | Round 1 | 25th March – 8th April |
16
+ | Results | 10th April |
17
+ | Finals | 25th – 26th (April) |
18
+
19
+ ---
20
+
21
+ ## Community
22
+
23
+ - **Discord:** Join the Discord Community — all announcements, mentor access, and team matching happens here.
24
+
25
+ ---
26
+
27
+ ## Participation
28
+
29
+ - Currently registered as **Solo Warrior**
30
+ - Locked for Round 1 — cannot switch to a team until Round 1 is over.
31
+
32
+ ---
33
+
34
+ ## Problem Statement
35
+
36
+ ### The Task
37
+
38
+ > Build a complete, real-world OpenEnv environment that an AI agent can learn from through the standard `step()` / `reset()` / `state()` API.
39
+
40
+ ---
41
+
42
+ ## Key Requirements at a Glance
43
+
44
+ - Must simulate a real-world task (not games or toys)
45
+ - Implement full OpenEnv spec: typed models, `step()`/`reset()`/`state()`, `openenv.yaml`
46
+ - Minimum 3 tasks with agent graders (easy → medium → hard, scores 0.0–1.0)
47
+ - Meaningful reward function with partial progress signals
48
+ - Baseline inference script with reproducible scores
49
+ - Deploy to Hugging Face Spaces + working Dockerfile
50
+ - README with environment description, action/observation spaces, setup instructions
51
+
52
+ ---
53
+
54
+ ## Detailed Requirements
55
+
56
+ ### Real-world task simulation
57
+ The environment must simulate a task humans actually do. Not games, not toys.
58
+ Examples: email triage, code review, data cleaning, scheduling, customer support, content moderation.
59
+
60
+ ### OpenEnv spec compliance
61
+ Implement the full OpenEnv interface:
62
+ - Typed `Observation`, `Action`, and `Reward` Pydantic models
63
+ - `step(action)` → returns observation, reward, done, info
64
+ - `reset()` → returns initial observation
65
+ - `state()` → returns current state
66
+ - `openenv.yaml` with metadata
67
+ - Tested via `openenv validate`
68
+
69
+ ### Minimum 3 tasks with agent graders
70
+ Each task defines a concrete objective an agent must accomplish, with a programmatic grader that scores performance (0.0–1.0). Tasks should range: easy → medium → hard. Graders must have clear, deterministic success/failure criteria.
71
+
72
+ ### Meaningful reward function
73
+ Provides signal over the full trajectory (not just binary end-of-episode). Rewards partial progress toward task completion. Penalizes clearly undesirable behavior (e.g. infinite loops, destructive actions).
74
+
75
+ ### Baseline inference script
76
+ Uses the OpenAI API client to run a model against the environment. Reads API credentials from environment variables (`OPENAI_API_KEY`). Produces a reproducible baseline score on all 3 tasks.
77
+
78
+ ---
79
+
80
+ ## Non-Functional Requirements
81
+
82
+ ### Deploys to a Hugging Face Space
83
+ Environment must run as a containerized HF Space tagged with `openenv`.
84
+ Must include a working Dockerfile. The environment should start cleanly with `docker build` + `docker run`.
85
+
86
+ ### Documentation
87
+ README must include:
88
+ - Environment description and motivation
89
+ - Action and observation space definitions
90
+ - Task descriptions with expected difficulty
91
+ - Setup and usage instructions
92
+ - Baseline scores
93
+
94
+ ---
95
+
96
+ ## Evaluation Criteria
97
+
98
+ | Parameter | Weight | Description |
99
+ |----------------------------|--------|-----------------------------------------------------------------------------|
100
+ | Real-world utility | 30% | Does the environment model a genuine task? Would someone actually use this to train or evaluate agents? |
101
+ | Task & grader quality | 25% | Are tasks well-defined with clear objectives? Do graders accurately and fairly measure success? Meaningful difficulty progression? |
102
+ | Environment design | 20% | Clean state management, sensible action/observation spaces, good reward shaping, proper episode boundaries |
103
+ | Code quality & spec compliance | 15% | Follows OpenEnv spec, clean project structure, typed models, documented, tested, Dockerfile works |
104
+ | Creativity & novelty | 10% | Novel problem domain, interesting mechanics, clever reward design, original approach |
105
+
106
+ ### Scoring Breakdown (Real-world utility)
107
+
108
+ - **0–5:** Toy/artificial problem with no practical application
109
+ - **6–15:** Valid domain but shallow modeling of the real task
110
+ - **16–25:** Good domain modeling, would be useful for agent evaluation
111
+ - **26–30:** Excellent — fills a real gap, immediate value for the RL/agent community
112
+
113
+ ### Scoring Checklist Questions
114
+
115
+ **Task & grader quality:**
116
+ - 3+ tasks with difficulty range?
117
+ - Graders produce scores between 0.0–1.0?
118
+ - Graders deterministic and reproducible?
119
+ - Hard task genuinely challenges frontier models?
120
+
121
+ **Environment design:**
122
+ - `reset()` produces clean state?
123
+ - Action/observation types well-designed and documented?
124
+ - Reward function provides useful varying signal (not just sparse)?
125
+ - Episode boundaries sensible?
126
+
127
+ **Code quality & spec compliance:**
128
+ - `openenv validate` passes?
129
+ - `docker build && docker run` works?
130
+ - HF Space deploys and responds?
131
+ - Baseline script runs and reproduces scores?
132
+
133
+ **Creativity & novelty:**
134
+ - Domain we haven't seen in OpenEnv before?
135
+ - Reward design has interesting properties?
136
+ - Clever mechanics that make the environment engaging?
137
+
138
+ ---
139
+
140
+ ## How Judging Works
141
+
142
+ - **Phase 1 — Automated Validation:** Pass/fail gate — HF Space deploys, OpenEnv spec compliance, Dockerfile builds, baseline reproduces, 3+ tasks with graders.
143
+ - **Phase 2 — Agentic Evaluation:** Scored — baseline agent re-run, standard Open LLM agent (e.g. Nemotron 3 Super) run against all environments, score variance check.
144
+ - **Phase 3 — Human Review:** Top submissions reviewed by Meta and Hugging Face engineers for real-world utility, creativity, and exploit checks.
145
+
146
+ ---
147
+
148
+ ## Disqualification Criteria
149
+
150
+ - Environment does not deploy or respond
151
+ - Plagiarized or trivially modified existing environments
152
+ - Graders that always return the same score
153
+ - No baseline inference script
154
+
155
+ ---
156
+
157
+ ## Pre-Submission Checklist — all must pass or you're disqualified
158
+
159
+ | Check | Requirement |
160
+ |--------------------------|-------------------------------------------------------------------------------------------------|
161
+ | HF Space deploys | Automated ping to the Space URL — must return 200 and respond to `reset()` |
162
+ | OpenEnv spec compliance | Validate `openenv.yaml`, typed models, `step()`/`reset()`/`state()` endpoints |
163
+ | Dockerfile builds | Automated docker build on the submitted repo |
164
+ | Baseline reproduces | Run the submitted inference script — must complete without error and produce scores |
165
+ | 3+ tasks with graders | Enumerate tasks, run each grader, verify scores in 0.0–1.0 range |
166
+ | Infra Restrictions | Runtime of inference script should be less than 20 min. Must run on vcpu=2, memory=8gb |
167
+ | Validator | Run the pre-submission validation script before submitting |
168
+
169
+ ### Mandatory Additional Instructions
170
+
171
+ Before submitting, ensure the following variables are defined in your environment configuration:
172
+
173
+ | Variable | Description |
174
+ |-----------------|------------------------------------------|
175
+ | `API_BASE_URL` | The API endpoint for the LLM |
176
+ | `MODEL_NAME` | The model identifier to use for inference |
177
+ | `HF_TOKEN` | Your Hugging Face / API key |
178
+
179
+ - The inference script must be named `inference.py` and placed in the root directory of the project.
180
+ - Participants must use the OpenAI Client for all LLM calls using the above variables.
181
+ - Participants must emit structured stdout logs strictly following the `[START]`, `[STEP]`, and `[END]` format defined in the sample inference script. Any deviation in field names, ordering, or formatting will result in incorrect evaluation scoring. Refer to [`sample_inference.py`](./sample_inference.py) for the complete format specification and examples.
182
+
183
+ ### Infra Restrictions
184
+
185
+ - Runtime of inference script must be less than 20 minutes.
186
+ - Ensure your env and inference can run on a machine with `vcpu=2`, `memory=8gb`.
187
+
188
+ ### Validator
189
+
190
+ Run the pre-submission validation script at [`pre_validate.sh`](./pre_validate.sh) before submitting.
191
+
192
+ ### Sample Inference Script
193
+
194
+ See [`sample_inference.py`](./sample_inference.py) for the complete example, including the mandatory `[START]`, `[STEP]`, and `[END]` structured log format.
195
+
196
+ ---
197
+
198
+ ## Submission
199
+
200
+ - **Submission window opens:** 28th March
201
+ - **Deadline:** 8 April 2026, 11:59 PM IST
202
+
203
+ ### Step 1
204
+ Choose solo or team before you can start the assessment.
205
+
206
+ ### Step 2
207
+ Complete Step 1 first. Problem Statement is live. Build and submit.
208
+
209
+ ---
210
+
211
+ ## Study Material
212
+
213
+ **4 modules · ~3.5 hours**
214
+
215
+ Each module: read the README first, then open the notebook in Colab. No local setup needed.
216
+
217
+ ### Module 1 — Essential for Round 1 (45 min)
218
+ **What you'll do:** Connect to 3 real AI environments hosted online — an Echo bot, a Catch game, and Wordle — and interact with each using the exact same code pattern.
219
+
220
+ ### Module 2 — Essential for Round 1 (50 min)
221
+ **What you'll do:** Write 4 different game-playing strategies for a Catch game, run a competition between them, then switch to a completely different game using the same code.
222
+
223
+ ### Module 3 — Essential for Round 1 (45 min)
224
+ **What you'll do:** Clone an existing environment, modify it, run it on your machine, then deploy your version live to Hugging Face Spaces with one command.
225
+
226
+ ### Module 4 — Most Important for Round 1
227
+ **What you'll do:** Build a complete word-guessing game environment from scratch — define the rules, implement the logic, test it locally, and deploy it live. About 100 lines of real code.
228
+
229
+ - View full course repository
230
+
231
+ ---
232
+
233
+ ## Guide
234
+
235
+ ### What to Expect
236
+ Example of what a problem statement looks like:
237
+ > "Build a mini-game RL environment with clearly defined tasks, automated graders, and deploy it live to Hugging Face Spaces."
238
+
239
+ ### Prerequisites (from Step 1 assessment)
240
+ - Write graders that verify task completion
241
+ - Define reward logic for scoring
242
+ - Package using OpenEnv for automated evaluation
243
+
244
+ **Install before April 1st:**
245
+
246
+ | Tool | Requirement | Command |
247
+ |-----------------------|--------------------------------------|----------------------------------------------|
248
+ | Python 3.10+ | Install 3.10, 3.11, or 3.12 | `python --version` |
249
+ | Git + GitHub account | Push your submission to GitHub or HF | `git --version` |
250
+ | Hugging Face CLI | Deploy to HF Spaces | `pip install huggingface_hub` |
251
+ | | | `huggingface-cli login` |
252
+ | OpenEnv | The framework | `pip install openenv-core` |
253
+ | Google Colab | Prep course runs in Colab (free tier works) | colab.research.google.com |
254
+ | Docker | Isolated container testing | `docker --version` |
255
+ | VS Code (Recommended) | Best Python + Docker support | |
256
+
257
+ ### Step 1 Evaluation Criteria
258
+
259
+ | Criteria | Standard |
260
+ |-----------------------|---------------------------------|
261
+ | Runtime correctness | Runs without errors |
262
+ | Interface compliance | Follows OpenEnv standard |
263
+ | Task design | Clear, realistic, testable |
264
+ | Grading logic | Reward system makes sense |
265
+
266
+ ### How to Submit
267
+
268
+ When Round 1 starts on 1 April:
269
+
270
+ **Step 1 — Application Form**
271
+ Choose your problem domain. The task is open-ended — build any real-world OpenEnv environment that a human would actually do.
272
+
273
+ **Step 2 — Scaffold**
274
+ ```bash
275
+ openenv init my_env
276
+ ```
277
+ Generate project structure.
278
+
279
+ **Step 3 — Build**
280
+ Define your environment in the generated files.
281
+
282
+ **Step 4 — Test locally**
283
+ ```bash
284
+ uv run server
285
+ ```
286
+
287
+ **Step 5 — Deploy**
288
+ ```bash
289
+ openenv push --repo-id your-username/my-env
290
+ ```
291
+
292
+ **Step 6 — Submit**
293
+ Paste your HF Spaces URL on the platform before the deadline.
294
+
295
+ - Submission window opens 28th March
296
+ - Deadline: 8 April 2026, 11:59 PM IST
297
+
298
+ > **Note:** Only team leaders can make the final submission.
299
+
300
+ > **Note:** The Guide above references "4–5 problem statements" — this is outdated. Round 1 is open-ended. There is no fixed list of problem statements to choose from. Build any real-world environment that a human would actually do (e.g. email triage, code review, data cleaning). The requirements and evaluation criteria remain the same.
301
+
302
+ ---
303
+
304
+ ## FAQs
305
+
306
+ ### How does the team/solo declaration work?
307
+
308
+ If you choose to compete solo, you will participate individually for Round 1.
309
+
310
+ If you form a team (2–3 members), only the Team Lead fills out the team formation form before the Round 1 assessment window opens and adds teammates using their registered email IDs. Once a team is confirmed, it cannot be changed.
311
+
312
+ Note: Since Round 2 is a 48-hour in-person hackathon, solo participants who qualify will be matched with other qualifying participants to form teams for the final round.
313
+
314
+ ### Who should fill the team form?
315
+
316
+ Only the team lead completes the team registration form. Teammates do not need to fill out anything at this stage. Once the Team Lead submits the form, listed members will receive an invite on their dashboards. The team will be reflected on their dashboards only after they accept the invite.
317
+
318
+ ### What if someone already added me to their team?
319
+
320
+ This will only happen once you accept their invite; your dashboard will then automatically update to reflect the team you have joined. After confirmation, you will not be able to switch to solo mode or join/form another team. Team assignments are permanent once confirmed.
321
+
322
+ ### Can I change my team or switch to solo after confirming?
323
+
324
+ No. Teams are permanent once confirmed, no changes are allowed. Solo declarations are locked for Round 1. A confirmation prompt is shown before submission, so please review carefully before proceeding.
325
+
326
+ ### Do I need to complete the prep course?
327
+
328
+ While not mandatory, it is strongly recommended.
329
+
330
+ ### What happens during Round 1?
331
+
332
+ You will select one problem statement from a set of challenges and build an RL environment using the OpenEnv framework.
333
+
334
+ ### Can I update my submission?
335
+
336
+ Yes. You may update your submission multiple times until the Round 1 deadline (5th April, 11:59 PM IST). Only the latest submission will be evaluated.
337
+
338
+ ### How are submissions evaluated?
339
+
340
+ Round 1 uses an LLM-based evaluator with structured rubrics. The finale includes LLM screening, manual review, and judging by Meta's global team. Evaluation criteria include runtime correctness, OpenEnv interface compliance, task design quality, grading logic, and overall code quality.
341
+
342
+ ### What framework must be used?
343
+
344
+ All environments must be built using the OpenEnv framework by Meta and Hugging Face.
345
+
346
+ ### What happens after Round 1?
347
+
348
+ Results will be announced on 10 April. The top 3,000 teams will advance to the Grand Finale, a 48-hour on-campus hackathon at Scaler School of Technology, Bangalore (25th–26th April).
349
+
350
+ ### What do I need to submit?
351
+
352
+ A public GitHub repository with your environment code, a `requirements.txt`, a demo script, and a README. A deployed Hugging Face Spaces URL showcasing your working demo.
353
+
354
+ ### Where can I get help?
355
+
356
+ Join the Discord community for announcements and support.
357
+
358
+ For account or registration issues, email: help_openenvhackathon@scaler.com
359
+
360
+ ---
361
+
362
+ ## Support
363
+
364
+ **Need help? Reach out to us:**
365
+ - Email: help_openenvhackathon@scaler.com
inference.py ADDED
@@ -0,0 +1,524 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ MedChain Env — Inference Script
3
+ ================================
4
+ Runs all tasks sequentially and reports scores.
5
+
6
+ MANDATORY environment variables:
7
+ API_BASE_URL The API endpoint for the LLM
8
+ MODEL_NAME / MODEL The model identifier for inference
9
+ HF_TOKEN / API_KEY Your Hugging Face / API key
10
+
11
+ STDOUT FORMAT
12
+ - The script emits exactly three line types to stdout, in this order:
13
+
14
+ [START] task=<task_name> env=medchain model=<model_name>
15
+ [STEP] step=<n> action=<action_str> reward=<0.00> done=<true|false> error=<msg|null>
16
+ [END] success=<true|false> steps=<n> score=<0.000> rewards=<r1,r2,...,rn>
17
+
18
+ Rules:
19
+ - One [START] line at episode begin.
20
+ - One [STEP] line per step, immediately after env.step() returns.
21
+ - One [END] line after env.close(), always emitted (even on exception).
22
+ - reward and rewards are formatted to 2 decimal places; score to 3.
23
+ - done and success are lowercase booleans: true or false.
24
+ - error is the raw error string, or null if none.
25
+ - All fields on a single line with no newlines within a line.
26
+ """
27
+
28
+ import asyncio
29
+ import json
30
+ import logging
31
+ import os
32
+ import sys
33
+ from datetime import datetime
34
+ from pathlib import Path
35
+ from typing import Any, Dict, List, Optional
36
+
37
+ from openai import BadRequestError, OpenAI
38
+
39
+ sys.path.insert(0, str(Path(__file__).parent.parent))
40
+
41
+ from medchain_env import CallToolAction, MedchainEnv
42
+
43
+ # Both modes log to stdout. DEBUG additionally saves to a timestamped file under logs/
44
+ LOG_LEVEL = os.getenv("LOG_LEVEL", "INFO").upper()
45
+
46
+ _log_fmt = logging.Formatter(
47
+ "%(asctime)s [%(levelname)s] %(message)s", datefmt="%H:%M:%S"
48
+ )
49
+ _stream_handler = logging.StreamHandler(sys.stdout)
50
+ _stream_handler.setFormatter(_log_fmt)
51
+ _handlers: list = [_stream_handler]
52
+
53
+ if LOG_LEVEL == "DEBUG":
54
+ os.makedirs("logs", exist_ok=True)
55
+ _log_filename = datetime.now().strftime("logs/inference_%Y%m%d_%H%M%S.log")
56
+ _file_handler = logging.FileHandler(_log_filename)
57
+ _file_handler.setFormatter(_log_fmt)
58
+ _handlers.append(_file_handler)
59
+ print(f"[DEBUG] Logging to file: {_log_filename}", flush=True)
60
+
61
+ logging.basicConfig(level=logging.WARNING, handlers=_handlers)
62
+ log = logging.getLogger(__name__)
63
+ log.setLevel(getattr(logging, LOG_LEVEL, logging.INFO))
64
+
65
+ API_BASE_URL = os.getenv("API_BASE_URL", "https://router.huggingface.co/v1")
66
+ API_KEY = os.getenv("HF_TOKEN") or os.getenv("API_KEY")
67
+ MODEL_NAME = os.getenv("MODEL_NAME") or os.getenv("MODEL", "openai/gpt-oss-120b:groq")
68
+
69
+ TASKS = [
70
+ "orientation_ward",
71
+ "single_ward_stable",
72
+ "multi_ward_seasonal",
73
+ "hospital_network_crisis",
74
+ ]
75
+ MAX_STEPS_PER_TASK = 150
76
+ MAX_TOKENS = 6000
77
+ TEMPERATURE = 0.1
78
+ MAX_CONSECUTIVE_ERRORS = 5
79
+ SLEEP_BETWEEN_STEPS = int(os.getenv("SLEEP_BETWEEN_STEPS", "2"))
80
+ SHIFT_HISTORY_KEEP = 6
81
+
82
+ BENCHMARK = "medchain"
83
+
84
+ SYSTEM_PROMPT = """You are an experienced hospital supply chain manager operating a legacy ERP system.
85
+ Your goal is to maintain adequate medical supplies across all locations while controlling costs.
86
+
87
+ CRITICAL — ACTION BUDGET: You have a strictly limited number of actions per shift.
88
+ Budget does NOT roll over. Unspent actions are lost at end_shift().
89
+
90
+ Recommended budget allocation (highest priority first):
91
+ 1. read_inbox() — ALWAYS do this first to catch urgent alerts
92
+ 2. query_erp(table='inventory') — check current stock levels across all locations
93
+ 3. submit_po(...) — place orders for items below safety stock (PRIORITY)
94
+ 4. end_shift() — call this when budget is exhausted OR tasks are done
95
+
96
+ Query tools (query_erp expiry/pipeline, query_forecast, query_supplier) are LOW PRIORITY.
97
+ Only use them if you have budget remaining AFTER placing critical orders.
98
+
99
+ MANDATORY RULES:
100
+ - If you receive "Action budget exhausted" → call end_shift() as your VERY NEXT action.
101
+ Do NOT call any other tool. The budget cannot be restored until end_shift() is called.
102
+ - Order early: factor in lead times. If lead time is 2 days, order today to avoid stockout in 2 days.
103
+ - Expedited orders require file_justification(ticket_id=...) with a real clinical reason.
104
+ - FEFO: oldest stock consumed first — check expiry and rotate perishables proactively.
105
+ - Recalls: quarantine the recalled lot immediately, then order a replacement.
106
+ - MCI events: pre-emptive ordering beats reactive ordering. Order extra blood/critical supplies NOW.
107
+
108
+ Safety stock target: aim for at least (lead_time + 1) × daily_demand units on hand.
109
+
110
+ When calling tools, use the EXACT parameter names shown in the tool descriptions.
111
+ """
112
+
113
+
114
+ def log_start(task: str, model: str) -> None:
115
+ print(f"[START] task={task} env={BENCHMARK} model={model}", flush=True)
116
+
117
+
118
+ def log_step(
119
+ step: int, action: str, reward: float, done: bool, error: Optional[str]
120
+ ) -> None:
121
+ error_val = error if error else "null"
122
+ print(
123
+ f"[STEP] step={step} action={action} reward={reward:.2f} done={str(done).lower()} error={error_val}",
124
+ flush=True,
125
+ )
126
+
127
+
128
+ def log_end(success: bool, steps: int, score: float, rewards: List[float]) -> None:
129
+ rewards_str = ",".join(f"{r:.2f}" for r in rewards)
130
+ print(
131
+ f"[END] success={str(success).lower()} steps={steps} score={score:.3f} rewards={rewards_str}",
132
+ flush=True,
133
+ )
134
+
135
+
136
+ def _tools_to_openai_format(tools) -> List[dict]:
137
+ """Convert MCP tools to OpenAI function-calling format."""
138
+ openai_tools = []
139
+ for tool in tools:
140
+ properties = {}
141
+ required = []
142
+ if tool.input_schema and "properties" in tool.input_schema:
143
+ for name, schema in tool.input_schema["properties"].items():
144
+ properties[name] = {
145
+ "type": schema.get("type", "string"),
146
+ "description": schema.get("description", ""),
147
+ }
148
+ required = tool.input_schema.get("required", [])
149
+ openai_tools.append(
150
+ {
151
+ "type": "function",
152
+ "function": {
153
+ "name": tool.name,
154
+ "description": tool.description or "",
155
+ "parameters": {
156
+ "type": "object",
157
+ "properties": properties,
158
+ "required": required,
159
+ },
160
+ },
161
+ }
162
+ )
163
+ log.debug("Tool registered: %s (required=%s)", tool.name, required)
164
+ return openai_tools
165
+
166
+
167
+ def _make_shift_summary(shift_day: int, end_shift_result: str) -> str:
168
+ """Build a compact summary of a completed shift for the context window."""
169
+ lines = []
170
+ for line in (end_shift_result or "").splitlines():
171
+ stripped = line.strip()
172
+ if stripped and any(
173
+ kw in stripped
174
+ for kw in [
175
+ "DEMAND:",
176
+ "FULFILLED:",
177
+ "DELIVERIES:",
178
+ "SPEND:",
179
+ "WASTE:",
180
+ "EXPIRY:",
181
+ "STOCKOUT",
182
+ "CRITICAL",
183
+ "END OF SHIFT",
184
+ "Day ",
185
+ "Score:",
186
+ "═",
187
+ "─",
188
+ ]
189
+ ):
190
+ lines.append(stripped)
191
+ if len(lines) >= 20:
192
+ break
193
+ summary_body = "\n".join(lines) if lines else (end_shift_result or "")[:400]
194
+ return f"[SHIFT DAY {shift_day} SUMMARY]\n{summary_body}"
195
+
196
+
197
+ async def run_task_episode(
198
+ env: MedchainEnv,
199
+ client: OpenAI,
200
+ tools: List[dict],
201
+ task_name: str,
202
+ ) -> Dict[str, Any]:
203
+ """Run one episode of a task and return the result."""
204
+ tool_names = [t["function"]["name"] for t in tools]
205
+
206
+ obs = await env.reset()
207
+ obs = obs.observation
208
+ dashboard = obs.metadata.get("dashboard", "")
209
+
210
+ log_start(task=task_name, model=MODEL_NAME)
211
+ log.debug("[%s] Episode started. Tools: %s", task_name, tool_names)
212
+
213
+ chat_history: List[dict] = [
214
+ {"role": "system", "content": SYSTEM_PROMPT},
215
+ {
216
+ "role": "user",
217
+ "content": f"Your shift has started. Current dashboard:\n\n{dashboard}",
218
+ },
219
+ ]
220
+
221
+ step_count = 0
222
+ final_reward = 0.0
223
+ done = obs.done
224
+ consecutive_errors = 0
225
+ rewards: List[float] = []
226
+ past_shift_summaries: List[str] = []
227
+ current_shift_messages: List[dict] = []
228
+
229
+ while not done and step_count < MAX_STEPS_PER_TASK:
230
+ step_count += 1
231
+ log.debug(
232
+ "[%s] Step %d/%d — %d messages in context",
233
+ task_name,
234
+ step_count,
235
+ MAX_STEPS_PER_TASK,
236
+ len(chat_history),
237
+ )
238
+
239
+ try:
240
+ response = client.chat.completions.create(
241
+ model=MODEL_NAME,
242
+ messages=chat_history,
243
+ tools=tools,
244
+ tool_choice="required",
245
+ max_completion_tokens=MAX_TOKENS,
246
+ temperature=TEMPERATURE,
247
+ )
248
+ consecutive_errors = 0
249
+ except BadRequestError as e:
250
+ consecutive_errors += 1
251
+ log.warning(
252
+ "[%s] Step %d — BadRequestError (%d/%d): %s",
253
+ task_name,
254
+ step_count,
255
+ consecutive_errors,
256
+ MAX_CONSECUTIVE_ERRORS,
257
+ e,
258
+ )
259
+ if consecutive_errors >= MAX_CONSECUTIVE_ERRORS:
260
+ log.error(
261
+ "[%s] Aborting after %d consecutive errors",
262
+ task_name,
263
+ MAX_CONSECUTIVE_ERRORS,
264
+ )
265
+ break
266
+
267
+ err_msg = (
268
+ f"Your previous tool call was rejected with an error:\n{e}\n\n"
269
+ "Please retry with a valid tool call. If your budget is exhausted, call end_shift()."
270
+ )
271
+ chat_history.append({"role": "user", "content": err_msg})
272
+ current_shift_messages.append({"role": "user", "content": err_msg})
273
+ continue
274
+
275
+ message = response.choices[0].message
276
+ log.debug(
277
+ "[%s] Step %d — finish_reason=%s tool_calls=%d",
278
+ task_name,
279
+ step_count,
280
+ response.choices[0].finish_reason,
281
+ len(message.tool_calls) if message.tool_calls else 0,
282
+ )
283
+
284
+ if not message.tool_calls:
285
+ log.warning(
286
+ "[%s] Step %d — no tool_calls in response; falling back to end_shift",
287
+ task_name,
288
+ step_count,
289
+ )
290
+ tool_name = "end_shift"
291
+ tool_args = {}
292
+ tool_call_id = "fallback"
293
+ else:
294
+ tc = message.tool_calls[0]
295
+ tool_name = tc.function.name
296
+ tool_call_id = tc.id
297
+ try:
298
+ tool_args = json.loads(tc.function.arguments)
299
+ except (json.JSONDecodeError, AttributeError):
300
+ log.warning(
301
+ "[%s] Step %d — failed to parse tool arguments: %r",
302
+ task_name,
303
+ step_count,
304
+ tc.function.arguments,
305
+ )
306
+ tool_args = {}
307
+
308
+ if tool_name not in tool_names:
309
+ log.warning(
310
+ "[%s] Step %d — unknown tool %r; falling back to end_shift",
311
+ task_name,
312
+ step_count,
313
+ tool_name,
314
+ )
315
+ tool_name = "end_shift"
316
+ tool_args = {}
317
+
318
+ log.debug(
319
+ "[%s] Step %d — calling %s(%s)", task_name, step_count, tool_name, tool_args
320
+ )
321
+
322
+ assistant_msg = {
323
+ "role": "assistant",
324
+ "content": None,
325
+ "tool_calls": [
326
+ {
327
+ "id": tool_call_id,
328
+ "type": "function",
329
+ "function": {
330
+ "name": tool_name,
331
+ "arguments": json.dumps(tool_args),
332
+ },
333
+ }
334
+ ],
335
+ }
336
+ chat_history.append(assistant_msg)
337
+ current_shift_messages.append(assistant_msg)
338
+
339
+ action = CallToolAction(tool_name=tool_name, arguments=tool_args)
340
+ step_result = await env.step(action)
341
+ obs = step_result.observation
342
+ done = obs.done
343
+
344
+ result_text = obs.metadata.get("tool_result", str(obs.metadata))
345
+ step_reward = obs.reward or 0.0
346
+ step_error: Optional[str] = None
347
+
348
+ if "EPISODE COMPLETE" in (result_text or ""):
349
+ log.info("[%s] Step %d — episode complete detected", task_name, step_count)
350
+ done = True
351
+
352
+ if obs.reward is not None and obs.reward > 0:
353
+ final_reward = obs.reward
354
+
355
+ rewards.append(step_reward)
356
+ action_str = f"{tool_name}({json.dumps(tool_args)})"
357
+ log_step(
358
+ step=step_count,
359
+ action=action_str,
360
+ reward=step_reward,
361
+ done=done,
362
+ error=step_error,
363
+ )
364
+
365
+ tool_result_msg = {
366
+ "role": "tool",
367
+ "tool_call_id": tool_call_id,
368
+ "content": result_text[:2000] if result_text else "OK",
369
+ }
370
+ chat_history.append(tool_result_msg)
371
+ current_shift_messages.append(tool_result_msg)
372
+
373
+ # Budget exhausted — inject directive and skip sleep
374
+ if "Action budget exhausted" in (result_text or ""):
375
+ log.info(
376
+ "[%s] Step %d — budget exhausted; injecting end_shift directive",
377
+ task_name,
378
+ step_count,
379
+ )
380
+ directive = (
381
+ "SYSTEM ALERT: Your action budget for this shift is fully exhausted. "
382
+ "You MUST call end_shift() as your very next action. "
383
+ "Every other tool call will fail until you do."
384
+ )
385
+ chat_history.append({"role": "user", "content": directive})
386
+ current_shift_messages.append({"role": "user", "content": directive})
387
+ continue
388
+
389
+ if SLEEP_BETWEEN_STEPS > 0:
390
+ await asyncio.sleep(SLEEP_BETWEEN_STEPS)
391
+
392
+ # Shift ended — summarise and prune context, then set up next shift
393
+ if (
394
+ tool_name == "end_shift"
395
+ and "END OF SHIFT" in (result_text or "")
396
+ and not done
397
+ ):
398
+ shift_day = "?"
399
+ for part in (result_text or "").split():
400
+ if part.isdigit():
401
+ shift_day = part
402
+ break
403
+
404
+ shift_summary = _make_shift_summary(shift_day, result_text or "")
405
+ past_shift_summaries.append(shift_summary)
406
+ log.info(
407
+ "[%s] Step %d — shift %s ended; pruning context (%d summaries)",
408
+ task_name,
409
+ step_count,
410
+ shift_day,
411
+ len(past_shift_summaries),
412
+ )
413
+
414
+ summaries_msg = {
415
+ "role": "user",
416
+ "content": "COMPLETED SHIFT SUMMARIES:\n\n"
417
+ + "\n\n".join(past_shift_summaries),
418
+ }
419
+ trimmed = (
420
+ current_shift_messages[-SHIFT_HISTORY_KEEP:]
421
+ if len(current_shift_messages) > SHIFT_HISTORY_KEEP
422
+ else list(current_shift_messages)
423
+ )
424
+ # Strip orphaned leading tool-response messages to avoid API errors
425
+ while trimmed and trimmed[0].get("role") == "tool":
426
+ log.debug(
427
+ "[%s] Dropping orphaned leading tool msg (tool_call_id=%s)",
428
+ task_name,
429
+ trimmed[0].get("tool_call_id"),
430
+ )
431
+ trimmed = trimmed[1:]
432
+
433
+ chat_history = (
434
+ [
435
+ {"role": "system", "content": SYSTEM_PROMPT},
436
+ summaries_msg,
437
+ ]
438
+ + trimmed
439
+ + [
440
+ {
441
+ "role": "user",
442
+ "content": "Your next shift has begun. The dashboard is shown above in the last tool result. "
443
+ "Continue managing the supply chain.",
444
+ },
445
+ ]
446
+ )
447
+ current_shift_messages = []
448
+
449
+ log.info(
450
+ "[%s] Episode finished. steps=%d done=%s final_reward=%.4f",
451
+ task_name,
452
+ step_count,
453
+ done,
454
+ final_reward,
455
+ )
456
+ return {
457
+ "task": task_name,
458
+ "reward": final_reward,
459
+ "steps": step_count,
460
+ "done": done,
461
+ "rewards": rewards,
462
+ }
463
+
464
+
465
+ async def async_main() -> None:
466
+ if not API_KEY:
467
+ raise SystemExit("HF_TOKEN or API_KEY must be set.")
468
+ if not MODEL_NAME:
469
+ raise SystemExit("MODEL_NAME or MODEL must be set.")
470
+
471
+ log.info("Starting. API_BASE_URL=%s MODEL_NAME=%s", API_BASE_URL, MODEL_NAME)
472
+
473
+ client = OpenAI(base_url=API_BASE_URL, api_key=API_KEY)
474
+ results = []
475
+
476
+ for task_name in TASKS:
477
+ log.info("Launching task: %s", task_name)
478
+ env = await MedchainEnv.from_docker_image(
479
+ "medchain_env-env:latest",
480
+ env_vars={"MEDCHAIN_TASK": task_name},
481
+ )
482
+ final_reward = 0.0
483
+ success = False
484
+ steps = 0
485
+ step_rewards: List[float] = []
486
+ try:
487
+ mcp_tools = await env.list_tools()
488
+ tools = _tools_to_openai_format(mcp_tools)
489
+ log.info("[%s] %d tools discovered", task_name, len(tools))
490
+
491
+ result = await run_task_episode(env, client, tools, task_name)
492
+ results.append(result)
493
+ final_reward = result["reward"]
494
+ steps = result["steps"]
495
+ success = result["done"]
496
+ step_rewards = result["rewards"]
497
+ log.info(
498
+ "[%s] Task complete: reward=%.4f steps=%d",
499
+ task_name,
500
+ final_reward,
501
+ steps,
502
+ )
503
+ except Exception as e:
504
+ log.error("[%s] Task failed with exception: %s", task_name, e)
505
+ finally:
506
+ try:
507
+ await env.close()
508
+ except Exception as e:
509
+ log.error("[%s] env.close() failed: %s", task_name, e)
510
+ log_end(
511
+ success=success, steps=steps, score=final_reward, rewards=step_rewards
512
+ )
513
+
514
+ if results:
515
+ avg_reward = sum(r["reward"] for r in results) / len(results)
516
+ log.info("All tasks complete. avg_reward=%.4f", avg_reward)
517
+
518
+
519
+ def main() -> None:
520
+ asyncio.run(async_main())
521
+
522
+
523
+ if __name__ == "__main__":
524
+ main()
logs/inference_20260403_015029.log ADDED
@@ -0,0 +1,79 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 01:50:29 [DEBUG] Using selector: EpollSelector
2
+ 01:50:29 [INFO] Starting. API_BASE_URL=https://router.huggingface.co/v1 MODEL_NAME=openai/gpt-oss-20b:groq
3
+ 01:50:29 [INFO] Launching task: orientation_ward
4
+ 01:50:30 [DEBUG] Starting new HTTP connection (1): localhost:60093
5
+ 01:50:30 [DEBUG] Starting new HTTP connection (1): localhost:60093
6
+ 01:50:31 [DEBUG] Starting new HTTP connection (1): localhost:60093
7
+ 01:50:31 [DEBUG] Starting new HTTP connection (1): localhost:60093
8
+ 01:50:32 [DEBUG] Starting new HTTP connection (1): localhost:60093
9
+ 01:50:32 [DEBUG] Starting new HTTP connection (1): localhost:60093
10
+ 01:50:33 [DEBUG] Starting new HTTP connection (1): localhost:60093
11
+ 01:50:33 [DEBUG] Starting new HTTP connection (1): localhost:60093
12
+ 01:50:34 [DEBUG] Starting new HTTP connection (1): localhost:60093
13
+ 01:50:34 [DEBUG] http://localhost:60093 "GET /health HTTP/1.1" 200 20
14
+ 01:50:34 [DEBUG] = connection is CONNECTING
15
+ 01:50:34 [DEBUG] > GET /ws HTTP/1.1
16
+ 01:50:34 [DEBUG] > Host: localhost:60093
17
+ 01:50:34 [DEBUG] > Upgrade: websocket
18
+ 01:50:34 [DEBUG] > Connection: Upgrade
19
+ 01:50:34 [DEBUG] > Sec-WebSocket-Key: +R8/wDU495sD6dSvTQfJJA==
20
+ 01:50:34 [DEBUG] > Sec-WebSocket-Version: 13
21
+ 01:50:34 [DEBUG] > Sec-WebSocket-Extensions: permessage-deflate; client_max_window_bits
22
+ 01:50:34 [DEBUG] > User-Agent: Python/3.12 websockets/16.0
23
+ 01:50:34 [DEBUG] < HTTP/1.1 101 Switching Protocols
24
+ 01:50:34 [DEBUG] < Upgrade: websocket
25
+ 01:50:34 [DEBUG] < Connection: Upgrade
26
+ 01:50:34 [DEBUG] < Sec-WebSocket-Accept: dLM5YlCPG5KeH5IniKYMatgH6Ao=
27
+ 01:50:34 [DEBUG] < Sec-WebSocket-Extensions: permessage-deflate
28
+ 01:50:34 [DEBUG] < date: Thu, 02 Apr 2026 20:20:33 GMT
29
+ 01:50:34 [DEBUG] < server: uvicorn
30
+ 01:50:34 [DEBUG] = connection is OPEN
31
+ 01:50:34 [DEBUG] > TEXT '{"type": "step", "data": {"type": "list_tools"}}' [48 bytes]
32
+ 01:50:34 [DEBUG] < TEXT '{"type":"observation","data":{"observation":{"t...rd":null,"done":false}}' [5881 bytes]
33
+ 01:50:34 [DEBUG] Tool registered: read_inbox (required=[])
34
+ 01:50:34 [DEBUG] Tool registered: query_erp (required=['table'])
35
+ 01:50:34 [DEBUG] Tool registered: query_supplier (required=['supplier_id'])
36
+ 01:50:34 [DEBUG] Tool registered: query_forecast (required=['product_id', 'location_id'])
37
+ 01:50:34 [DEBUG] Tool registered: submit_po (required=['supplier_id', 'product_id', 'destination_id', 'quantity'])
38
+ 01:50:34 [DEBUG] Tool registered: transfer (required=['from_location_id', 'to_location_id', 'product_id', 'quantity'])
39
+ 01:50:34 [DEBUG] Tool registered: quarantine_lot (required=['location_id', 'sku', 'lot_id'])
40
+ 01:50:34 [DEBUG] Tool registered: file_justification (required=['ticket_id', 'reason'])
41
+ 01:50:34 [DEBUG] Tool registered: end_shift (required=[])
42
+ 01:50:34 [INFO] [orientation_ward] 9 tools discovered
43
+ 01:50:34 [DEBUG] > TEXT '{"type": "reset", "data": {}}' [29 bytes]
44
+ 01:50:34 [DEBUG] < TEXT '{"type":"observation","data":{"observation":{"d...ard":0.0,"done":false}}' [1985 bytes]
45
+ 01:50:34 [INFO] [orientation_ward] Episode started. Tools: ['read_inbox', 'query_erp', 'query_supplier', 'query_forecast', 'submit_po', 'transfer', 'quarantine_lot', 'file_justification', 'end_shift']
46
+ 01:50:34 [INFO] [orientation_ward] Step 1/150 — 2 messages in context
47
+ 01:50:34 [DEBUG] Request options: {'method': 'post', 'url': '/chat/completions', 'files': None, 'idempotency_key': 'stainless-python-retry-120d6e65-e8bc-4e74-9bc2-54608097b49a', 'content': None, 'json_data': {'messages': [{'role': 'system', 'content': 'You are an experienced hospital supply chain manager operating a legacy ERP system.\nYour goal is to maintain adequate medical supplies across all locations while controlling costs.\n\nCRITICAL — ACTION BUDGET: You have a strictly limited number of actions per shift.\nBudget does NOT roll over. Unspent actions are lost at end_shift().\n\nRecommended budget allocation (highest priority first):\n 1. read_inbox() — ALWAYS do this first to catch urgent alerts\n 2. query_erp(table=\'inventory\') — check current stock levels across all locations\n 3. submit_po(...) — place orders for items below safety stock (PRIORITY)\n 4. end_shift() — call this when budget is exhausted OR tasks are done\n\nQuery tools (query_erp expiry/pipeline, query_forecast, query_supplier) are LOW PRIORITY.\nOnly use them if you have budget remaining AFTER placing critical orders.\n\nMANDATORY RULES:\n- If you receive "Action budget exhausted" → call end_shift() as your VERY NEXT action.\n Do NOT call any other tool. The budget cannot be restored until end_shift() is called.\n- Order early: factor in lead times. If lead time is 2 days, order today to avoid stockout in 2 days.\n- Expedited orders require file_justification(ticket_id=...) with a real clinical reason.\n- FEFO: oldest stock consumed first — check expiry and rotate perishables proactively.\n- Recalls: quarantine the recalled lot immediately, then order a replacement.\n- MCI events: pre-emptive ordering beats reactive ordering. Order extra blood/critical supplies NOW.\n\nSafety stock target: aim for at least (lead_time + 1) × daily_demand units on hand.\n\nWhen calling tools, use the EXACT parameter names shown in the tool descriptions.\n'}, {'role': 'user', 'content': 'Your shift has started. Current dashboard:\n\n╔════════════════════════════════════════════════════════════════════╗\n║ MEDSUPPLY ERP v2.1 — CENTRAL HOSPITAL NETWORK ║\n║ Task: orientation_ward | Shift: Day 1 of 2 ║\n║ Actions remaining: 5/5 ║\n║ Budget used: $0 / $5,000 ║\n╠════════════════════════════════════════════════════════════════════╣\n║ [!] COMMS PAGER: 1 unread message(s) ║\n║ [·] INVDB: No expiry warnings ║\n║ [·] PROCURENET: 0 order(s) in transit ║\n╠════════════════════════════════════════════════════════════════════╣\n║ SUPPLIERS (use exact IDs below): ║\n║ MEDLINE → GLOVE-001, SYR-10, MASK-001 ║\n╚════════════════════════════════════════════════════════════════════╝\nAwaiting input.\nAvailable tools: read_inbox, query_erp, query_supplier, query_forecast, submit_po, transfer, quarantine_lot, file_justification, end_shift'}], 'model': 'openai/gpt-oss-20b:groq', 'max_completion_tokens': 6000, 'temperature': 0.1, 'tool_choice': 'required', 'tools': [{'type': 'function', 'function': {'name': 'read_inbox', 'description': "Read messages from the COMMS PAGER inbox.\n\nArgs:\n filter: Message filter — 'unread' (default), 'all', or 'flagged'\n\nReturns:\n Formatted inbox messages as raw text", 'parameters': {'type': 'object', 'properties': {'filter': {'type': 'string', 'description': ''}}, 'required': []}}}, {'type': 'function', 'function': {'name': 'query_erp', 'description': "Query the legacy ERP database.\n\nArgs:\n table: Table to query — 'inventory', 'expiry', 'pipeline_orders', or 'demand_history'\n location: Location ID or 'all'. E.g. 'ward_general', 'ward_icu', 'hospital_a'\n sku: Product SKU or 'all'. E.g. 'B-001', 'IV-500', 'GLOVE-001'\n\nReturns:\n ASCII table with query results (legacy ERP format)", 'parameters': {'type': 'object', 'properties': {'table': {'type': 'string', 'description': ''}, 'location': {'type': 'string', 'description': ''}, 'sku': {'type': 'string', 'description': ''}}, 'required': ['table']}}}, {'type': 'function', 'function': {'name': 'query_supplier', 'description': 'Query supplier information including current lead times and disruptions.\n\nArgs:\n supplier_id: Supplier identifier. Check the dashboard for valid supplier IDs.\n\nReturns:\n Supplier status text including lead times and any active disruptions', 'parameters': {'type': 'object', 'properties': {'supplier_id': {'type': 'string', 'description': ''}}, 'required': ['supplier_id']}}}, {'type': 'function', 'function': {'name': 'query_forecast', 'description': "Get demand forecast for a product at a location.\n\nArgs:\n product_id: Product SKU to forecast. Use query_erp(table='inventory') to see available SKUs.\n location_id: Location to forecast for. Use query_erp(table='inventory') to see valid location IDs.\n horizon_days: Forecast horizon in days (1-21, default 7)\n\nReturns:\n Forecasted daily demand table", 'parameters': {'type': 'object', 'properties': {'product_id': {'type': 'string', 'description': ''}, 'location_id': {'type': 'string', 'description': ''}, 'horizon_days': {'type': 'integer', 'description': ''}}, 'required': ['product_id', 'location_id']}}}, {'type': 'function', 'function': {'name': 'submit_po', 'description': "Submit a purchase order to a supplier.\n\nArgs:\n supplier_id: Supplier to order from. Check the dashboard for valid supplier IDs.\n product_id: Product SKU to order. Use query_erp(table='inventory') to see available SKUs.\n destination_id: Delivery location. Use query_erp(table='inventory') to see valid location IDs.\n quantity: Number of units to order (must be positive)\n priority: 'standard' (default) or 'expedited' (+50% cost, -2 day lead time; requires justification)\n\nReturns:\n Confirmation with PO ID and ETA, or error if budget/validation fails.\n For expedited orders: returns BUDGET_OVERRIDE_REQUIRED with a ticket ID.\n Use file_justification(ticket_id=...) to proceed.", 'parameters': {'type': 'object', 'properties': {'supplier_id': {'type': 'string', 'description': ''}, 'product_id': {'type': 'string', 'description': ''}, 'destination_id': {'type': 'string', 'description': ''}, 'quantity': {'type': 'integer', 'description': ''}, 'priority': {'type': 'string', 'description': ''}}, 'required': ['supplier_id', 'product_id', 'destination_id', 'quantity']}}}, {'type': 'function', 'function': {'name': 'transfer', 'description': "Transfer inventory between locations (small handling fee).\n\nArgs:\n from_location_id: Source location. Use query_erp(table='inventory') to see valid location IDs.\n to_location_id: Destination location. Use query_erp(table='inventory') to see valid location IDs.\n product_id: Product SKU to transfer. Use query_erp(table='inventory') to see available SKUs.\n quantity: Units to transfer (must not exceed available stock at source)\n\nReturns:\n Confirmation or error (insufficient stock, capacity exceeded)", 'parameters': {'type': 'object', 'properties': {'from_location_id': {'type': 'string', 'description': ''}, 'to_location_id': {'type': 'string', 'description': ''}, 'product_id': {'type': 'string', 'description': ''}, 'quantity': {'type': 'integer', 'description': ''}}, 'required': ['from_location_id', 'to_location_id', 'product_id', 'quantity']}}}, {'type': 'function', 'function': {'name': 'quarantine_lot', 'description': "Quarantine a specific inventory lot (e.g. for product recalls or cold chain breaches).\nQuarantined lots are excluded from demand fulfillment.\n\nArgs:\n location_id: Location where the lot is stored\n sku: Product SKU of the lot\n lot_id: Lot identifier (from inventory query). Use 'all' to quarantine all lots of this SKU at this location.\n\nReturns:\n Confirmation with quarantine details and disposal ticket ID", 'parameters': {'type': 'object', 'properties': {'location_id': {'type': 'string', 'description': ''}, 'sku': {'type': 'string', 'description': ''}, 'lot_id': {'type': 'string', 'description': ''}}, 'required': ['location_id', 'sku', 'lot_id']}}}, {'type': 'function', 'function': {'name': 'file_justification', 'description': "File a budget override justification to proceed with an expedited order.\n\nRequired after submit_po returns BUDGET_OVERRIDE_REQUIRED.\nThe justification is audited by Finance — it must reference the current clinical situation.\nA false or incoherent justification is flagged and results in a scoring penalty.\n\nArgs:\n ticket_id: The ticket ID from the BUDGET_OVERRIDE_REQUIRED error (e.g. 'BOT-0002')\n reason: Free-text justification. Must clearly reference the reason for urgency.\n\nReturns:\n OK confirmation and PO submission details, or FLAGGED audit warning", 'parameters': {'type': 'object', 'properties': {'ticket_id': {'type': 'string', 'description': ''}, 'reason': {'type': 'string', 'description': ''}}, 'required': ['ticket_id', 'reason']}}}, {'type': 'function', 'function': {'name': 'end_shift', 'description': 'End the current shift and advance the simulation by one day.\n\nCommits all pending decisions. Simulates demand, deliveries, and expiry for the day.\nResets your action budget for the next shift.\nUnspent actions are lost — no rollover.\n\nReturns:\n Day summary report + next shift dashboard', 'parameters': {'type': 'object', 'properties': {}, 'required': []}}}]}}
48
+ 01:50:34 [DEBUG] Sending HTTP Request: POST https://router.huggingface.co/v1/chat/completions
49
+ 01:50:34 [DEBUG] connect_tcp.started host='router.huggingface.co' port=443 local_address=None timeout=5.0 socket_options=None
50
+ 01:50:34 [DEBUG] connect_tcp.complete return_value=<httpcore._backends.sync.SyncStream object at 0x747917eb6240>
51
+ 01:50:34 [DEBUG] start_tls.started ssl_context=<ssl.SSLContext object at 0x74791cb135d0> server_hostname='router.huggingface.co' timeout=5.0
52
+ 01:50:34 [DEBUG] start_tls.complete return_value=<httpcore._backends.sync.SyncStream object at 0x747917eb4500>
53
+ 01:50:34 [DEBUG] send_request_headers.started request=<Request [b'POST']>
54
+ 01:50:34 [DEBUG] send_request_headers.complete
55
+ 01:50:34 [DEBUG] send_request_body.started request=<Request [b'POST']>
56
+ 01:50:34 [DEBUG] send_request_body.complete
57
+ 01:50:34 [DEBUG] receive_response_headers.started request=<Request [b'POST']>
58
+ 01:50:35 [DEBUG] receive_response_headers.complete return_value=(b'HTTP/1.1', 200, b'OK', [(b'Content-Type', b'application/json'), (b'Transfer-Encoding', b'chunked'), (b'Connection', b'keep-alive'), (b'Date', b'Thu, 02 Apr 2026 20:20:35 GMT'), (b'x-ratelimit-reset-requests', b'60ms'), (b'x-ratelimit-reset-tokens', b'227ms'), (b'X-Powered-By', b'huggingface-moon'), (b'x-request-id', b'req_01kn7xnnw2et7b44t68w493sbh'), (b'cross-origin-opener-policy', b'same-origin'), (b'Referrer-Policy', b'strict-origin-when-cross-origin'), (b'vary', b'Origin'), (b'Access-Control-Allow-Origin', b'*'), (b'Access-Control-Expose-Headers', b'X-Repo-Commit,X-Request-Id,X-Error-Code,X-Error-Message,X-Total-Count,ETag,Link,Accept-Ranges,Content-Range,X-Linked-Size,X-Linked-ETag,X-Xet-Hash'), (b'X-Robots-Tag', b'none'), (b'x-inference-provider', b'groq'), (b'cache-control', b'private, max-age=0, no-store, no-cache, must-revalidate'), (b'cf-cache-status', b'DYNAMIC'), (b'cf-ray', b'9e6288f8ae430833-IAD'), (b'server', b'cloudflare'), (b'set-cookie', b'__cf_bm=3TErgywCHmJ_bGgyJphVZdQpluX.hxyLeoVsm8_e_CY-1775161235.3111424-1.0.1.1-U.ik6GSM5toZqCr9TaLOgDqgUJnOiV4SOllSRV14nfHpefPesAChv35s_asXm1_5rjY.rCEVP99ehFWf4o7xE4KUcikmwkYAGRHlPBg.mqYf.csPV611ScNfRGbN3h8G; HttpOnly; Secure; Path=/; Domain=groq.com; Expires=Thu, 02 Apr 2026 20:50:35 GMT'), (b'strict-transport-security', b'max-age=15552000'), (b'x-groq-region', b'msp'), (b'x-ratelimit-limit-requests', b'1440000'), (b'x-ratelimit-limit-tokens', b'750000'), (b'x-ratelimit-remaining-requests', b'1439999'), (b'x-ratelimit-remaining-tokens', b'747158'), (b'X-Cache', b'Miss from cloudfront'), (b'Via', b'1.1 e6a37c61d86d6e0bcdada5b6b948004c.cloudfront.net (CloudFront)'), (b'X-Amz-Cf-Pop', b'BLR50-P4'), (b'X-Amz-Cf-Id', b'E630iyt9rgMwsBGL2qEjR0o9NYKo5FhjdWsUglT3OhS6BLdKRxVETQ==')])
59
+ 01:50:35 [INFO] HTTP Request: POST https://router.huggingface.co/v1/chat/completions "HTTP/1.1 200 OK"
60
+ 01:50:35 [DEBUG] receive_response_body.started request=<Request [b'POST']>
61
+ 01:50:35 [DEBUG] receive_response_body.complete
62
+ 01:50:35 [DEBUG] response_closed.started
63
+ 01:50:35 [DEBUG] response_closed.complete
64
+ 01:50:35 [DEBUG] HTTP Response: POST https://router.huggingface.co/v1/chat/completions "200 OK" Headers({'content-type': 'application/json', 'transfer-encoding': 'chunked', 'connection': 'keep-alive', 'date': 'Thu, 02 Apr 2026 20:20:35 GMT', 'x-ratelimit-reset-requests': '60ms', 'x-ratelimit-reset-tokens': '227ms', 'x-powered-by': 'huggingface-moon', 'x-request-id': 'req_01kn7xnnw2et7b44t68w493sbh', 'cross-origin-opener-policy': 'same-origin', 'referrer-policy': 'strict-origin-when-cross-origin', 'vary': 'Origin', 'access-control-allow-origin': '*', 'access-control-expose-headers': 'X-Repo-Commit,X-Request-Id,X-Error-Code,X-Error-Message,X-Total-Count,ETag,Link,Accept-Ranges,Content-Range,X-Linked-Size,X-Linked-ETag,X-Xet-Hash', 'x-robots-tag': 'none', 'x-inference-provider': 'groq', 'cache-control': 'private, max-age=0, no-store, no-cache, must-revalidate', 'cf-cache-status': 'DYNAMIC', 'cf-ray': '9e6288f8ae430833-IAD', 'server': 'cloudflare', 'set-cookie': '__cf_bm=3TErgywCHmJ_bGgyJphVZdQpluX.hxyLeoVsm8_e_CY-1775161235.3111424-1.0.1.1-U.ik6GSM5toZqCr9TaLOgDqgUJnOiV4SOllSRV14nfHpefPesAChv35s_asXm1_5rjY.rCEVP99ehFWf4o7xE4KUcikmwkYAGRHlPBg.mqYf.csPV611ScNfRGbN3h8G; HttpOnly; Secure; Path=/; Domain=groq.com; Expires=Thu, 02 Apr 2026 20:50:35 GMT', 'strict-transport-security': 'max-age=15552000', 'x-groq-region': 'msp', 'x-ratelimit-limit-requests': '1440000', 'x-ratelimit-limit-tokens': '750000', 'x-ratelimit-remaining-requests': '1439999', 'x-ratelimit-remaining-tokens': '747158', 'x-cache': 'Miss from cloudfront', 'via': '1.1 e6a37c61d86d6e0bcdada5b6b948004c.cloudfront.net (CloudFront)', 'x-amz-cf-pop': 'BLR50-P4', 'x-amz-cf-id': 'E630iyt9rgMwsBGL2qEjR0o9NYKo5FhjdWsUglT3OhS6BLdKRxVETQ=='})
65
+ 01:50:35 [DEBUG] request_id: req_01kn7xnnw2et7b44t68w493sbh
66
+ 01:50:35 [DEBUG] [orientation_ward] Step 1 — finish_reason=tool_calls tool_calls=1
67
+ 01:50:35 [DEBUG] [orientation_ward] Step 1 — calling read_inbox({'filter': 'unread'})
68
+ 01:50:35 [DEBUG] > TEXT '{"type": "step", "data": {"type": "call_tool", ... {"filter": "unread"}}}' [109 bytes]
69
+ 01:50:35 [DEBUG] < TEXT '{"type":"observation","data":{"observation":{"t...rd":0.01,"done":false}}' [530 bytes]
70
+ 01:50:36 [DEBUG] > TEXT '{"type": "close"}' [17 bytes]
71
+ 01:50:36 [DEBUG] > CLOSE 1000 (OK) [2 bytes]
72
+ 01:50:36 [DEBUG] = connection is CLOSING
73
+ 01:50:36 [DEBUG] < CLOSE 1000 (OK) [2 bytes]
74
+ 01:50:36 [DEBUG] < EOF
75
+ 01:50:36 [DEBUG] > EOF
76
+ 01:50:36 [DEBUG] = connection is CLOSED
77
+ 01:50:36 [DEBUG] x half-closing TCP connection
78
+ 01:50:37 [DEBUG] close.started
79
+ 01:50:37 [DEBUG] close.complete
logs/inference_20260403_015146.log ADDED
@@ -0,0 +1,118 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 01:51:46 [DEBUG] Using selector: EpollSelector
2
+ 01:51:46 [INFO] Starting. API_BASE_URL=https://router.huggingface.co/v1 MODEL_NAME=openai/gpt-oss-20b:groq
3
+ 01:51:46 [INFO] Launching task: orientation_ward
4
+ 01:51:48 [DEBUG] Starting new HTTP connection (1): localhost:56401
5
+ 01:51:48 [DEBUG] Starting new HTTP connection (1): localhost:56401
6
+ 01:51:49 [DEBUG] Starting new HTTP connection (1): localhost:56401
7
+ 01:51:49 [DEBUG] Starting new HTTP connection (1): localhost:56401
8
+ 01:51:50 [DEBUG] Starting new HTTP connection (1): localhost:56401
9
+ 01:51:50 [DEBUG] Starting new HTTP connection (1): localhost:56401
10
+ 01:51:51 [DEBUG] Starting new HTTP connection (1): localhost:56401
11
+ 01:51:51 [DEBUG] Starting new HTTP connection (1): localhost:56401
12
+ 01:51:51 [DEBUG] http://localhost:56401 "GET /health HTTP/1.1" 200 20
13
+ 01:51:51 [DEBUG] = connection is CONNECTING
14
+ 01:51:51 [DEBUG] > GET /ws HTTP/1.1
15
+ 01:51:51 [DEBUG] > Host: localhost:56401
16
+ 01:51:51 [DEBUG] > Upgrade: websocket
17
+ 01:51:51 [DEBUG] > Connection: Upgrade
18
+ 01:51:51 [DEBUG] > Sec-WebSocket-Key: UPQpJ2hOA/eVbLZYiexy3A==
19
+ 01:51:51 [DEBUG] > Sec-WebSocket-Version: 13
20
+ 01:51:51 [DEBUG] > Sec-WebSocket-Extensions: permessage-deflate; client_max_window_bits
21
+ 01:51:51 [DEBUG] > User-Agent: Python/3.12 websockets/16.0
22
+ 01:51:51 [DEBUG] < HTTP/1.1 101 Switching Protocols
23
+ 01:51:51 [DEBUG] < Upgrade: websocket
24
+ 01:51:51 [DEBUG] < Connection: Upgrade
25
+ 01:51:51 [DEBUG] < Sec-WebSocket-Accept: uozQiiC2n5BYDgzKNQ0WuU0Lrhs=
26
+ 01:51:51 [DEBUG] < Sec-WebSocket-Extensions: permessage-deflate
27
+ 01:51:51 [DEBUG] < date: Thu, 02 Apr 2026 20:21:51 GMT
28
+ 01:51:51 [DEBUG] < server: uvicorn
29
+ 01:51:51 [DEBUG] = connection is OPEN
30
+ 01:51:51 [DEBUG] > TEXT '{"type": "step", "data": {"type": "list_tools"}}' [48 bytes]
31
+ 01:51:51 [DEBUG] < TEXT '{"type":"observation","data":{"observation":{"t...rd":null,"done":false}}' [5881 bytes]
32
+ 01:51:51 [DEBUG] Tool registered: read_inbox (required=[])
33
+ 01:51:51 [DEBUG] Tool registered: query_erp (required=['table'])
34
+ 01:51:51 [DEBUG] Tool registered: query_supplier (required=['supplier_id'])
35
+ 01:51:51 [DEBUG] Tool registered: query_forecast (required=['product_id', 'location_id'])
36
+ 01:51:51 [DEBUG] Tool registered: submit_po (required=['supplier_id', 'product_id', 'destination_id', 'quantity'])
37
+ 01:51:51 [DEBUG] Tool registered: transfer (required=['from_location_id', 'to_location_id', 'product_id', 'quantity'])
38
+ 01:51:51 [DEBUG] Tool registered: quarantine_lot (required=['location_id', 'sku', 'lot_id'])
39
+ 01:51:51 [DEBUG] Tool registered: file_justification (required=['ticket_id', 'reason'])
40
+ 01:51:51 [DEBUG] Tool registered: end_shift (required=[])
41
+ 01:51:51 [INFO] [orientation_ward] 9 tools discovered
42
+ 01:51:51 [DEBUG] > TEXT '{"type": "reset", "data": {}}' [29 bytes]
43
+ 01:51:51 [DEBUG] < TEXT '{"type":"observation","data":{"observation":{"d...ard":0.0,"done":false}}' [1985 bytes]
44
+ 01:51:51 [INFO] [orientation_ward] Episode started. Tools: ['read_inbox', 'query_erp', 'query_supplier', 'query_forecast', 'submit_po', 'transfer', 'quarantine_lot', 'file_justification', 'end_shift']
45
+ 01:51:51 [INFO] [orientation_ward] Step 1/150 — 2 messages in context
46
+ 01:51:51 [DEBUG] Request options: {'method': 'post', 'url': '/chat/completions', 'files': None, 'idempotency_key': 'stainless-python-retry-e3ff487d-994c-4de9-bab8-4d1e8b2b2359', 'content': None, 'json_data': {'messages': [{'role': 'system', 'content': 'You are an experienced hospital supply chain manager operating a legacy ERP system.\nYour goal is to maintain adequate medical supplies across all locations while controlling costs.\n\nCRITICAL — ACTION BUDGET: You have a strictly limited number of actions per shift.\nBudget does NOT roll over. Unspent actions are lost at end_shift().\n\nRecommended budget allocation (highest priority first):\n 1. read_inbox() — ALWAYS do this first to catch urgent alerts\n 2. query_erp(table=\'inventory\') — check current stock levels across all locations\n 3. submit_po(...) — place orders for items below safety stock (PRIORITY)\n 4. end_shift() — call this when budget is exhausted OR tasks are done\n\nQuery tools (query_erp expiry/pipeline, query_forecast, query_supplier) are LOW PRIORITY.\nOnly use them if you have budget remaining AFTER placing critical orders.\n\nMANDATORY RULES:\n- If you receive "Action budget exhausted" → call end_shift() as your VERY NEXT action.\n Do NOT call any other tool. The budget cannot be restored until end_shift() is called.\n- Order early: factor in lead times. If lead time is 2 days, order today to avoid stockout in 2 days.\n- Expedited orders require file_justification(ticket_id=...) with a real clinical reason.\n- FEFO: oldest stock consumed first — check expiry and rotate perishables proactively.\n- Recalls: quarantine the recalled lot immediately, then order a replacement.\n- MCI events: pre-emptive ordering beats reactive ordering. Order extra blood/critical supplies NOW.\n\nSafety stock target: aim for at least (lead_time + 1) × daily_demand units on hand.\n\nWhen calling tools, use the EXACT parameter names shown in the tool descriptions.\n'}, {'role': 'user', 'content': 'Your shift has started. Current dashboard:\n\n╔════════════════════════════════════════════════════════════════════╗\n║ MEDSUPPLY ERP v2.1 — CENTRAL HOSPITAL NETWORK ║\n║ Task: orientation_ward | Shift: Day 1 of 2 ║\n║ Actions remaining: 5/5 ║\n║ Budget used: $0 / $5,000 ║\n╠════════════════════════════════════════════════════════════════════╣\n║ [!] COMMS PAGER: 1 unread message(s) ║\n║ [·] INVDB: No expiry warnings ║\n║ [·] PROCURENET: 0 order(s) in transit ║\n╠════════════════════════════════════════════════════════════════════╣\n║ SUPPLIERS (use exact IDs below): ║\n║ MEDLINE → GLOVE-001, SYR-10, MASK-001 ║\n╚════════════════════════════════════════════════════════════════════╝\nAwaiting input.\nAvailable tools: read_inbox, query_erp, query_supplier, query_forecast, submit_po, transfer, quarantine_lot, file_justification, end_shift'}], 'model': 'openai/gpt-oss-20b:groq', 'max_completion_tokens': 6000, 'temperature': 0.1, 'tool_choice': 'required', 'tools': [{'type': 'function', 'function': {'name': 'read_inbox', 'description': "Read messages from the COMMS PAGER inbox.\n\nArgs:\n filter: Message filter — 'unread' (default), 'all', or 'flagged'\n\nReturns:\n Formatted inbox messages as raw text", 'parameters': {'type': 'object', 'properties': {'filter': {'type': 'string', 'description': ''}}, 'required': []}}}, {'type': 'function', 'function': {'name': 'query_erp', 'description': "Query the legacy ERP database.\n\nArgs:\n table: Table to query — 'inventory', 'expiry', 'pipeline_orders', or 'demand_history'\n location: Location ID or 'all'. E.g. 'ward_general', 'ward_icu', 'hospital_a'\n sku: Product SKU or 'all'. E.g. 'B-001', 'IV-500', 'GLOVE-001'\n\nReturns:\n ASCII table with query results (legacy ERP format)", 'parameters': {'type': 'object', 'properties': {'table': {'type': 'string', 'description': ''}, 'location': {'type': 'string', 'description': ''}, 'sku': {'type': 'string', 'description': ''}}, 'required': ['table']}}}, {'type': 'function', 'function': {'name': 'query_supplier', 'description': 'Query supplier information including current lead times and disruptions.\n\nArgs:\n supplier_id: Supplier identifier. Check the dashboard for valid supplier IDs.\n\nReturns:\n Supplier status text including lead times and any active disruptions', 'parameters': {'type': 'object', 'properties': {'supplier_id': {'type': 'string', 'description': ''}}, 'required': ['supplier_id']}}}, {'type': 'function', 'function': {'name': 'query_forecast', 'description': "Get demand forecast for a product at a location.\n\nArgs:\n product_id: Product SKU to forecast. Use query_erp(table='inventory') to see available SKUs.\n location_id: Location to forecast for. Use query_erp(table='inventory') to see valid location IDs.\n horizon_days: Forecast horizon in days (1-21, default 7)\n\nReturns:\n Forecasted daily demand table", 'parameters': {'type': 'object', 'properties': {'product_id': {'type': 'string', 'description': ''}, 'location_id': {'type': 'string', 'description': ''}, 'horizon_days': {'type': 'integer', 'description': ''}}, 'required': ['product_id', 'location_id']}}}, {'type': 'function', 'function': {'name': 'submit_po', 'description': "Submit a purchase order to a supplier.\n\nArgs:\n supplier_id: Supplier to order from. Check the dashboard for valid supplier IDs.\n product_id: Product SKU to order. Use query_erp(table='inventory') to see available SKUs.\n destination_id: Delivery location. Use query_erp(table='inventory') to see valid location IDs.\n quantity: Number of units to order (must be positive)\n priority: 'standard' (default) or 'expedited' (+50% cost, -2 day lead time; requires justification)\n\nReturns:\n Confirmation with PO ID and ETA, or error if budget/validation fails.\n For expedited orders: returns BUDGET_OVERRIDE_REQUIRED with a ticket ID.\n Use file_justification(ticket_id=...) to proceed.", 'parameters': {'type': 'object', 'properties': {'supplier_id': {'type': 'string', 'description': ''}, 'product_id': {'type': 'string', 'description': ''}, 'destination_id': {'type': 'string', 'description': ''}, 'quantity': {'type': 'integer', 'description': ''}, 'priority': {'type': 'string', 'description': ''}}, 'required': ['supplier_id', 'product_id', 'destination_id', 'quantity']}}}, {'type': 'function', 'function': {'name': 'transfer', 'description': "Transfer inventory between locations (small handling fee).\n\nArgs:\n from_location_id: Source location. Use query_erp(table='inventory') to see valid location IDs.\n to_location_id: Destination location. Use query_erp(table='inventory') to see valid location IDs.\n product_id: Product SKU to transfer. Use query_erp(table='inventory') to see available SKUs.\n quantity: Units to transfer (must not exceed available stock at source)\n\nReturns:\n Confirmation or error (insufficient stock, capacity exceeded)", 'parameters': {'type': 'object', 'properties': {'from_location_id': {'type': 'string', 'description': ''}, 'to_location_id': {'type': 'string', 'description': ''}, 'product_id': {'type': 'string', 'description': ''}, 'quantity': {'type': 'integer', 'description': ''}}, 'required': ['from_location_id', 'to_location_id', 'product_id', 'quantity']}}}, {'type': 'function', 'function': {'name': 'quarantine_lot', 'description': "Quarantine a specific inventory lot (e.g. for product recalls or cold chain breaches).\nQuarantined lots are excluded from demand fulfillment.\n\nArgs:\n location_id: Location where the lot is stored\n sku: Product SKU of the lot\n lot_id: Lot identifier (from inventory query). Use 'all' to quarantine all lots of this SKU at this location.\n\nReturns:\n Confirmation with quarantine details and disposal ticket ID", 'parameters': {'type': 'object', 'properties': {'location_id': {'type': 'string', 'description': ''}, 'sku': {'type': 'string', 'description': ''}, 'lot_id': {'type': 'string', 'description': ''}}, 'required': ['location_id', 'sku', 'lot_id']}}}, {'type': 'function', 'function': {'name': 'file_justification', 'description': "File a budget override justification to proceed with an expedited order.\n\nRequired after submit_po returns BUDGET_OVERRIDE_REQUIRED.\nThe justification is audited by Finance — it must reference the current clinical situation.\nA false or incoherent justification is flagged and results in a scoring penalty.\n\nArgs:\n ticket_id: The ticket ID from the BUDGET_OVERRIDE_REQUIRED error (e.g. 'BOT-0002')\n reason: Free-text justification. Must clearly reference the reason for urgency.\n\nReturns:\n OK confirmation and PO submission details, or FLAGGED audit warning", 'parameters': {'type': 'object', 'properties': {'ticket_id': {'type': 'string', 'description': ''}, 'reason': {'type': 'string', 'description': ''}}, 'required': ['ticket_id', 'reason']}}}, {'type': 'function', 'function': {'name': 'end_shift', 'description': 'End the current shift and advance the simulation by one day.\n\nCommits all pending decisions. Simulates demand, deliveries, and expiry for the day.\nResets your action budget for the next shift.\nUnspent actions are lost — no rollover.\n\nReturns:\n Day summary report + next shift dashboard', 'parameters': {'type': 'object', 'properties': {}, 'required': []}}}]}}
47
+ 01:51:51 [DEBUG] Sending HTTP Request: POST https://router.huggingface.co/v1/chat/completions
48
+ 01:51:51 [DEBUG] connect_tcp.started host='router.huggingface.co' port=443 local_address=None timeout=5.0 socket_options=None
49
+ 01:51:51 [DEBUG] connect_tcp.complete return_value=<httpcore._backends.sync.SyncStream object at 0x723ff2d1e240>
50
+ 01:51:51 [DEBUG] start_tls.started ssl_context=<ssl.SSLContext object at 0x723ff3a175d0> server_hostname='router.huggingface.co' timeout=5.0
51
+ 01:51:51 [DEBUG] start_tls.complete return_value=<httpcore._backends.sync.SyncStream object at 0x723ff2d5e1b0>
52
+ 01:51:51 [DEBUG] send_request_headers.started request=<Request [b'POST']>
53
+ 01:51:51 [DEBUG] send_request_headers.complete
54
+ 01:51:51 [DEBUG] send_request_body.started request=<Request [b'POST']>
55
+ 01:51:51 [DEBUG] send_request_body.complete
56
+ 01:51:51 [DEBUG] receive_response_headers.started request=<Request [b'POST']>
57
+ 01:51:52 [DEBUG] receive_response_headers.complete return_value=(b'HTTP/1.1', 200, b'OK', [(b'Content-Type', b'application/json'), (b'Transfer-Encoding', b'chunked'), (b'Connection', b'keep-alive'), (b'Date', b'Thu, 02 Apr 2026 20:21:52 GMT'), (b'x-ratelimit-reset-requests', b'60ms'), (b'x-ratelimit-reset-tokens', b'243ms'), (b'X-Powered-By', b'huggingface-moon'), (b'x-request-id', b'req_01kn7xr0r6esjbexpypgn91mcn'), (b'cross-origin-opener-policy', b'same-origin'), (b'Referrer-Policy', b'strict-origin-when-cross-origin'), (b'vary', b'Origin'), (b'Access-Control-Allow-Origin', b'*'), (b'Access-Control-Expose-Headers', b'X-Repo-Commit,X-Request-Id,X-Error-Code,X-Error-Message,X-Total-Count,ETag,Link,Accept-Ranges,Content-Range,X-Linked-Size,X-Linked-ETag,X-Xet-Hash'), (b'X-Robots-Tag', b'none'), (b'x-inference-provider', b'groq'), (b'cache-control', b'private, max-age=0, no-store, no-cache, must-revalidate'), (b'cf-cache-status', b'DYNAMIC'), (b'cf-ray', b'9e628ad7db6238b3-IAD'), (b'server', b'cloudflare'), (b'set-cookie', b'__cf_bm=3duIfDUqh7IEx.T59dNY7MCKZ6iACTEm0iEiUVIoY2U-1775161311.9782407-1.0.1.1-X0AsT1C1g8zYMMIH_MXwx4S8ez43Cv5CvQjRikgQxslvIxEKA.l.n68Qvk_S0dezSdAvUZJG5vYLnWbrIvo6IAlffVPELTYMA0aG8qTBxXhp7M94.R7HUdnNY27vxIG2; HttpOnly; Secure; Path=/; Domain=groq.com; Expires=Thu, 02 Apr 2026 20:51:52 GMT'), (b'strict-transport-security', b'max-age=15552000'), (b'x-groq-region', b'yul'), (b'x-ratelimit-limit-requests', b'1440000'), (b'x-ratelimit-limit-tokens', b'750000'), (b'x-ratelimit-remaining-requests', b'1439999'), (b'x-ratelimit-remaining-tokens', b'746961'), (b'X-Cache', b'Miss from cloudfront'), (b'Via', b'1.1 61abf3889106f07b08b458478f6aee64.cloudfront.net (CloudFront)'), (b'X-Amz-Cf-Pop', b'BLR50-P4'), (b'X-Amz-Cf-Id', b'7uub2pGJ0RN4YwVF3Pq0QciaRg4blFRbyBjxqtf4uVWgSrQC_pBGRQ==')])
58
+ 01:51:52 [INFO] HTTP Request: POST https://router.huggingface.co/v1/chat/completions "HTTP/1.1 200 OK"
59
+ 01:51:52 [DEBUG] receive_response_body.started request=<Request [b'POST']>
60
+ 01:51:52 [DEBUG] receive_response_body.complete
61
+ 01:51:52 [DEBUG] response_closed.started
62
+ 01:51:52 [DEBUG] response_closed.complete
63
+ 01:51:52 [DEBUG] HTTP Response: POST https://router.huggingface.co/v1/chat/completions "200 OK" Headers({'content-type': 'application/json', 'transfer-encoding': 'chunked', 'connection': 'keep-alive', 'date': 'Thu, 02 Apr 2026 20:21:52 GMT', 'x-ratelimit-reset-requests': '60ms', 'x-ratelimit-reset-tokens': '243ms', 'x-powered-by': 'huggingface-moon', 'x-request-id': 'req_01kn7xr0r6esjbexpypgn91mcn', 'cross-origin-opener-policy': 'same-origin', 'referrer-policy': 'strict-origin-when-cross-origin', 'vary': 'Origin', 'access-control-allow-origin': '*', 'access-control-expose-headers': 'X-Repo-Commit,X-Request-Id,X-Error-Code,X-Error-Message,X-Total-Count,ETag,Link,Accept-Ranges,Content-Range,X-Linked-Size,X-Linked-ETag,X-Xet-Hash', 'x-robots-tag': 'none', 'x-inference-provider': 'groq', 'cache-control': 'private, max-age=0, no-store, no-cache, must-revalidate', 'cf-cache-status': 'DYNAMIC', 'cf-ray': '9e628ad7db6238b3-IAD', 'server': 'cloudflare', 'set-cookie': '__cf_bm=3duIfDUqh7IEx.T59dNY7MCKZ6iACTEm0iEiUVIoY2U-1775161311.9782407-1.0.1.1-X0AsT1C1g8zYMMIH_MXwx4S8ez43Cv5CvQjRikgQxslvIxEKA.l.n68Qvk_S0dezSdAvUZJG5vYLnWbrIvo6IAlffVPELTYMA0aG8qTBxXhp7M94.R7HUdnNY27vxIG2; HttpOnly; Secure; Path=/; Domain=groq.com; Expires=Thu, 02 Apr 2026 20:51:52 GMT', 'strict-transport-security': 'max-age=15552000', 'x-groq-region': 'yul', 'x-ratelimit-limit-requests': '1440000', 'x-ratelimit-limit-tokens': '750000', 'x-ratelimit-remaining-requests': '1439999', 'x-ratelimit-remaining-tokens': '746961', 'x-cache': 'Miss from cloudfront', 'via': '1.1 61abf3889106f07b08b458478f6aee64.cloudfront.net (CloudFront)', 'x-amz-cf-pop': 'BLR50-P4', 'x-amz-cf-id': '7uub2pGJ0RN4YwVF3Pq0QciaRg4blFRbyBjxqtf4uVWgSrQC_pBGRQ=='})
64
+ 01:51:52 [DEBUG] request_id: req_01kn7xr0r6esjbexpypgn91mcn
65
+ 01:51:52 [DEBUG] [orientation_ward] Step 1 — finish_reason=tool_calls tool_calls=1
66
+ 01:51:52 [DEBUG] [orientation_ward] Step 1 — calling read_inbox({'filter': 'unread'})
67
+ 01:51:52 [DEBUG] > TEXT '{"type": "step", "data": {"type": "call_tool", ... {"filter": "unread"}}}' [109 bytes]
68
+ 01:51:52 [DEBUG] < TEXT '{"type":"observation","data":{"observation":{"t...rd":0.01,"done":false}}' [530 bytes]
69
+ 01:51:54 [INFO] [orientation_ward] Step 2/150 — 4 messages in context
70
+ 01:51:54 [DEBUG] Request options: {'method': 'post', 'url': '/chat/completions', 'files': None, 'idempotency_key': 'stainless-python-retry-c34f7880-6ee7-4fd1-8588-a7abbd81c6db', 'content': None, 'json_data': {'messages': [{'role': 'system', 'content': 'You are an experienced hospital supply chain manager operating a legacy ERP system.\nYour goal is to maintain adequate medical supplies across all locations while controlling costs.\n\nCRITICAL — ACTION BUDGET: You have a strictly limited number of actions per shift.\nBudget does NOT roll over. Unspent actions are lost at end_shift().\n\nRecommended budget allocation (highest priority first):\n 1. read_inbox() — ALWAYS do this first to catch urgent alerts\n 2. query_erp(table=\'inventory\') — check current stock levels across all locations\n 3. submit_po(...) — place orders for items below safety stock (PRIORITY)\n 4. end_shift() — call this when budget is exhausted OR tasks are done\n\nQuery tools (query_erp expiry/pipeline, query_forecast, query_supplier) are LOW PRIORITY.\nOnly use them if you have budget remaining AFTER placing critical orders.\n\nMANDATORY RULES:\n- If you receive "Action budget exhausted" → call end_shift() as your VERY NEXT action.\n Do NOT call any other tool. The budget cannot be restored until end_shift() is called.\n- Order early: factor in lead times. If lead time is 2 days, order today to avoid stockout in 2 days.\n- Expedited orders require file_justification(ticket_id=...) with a real clinical reason.\n- FEFO: oldest stock consumed first — check expiry and rotate perishables proactively.\n- Recalls: quarantine the recalled lot immediately, then order a replacement.\n- MCI events: pre-emptive ordering beats reactive ordering. Order extra blood/critical supplies NOW.\n\nSafety stock target: aim for at least (lead_time + 1) × daily_demand units on hand.\n\nWhen calling tools, use the EXACT parameter names shown in the tool descriptions.\n'}, {'role': 'user', 'content': 'Your shift has started. Current dashboard:\n\n╔════════════════════════════════════════════════════════════════════╗\n║ MEDSUPPLY ERP v2.1 — CENTRAL HOSPITAL NETWORK ║\n║ Task: orientation_ward | Shift: Day 1 of 2 ║\n║ Actions remaining: 5/5 ║\n║ Budget used: $0 / $5,000 ║\n╠════════════════════════════════════════════════════════════════════╣\n║ [!] COMMS PAGER: 1 unread message(s) ║\n║ [·] INVDB: No expiry warnings ║\n║ [·] PROCURENET: 0 order(s) in transit ║\n╠════════════════════════════════════════════════════════════════════╣\n║ SUPPLIERS (use exact IDs below): ║\n║ MEDLINE → GLOVE-001, SYR-10, MASK-001 ║\n╚════════════════════════════════════════════════════════════════════╝\nAwaiting input.\nAvailable tools: read_inbox, query_erp, query_supplier, query_forecast, submit_po, transfer, quarantine_lot, file_justification, end_shift'}, {'role': 'assistant', 'content': None, 'tool_calls': [{'id': 'fc_edbe312b-e6c1-4594-9789-a9b65675543d', 'type': 'function', 'function': {'name': 'read_inbox', 'arguments': '{"filter": "unread"}'}}]}, {'role': 'tool', 'tool_call_id': 'fc_edbe312b-e6c1-4594-9789-a9b65675543d', 'content': '\n[MSG MSG-0001 | READ | PRIORITY: LOW | Day 1 08:00]\nFROM: System\nSUBJ: Shift Handover Notes\n\nWelcome to the orientation_ward scenario.\nYou are managing medical supplies for 2 days.\nAction budget: 5 actions per shift.\nBudget ceiling: $5,000 outstanding orders.\n\nUse read_inbox to check messages, query_erp to check stock,\nsubmit_po to order supplies, and end_shift to advance the day.\n'}], 'model': 'openai/gpt-oss-20b:groq', 'max_completion_tokens': 6000, 'temperature': 0.1, 'tool_choice': 'required', 'tools': [{'type': 'function', 'function': {'name': 'read_inbox', 'description': "Read messages from the COMMS PAGER inbox.\n\nArgs:\n filter: Message filter — 'unread' (default), 'all', or 'flagged'\n\nReturns:\n Formatted inbox messages as raw text", 'parameters': {'type': 'object', 'properties': {'filter': {'type': 'string', 'description': ''}}, 'required': []}}}, {'type': 'function', 'function': {'name': 'query_erp', 'description': "Query the legacy ERP database.\n\nArgs:\n table: Table to query — 'inventory', 'expiry', 'pipeline_orders', or 'demand_history'\n location: Location ID or 'all'. E.g. 'ward_general', 'ward_icu', 'hospital_a'\n sku: Product SKU or 'all'. E.g. 'B-001', 'IV-500', 'GLOVE-001'\n\nReturns:\n ASCII table with query results (legacy ERP format)", 'parameters': {'type': 'object', 'properties': {'table': {'type': 'string', 'description': ''}, 'location': {'type': 'string', 'description': ''}, 'sku': {'type': 'string', 'description': ''}}, 'required': ['table']}}}, {'type': 'function', 'function': {'name': 'query_supplier', 'description': 'Query supplier information including current lead times and disruptions.\n\nArgs:\n supplier_id: Supplier identifier. Check the dashboard for valid supplier IDs.\n\nReturns:\n Supplier status text including lead times and any active disruptions', 'parameters': {'type': 'object', 'properties': {'supplier_id': {'type': 'string', 'description': ''}}, 'required': ['supplier_id']}}}, {'type': 'function', 'function': {'name': 'query_forecast', 'description': "Get demand forecast for a product at a location.\n\nArgs:\n product_id: Product SKU to forecast. Use query_erp(table='inventory') to see available SKUs.\n location_id: Location to forecast for. Use query_erp(table='inventory') to see valid location IDs.\n horizon_days: Forecast horizon in days (1-21, default 7)\n\nReturns:\n Forecasted daily demand table", 'parameters': {'type': 'object', 'properties': {'product_id': {'type': 'string', 'description': ''}, 'location_id': {'type': 'string', 'description': ''}, 'horizon_days': {'type': 'integer', 'description': ''}}, 'required': ['product_id', 'location_id']}}}, {'type': 'function', 'function': {'name': 'submit_po', 'description': "Submit a purchase order to a supplier.\n\nArgs:\n supplier_id: Supplier to order from. Check the dashboard for valid supplier IDs.\n product_id: Product SKU to order. Use query_erp(table='inventory') to see available SKUs.\n destination_id: Delivery location. Use query_erp(table='inventory') to see valid location IDs.\n quantity: Number of units to order (must be positive)\n priority: 'standard' (default) or 'expedited' (+50% cost, -2 day lead time; requires justification)\n\nReturns:\n Confirmation with PO ID and ETA, or error if budget/validation fails.\n For expedited orders: returns BUDGET_OVERRIDE_REQUIRED with a ticket ID.\n Use file_justification(ticket_id=...) to proceed.", 'parameters': {'type': 'object', 'properties': {'supplier_id': {'type': 'string', 'description': ''}, 'product_id': {'type': 'string', 'description': ''}, 'destination_id': {'type': 'string', 'description': ''}, 'quantity': {'type': 'integer', 'description': ''}, 'priority': {'type': 'string', 'description': ''}}, 'required': ['supplier_id', 'product_id', 'destination_id', 'quantity']}}}, {'type': 'function', 'function': {'name': 'transfer', 'description': "Transfer inventory between locations (small handling fee).\n\nArgs:\n from_location_id: Source location. Use query_erp(table='inventory') to see valid location IDs.\n to_location_id: Destination location. Use query_erp(table='inventory') to see valid location IDs.\n product_id: Product SKU to transfer. Use query_erp(table='inventory') to see available SKUs.\n quantity: Units to transfer (must not exceed available stock at source)\n\nReturns:\n Confirmation or error (insufficient stock, capacity exceeded)", 'parameters': {'type': 'object', 'properties': {'from_location_id': {'type': 'string', 'description': ''}, 'to_location_id': {'type': 'string', 'description': ''}, 'product_id': {'type': 'string', 'description': ''}, 'quantity': {'type': 'integer', 'description': ''}}, 'required': ['from_location_id', 'to_location_id', 'product_id', 'quantity']}}}, {'type': 'function', 'function': {'name': 'quarantine_lot', 'description': "Quarantine a specific inventory lot (e.g. for product recalls or cold chain breaches).\nQuarantined lots are excluded from demand fulfillment.\n\nArgs:\n location_id: Location where the lot is stored\n sku: Product SKU of the lot\n lot_id: Lot identifier (from inventory query). Use 'all' to quarantine all lots of this SKU at this location.\n\nReturns:\n Confirmation with quarantine details and disposal ticket ID", 'parameters': {'type': 'object', 'properties': {'location_id': {'type': 'string', 'description': ''}, 'sku': {'type': 'string', 'description': ''}, 'lot_id': {'type': 'string', 'description': ''}}, 'required': ['location_id', 'sku', 'lot_id']}}}, {'type': 'function', 'function': {'name': 'file_justification', 'description': "File a budget override justification to proceed with an expedited order.\n\nRequired after submit_po returns BUDGET_OVERRIDE_REQUIRED.\nThe justification is audited by Finance — it must reference the current clinical situation.\nA false or incoherent justification is flagged and results in a scoring penalty.\n\nArgs:\n ticket_id: The ticket ID from the BUDGET_OVERRIDE_REQUIRED error (e.g. 'BOT-0002')\n reason: Free-text justification. Must clearly reference the reason for urgency.\n\nReturns:\n OK confirmation and PO submission details, or FLAGGED audit warning", 'parameters': {'type': 'object', 'properties': {'ticket_id': {'type': 'string', 'description': ''}, 'reason': {'type': 'string', 'description': ''}}, 'required': ['ticket_id', 'reason']}}}, {'type': 'function', 'function': {'name': 'end_shift', 'description': 'End the current shift and advance the simulation by one day.\n\nCommits all pending decisions. Simulates demand, deliveries, and expiry for the day.\nResets your action budget for the next shift.\nUnspent actions are lost — no rollover.\n\nReturns:\n Day summary report + next shift dashboard', 'parameters': {'type': 'object', 'properties': {}, 'required': []}}}]}}
71
+ 01:51:54 [DEBUG] Sending HTTP Request: POST https://router.huggingface.co/v1/chat/completions
72
+ 01:51:54 [DEBUG] send_request_headers.started request=<Request [b'POST']>
73
+ 01:51:54 [DEBUG] send_request_headers.complete
74
+ 01:51:54 [DEBUG] send_request_body.started request=<Request [b'POST']>
75
+ 01:51:54 [DEBUG] send_request_body.complete
76
+ 01:51:54 [DEBUG] receive_response_headers.started request=<Request [b'POST']>
77
+ 01:51:55 [DEBUG] receive_response_headers.complete return_value=(b'HTTP/1.1', 200, b'OK', [(b'Content-Type', b'application/json'), (b'Transfer-Encoding', b'chunked'), (b'Connection', b'keep-alive'), (b'Date', b'Thu, 02 Apr 2026 20:21:55 GMT'), (b'x-ratelimit-reset-requests', b'60ms'), (b'x-ratelimit-reset-tokens', b'244ms'), (b'X-Powered-By', b'huggingface-moon'), (b'x-request-id', b'req_01kn7xr3eze10saazav3qq3avr'), (b'cross-origin-opener-policy', b'same-origin'), (b'Referrer-Policy', b'strict-origin-when-cross-origin'), (b'vary', b'Origin'), (b'Access-Control-Allow-Origin', b'*'), (b'Access-Control-Expose-Headers', b'X-Repo-Commit,X-Request-Id,X-Error-Code,X-Error-Message,X-Total-Count,ETag,Link,Accept-Ranges,Content-Range,X-Linked-Size,X-Linked-ETag,X-Xet-Hash'), (b'X-Robots-Tag', b'none'), (b'x-inference-provider', b'groq'), (b'cache-control', b'private, max-age=0, no-store, no-cache, must-revalidate'), (b'cf-cache-status', b'DYNAMIC'), (b'cf-ray', b'9e628ae939d838b3-IAD'), (b'server', b'cloudflare'), (b'set-cookie', b'__cf_bm=dBrvfDghu6i3ulHi3vL4dcahvKG84d.NJza2A7k.naI-1775161314.7556033-1.0.1.1-UPaGIuRInFzY_8EW8sUq7zBvVsbkkqOvo3mEBMwIh1VHd1MUMM.mZaZIM8rYH_sDKwHI5XzOqeMmvxjIIvF4iCClMg3tRiVcAWUIFPecnhLf7plP.bbH0kheaUYHsL7z; HttpOnly; Secure; Path=/; Domain=groq.com; Expires=Thu, 02 Apr 2026 20:51:55 GMT'), (b'strict-transport-security', b'max-age=15552000'), (b'x-groq-region', b'yul'), (b'x-ratelimit-limit-requests', b'1440000'), (b'x-ratelimit-limit-tokens', b'750000'), (b'x-ratelimit-remaining-requests', b'1439999'), (b'x-ratelimit-remaining-tokens', b'746947'), (b'X-Cache', b'Miss from cloudfront'), (b'Via', b'1.1 61abf3889106f07b08b458478f6aee64.cloudfront.net (CloudFront)'), (b'X-Amz-Cf-Pop', b'BLR50-P4'), (b'X-Amz-Cf-Id', b'G8Q1q3n2mQUXYa12dQp0QZxz7lhl-M_7TOJJZ16T0sy44b8Mpz3ERw==')])
78
+ 01:51:55 [INFO] HTTP Request: POST https://router.huggingface.co/v1/chat/completions "HTTP/1.1 200 OK"
79
+ 01:51:55 [DEBUG] receive_response_body.started request=<Request [b'POST']>
80
+ 01:51:55 [DEBUG] receive_response_body.complete
81
+ 01:51:55 [DEBUG] response_closed.started
82
+ 01:51:55 [DEBUG] response_closed.complete
83
+ 01:51:55 [DEBUG] HTTP Response: POST https://router.huggingface.co/v1/chat/completions "200 OK" Headers({'content-type': 'application/json', 'transfer-encoding': 'chunked', 'connection': 'keep-alive', 'date': 'Thu, 02 Apr 2026 20:21:55 GMT', 'x-ratelimit-reset-requests': '60ms', 'x-ratelimit-reset-tokens': '244ms', 'x-powered-by': 'huggingface-moon', 'x-request-id': 'req_01kn7xr3eze10saazav3qq3avr', 'cross-origin-opener-policy': 'same-origin', 'referrer-policy': 'strict-origin-when-cross-origin', 'vary': 'Origin', 'access-control-allow-origin': '*', 'access-control-expose-headers': 'X-Repo-Commit,X-Request-Id,X-Error-Code,X-Error-Message,X-Total-Count,ETag,Link,Accept-Ranges,Content-Range,X-Linked-Size,X-Linked-ETag,X-Xet-Hash', 'x-robots-tag': 'none', 'x-inference-provider': 'groq', 'cache-control': 'private, max-age=0, no-store, no-cache, must-revalidate', 'cf-cache-status': 'DYNAMIC', 'cf-ray': '9e628ae939d838b3-IAD', 'server': 'cloudflare', 'set-cookie': '__cf_bm=dBrvfDghu6i3ulHi3vL4dcahvKG84d.NJza2A7k.naI-1775161314.7556033-1.0.1.1-UPaGIuRInFzY_8EW8sUq7zBvVsbkkqOvo3mEBMwIh1VHd1MUMM.mZaZIM8rYH_sDKwHI5XzOqeMmvxjIIvF4iCClMg3tRiVcAWUIFPecnhLf7plP.bbH0kheaUYHsL7z; HttpOnly; Secure; Path=/; Domain=groq.com; Expires=Thu, 02 Apr 2026 20:51:55 GMT', 'strict-transport-security': 'max-age=15552000', 'x-groq-region': 'yul', 'x-ratelimit-limit-requests': '1440000', 'x-ratelimit-limit-tokens': '750000', 'x-ratelimit-remaining-requests': '1439999', 'x-ratelimit-remaining-tokens': '746947', 'x-cache': 'Miss from cloudfront', 'via': '1.1 61abf3889106f07b08b458478f6aee64.cloudfront.net (CloudFront)', 'x-amz-cf-pop': 'BLR50-P4', 'x-amz-cf-id': 'G8Q1q3n2mQUXYa12dQp0QZxz7lhl-M_7TOJJZ16T0sy44b8Mpz3ERw=='})
84
+ 01:51:55 [DEBUG] request_id: req_01kn7xr3eze10saazav3qq3avr
85
+ 01:51:55 [DEBUG] [orientation_ward] Step 2 — finish_reason=tool_calls tool_calls=1
86
+ 01:51:55 [DEBUG] [orientation_ward] Step 2 — calling query_erp({'location': 'all', 'sku': 'all', 'table': 'inventory'})
87
+ 01:51:55 [DEBUG] > TEXT '{"type": "step", "data": {"type": "call_tool", ..."table": "inventory"}}}' [143 bytes]
88
+ 01:51:55 [DEBUG] < TEXT '{"type":"observation","data":{"observation":{"t...rd":0.01,"done":false}}' [841 bytes]
89
+ 01:51:57 [INFO] [orientation_ward] Step 3/150 — 6 messages in context
90
+ 01:51:57 [DEBUG] Request options: {'method': 'post', 'url': '/chat/completions', 'files': None, 'idempotency_key': 'stainless-python-retry-38cd9502-480f-4ac6-a7d5-67b52541d9bb', 'content': None, 'json_data': {'messages': [{'role': 'system', 'content': 'You are an experienced hospital supply chain manager operating a legacy ERP system.\nYour goal is to maintain adequate medical supplies across all locations while controlling costs.\n\nCRITICAL — ACTION BUDGET: You have a strictly limited number of actions per shift.\nBudget does NOT roll over. Unspent actions are lost at end_shift().\n\nRecommended budget allocation (highest priority first):\n 1. read_inbox() — ALWAYS do this first to catch urgent alerts\n 2. query_erp(table=\'inventory\') — check current stock levels across all locations\n 3. submit_po(...) — place orders for items below safety stock (PRIORITY)\n 4. end_shift() — call this when budget is exhausted OR tasks are done\n\nQuery tools (query_erp expiry/pipeline, query_forecast, query_supplier) are LOW PRIORITY.\nOnly use them if you have budget remaining AFTER placing critical orders.\n\nMANDATORY RULES:\n- If you receive "Action budget exhausted" → call end_shift() as your VERY NEXT action.\n Do NOT call any other tool. The budget cannot be restored until end_shift() is called.\n- Order early: factor in lead times. If lead time is 2 days, order today to avoid stockout in 2 days.\n- Expedited orders require file_justification(ticket_id=...) with a real clinical reason.\n- FEFO: oldest stock consumed first — check expiry and rotate perishables proactively.\n- Recalls: quarantine the recalled lot immediately, then order a replacement.\n- MCI events: pre-emptive ordering beats reactive ordering. Order extra blood/critical supplies NOW.\n\nSafety stock target: aim for at least (lead_time + 1) × daily_demand units on hand.\n\nWhen calling tools, use the EXACT parameter names shown in the tool descriptions.\n'}, {'role': 'user', 'content': 'Your shift has started. Current dashboard:\n\n╔════════════════════════════════════════════════════════════════════╗\n║ MEDSUPPLY ERP v2.1 — CENTRAL HOSPITAL NETWORK ║\n║ Task: orientation_ward | Shift: Day 1 of 2 ║\n║ Actions remaining: 5/5 ║\n║ Budget used: $0 / $5,000 ║\n╠════════════════════════════════════════════════════════════════════╣\n║ [!] COMMS PAGER: 1 unread message(s) ║\n║ [·] INVDB: No expiry warnings ║\n║ [·] PROCURENET: 0 order(s) in transit ║\n╠════════════════════════════════════════════════════════════════════╣\n║ SUPPLIERS (use exact IDs below): ║\n║ MEDLINE → GLOVE-001, SYR-10, MASK-001 ║\n╚════════════════════════════════════════════════════════════════════╝\nAwaiting input.\nAvailable tools: read_inbox, query_erp, query_supplier, query_forecast, submit_po, transfer, quarantine_lot, file_justification, end_shift'}, {'role': 'assistant', 'content': None, 'tool_calls': [{'id': 'fc_edbe312b-e6c1-4594-9789-a9b65675543d', 'type': 'function', 'function': {'name': 'read_inbox', 'arguments': '{"filter": "unread"}'}}]}, {'role': 'tool', 'tool_call_id': 'fc_edbe312b-e6c1-4594-9789-a9b65675543d', 'content': '\n[MSG MSG-0001 | READ | PRIORITY: LOW | Day 1 08:00]\nFROM: System\nSUBJ: Shift Handover Notes\n\nWelcome to the orientation_ward scenario.\nYou are managing medical supplies for 2 days.\nAction budget: 5 actions per shift.\nBudget ceiling: $5,000 outstanding orders.\n\nUse read_inbox to check messages, query_erp to check stock,\nsubmit_po to order supplies, and end_shift to advance the day.\n'}, {'role': 'assistant', 'content': None, 'tool_calls': [{'id': 'fc_d42e8acc-f5ce-43e5-98b1-0b7628ad4eea', 'type': 'function', 'function': {'name': 'query_erp', 'arguments': '{"location": "all", "sku": "all", "table": "inventory"}'}}]}, {'role': 'tool', 'tool_call_id': 'fc_d42e8acc-f5ce-43e5-98b1-0b7628ad4eea', 'content': 'SYSTEM QUERY RESULT [TABLE: INVENTORY] [LOC: ALL] [SKU: ALL]\n[TIMESTAMP: Day 1]\n----------------------------------------------------------------------\nLOT_ID | DESC | QTY | EXP_DAY | DAYS_LEFT | STATUS\n----------------------------------------------------------------------\nINIT-GLOVE-001-ward_general | Surgical Gloves (box) | 20 | N/A | N/A | NON-PERISH\nINIT-MASK-001-ward_general | Surgical Masks | 25 | N/A | N/A | NON-PERISH\nINIT-SYR-10-ward_general | Syringes 10ml | 30 | N/A | N/A | NON-PERISH\n----------------------------------------------------------------------\nQUERY OK | 3 row(s) returned'}], 'model': 'openai/gpt-oss-20b:groq', 'max_completion_tokens': 6000, 'temperature': 0.1, 'tool_choice': 'required', 'tools': [{'type': 'function', 'function': {'name': 'read_inbox', 'description': "Read messages from the COMMS PAGER inbox.\n\nArgs:\n filter: Message filter — 'unread' (default), 'all', or 'flagged'\n\nReturns:\n Formatted inbox messages as raw text", 'parameters': {'type': 'object', 'properties': {'filter': {'type': 'string', 'description': ''}}, 'required': []}}}, {'type': 'function', 'function': {'name': 'query_erp', 'description': "Query the legacy ERP database.\n\nArgs:\n table: Table to query — 'inventory', 'expiry', 'pipeline_orders', or 'demand_history'\n location: Location ID or 'all'. E.g. 'ward_general', 'ward_icu', 'hospital_a'\n sku: Product SKU or 'all'. E.g. 'B-001', 'IV-500', 'GLOVE-001'\n\nReturns:\n ASCII table with query results (legacy ERP format)", 'parameters': {'type': 'object', 'properties': {'table': {'type': 'string', 'description': ''}, 'location': {'type': 'string', 'description': ''}, 'sku': {'type': 'string', 'description': ''}}, 'required': ['table']}}}, {'type': 'function', 'function': {'name': 'query_supplier', 'description': 'Query supplier information including current lead times and disruptions.\n\nArgs:\n supplier_id: Supplier identifier. Check the dashboard for valid supplier IDs.\n\nReturns:\n Supplier status text including lead times and any active disruptions', 'parameters': {'type': 'object', 'properties': {'supplier_id': {'type': 'string', 'description': ''}}, 'required': ['supplier_id']}}}, {'type': 'function', 'function': {'name': 'query_forecast', 'description': "Get demand forecast for a product at a location.\n\nArgs:\n product_id: Product SKU to forecast. Use query_erp(table='inventory') to see available SKUs.\n location_id: Location to forecast for. Use query_erp(table='inventory') to see valid location IDs.\n horizon_days: Forecast horizon in days (1-21, default 7)\n\nReturns:\n Forecasted daily demand table", 'parameters': {'type': 'object', 'properties': {'product_id': {'type': 'string', 'description': ''}, 'location_id': {'type': 'string', 'description': ''}, 'horizon_days': {'type': 'integer', 'description': ''}}, 'required': ['product_id', 'location_id']}}}, {'type': 'function', 'function': {'name': 'submit_po', 'description': "Submit a purchase order to a supplier.\n\nArgs:\n supplier_id: Supplier to order from. Check the dashboard for valid supplier IDs.\n product_id: Product SKU to order. Use query_erp(table='inventory') to see available SKUs.\n destination_id: Delivery location. Use query_erp(table='inventory') to see valid location IDs.\n quantity: Number of units to order (must be positive)\n priority: 'standard' (default) or 'expedited' (+50% cost, -2 day lead time; requires justification)\n\nReturns:\n Confirmation with PO ID and ETA, or error if budget/validation fails.\n For expedited orders: returns BUDGET_OVERRIDE_REQUIRED with a ticket ID.\n Use file_justification(ticket_id=...) to proceed.", 'parameters': {'type': 'object', 'properties': {'supplier_id': {'type': 'string', 'description': ''}, 'product_id': {'type': 'string', 'description': ''}, 'destination_id': {'type': 'string', 'description': ''}, 'quantity': {'type': 'integer', 'description': ''}, 'priority': {'type': 'string', 'description': ''}}, 'required': ['supplier_id', 'product_id', 'destination_id', 'quantity']}}}, {'type': 'function', 'function': {'name': 'transfer', 'description': "Transfer inventory between locations (small handling fee).\n\nArgs:\n from_location_id: Source location. Use query_erp(table='inventory') to see valid location IDs.\n to_location_id: Destination location. Use query_erp(table='inventory') to see valid location IDs.\n product_id: Product SKU to transfer. Use query_erp(table='inventory') to see available SKUs.\n quantity: Units to transfer (must not exceed available stock at source)\n\nReturns:\n Confirmation or error (insufficient stock, capacity exceeded)", 'parameters': {'type': 'object', 'properties': {'from_location_id': {'type': 'string', 'description': ''}, 'to_location_id': {'type': 'string', 'description': ''}, 'product_id': {'type': 'string', 'description': ''}, 'quantity': {'type': 'integer', 'description': ''}}, 'required': ['from_location_id', 'to_location_id', 'product_id', 'quantity']}}}, {'type': 'function', 'function': {'name': 'quarantine_lot', 'description': "Quarantine a specific inventory lot (e.g. for product recalls or cold chain breaches).\nQuarantined lots are excluded from demand fulfillment.\n\nArgs:\n location_id: Location where the lot is stored\n sku: Product SKU of the lot\n lot_id: Lot identifier (from inventory query). Use 'all' to quarantine all lots of this SKU at this location.\n\nReturns:\n Confirmation with quarantine details and disposal ticket ID", 'parameters': {'type': 'object', 'properties': {'location_id': {'type': 'string', 'description': ''}, 'sku': {'type': 'string', 'description': ''}, 'lot_id': {'type': 'string', 'description': ''}}, 'required': ['location_id', 'sku', 'lot_id']}}}, {'type': 'function', 'function': {'name': 'file_justification', 'description': "File a budget override justification to proceed with an expedited order.\n\nRequired after submit_po returns BUDGET_OVERRIDE_REQUIRED.\nThe justification is audited by Finance — it must reference the current clinical situation.\nA false or incoherent justification is flagged and results in a scoring penalty.\n\nArgs:\n ticket_id: The ticket ID from the BUDGET_OVERRIDE_REQUIRED error (e.g. 'BOT-0002')\n reason: Free-text justification. Must clearly reference the reason for urgency.\n\nReturns:\n OK confirmation and PO submission details, or FLAGGED audit warning", 'parameters': {'type': 'object', 'properties': {'ticket_id': {'type': 'string', 'description': ''}, 'reason': {'type': 'string', 'description': ''}}, 'required': ['ticket_id', 'reason']}}}, {'type': 'function', 'function': {'name': 'end_shift', 'description': 'End the current shift and advance the simulation by one day.\n\nCommits all pending decisions. Simulates demand, deliveries, and expiry for the day.\nResets your action budget for the next shift.\nUnspent actions are lost — no rollover.\n\nReturns:\n Day summary report + next shift dashboard', 'parameters': {'type': 'object', 'properties': {}, 'required': []}}}]}}
91
+ 01:51:57 [DEBUG] Sending HTTP Request: POST https://router.huggingface.co/v1/chat/completions
92
+ 01:51:57 [DEBUG] send_request_headers.started request=<Request [b'POST']>
93
+ 01:51:57 [DEBUG] send_request_headers.complete
94
+ 01:51:57 [DEBUG] send_request_body.started request=<Request [b'POST']>
95
+ 01:51:57 [DEBUG] send_request_body.complete
96
+ 01:51:57 [DEBUG] receive_response_headers.started request=<Request [b'POST']>
97
+ 01:51:57 [DEBUG] receive_response_headers.complete return_value=(b'HTTP/1.1', 200, b'OK', [(b'Content-Type', b'application/json'), (b'Transfer-Encoding', b'chunked'), (b'Connection', b'keep-alive'), (b'Date', b'Thu, 02 Apr 2026 20:21:57 GMT'), (b'x-ratelimit-reset-requests', b'60ms'), (b'x-ratelimit-reset-tokens', b'264ms'), (b'X-Powered-By', b'huggingface-moon'), (b'x-request-id', b'req_01kn7xr62yewy8ecn3ny8pg0fn'), (b'cross-origin-opener-policy', b'same-origin'), (b'Referrer-Policy', b'strict-origin-when-cross-origin'), (b'vary', b'Origin'), (b'Access-Control-Allow-Origin', b'*'), (b'Access-Control-Expose-Headers', b'X-Repo-Commit,X-Request-Id,X-Error-Code,X-Error-Message,X-Total-Count,ETag,Link,Accept-Ranges,Content-Range,X-Linked-Size,X-Linked-ETag,X-Xet-Hash'), (b'X-Robots-Tag', b'none'), (b'x-inference-provider', b'groq'), (b'cache-control', b'private, max-age=0, no-store, no-cache, must-revalidate'), (b'cf-cache-status', b'DYNAMIC'), (b'cf-ray', b'9e628af9fba039af-IAD'), (b'server', b'cloudflare'), (b'set-cookie', b'__cf_bm=0jcgEATn_f0m4unTeLno4ew9Qwn8j3dMp.nuSh_qJK4-1775161317.4404798-1.0.1.1-2acGuYN_YTY9DrrprC_g1GQgUQ8I1BpCaSQMTac_xy.GvlOEZEYhnvLVKIFHInJivrQHLwVUl1S9nnrghqwUuYEHc73.IIwYyPJVW0AYxMo4Rmlsr6MeQnGsJUA7J4pc; HttpOnly; Secure; Path=/; Domain=groq.com; Expires=Thu, 02 Apr 2026 20:51:57 GMT'), (b'strict-transport-security', b'max-age=15552000'), (b'x-groq-region', b'yul'), (b'x-ratelimit-limit-requests', b'1440000'), (b'x-ratelimit-limit-tokens', b'750000'), (b'x-ratelimit-remaining-requests', b'1439999'), (b'x-ratelimit-remaining-tokens', b'746698'), (b'X-Cache', b'Miss from cloudfront'), (b'Via', b'1.1 61abf3889106f07b08b458478f6aee64.cloudfront.net (CloudFront)'), (b'X-Amz-Cf-Pop', b'BLR50-P4'), (b'X-Amz-Cf-Id', b'VMt-_rDlzqV5Ubh18n1mtpcxqIoAemFVDIvVFlwhokqfcAKhbfu5oA==')])
98
+ 01:51:57 [INFO] HTTP Request: POST https://router.huggingface.co/v1/chat/completions "HTTP/1.1 200 OK"
99
+ 01:51:57 [DEBUG] receive_response_body.started request=<Request [b'POST']>
100
+ 01:51:57 [DEBUG] receive_response_body.complete
101
+ 01:51:57 [DEBUG] response_closed.started
102
+ 01:51:57 [DEBUG] response_closed.complete
103
+ 01:51:57 [DEBUG] HTTP Response: POST https://router.huggingface.co/v1/chat/completions "200 OK" Headers({'content-type': 'application/json', 'transfer-encoding': 'chunked', 'connection': 'keep-alive', 'date': 'Thu, 02 Apr 2026 20:21:57 GMT', 'x-ratelimit-reset-requests': '60ms', 'x-ratelimit-reset-tokens': '264ms', 'x-powered-by': 'huggingface-moon', 'x-request-id': 'req_01kn7xr62yewy8ecn3ny8pg0fn', 'cross-origin-opener-policy': 'same-origin', 'referrer-policy': 'strict-origin-when-cross-origin', 'vary': 'Origin', 'access-control-allow-origin': '*', 'access-control-expose-headers': 'X-Repo-Commit,X-Request-Id,X-Error-Code,X-Error-Message,X-Total-Count,ETag,Link,Accept-Ranges,Content-Range,X-Linked-Size,X-Linked-ETag,X-Xet-Hash', 'x-robots-tag': 'none', 'x-inference-provider': 'groq', 'cache-control': 'private, max-age=0, no-store, no-cache, must-revalidate', 'cf-cache-status': 'DYNAMIC', 'cf-ray': '9e628af9fba039af-IAD', 'server': 'cloudflare', 'set-cookie': '__cf_bm=0jcgEATn_f0m4unTeLno4ew9Qwn8j3dMp.nuSh_qJK4-1775161317.4404798-1.0.1.1-2acGuYN_YTY9DrrprC_g1GQgUQ8I1BpCaSQMTac_xy.GvlOEZEYhnvLVKIFHInJivrQHLwVUl1S9nnrghqwUuYEHc73.IIwYyPJVW0AYxMo4Rmlsr6MeQnGsJUA7J4pc; HttpOnly; Secure; Path=/; Domain=groq.com; Expires=Thu, 02 Apr 2026 20:51:57 GMT', 'strict-transport-security': 'max-age=15552000', 'x-groq-region': 'yul', 'x-ratelimit-limit-requests': '1440000', 'x-ratelimit-limit-tokens': '750000', 'x-ratelimit-remaining-requests': '1439999', 'x-ratelimit-remaining-tokens': '746698', 'x-cache': 'Miss from cloudfront', 'via': '1.1 61abf3889106f07b08b458478f6aee64.cloudfront.net (CloudFront)', 'x-amz-cf-pop': 'BLR50-P4', 'x-amz-cf-id': 'VMt-_rDlzqV5Ubh18n1mtpcxqIoAemFVDIvVFlwhokqfcAKhbfu5oA=='})
104
+ 01:51:57 [DEBUG] request_id: req_01kn7xr62yewy8ecn3ny8pg0fn
105
+ 01:51:57 [DEBUG] [orientation_ward] Step 3 — finish_reason=tool_calls tool_calls=1
106
+ 01:51:57 [DEBUG] [orientation_ward] Step 3 — calling query_supplier({'supplier_id': 'MEDLINE'})
107
+ 01:51:57 [DEBUG] > TEXT '{"type": "step", "data": {"type": "call_tool", ...plier_id": "MEDLINE"}}}' [119 bytes]
108
+ 01:51:57 [DEBUG] < TEXT '{"type":"observation","data":{"observation":{"t...ard":0.0,"done":false}}' [533 bytes]
109
+ 01:51:59 [DEBUG] > TEXT '{"type": "close"}' [17 bytes]
110
+ 01:51:59 [DEBUG] > CLOSE 1000 (OK) [2 bytes]
111
+ 01:51:59 [DEBUG] = connection is CLOSING
112
+ 01:51:59 [DEBUG] < CLOSE 1000 (OK) [2 bytes]
113
+ 01:51:59 [DEBUG] < EOF
114
+ 01:51:59 [DEBUG] > EOF
115
+ 01:51:59 [DEBUG] = connection is CLOSED
116
+ 01:51:59 [DEBUG] x half-closing TCP connection
117
+ 01:51:59 [DEBUG] close.started
118
+ 01:51:59 [DEBUG] close.complete
logs/inference_20260403_015354.log ADDED
@@ -0,0 +1,69 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 01:53:54 [INFO] Starting. API_BASE_URL=https://router.huggingface.co/v1 MODEL_NAME=openai/gpt-oss-20b:groq
2
+ 01:53:55 [INFO] Launching task: orientation_ward
3
+ 01:53:59 [DEBUG] Tool registered: read_inbox (required=[])
4
+ 01:53:59 [DEBUG] Tool registered: query_erp (required=['table'])
5
+ 01:53:59 [DEBUG] Tool registered: query_supplier (required=['supplier_id'])
6
+ 01:53:59 [DEBUG] Tool registered: query_forecast (required=['product_id', 'location_id'])
7
+ 01:53:59 [DEBUG] Tool registered: submit_po (required=['supplier_id', 'product_id', 'destination_id', 'quantity'])
8
+ 01:53:59 [DEBUG] Tool registered: transfer (required=['from_location_id', 'to_location_id', 'product_id', 'quantity'])
9
+ 01:53:59 [DEBUG] Tool registered: quarantine_lot (required=['location_id', 'sku', 'lot_id'])
10
+ 01:53:59 [DEBUG] Tool registered: file_justification (required=['ticket_id', 'reason'])
11
+ 01:53:59 [DEBUG] Tool registered: end_shift (required=[])
12
+ 01:53:59 [INFO] [orientation_ward] 9 tools discovered
13
+ 01:53:59 [INFO] [orientation_ward] Episode started. Tools: ['read_inbox', 'query_erp', 'query_supplier', 'query_forecast', 'submit_po', 'transfer', 'quarantine_lot', 'file_justification', 'end_shift']
14
+ 01:53:59 [INFO] [orientation_ward] Step 1/150 — 2 messages in context
15
+ 01:54:00 [DEBUG] [orientation_ward] Step 1 — finish_reason=tool_calls tool_calls=1
16
+ 01:54:00 [DEBUG] [orientation_ward] Step 1 — calling read_inbox({'filter': 'unread'})
17
+ 01:54:02 [INFO] [orientation_ward] Step 2/150 — 4 messages in context
18
+ 01:54:03 [DEBUG] [orientation_ward] Step 2 — finish_reason=tool_calls tool_calls=1
19
+ 01:54:03 [DEBUG] [orientation_ward] Step 2 — calling query_erp({'location': 'all', 'sku': 'all', 'table': 'inventory'})
20
+ 01:54:05 [INFO] [orientation_ward] Step 3/150 — 6 messages in context
21
+ 01:54:05 [DEBUG] [orientation_ward] Step 3 — finish_reason=tool_calls tool_calls=1
22
+ 01:54:05 [DEBUG] [orientation_ward] Step 3 — calling query_supplier({'supplier_id': 'MEDLINE'})
23
+ 01:54:07 [INFO] [orientation_ward] Step 4/150 — 8 messages in context
24
+ 01:54:12 [DEBUG] [orientation_ward] Step 4 — finish_reason=tool_calls tool_calls=1
25
+ 01:54:12 [DEBUG] [orientation_ward] Step 4 — calling query_forecast({'horizon_days': 7, 'location_id': 'ward_general', 'product_id': 'SYR-10'})
26
+ 01:54:14 [INFO] [orientation_ward] Step 5/150 — 10 messages in context
27
+ 01:54:15 [DEBUG] [orientation_ward] Step 5 — finish_reason=tool_calls tool_calls=1
28
+ 01:54:15 [DEBUG] [orientation_ward] Step 5 — calling submit_po({'destination_id': 'ward_general', 'priority': 'standard', 'product_id': 'SYR-10', 'quantity': 60, 'supplier_id': 'MEDLINE'})
29
+ 01:54:17 [INFO] [orientation_ward] Step 6/150 — 12 messages in context
30
+ 01:54:17 [WARNING] [orientation_ward] Step 6 — BadRequestError (1/5): Error code: 400 - {'error': {'message': 'Failed to parse tool call arguments as JSON', 'type': 'invalid_request_error', 'code': 'tool_use_failed', 'failed_generation': '{"name": "end_shift", "arguments": {""}"}'}}
31
+ 01:54:17 [INFO] [orientation_ward] Step 7/150 — 13 messages in context
32
+ 01:54:18 [DEBUG] [orientation_ward] Step 7 — finish_reason=tool_calls tool_calls=1
33
+ 01:54:18 [DEBUG] [orientation_ward] Step 7 — calling end_shift({})
34
+ 01:54:20 [INFO] [orientation_ward] Step 7 — shift 1 ended; pruning context (1 summaries)
35
+ 01:54:20 [DEBUG] [orientation_ward] Dropping orphaned leading tool msg (tool_call_id=fc_49c2f8c1-1ba9-4a77-adff-9d3845be0b4b)
36
+ 01:54:20 [INFO] [orientation_ward] Step 8/150 — 8 messages in context
37
+ 01:54:21 [DEBUG] [orientation_ward] Step 8 — finish_reason=tool_calls tool_calls=1
38
+ 01:54:21 [DEBUG] [orientation_ward] Step 8 — calling read_inbox({'filter': 'unread'})
39
+ 01:54:23 [INFO] [orientation_ward] Step 9/150 — 10 messages in context
40
+ 01:54:23 [DEBUG] [orientation_ward] Step 9 — finish_reason=tool_calls tool_calls=1
41
+ 01:54:23 [DEBUG] [orientation_ward] Step 9 — calling query_erp({'location': 'all', 'sku': 'all', 'table': 'inventory'})
42
+ 01:54:25 [INFO] [orientation_ward] Step 10/150 — 12 messages in context
43
+ 01:54:26 [DEBUG] [orientation_ward] Step 10 — finish_reason=tool_calls tool_calls=1
44
+ 01:54:26 [DEBUG] [orientation_ward] Step 10 — calling query_supplier({'supplier_id': 'MEDLINE'})
45
+ 01:54:28 [INFO] [orientation_ward] Step 11/150 — 14 messages in context
46
+ 01:54:29 [DEBUG] [orientation_ward] Step 11 — finish_reason=tool_calls tool_calls=1
47
+ 01:54:29 [DEBUG] [orientation_ward] Step 11 — calling query_forecast({'horizon_days': 7, 'location_id': 'ward_general', 'product_id': 'SYR-10'})
48
+ 01:54:31 [INFO] [orientation_ward] Step 12/150 — 16 messages in context
49
+ 01:54:32 [DEBUG] [orientation_ward] Step 12 — finish_reason=tool_calls tool_calls=1
50
+ 01:54:32 [DEBUG] [orientation_ward] Step 12 — calling end_shift({})
51
+ 01:54:32 [INFO] [orientation_ward] Step 12 — episode complete detected
52
+ 01:54:34 [INFO] [orientation_ward] Episode finished. steps=12 done=True final_reward=0.7599
53
+ 01:54:34 [INFO] [orientation_ward] Task complete: reward=0.7599 steps=12
54
+ 01:54:44 [WARNING] MedchainEnv.close() suppressed error during shutdown: Command '['docker', 'stop', '1e65c475a5a595ff160edc4efa0bac62e613c6fc2b4158e62eb8a8ab317bec3f']' timed out after 10 seconds
55
+ 01:54:44 [INFO] Launching task: single_ward_stable
56
+ 01:54:49 [DEBUG] Tool registered: read_inbox (required=[])
57
+ 01:54:49 [DEBUG] Tool registered: query_erp (required=['table'])
58
+ 01:54:49 [DEBUG] Tool registered: query_supplier (required=['supplier_id'])
59
+ 01:54:49 [DEBUG] Tool registered: query_forecast (required=['product_id', 'location_id'])
60
+ 01:54:49 [DEBUG] Tool registered: submit_po (required=['supplier_id', 'product_id', 'destination_id', 'quantity'])
61
+ 01:54:49 [DEBUG] Tool registered: transfer (required=['from_location_id', 'to_location_id', 'product_id', 'quantity'])
62
+ 01:54:49 [DEBUG] Tool registered: quarantine_lot (required=['location_id', 'sku', 'lot_id'])
63
+ 01:54:49 [DEBUG] Tool registered: file_justification (required=['ticket_id', 'reason'])
64
+ 01:54:49 [DEBUG] Tool registered: end_shift (required=[])
65
+ 01:54:49 [INFO] [single_ward_stable] 9 tools discovered
66
+ 01:54:49 [INFO] [single_ward_stable] Episode started. Tools: ['read_inbox', 'query_erp', 'query_supplier', 'query_forecast', 'submit_po', 'transfer', 'quarantine_lot', 'file_justification', 'end_shift']
67
+ 01:54:49 [INFO] [single_ward_stable] Step 1/150 — 2 messages in context
68
+ 01:54:49 [DEBUG] [single_ward_stable] Step 1 — finish_reason=tool_calls tool_calls=1
69
+ 01:54:49 [DEBUG] [single_ward_stable] Step 1 — calling read_inbox({'filter': 'unread'})
logs/inference_20260403_015503.log ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 01:55:03 [INFO] Starting. API_BASE_URL=https://router.huggingface.co/v1 MODEL_NAME=openai/gpt-oss-20b:groq
2
+ 01:55:03 [INFO] Launching task: orientation_ward
3
+ 01:55:08 [DEBUG] Tool registered: read_inbox (required=[])
4
+ 01:55:08 [DEBUG] Tool registered: query_erp (required=['table'])
5
+ 01:55:08 [DEBUG] Tool registered: query_supplier (required=['supplier_id'])
6
+ 01:55:08 [DEBUG] Tool registered: query_forecast (required=['product_id', 'location_id'])
7
+ 01:55:08 [DEBUG] Tool registered: submit_po (required=['supplier_id', 'product_id', 'destination_id', 'quantity'])
8
+ 01:55:08 [DEBUG] Tool registered: transfer (required=['from_location_id', 'to_location_id', 'product_id', 'quantity'])
9
+ 01:55:08 [DEBUG] Tool registered: quarantine_lot (required=['location_id', 'sku', 'lot_id'])
10
+ 01:55:08 [DEBUG] Tool registered: file_justification (required=['ticket_id', 'reason'])
11
+ 01:55:08 [DEBUG] Tool registered: end_shift (required=[])
12
+ 01:55:08 [INFO] [orientation_ward] 9 tools discovered
13
+ 01:55:08 [INFO] [orientation_ward] Episode started. Tools: ['read_inbox', 'query_erp', 'query_supplier', 'query_forecast', 'submit_po', 'transfer', 'quarantine_lot', 'file_justification', 'end_shift']
14
+ 01:55:08 [INFO] [orientation_ward] Step 1/150 — 2 messages in context
15
+ 01:55:09 [DEBUG] [orientation_ward] Step 1 — finish_reason=tool_calls tool_calls=1
16
+ 01:55:09 [DEBUG] [orientation_ward] Step 1 — calling read_inbox({'filter': 'unread'})
17
+ 01:55:11 [INFO] [orientation_ward] Step 2/150 — 4 messages in context
18
+ 01:55:11 [DEBUG] [orientation_ward] Step 2 — finish_reason=tool_calls tool_calls=1
19
+ 01:55:11 [DEBUG] [orientation_ward] Step 2 — calling query_erp({'location': 'all', 'sku': 'all', 'table': 'inventory'})
20
+ 01:55:13 [INFO] [orientation_ward] Step 3/150 — 6 messages in context
21
+ 01:55:14 [DEBUG] [orientation_ward] Step 3 — finish_reason=tool_calls tool_calls=1
22
+ 01:55:14 [DEBUG] [orientation_ward] Step 3 — calling query_forecast({'horizon_days': 7, 'location_id': 'ward_general', 'product_id': 'GLOVE-001'})
23
+ 01:55:16 [INFO] [orientation_ward] Step 4/150 — 8 messages in context
24
+ 01:55:17 [DEBUG] [orientation_ward] Step 4 — finish_reason=tool_calls tool_calls=1
25
+ 01:55:17 [DEBUG] [orientation_ward] Step 4 — calling query_supplier({'supplier_id': 'MEDLINE'})
26
+ 01:55:19 [INFO] [orientation_ward] Step 5/150 — 10 messages in context
27
+ 01:55:20 [DEBUG] [orientation_ward] Step 5 — finish_reason=tool_calls tool_calls=1
28
+ 01:55:20 [DEBUG] [orientation_ward] Step 5 — calling submit_po({'destination_id': 'ward_general', 'priority': 'standard', 'product_id': 'GLOVE-001', 'quantity': 20, 'supplier_id': 'MEDLINE'})
29
+ 01:55:22 [INFO] [orientation_ward] Step 6/150 — 12 messages in context
30
+ 01:55:23 [WARNING] [orientation_ward] Step 6 — BadRequestError (1/5): Error code: 400 - {'error': {'message': 'Failed to parse tool call arguments as JSON', 'type': 'invalid_request_error', 'code': 'tool_use_failed', 'failed_generation': '{"name": "end_shift", "arguments": {""}"}'}}
31
+ 01:55:23 [INFO] [orientation_ward] Step 7/150 — 13 messages in context
32
+ 01:55:24 [DEBUG] [orientation_ward] Step 7 — finish_reason=tool_calls tool_calls=1
33
+ 01:55:24 [DEBUG] [orientation_ward] Step 7 — calling end_shift({})
34
+ 01:55:26 [INFO] [orientation_ward] Step 7 — shift 1 ended; pruning context (1 summaries)
35
+ 01:55:26 [DEBUG] [orientation_ward] Dropping orphaned leading tool msg (tool_call_id=fc_8153125b-ba62-49db-ae31-d9170c8a07ab)
36
+ 01:55:26 [INFO] [orientation_ward] Step 8/150 — 8 messages in context
37
+ 01:55:27 [DEBUG] [orientation_ward] Step 8 — finish_reason=tool_calls tool_calls=1
38
+ 01:55:27 [DEBUG] [orientation_ward] Step 8 — calling query_erp({'location': 'all', 'sku': 'all', 'table': 'inventory'})
39
+ 01:55:29 [INFO] [orientation_ward] Step 9/150 — 10 messages in context
40
+ 01:55:29 [DEBUG] [orientation_ward] Step 9 — finish_reason=tool_calls tool_calls=1
41
+ 01:55:29 [DEBUG] [orientation_ward] Step 9 — calling query_forecast({'horizon_days': 7, 'location_id': 'ward_general', 'product_id': 'GLOVE-001'})
42
+ 01:55:31 [INFO] [orientation_ward] Step 10/150 — 12 messages in context
43
+ 01:55:32 [DEBUG] [orientation_ward] Step 10 — finish_reason=tool_calls tool_calls=1
44
+ 01:55:32 [DEBUG] [orientation_ward] Step 10 — calling query_supplier({'supplier_id': 'MEDLINE'})
45
+ 01:55:34 [INFO] [orientation_ward] Step 11/150 — 14 messages in context
46
+ 01:55:35 [DEBUG] [orientation_ward] Step 11 — finish_reason=tool_calls tool_calls=1
47
+ 01:55:35 [DEBUG] [orientation_ward] Step 11 — calling submit_po({'destination_id': 'ward_general', 'priority': 'standard', 'product_id': 'GLOVE-001', 'quantity': 40, 'supplier_id': 'MEDLINE'})
48
+ 01:55:37 [INFO] [orientation_ward] Step 12/150 — 16 messages in context
49
+ 01:55:39 [DEBUG] [orientation_ward] Step 12 — finish_reason=tool_calls tool_calls=1
50
+ 01:55:39 [DEBUG] [orientation_ward] Step 12 — calling submit_po({'destination_id': 'ward_general', 'priority': 'standard', 'product_id': 'GLOVE-001', 'quantity': 100, 'supplier_id': 'MEDLINE'})
51
+ 01:55:41 [INFO] [orientation_ward] Step 13/150 — 18 messages in context
52
+ 01:55:41 [DEBUG] [orientation_ward] Step 13 — finish_reason=tool_calls tool_calls=1
53
+ 01:55:41 [DEBUG] [orientation_ward] Step 13 — calling end_shift({})
54
+ 01:55:41 [INFO] [orientation_ward] Step 13 — episode complete detected
55
+ 01:55:44 [INFO] [orientation_ward] Episode finished. steps=13 done=True final_reward=0.7854
56
+ 01:55:44 [INFO] [orientation_ward] Task complete: reward=0.7854 steps=13
57
+ 01:55:54 [WARNING] MedchainEnv.close() suppressed error during shutdown: Command '['docker', 'stop', 'b667c7cb1b5aab1043d379f2f81357cefda81aab0068de49a8664b9c859c67f5']' timed out after 10 seconds
58
+ 01:55:54 [INFO] All tasks complete. avg_reward=0.7854
logs/inference_20260403_020521.log ADDED
@@ -0,0 +1,147 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 02:05:21 [INFO] Starting. API_BASE_URL=https://router.huggingface.co/v1 MODEL_NAME=openai/gpt-oss-20b:groq
2
+ 02:05:21 [INFO] Launching task: orientation_ward
3
+ 02:05:25 [DEBUG] Tool registered: read_inbox (required=[])
4
+ 02:05:25 [DEBUG] Tool registered: query_erp (required=['table'])
5
+ 02:05:25 [DEBUG] Tool registered: query_supplier (required=['supplier_id'])
6
+ 02:05:25 [DEBUG] Tool registered: query_forecast (required=['product_id', 'location_id'])
7
+ 02:05:25 [DEBUG] Tool registered: submit_po (required=['supplier_id', 'product_id', 'destination_id', 'quantity'])
8
+ 02:05:25 [DEBUG] Tool registered: transfer (required=['from_location_id', 'to_location_id', 'product_id', 'quantity'])
9
+ 02:05:25 [DEBUG] Tool registered: quarantine_lot (required=['location_id', 'sku', 'lot_id'])
10
+ 02:05:25 [DEBUG] Tool registered: file_justification (required=['ticket_id', 'reason'])
11
+ 02:05:25 [DEBUG] Tool registered: end_shift (required=[])
12
+ 02:05:25 [INFO] [orientation_ward] 9 tools discovered
13
+ 02:05:25 [DEBUG] [orientation_ward] Episode started. Tools: ['read_inbox', 'query_erp', 'query_supplier', 'query_forecast', 'submit_po', 'transfer', 'quarantine_lot', 'file_justification', 'end_shift']
14
+ 02:05:25 [DEBUG] [orientation_ward] Step 1/150 — 2 messages in context
15
+ 02:05:26 [DEBUG] [orientation_ward] Step 1 — finish_reason=tool_calls tool_calls=1
16
+ 02:05:26 [DEBUG] [orientation_ward] Step 1 — calling read_inbox({'filter': 'unread'})
17
+ 02:05:28 [DEBUG] [orientation_ward] Step 2/150 — 4 messages in context
18
+ 02:05:29 [DEBUG] [orientation_ward] Step 2 — finish_reason=tool_calls tool_calls=1
19
+ 02:05:29 [DEBUG] [orientation_ward] Step 2 — calling query_erp({'location': 'all', 'sku': 'all', 'table': 'inventory'})
20
+ 02:05:31 [DEBUG] [orientation_ward] Step 3/150 — 6 messages in context
21
+ 02:05:32 [DEBUG] [orientation_ward] Step 3 — finish_reason=tool_calls tool_calls=1
22
+ 02:05:32 [DEBUG] [orientation_ward] Step 3 — calling query_forecast({'horizon_days': 7, 'location_id': 'ward_general', 'product_id': 'GLOVE-001'})
23
+ 02:05:34 [DEBUG] [orientation_ward] Step 4/150 — 8 messages in context
24
+ 02:05:35 [DEBUG] [orientation_ward] Step 4 — finish_reason=tool_calls tool_calls=1
25
+ 02:05:35 [DEBUG] [orientation_ward] Step 4 — calling query_supplier({'supplier_id': 'MEDLINE'})
26
+ 02:05:37 [DEBUG] [orientation_ward] Step 5/150 — 10 messages in context
27
+ 02:05:38 [DEBUG] [orientation_ward] Step 5 — finish_reason=tool_calls tool_calls=1
28
+ 02:05:38 [DEBUG] [orientation_ward] Step 5 — calling submit_po({'destination_id': 'ward_general', 'priority': 'standard', 'product_id': 'GLOVE-001', 'quantity': 160, 'supplier_id': 'MEDLINE'})
29
+ 02:05:40 [DEBUG] [orientation_ward] Step 6/150 — 12 messages in context
30
+ 02:05:41 [WARNING] [orientation_ward] Step 6 — BadRequestError (1/5): Error code: 400 - {'error': {'message': 'Failed to parse tool call arguments as JSON', 'type': 'invalid_request_error', 'code': 'tool_use_failed', 'failed_generation': '{"name": "end_shift", "arguments": {""}"}'}}
31
+ 02:05:41 [DEBUG] [orientation_ward] Step 7/150 — 13 messages in context
32
+ 02:05:42 [DEBUG] [orientation_ward] Step 7 — finish_reason=tool_calls tool_calls=1
33
+ 02:05:42 [DEBUG] [orientation_ward] Step 7 — calling end_shift({})
34
+ 02:05:44 [INFO] [orientation_ward] Step 7 — shift 1 ended; pruning context (1 summaries)
35
+ 02:05:44 [DEBUG] [orientation_ward] Dropping orphaned leading tool msg (tool_call_id=fc_4f2efe5a-4060-4444-9e31-d6fe229ccdd1)
36
+ 02:05:44 [DEBUG] [orientation_ward] Step 8/150 — 8 messages in context
37
+ 02:05:44 [DEBUG] [orientation_ward] Step 8 — finish_reason=tool_calls tool_calls=1
38
+ 02:05:44 [DEBUG] [orientation_ward] Step 8 — calling read_inbox({'filter': 'unread'})
39
+ 02:05:46 [DEBUG] [orientation_ward] Step 9/150 — 10 messages in context
40
+ 02:05:47 [DEBUG] [orientation_ward] Step 9 — finish_reason=tool_calls tool_calls=1
41
+ 02:05:47 [DEBUG] [orientation_ward] Step 9 — calling query_erp({'location': 'all', 'sku': 'all', 'table': 'inventory'})
42
+ 02:05:49 [DEBUG] [orientation_ward] Step 10/150 — 12 messages in context
43
+ 02:05:50 [DEBUG] [orientation_ward] Step 10 — finish_reason=tool_calls tool_calls=1
44
+ 02:05:50 [DEBUG] [orientation_ward] Step 10 — calling submit_po({'destination_id': 'ward_general', 'priority': 'standard', 'product_id': 'GLOVE-001', 'quantity': 200, 'supplier_id': 'MEDLINE'})
45
+ 02:05:52 [DEBUG] [orientation_ward] Step 11/150 — 14 messages in context
46
+ 02:05:53 [DEBUG] [orientation_ward] Step 11 — finish_reason=tool_calls tool_calls=1
47
+ 02:05:53 [DEBUG] [orientation_ward] Step 11 — calling query_erp({'location': 'all', 'sku': 'GLOVE-001', 'table': 'inventory'})
48
+ 02:05:55 [DEBUG] [orientation_ward] Step 12/150 — 16 messages in context
49
+ 02:05:56 [DEBUG] [orientation_ward] Step 12 — finish_reason=tool_calls tool_calls=1
50
+ 02:05:56 [DEBUG] [orientation_ward] Step 12 — calling query_supplier({'supplier_id': 'MEDLINE'})
51
+ 02:05:58 [DEBUG] [orientation_ward] Step 13/150 — 18 messages in context
52
+ 02:06:00 [DEBUG] [orientation_ward] Step 13 — finish_reason=tool_calls tool_calls=1
53
+ 02:06:00 [DEBUG] [orientation_ward] Step 13 — calling end_shift({})
54
+ 02:06:00 [INFO] [orientation_ward] Step 13 — episode complete detected
55
+ 02:06:02 [INFO] [orientation_ward] Episode finished. steps=13 done=True final_reward=0.8007
56
+ 02:06:02 [INFO] [orientation_ward] Task complete: reward=0.8007 steps=13
57
+ 02:06:12 [WARNING] MedchainEnv.close() suppressed error during shutdown: Command '['docker', 'stop', '5ef373ff9b040d1a9b9a5d25924f54fff0443fee3d0abd338750f7a0469bde69']' timed out after 10 seconds
58
+ 02:06:12 [INFO] Launching task: single_ward_stable
59
+ 02:06:17 [DEBUG] Tool registered: read_inbox (required=[])
60
+ 02:06:17 [DEBUG] Tool registered: query_erp (required=['table'])
61
+ 02:06:17 [DEBUG] Tool registered: query_supplier (required=['supplier_id'])
62
+ 02:06:17 [DEBUG] Tool registered: query_forecast (required=['product_id', 'location_id'])
63
+ 02:06:17 [DEBUG] Tool registered: submit_po (required=['supplier_id', 'product_id', 'destination_id', 'quantity'])
64
+ 02:06:17 [DEBUG] Tool registered: transfer (required=['from_location_id', 'to_location_id', 'product_id', 'quantity'])
65
+ 02:06:17 [DEBUG] Tool registered: quarantine_lot (required=['location_id', 'sku', 'lot_id'])
66
+ 02:06:17 [DEBUG] Tool registered: file_justification (required=['ticket_id', 'reason'])
67
+ 02:06:17 [DEBUG] Tool registered: end_shift (required=[])
68
+ 02:06:17 [INFO] [single_ward_stable] 9 tools discovered
69
+ 02:06:17 [DEBUG] [single_ward_stable] Episode started. Tools: ['read_inbox', 'query_erp', 'query_supplier', 'query_forecast', 'submit_po', 'transfer', 'quarantine_lot', 'file_justification', 'end_shift']
70
+ 02:06:17 [DEBUG] [single_ward_stable] Step 1/150 — 2 messages in context
71
+ 02:06:17 [DEBUG] [single_ward_stable] Step 1 — finish_reason=tool_calls tool_calls=1
72
+ 02:06:17 [DEBUG] [single_ward_stable] Step 1 — calling read_inbox({'filter': 'unread'})
73
+ 02:06:19 [DEBUG] [single_ward_stable] Step 2/150 — 4 messages in context
74
+ 02:06:20 [DEBUG] [single_ward_stable] Step 2 — finish_reason=tool_calls tool_calls=1
75
+ 02:06:20 [DEBUG] [single_ward_stable] Step 2 — calling query_erp({'location': 'all', 'sku': 'all', 'table': 'inventory'})
76
+ 02:06:22 [DEBUG] [single_ward_stable] Step 3/150 — 6 messages in context
77
+ 02:06:23 [DEBUG] [single_ward_stable] Step 3 — finish_reason=tool_calls tool_calls=1
78
+ 02:06:23 [DEBUG] [single_ward_stable] Step 3 — calling query_forecast({'horizon_days': 7, 'location_id': 'ward_general', 'product_id': 'GLOVE-001'})
79
+ 02:06:25 [DEBUG] [single_ward_stable] Step 4/150 — 8 messages in context
80
+ 02:06:25 [DEBUG] [single_ward_stable] Step 4 — finish_reason=tool_calls tool_calls=1
81
+ 02:06:25 [DEBUG] [single_ward_stable] Step 4 — calling query_supplier({'supplier_id': 'MEDLINE'})
82
+ 02:06:27 [DEBUG] [single_ward_stable] Step 5/150 — 10 messages in context
83
+ 02:06:28 [DEBUG] [single_ward_stable] Step 5 — finish_reason=tool_calls tool_calls=1
84
+ 02:06:28 [DEBUG] [single_ward_stable] Step 5 — calling query_forecast({'horizon_days': 7, 'location_id': 'ward_general', 'product_id': 'IV-500'})
85
+ 02:06:30 [DEBUG] [single_ward_stable] Step 6/150 — 12 messages in context
86
+ 02:06:32 [DEBUG] [single_ward_stable] Step 6 — finish_reason=tool_calls tool_calls=1
87
+ 02:06:32 [DEBUG] [single_ward_stable] Step 6 — calling submit_po({'destination_id': 'ward_general', 'product_id': 'GLOVE-001', 'quantity': 50, 'supplier_id': 'MEDLINE'})
88
+ 02:06:34 [DEBUG] [single_ward_stable] Step 7/150 — 14 messages in context
89
+ 02:06:34 [WARNING] [single_ward_stable] Step 7 — BadRequestError (1/5): Error code: 400 - {'error': {'message': 'Failed to parse tool call arguments as JSON', 'type': 'invalid_request_error', 'code': 'tool_use_failed', 'failed_generation': '{"name": "end_shift", "arguments": {""}"}'}}
90
+ 02:06:34 [DEBUG] [single_ward_stable] Step 8/150 — 15 messages in context
91
+ 02:06:35 [DEBUG] [single_ward_stable] Step 8 — finish_reason=tool_calls tool_calls=1
92
+ 02:06:35 [DEBUG] [single_ward_stable] Step 8 — calling end_shift({})
93
+ 02:06:37 [INFO] [single_ward_stable] Step 8 — shift 1 ended; pruning context (1 summaries)
94
+ 02:06:37 [DEBUG] [single_ward_stable] Dropping orphaned leading tool msg (tool_call_id=fc_a6c7005e-8882-4b92-a1dc-828fd8e57012)
95
+ 02:06:37 [DEBUG] [single_ward_stable] Step 9/150 — 8 messages in context
96
+ 02:06:38 [DEBUG] [single_ward_stable] Step 9 — finish_reason=tool_calls tool_calls=1
97
+ 02:06:38 [DEBUG] [single_ward_stable] Step 9 — calling read_inbox({'filter': 'unread'})
98
+ 02:06:40 [DEBUG] [single_ward_stable] Step 10/150 — 10 messages in context
99
+ 02:06:41 [DEBUG] [single_ward_stable] Step 10 — finish_reason=tool_calls tool_calls=1
100
+ 02:06:41 [DEBUG] [single_ward_stable] Step 10 — calling query_erp({'location': 'all', 'sku': 'all', 'table': 'inventory'})
101
+ 02:06:43 [DEBUG] [single_ward_stable] Step 11/150 — 12 messages in context
102
+ 02:06:44 [DEBUG] [single_ward_stable] Step 11 — finish_reason=tool_calls tool_calls=1
103
+ 02:06:44 [DEBUG] [single_ward_stable] Step 11 — calling query_erp({'location': 'ward_general', 'sku': 'all', 'table': 'demand_history'})
104
+ 02:06:46 [DEBUG] [single_ward_stable] Step 12/150 — 14 messages in context
105
+ 02:06:47 [DEBUG] [single_ward_stable] Step 12 — finish_reason=tool_calls tool_calls=1
106
+ 02:06:47 [DEBUG] [single_ward_stable] Step 12 — calling query_forecast({'horizon_days': 7, 'location_id': 'ward_general', 'product_id': 'GLOVE-001'})
107
+ 02:06:49 [DEBUG] [single_ward_stable] Step 13/150 — 16 messages in context
108
+ 02:06:49 [DEBUG] [single_ward_stable] Step 13 — finish_reason=tool_calls tool_calls=1
109
+ 02:06:49 [DEBUG] [single_ward_stable] Step 13 — calling query_supplier({'supplier_id': 'MEDLINE'})
110
+ 02:06:51 [DEBUG] [single_ward_stable] Step 14/150 — 18 messages in context
111
+ 02:06:52 [DEBUG] [single_ward_stable] Step 14 — finish_reason=tool_calls tool_calls=1
112
+ 02:06:52 [DEBUG] [single_ward_stable] Step 14 — calling submit_po({'destination_id': 'ward_general', 'product_id': 'GLOVE-001', 'quantity': 102, 'supplier_id': 'MEDLINE'})
113
+ 02:06:54 [DEBUG] [single_ward_stable] Step 15/150 — 20 messages in context
114
+ 02:06:56 [DEBUG] [single_ward_stable] Step 15 — finish_reason=tool_calls tool_calls=1
115
+ 02:06:56 [DEBUG] [single_ward_stable] Step 15 — calling submit_po({'destination_id': 'ward_general', 'priority': 'expedited', 'product_id': 'GLOVE-001', 'quantity': 102, 'supplier_id': 'MEDLINE'})
116
+ 02:06:56 [INFO] [single_ward_stable] Step 15 — budget exhausted; injecting end_shift directive
117
+ 02:06:56 [DEBUG] [single_ward_stable] Step 16/150 — 23 messages in context
118
+ 02:06:56 [DEBUG] [single_ward_stable] Step 16 — finish_reason=tool_calls tool_calls=1
119
+ 02:06:56 [DEBUG] [single_ward_stable] Step 16 — calling end_shift({})
120
+ 02:06:58 [INFO] [single_ward_stable] Step 16 — shift 2 ended; pruning context (2 summaries)
121
+ 02:06:58 [DEBUG] [single_ward_stable] Dropping orphaned leading tool msg (tool_call_id=fc_3b3ef679-1e01-425a-bf58-a5e04cf204e4)
122
+ 02:06:58 [DEBUG] [single_ward_stable] Step 17/150 — 8 messages in context
123
+ 02:06:59 [DEBUG] [single_ward_stable] Step 17 — finish_reason=tool_calls tool_calls=1
124
+ 02:06:59 [DEBUG] [single_ward_stable] Step 17 — calling query_erp({'table': 'inventory'})
125
+ 02:07:01 [DEBUG] [single_ward_stable] Step 18/150 — 10 messages in context
126
+ 02:07:02 [DEBUG] [single_ward_stable] Step 18 — finish_reason=tool_calls tool_calls=1
127
+ 02:07:02 [DEBUG] [single_ward_stable] Step 18 — calling query_supplier({'supplier_id': 'MEDLINE'})
128
+ 02:07:04 [DEBUG] [single_ward_stable] Step 19/150 — 12 messages in context
129
+ 02:07:05 [DEBUG] [single_ward_stable] Step 19 — finish_reason=tool_calls tool_calls=1
130
+ 02:07:05 [DEBUG] [single_ward_stable] Step 19 — calling query_forecast({'horizon_days': 3, 'location_id': 'ward_general', 'product_id': 'GLOVE-001'})
131
+ 02:07:07 [DEBUG] [single_ward_stable] Step 20/150 — 14 messages in context
132
+ 02:07:07 [DEBUG] [single_ward_stable] Step 20 — finish_reason=tool_calls tool_calls=1
133
+ 02:07:07 [DEBUG] [single_ward_stable] Step 20 — calling submit_po({'destination_id': 'ward_general', 'product_id': 'GLOVE-001', 'quantity': 150, 'supplier_id': 'MEDLINE'})
134
+ 02:07:09 [DEBUG] [single_ward_stable] Step 21/150 — 16 messages in context
135
+ 02:07:10 [DEBUG] [single_ward_stable] Step 21 — finish_reason=tool_calls tool_calls=1
136
+ 02:07:10 [DEBUG] [single_ward_stable] Step 21 — calling read_inbox({'filter': 'unread'})
137
+ 02:07:12 [DEBUG] [single_ward_stable] Step 22/150 — 18 messages in context
138
+ 02:07:13 [DEBUG] [single_ward_stable] Step 22 — finish_reason=tool_calls tool_calls=1
139
+ 02:07:13 [DEBUG] [single_ward_stable] Step 22 — calling submit_po({'destination_id': 'ward_general', 'priority': 'expedited', 'product_id': 'GLOVE-001', 'quantity': 150, 'supplier_id': 'MEDLINE'})
140
+ 02:07:15 [DEBUG] [single_ward_stable] Step 23/150 — 20 messages in context
141
+ 02:07:16 [DEBUG] [single_ward_stable] Step 23 — finish_reason=tool_calls tool_calls=1
142
+ 02:07:16 [DEBUG] [single_ward_stable] Step 23 — calling end_shift({})
143
+ 02:07:16 [INFO] [single_ward_stable] Step 23 — episode complete detected
144
+ 02:07:18 [INFO] [single_ward_stable] Episode finished. steps=23 done=True final_reward=0.8621
145
+ 02:07:18 [INFO] [single_ward_stable] Task complete: reward=0.8621 steps=23
146
+ 02:07:28 [WARNING] MedchainEnv.close() suppressed error during shutdown: Command '['docker', 'stop', 'd3354d1bb205aedcaa7ecc10ce164bc21f891bf33b6ccd2f50759d487577e429']' timed out after 10 seconds
147
+ 02:07:28 [INFO] All tasks complete. avg_reward=0.8314
logs/inference_20260403_021847.log ADDED
@@ -0,0 +1,22 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 02:18:47 [INFO] Starting. API_BASE_URL=https://router.huggingface.co/v1 MODEL_NAME=openai/gpt-oss-20b:groq
2
+ 02:18:47 [INFO] Launching task: orientation_ward
3
+ 02:18:52 [DEBUG] Tool registered: read_inbox (required=[])
4
+ 02:18:52 [DEBUG] Tool registered: query_erp (required=['table'])
5
+ 02:18:52 [DEBUG] Tool registered: query_supplier (required=['supplier_id'])
6
+ 02:18:52 [DEBUG] Tool registered: query_forecast (required=['product_id', 'location_id'])
7
+ 02:18:52 [DEBUG] Tool registered: submit_po (required=['supplier_id', 'product_id', 'destination_id', 'quantity'])
8
+ 02:18:52 [DEBUG] Tool registered: transfer (required=['from_location_id', 'to_location_id', 'product_id', 'quantity'])
9
+ 02:18:52 [DEBUG] Tool registered: quarantine_lot (required=['location_id', 'sku', 'lot_id'])
10
+ 02:18:52 [DEBUG] Tool registered: file_justification (required=['ticket_id', 'reason'])
11
+ 02:18:52 [DEBUG] Tool registered: end_shift (required=[])
12
+ 02:18:52 [INFO] [orientation_ward] 9 tools discovered
13
+ 02:18:52 [DEBUG] [orientation_ward] Episode started. Tools: ['read_inbox', 'query_erp', 'query_supplier', 'query_forecast', 'submit_po', 'transfer', 'quarantine_lot', 'file_justification', 'end_shift']
14
+ 02:18:52 [DEBUG] [orientation_ward] Step 1/150 — 2 messages in context
15
+ 02:18:53 [DEBUG] [orientation_ward] Step 1 — finish_reason=tool_calls tool_calls=1
16
+ 02:18:53 [DEBUG] [orientation_ward] Step 1 — calling read_inbox({'filter': 'unread'})
17
+ 02:18:55 [DEBUG] [orientation_ward] Step 2/150 — 4 messages in context
18
+ 02:18:55 [DEBUG] [orientation_ward] Step 2 — finish_reason=tool_calls tool_calls=1
19
+ 02:18:55 [DEBUG] [orientation_ward] Step 2 — calling query_erp({'location': 'all', 'sku': 'all', 'table': 'inventory'})
20
+ 02:18:57 [DEBUG] [orientation_ward] Step 3/150 — 6 messages in context
21
+ 02:18:58 [DEBUG] [orientation_ward] Step 3 — finish_reason=tool_calls tool_calls=1
22
+ 02:18:58 [DEBUG] [orientation_ward] Step 3 — calling query_forecast({'horizon_days': 7, 'location_id': 'ward_general', 'product_id': 'GLOVE-001'})
logs/inference_20260403_130724.log ADDED
@@ -0,0 +1,63 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 13:07:24 [INFO] Starting. API_BASE_URL=https://router.huggingface.co/v1 MODEL_NAME=openai/gpt-oss-20b:groq
2
+ 13:07:24 [INFO] Launching task: orientation_ward
3
+ 13:07:29 [DEBUG] Tool registered: read_inbox (required=[])
4
+ 13:07:29 [DEBUG] Tool registered: query_erp (required=['table'])
5
+ 13:07:29 [DEBUG] Tool registered: query_supplier (required=['supplier_id'])
6
+ 13:07:29 [DEBUG] Tool registered: query_forecast (required=['product_id', 'location_id'])
7
+ 13:07:29 [DEBUG] Tool registered: submit_po (required=['supplier_id', 'product_id', 'destination_id', 'quantity'])
8
+ 13:07:29 [DEBUG] Tool registered: transfer (required=['from_location_id', 'to_location_id', 'product_id', 'quantity'])
9
+ 13:07:29 [DEBUG] Tool registered: quarantine_lot (required=['location_id', 'sku', 'lot_id'])
10
+ 13:07:29 [DEBUG] Tool registered: file_justification (required=['ticket_id', 'reason'])
11
+ 13:07:29 [DEBUG] Tool registered: end_shift (required=[])
12
+ 13:07:29 [INFO] [orientation_ward] 9 tools discovered
13
+ 13:07:29 [DEBUG] [orientation_ward] Episode started. Tools: ['read_inbox', 'query_erp', 'query_supplier', 'query_forecast', 'submit_po', 'transfer', 'quarantine_lot', 'file_justification', 'end_shift']
14
+ 13:07:29 [DEBUG] [orientation_ward] Step 1/150 — 2 messages in context
15
+ 13:07:30 [DEBUG] [orientation_ward] Step 1 — finish_reason=tool_calls tool_calls=1
16
+ 13:07:30 [DEBUG] [orientation_ward] Step 1 — calling read_inbox({'filter': 'unread'})
17
+ 13:07:32 [DEBUG] [orientation_ward] Step 2/150 — 4 messages in context
18
+ 13:07:33 [DEBUG] [orientation_ward] Step 2 — finish_reason=tool_calls tool_calls=1
19
+ 13:07:33 [DEBUG] [orientation_ward] Step 2 — calling query_erp({'location': 'all', 'sku': 'all', 'table': 'inventory'})
20
+ 13:07:35 [DEBUG] [orientation_ward] Step 3/150 — 6 messages in context
21
+ 13:07:36 [DEBUG] [orientation_ward] Step 3 — finish_reason=tool_calls tool_calls=1
22
+ 13:07:36 [DEBUG] [orientation_ward] Step 3 — calling query_forecast({'horizon_days': 7, 'location_id': 'ward_general', 'product_id': 'GLOVE-001'})
23
+ 13:07:38 [DEBUG] [orientation_ward] Step 4/150 — 8 messages in context
24
+ 13:07:38 [DEBUG] [orientation_ward] Step 4 — finish_reason=tool_calls tool_calls=1
25
+ 13:07:38 [DEBUG] [orientation_ward] Step 4 — calling query_supplier({'supplier_id': 'MEDLINE'})
26
+ 13:07:40 [DEBUG] [orientation_ward] Step 5/150 — 10 messages in context
27
+ 13:07:47 [WARNING] [orientation_ward] Step 5 — BadRequestError (1/5): Error code: 400 - {'error': {'message': 'Tool choice is required, but model did not call a tool', 'type': 'invalid_request_error', 'code': 'tool_use_failed', 'failed_generation': ''}}
28
+ 13:07:47 [DEBUG] [orientation_ward] Step 6/150 — 11 messages in context
29
+ 13:07:48 [DEBUG] [orientation_ward] Step 6 — finish_reason=tool_calls tool_calls=1
30
+ 13:07:48 [DEBUG] [orientation_ward] Step 6 — calling submit_po({'destination_id': 'ward_general', 'priority': 'standard', 'product_id': 'GLOVE-001', 'quantity': 20, 'supplier_id': 'MEDLINE'})
31
+ 13:07:50 [DEBUG] [orientation_ward] Step 7/150 — 13 messages in context
32
+ 13:07:50 [WARNING] [orientation_ward] Step 7 — BadRequestError (1/5): Error code: 400 - {'error': {'message': 'Failed to parse tool call arguments as JSON', 'type': 'invalid_request_error', 'code': 'tool_use_failed', 'failed_generation': '{"name": "end_shift", "arguments": {""}"}'}}
33
+ 13:07:50 [DEBUG] [orientation_ward] Step 8/150 — 14 messages in context
34
+ 13:07:51 [DEBUG] [orientation_ward] Step 8 — finish_reason=tool_calls tool_calls=1
35
+ 13:07:51 [DEBUG] [orientation_ward] Step 8 — calling end_shift({})
36
+ 13:07:53 [INFO] [orientation_ward] Step 8 — shift 1 ended; pruning context (1 summaries)
37
+ 13:07:53 [DEBUG] [orientation_ward] Step 9/150 — 9 messages in context
38
+ 13:07:54 [DEBUG] [orientation_ward] Step 9 — finish_reason=tool_calls tool_calls=1
39
+ 13:07:54 [DEBUG] [orientation_ward] Step 9 — calling read_inbox({'filter': 'unread'})
40
+ 13:07:56 [DEBUG] [orientation_ward] Step 10/150 — 11 messages in context
41
+ 13:07:56 [DEBUG] [orientation_ward] Step 10 — finish_reason=tool_calls tool_calls=1
42
+ 13:07:56 [DEBUG] [orientation_ward] Step 10 — calling query_erp({'table': 'inventory'})
43
+ 13:07:58 [DEBUG] [orientation_ward] Step 11/150 — 13 messages in context
44
+ 13:07:59 [DEBUG] [orientation_ward] Step 11 — finish_reason=tool_calls tool_calls=1
45
+ 13:07:59 [DEBUG] [orientation_ward] Step 11 — calling query_forecast({'horizon_days': 7, 'location_id': 'ward_general', 'product_id': 'SYR-10'})
46
+ 13:08:01 [DEBUG] [orientation_ward] Step 12/150 — 15 messages in context
47
+ 13:08:02 [DEBUG] [orientation_ward] Step 12 — finish_reason=tool_calls tool_calls=1
48
+ 13:08:02 [DEBUG] [orientation_ward] Step 12 — calling submit_po({'destination_id': 'ward_general', 'priority': 'standard', 'product_id': 'SYR-10', 'quantity': 100, 'supplier_id': 'MEDLINE'})
49
+ 13:08:04 [DEBUG] [orientation_ward] Step 13/150 — 17 messages in context
50
+ 13:08:05 [DEBUG] [orientation_ward] Step 13 — finish_reason=tool_calls tool_calls=1
51
+ 13:08:05 [DEBUG] [orientation_ward] Step 13 — calling query_supplier({'supplier_id': 'MEDLINE'})
52
+ 13:08:07 [DEBUG] [orientation_ward] Step 14/150 — 19 messages in context
53
+ 13:08:08 [DEBUG] [orientation_ward] Step 14 — finish_reason=tool_calls tool_calls=1
54
+ 13:08:08 [DEBUG] [orientation_ward] Step 14 — calling query_erp({'sku': 'GLOVE-001', 'table': 'inventory'})
55
+ 13:08:08 [INFO] [orientation_ward] Step 14 — budget exhausted; injecting end_shift directive
56
+ 13:08:08 [DEBUG] [orientation_ward] Step 15/150 — 22 messages in context
57
+ 13:08:08 [DEBUG] [orientation_ward] Step 15 — finish_reason=tool_calls tool_calls=1
58
+ 13:08:08 [DEBUG] [orientation_ward] Step 15 — calling end_shift({})
59
+ 13:08:08 [INFO] [orientation_ward] Step 15 — episode complete detected
60
+ 13:08:10 [INFO] [orientation_ward] Episode finished. steps=15 done=True final_reward=0.7854
61
+ 13:08:10 [INFO] [orientation_ward] Task complete: reward=0.7854 steps=15
62
+ 13:08:20 [WARNING] MedchainEnv.close() suppressed error during shutdown: Command '['docker', 'stop', 'a0b94417a826d0d040984577f5f06901a5ff78826054ff6be04878d0953f7706']' timed out after 10 seconds
63
+ 13:08:20 [INFO] All tasks complete. avg_reward=0.7854
models.py ADDED
@@ -0,0 +1,49 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Client-side state and observation types for the MedChain Env environment."""
2
+
3
+ from openenv.core.env_server import State
4
+ from openenv.core.env_server.types import Observation
5
+ from pydantic import Field
6
+ from typing import List, Optional
7
+
8
+ AVAILABLE_TOOLS = [
9
+ "read_inbox",
10
+ "query_erp",
11
+ "query_supplier",
12
+ "query_forecast",
13
+ "submit_po",
14
+ "transfer",
15
+ "quarantine_lot",
16
+ "file_justification",
17
+ "end_shift",
18
+ ]
19
+
20
+
21
+ class MedchainState(State):
22
+ """Runtime state exposed by the environment server."""
23
+
24
+ task: str = Field(
25
+ default="",
26
+ description="Task name (single_ward_stable / multi_ward_seasonal / hospital_network_crisis)",
27
+ )
28
+ day: int = Field(default=0, description="Current simulation day (1-indexed)")
29
+ max_days: int = Field(default=0, description="Total simulation days for this task")
30
+ actions_remaining: int = Field(default=0, description="Actions left this shift")
31
+ budget_used: float = Field(default=0.0, description="Outstanding committed PO budget ($)")
32
+ budget_limit: float = Field(default=0.0, description="Budget ceiling for outstanding orders ($)")
33
+ unread_messages: int = Field(default=0, description="Unread inbox messages")
34
+ orders_in_transit: int = Field(default=0, description="POs currently in transit")
35
+
36
+ class MedObservation(Observation):
37
+ """Initial observation returned by reset(). Contains the shift dashboard text."""
38
+
39
+ dashboard: str = Field(default="", description="Dashboard state")
40
+ available_tools: List[str] = Field(default_factory=list, description="Available tools")
41
+ episode_id: str = Field(default="", description="Episode ID")
42
+
43
+
44
+ class MedchainToolObservation(Observation):
45
+ """Observation returned for every tool-call step."""
46
+ tool_name: str = Field(default="", description="Name of the tool that was called")
47
+ tool_result: str = Field(default="", description="Text result from the tool")
48
+ error_msg: Optional[str] = Field(default=None, description="Error message if call failed")
49
+
openenv.yaml ADDED
@@ -0,0 +1,78 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ spec_version: 1
2
+ name: medchain_env
3
+ type: space
4
+ runtime: fastapi
5
+ app: server.app:app
6
+ port: 8000
7
+
8
+ description: >
9
+ Hospital medical supply chain management environment.
10
+ AI agents operate a simulated legacy ERP system to manage
11
+ inventory across hospital wards, order supplies from suppliers,
12
+ and respond to crises (mass casualty incidents, product recalls,
13
+ supplier disruptions) — all through a fragmented text-based interface.
14
+
15
+ tasks:
16
+ - id: orientation_ward
17
+ name: Orientation Day (Single Ward)
18
+ description: Explore the ERP tools for 2 days with 3 non-critical products and stable demand. Initial stock covers only Day 1 — agent must check inventory and place one replenishment order to avoid a Day 2 stockout.
19
+ difficulty: easy
20
+ max_days: 2
21
+ actions_per_shift: 5
22
+ expected_scores:
23
+ random: 0.55
24
+ heuristic: 0.88
25
+ llm_baseline: 0.95
26
+
27
+ - id: single_ward_stable
28
+ name: Single Ward (Stable Demand)
29
+ description: Manage one ward with 6 non-critical products and stable demand for 3 days. Initial stock covers only 2 days — agent must order on Day 1 to avoid stockout.
30
+ difficulty: medium
31
+ max_days: 3
32
+ actions_per_shift: 6
33
+ expected_scores:
34
+ random: 0.30
35
+ heuristic: 0.68
36
+ llm_baseline: 0.82
37
+
38
+ - id: multi_ward_seasonal
39
+ name: Multi-Ward (Seasonal Events)
40
+ description: Manage 3 wards + central pharmacy for 6 days. Two suppliers (fast/1-day vs standard/4-day). Flu surge activates Day 3 (+50% demand). Supplier delay fires Day 4 (standard lead time doubles — agent must pivot to fast supplier).
41
+ difficulty: medium-hard
42
+ max_days: 6
43
+ actions_per_shift: 8
44
+ expected_scores:
45
+ random: 0.22
46
+ heuristic: 0.55
47
+ llm_baseline: 0.73
48
+
49
+ - id: hospital_network_crisis
50
+ name: Hospital Network Crisis
51
+ description: Manage a 3-hospital + regional DC network for 12 days with 15 products including life-critical perishables. 5 crisis events — cold chain breach (day 3), supplier force majeure (day 6), MCI activation days 9-11 (blood demand ×3), and mandatory product recall (day 11).
52
+ difficulty: hard
53
+ max_days: 12
54
+ actions_per_shift: 10
55
+ expected_scores:
56
+ random: 0.12
57
+ heuristic: 0.38
58
+ llm_baseline: 0.58
59
+
60
+ action_space:
61
+ type: mcp_tools
62
+ tools:
63
+ - read_inbox
64
+ - query_erp
65
+ - query_supplier
66
+ - query_forecast
67
+ - submit_po
68
+ - transfer
69
+ - quarantine_lot
70
+ - file_justification
71
+ - end_shift
72
+
73
+ observation_space:
74
+ type: text
75
+ description: >
76
+ ASCII text output from simulated legacy MEDSUPPLY ERP v2.1.
77
+ Dashboard shows daily status. Tools return formatted text tables,
78
+ free-text messages, and system confirmations/errors.
pyproject.toml ADDED
@@ -0,0 +1,39 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright (c) Meta Platforms, Inc. and affiliates.
2
+ # All rights reserved.
3
+ #
4
+ # This source code is licensed under the BSD-style license found in the
5
+ # LICENSE file in the root directory of this source tree.
6
+
7
+ [build-system]
8
+ requires = ["setuptools>=45", "wheel"]
9
+ build-backend = "setuptools.build_meta"
10
+
11
+ [project]
12
+ name = "openenv-medchain_env"
13
+ version = "0.1.0"
14
+ description = "Medchain Env environment for OpenEnv"
15
+ requires-python = ">=3.10"
16
+ dependencies = [
17
+ "openenv-core[core]>=0.2.1",
18
+ "fastmcp>=2.0.0",
19
+ "pydantic>=2.0.0",
20
+ "numpy>=1.24.0",
21
+ "uvicorn>=0.24.0",
22
+ "fastapi>=0.115.0",
23
+ ]
24
+
25
+ [project.optional-dependencies]
26
+ dev = [
27
+ "pytest>=8.0.0",
28
+ "pytest-cov>=4.0.0",
29
+ ]
30
+
31
+ [project.scripts]
32
+ # Server entry point - enables running via: uv run --project . server
33
+ # or: python -m medchain_env.server.app
34
+ server = "medchain_env.server.app:main"
35
+
36
+ [tool.setuptools]
37
+ include-package-data = true
38
+ packages = ["medchain_env", "medchain_env.server"]
39
+ package-dir = { "medchain_env" = ".", "medchain_env.server" = "server" }
sample_inference.py ADDED
@@ -0,0 +1,187 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Inference Script Example
3
+ ===================================
4
+ MANDATORY
5
+ - Before submitting, ensure the following variables are defined in your environment configuration:
6
+ API_BASE_URL The API endpoint for the LLM.
7
+ MODEL_NAME The model identifier to use for inference.
8
+ HF_TOKEN Your Hugging Face / API key.
9
+ LOCAL_IMAGE_NAME The name of the local image to use for the environment if you are using from_docker_image()
10
+ method
11
+
12
+ - Defaults are set only for API_BASE_URL and MODEL_NAME
13
+ (and should reflect your active inference setup):
14
+ API_BASE_URL = os.getenv("API_BASE_URL", "<your-active-endpoint>")
15
+ MODEL_NAME = os.getenv("MODEL_NAME", "<your-active-model>")
16
+
17
+ - The inference script must be named `inference.py` and placed in the root directory of the project
18
+ - Participants must use OpenAI Client for all LLM calls using above variables
19
+
20
+ STDOUT FORMAT
21
+ - The script must emit exactly three line types to stdout, in this order:
22
+
23
+ [START] task=<task_name> env=<benchmark> model=<model_name>
24
+ [STEP] step=<n> action=<action_str> reward=<0.00> done=<true|false> error=<msg|null>
25
+ [END] success=<true|false> steps=<n> rewards=<r1,r2,...,rn>
26
+
27
+ Rules:
28
+ - One [START] line at episode begin.
29
+ - One [STEP] line per step, immediately after env.step() returns.
30
+ - One [END] line after env.close(), always emitted (even on exception).
31
+ - reward and rewards are formatted to 2 decimal places.
32
+ - done and success are lowercase booleans: true or false.
33
+ - error is the raw last_action_error string, or null if none.
34
+ - All fields on a single line with no newlines within a line.
35
+
36
+ Example:
37
+ [START] task=click-test env=miniwob model=Qwen3-VL-30B
38
+ [STEP] step=1 action=click('123') reward=0.00 done=false error=null
39
+ [STEP] step=2 action=fill('456','text') reward=0.00 done=false error=null
40
+ [STEP] step=3 action=click('789') reward=1.00 done=true error=null
41
+ [END] success=true steps=3 rewards=0.00,0.00,1.00
42
+ """
43
+
44
+ import asyncio
45
+ import os
46
+ import textwrap
47
+ from typing import List, Optional
48
+
49
+ from openai import OpenAI
50
+
51
+ from my_env_v4 import MyEnvV4Action, MyEnvV4Env
52
+ IMAGE_NAME = os.getenv("IMAGE_NAME") # If you are using docker image
53
+ API_KEY = os.getenv("HF_TOKEN") or os.getenv("API_KEY")
54
+
55
+ API_BASE_URL = os.getenv("API_BASE_URL") or "https://router.huggingface.co/v1"
56
+ MODEL_NAME = os.getenv("MODEL_NAME") or "Qwen/Qwen2.5-72B-Instruct"
57
+ TASK_NAME = os.getenv("MY_ENV_V4_TASK", "echo")
58
+ BENCHMARK = os.getenv("MY_ENV_V4_BENCHMARK", "my_env_v4")
59
+ MAX_STEPS = 8
60
+ TEMPERATURE = 0.7
61
+ MAX_TOKENS = 150
62
+ SUCCESS_SCORE_THRESHOLD = 0.1 # normalized score in [0, 1]
63
+
64
+ # Max possible reward: each token contributes 0.1, across all steps
65
+ _MAX_REWARD_PER_STEP = MAX_TOKENS * 0.1
66
+ MAX_TOTAL_REWARD = MAX_STEPS * _MAX_REWARD_PER_STEP
67
+
68
+ SYSTEM_PROMPT = textwrap.dedent(
69
+ """
70
+ You are interacting with a simple echo environment.
71
+ Each turn you must send a message. The environment will echo it back.
72
+ Reward is proportional to message length: reward = len(message) * 0.1
73
+ Your goal is to maximize total reward by sending meaningful, substantive messages.
74
+ Reply with exactly one message string — no quotes, no prefixes, just the message text.
75
+ """
76
+ ).strip()
77
+
78
+
79
+ def log_start(task: str, env: str, model: str) -> None:
80
+ print(f"[START] task={task} env={env} model={model}", flush=True)
81
+
82
+
83
+ def log_step(step: int, action: str, reward: float, done: bool, error: Optional[str]) -> None:
84
+ error_val = error if error else "null"
85
+ done_val = str(done).lower()
86
+ print(
87
+ f"[STEP] step={step} action={action} reward={reward:.2f} done={done_val} error={error_val}",
88
+ flush=True,
89
+ )
90
+
91
+
92
+ def log_end(success: bool, steps: int, score: float, rewards: List[float]) -> None:
93
+ rewards_str = ",".join(f"{r:.2f}" for r in rewards)
94
+ print(f"[END] success={str(success).lower()} steps={steps} score={score:.3f} rewards={rewards_str}", flush=True)
95
+
96
+
97
+ def build_user_prompt(step: int, last_echoed: str, last_reward: float, history: List[str]) -> str:
98
+ history_block = "\n".join(history[-4:]) if history else "None"
99
+ return textwrap.dedent(
100
+ f"""
101
+ Step: {step}
102
+ Last echoed message: {last_echoed!r}
103
+ Last reward: {last_reward:.2f}
104
+ Previous steps:
105
+ {history_block}
106
+ Send your next message.
107
+ """
108
+ ).strip()
109
+
110
+
111
+ def get_model_message(client: OpenAI, step: int, last_echoed: str, last_reward: float, history: List[str]) -> str:
112
+ user_prompt = build_user_prompt(step, last_echoed, last_reward, history)
113
+ try:
114
+ completion = client.chat.completions.create(
115
+ model=MODEL_NAME,
116
+ messages=[
117
+ {"role": "system", "content": SYSTEM_PROMPT},
118
+ {"role": "user", "content": user_prompt},
119
+ ],
120
+ temperature=TEMPERATURE,
121
+ max_tokens=MAX_TOKENS,
122
+ stream=False,
123
+ )
124
+ text = (completion.choices[0].message.content or "").strip()
125
+ return text if text else "hello"
126
+ except Exception as exc:
127
+ print(f"[DEBUG] Model request failed: {exc}", flush=True)
128
+ return "hello"
129
+
130
+
131
+ async def main() -> None:
132
+ client = OpenAI(base_url=API_BASE_URL, api_key=API_KEY)
133
+
134
+ env = await MyEnvV4Env.from_docker_image(IMAGE_NAME)
135
+
136
+ history: List[str] = []
137
+ rewards: List[float] = []
138
+ steps_taken = 0
139
+ score = 0.0
140
+ success = False
141
+
142
+ log_start(task=TASK_NAME, env=BENCHMARK, model=MODEL_NAME)
143
+
144
+ try:
145
+ result = await env.reset() # OpenENV.reset()
146
+ last_echoed = result.observation.echoed_message
147
+ last_reward = 0.0
148
+
149
+ for step in range(1, MAX_STEPS + 1):
150
+ if result.done:
151
+ break
152
+
153
+ message = get_model_message(client, step, last_echoed, last_reward, history)
154
+
155
+ result = await env.step(MyEnvV4Action(message=message))
156
+ obs = result.observation
157
+
158
+ reward = result.reward or 0.0
159
+ done = result.done
160
+ error = None
161
+
162
+ rewards.append(reward)
163
+ steps_taken = step
164
+ last_echoed = obs.echoed_message
165
+ last_reward = reward
166
+
167
+ log_step(step=step, action=message, reward=reward, done=done, error=error)
168
+
169
+ history.append(f"Step {step}: {message!r} -> reward {reward:+.2f}")
170
+
171
+ if done:
172
+ break
173
+
174
+ score = sum(rewards) / MAX_TOTAL_REWARD if MAX_TOTAL_REWARD > 0 else 0.0
175
+ score = min(max(score, 0.0), 1.0) # clamp to [0, 1]
176
+ success = score >= SUCCESS_SCORE_THRESHOLD
177
+
178
+ finally:
179
+ try:
180
+ await env.close()
181
+ except Exception as e:
182
+ print(f"[DEBUG] env.close() error (container cleanup): {e}", flush=True)
183
+ log_end(success=success, steps=steps_taken, score=score, rewards=rewards)
184
+
185
+
186
+ if __name__ == "__main__":
187
+ asyncio.run(main())
server/Dockerfile ADDED
@@ -0,0 +1,80 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright (c) Meta Platforms, Inc. and affiliates.
2
+ # All rights reserved.
3
+ #
4
+ # This source code is licensed under the BSD-style license found in the
5
+ # LICENSE file in the root directory of this source tree.
6
+
7
+ # Multi-stage build using openenv-base
8
+ # This Dockerfile is flexible and works for both:
9
+ # - In-repo environments (with local OpenEnv sources)
10
+ # - Standalone environments (with openenv from PyPI/Git)
11
+ # The build script (openenv build) handles context detection and sets appropriate build args.
12
+
13
+ ARG BASE_IMAGE=ghcr.io/meta-pytorch/openenv-base:latest
14
+ FROM ${BASE_IMAGE} AS builder
15
+
16
+ WORKDIR /app
17
+
18
+ # Ensure git is available (required for installing dependencies from VCS)
19
+ RUN apt-get update && \
20
+ apt-get install -y --no-install-recommends git && \
21
+ rm -rf /var/lib/apt/lists/*
22
+
23
+ # Build argument to control whether we're building standalone or in-repo
24
+ ARG BUILD_MODE=in-repo
25
+ ARG ENV_NAME=medchain_env
26
+
27
+ # Copy environment code (always at root of build context)
28
+ COPY . /app/env
29
+
30
+ # For in-repo builds, openenv is already vendored in the build context
31
+ # For standalone builds, openenv will be installed via pyproject.toml
32
+ WORKDIR /app/env
33
+
34
+ # Ensure uv is available (for local builds where base image lacks it)
35
+ RUN if ! command -v uv >/dev/null 2>&1; then \
36
+ curl -LsSf https://astral.sh/uv/install.sh | sh && \
37
+ mv /root/.local/bin/uv /usr/local/bin/uv && \
38
+ mv /root/.local/bin/uvx /usr/local/bin/uvx; \
39
+ fi
40
+
41
+ # Install dependencies using uv sync
42
+ # If uv.lock exists, use it; otherwise resolve on the fly
43
+ RUN --mount=type=cache,target=/root/.cache/uv \
44
+ if [ -f uv.lock ]; then \
45
+ uv sync --frozen --no-install-project --no-editable; \
46
+ else \
47
+ uv sync --no-install-project --no-editable; \
48
+ fi
49
+
50
+ RUN --mount=type=cache,target=/root/.cache/uv \
51
+ if [ -f uv.lock ]; then \
52
+ uv sync --frozen --no-editable; \
53
+ else \
54
+ uv sync --no-editable; \
55
+ fi
56
+
57
+ # Final runtime stage
58
+ FROM ${BASE_IMAGE}
59
+
60
+ WORKDIR /app
61
+
62
+ # Copy the virtual environment from builder
63
+ COPY --from=builder /app/env/.venv /app/.venv
64
+
65
+ # Copy the environment code
66
+ COPY --from=builder /app/env /app/env
67
+
68
+ # Set PATH to use the virtual environment
69
+ ENV PATH="/app/.venv/bin:$PATH"
70
+
71
+ # Set PYTHONPATH so imports work correctly
72
+ ENV PYTHONPATH="/app/env:$PYTHONPATH"
73
+
74
+ # Health check
75
+ HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
76
+ CMD curl -f http://localhost:8000/health || exit 1
77
+
78
+ # Run the FastAPI server
79
+ # The module path is constructed to work with the /app/env structure
80
+ CMD ["sh", "-c", "cd /app/env && uvicorn server.app:app --host 0.0.0.0 --port 8000"]
server/__init__.py ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ """MedChain Env environment server components."""
2
+
3
+ from .medchain_env_environment import MedchainEnvironment
4
+
5
+ __all__ = ["MedchainEnvironment"]
server/app.py ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ FastAPI application for the MedChain Env environment.
3
+
4
+ Environment Variables:
5
+ MEDCHAIN_TASK: Task to serve (default: single_ward_stable).
6
+ Options: orientation_ward, single_ward_stable, multi_ward_seasonal, hospital_network_crisis
7
+
8
+ Usage:
9
+ uvicorn server.app:app --host 0.0.0.0 --port 8000
10
+ """
11
+
12
+ import os
13
+
14
+ from openenv.core.env_server.http_server import create_app
15
+ from openenv.core.env_server.mcp_types import CallToolAction
16
+
17
+ try:
18
+ from .medchain_env_environment import MedchainEnvironment
19
+ from ..models import MedchainToolObservation
20
+ except ImportError:
21
+ from server.medchain_env_environment import MedchainEnvironment
22
+ from models import MedchainToolObservation
23
+
24
+ TASK = os.environ.get("MEDCHAIN_TASK", "single_ward_stable")
25
+
26
+
27
+ def _env_factory():
28
+ """Create a new MedchainEnvironment instance for each client session."""
29
+ return MedchainEnvironment(task=TASK)
30
+
31
+
32
+ app = create_app(
33
+ _env_factory,
34
+ CallToolAction,
35
+ MedchainToolObservation,
36
+ env_name="medchain_env",
37
+ )
38
+
39
+
40
+ def main(host: str = "0.0.0.0", port: int = 8000):
41
+ import uvicorn
42
+ uvicorn.run(app, host=host, port=port)
43
+
44
+
45
+ if __name__ == "__main__":
46
+ import argparse
47
+
48
+ parser = argparse.ArgumentParser()
49
+ parser.add_argument("--port", type=int, default=8000)
50
+ args = parser.parse_args()
51
+ main(port=args.port) # Bypass multi node deployment requirement in openenv: main()
server/erp_formatter.py ADDED
@@ -0,0 +1,379 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ ASCII table formatters for the MEDSUPPLY ERP v2.1 interface.
3
+
4
+ All functions receive SimState + TaskConfig and return plain text strings
5
+ for display in an LLM chat interface.
6
+ """
7
+
8
+ from __future__ import annotations
9
+
10
+ import math
11
+ from typing import TYPE_CHECKING
12
+
13
+ if TYPE_CHECKING:
14
+ from .simulation import SimState
15
+ from .tasks import Product, Supplier, TaskConfig
16
+
17
+ _W = 70 # default box width
18
+
19
+
20
+ def _box(lines: list[str], width: int = _W) -> str:
21
+ """Wrap a list of content strings in a simple Unicode box."""
22
+ inner_width = width - 2
23
+ top = "╔" + "═" * inner_width + "╗"
24
+ mid = "╠" + "═" * inner_width + "╣"
25
+ bottom = "╚" + "═" * inner_width + "╝"
26
+
27
+ rows = [top]
28
+ for i, line in enumerate(lines):
29
+ if line == "---DIVIDER---":
30
+ rows.append(mid)
31
+ else:
32
+ padded = ("║ " + line).ljust(inner_width + 1) + "║"
33
+ rows.append(padded)
34
+ rows.append(bottom)
35
+ return "\n".join(rows)
36
+
37
+
38
+ def _sep(width: int = _W) -> str:
39
+ return "-" * width
40
+
41
+
42
+ def _status(days_left) -> str:
43
+ if days_left is None:
44
+ return "NON-PERISH"
45
+ if days_left <= 0:
46
+ return "*** EXPIRED ***"
47
+ if days_left <= 3:
48
+ return "WARN_CRITICAL"
49
+ if days_left <= 7:
50
+ return "WARN_LOW"
51
+ return "OK"
52
+
53
+
54
+ # ─── Dashboard ───────────────────────────────────────────────────────────────
55
+
56
+ def format_dashboard(state: "SimState", task_config: "TaskConfig") -> str:
57
+ """
58
+ Returns the ERP dashboard shown at start of each shift.
59
+ """
60
+ unread = sum(1 for m in state.inbox if not m.read)
61
+ in_transit = sum(1 for po in state.pipeline_orders if po.status == "in_transit")
62
+
63
+ # Expiry warning: any lot expiring within 7 days
64
+ expiry_warning = False
65
+ for lots in state.inventory.values():
66
+ for lot in lots:
67
+ if lot.expiry_day is not None and lot.lot_id not in state.quarantined_lots:
68
+ days_left = lot.expiry_day - state.day
69
+ if days_left <= 7:
70
+ expiry_warning = True
71
+ break
72
+
73
+ pager_icon = "[!]" if unread > 0 else "[·]"
74
+ pager_msg = f"{unread} unread message(s)" if unread > 0 else "No new messages"
75
+ expiry_icon = "[!]" if expiry_warning else "[·]"
76
+ expiry_msg = "Expiry warnings present — check INVDB" if expiry_warning else "No expiry warnings"
77
+ pipeline_icon = "[·]" if in_transit == 0 else "[·]"
78
+
79
+ supplier_lines = [
80
+ f" {s.supplier_id} → {', '.join(s.products)}"
81
+ for s in task_config.suppliers
82
+ ]
83
+
84
+ lines = [
85
+ " MEDSUPPLY ERP v2.1 — CENTRAL HOSPITAL NETWORK",
86
+ f" Task: {task_config.name} | Shift: Day {state.day} of {state.max_days}",
87
+ f" Actions remaining: {state.actions_remaining}/{state.actions_per_shift}",
88
+ f" Budget used: ${state.budget_used:,.0f} / ${state.budget_limit:,.0f}",
89
+ "---DIVIDER---",
90
+ f" {pager_icon} COMMS PAGER: {pager_msg}",
91
+ f" {expiry_icon} INVDB: {expiry_msg}",
92
+ f" {pipeline_icon} PROCURENET: {in_transit} order(s) in transit",
93
+ "---DIVIDER---",
94
+ " SUPPLIERS (use exact IDs below):",
95
+ ] + supplier_lines
96
+ box = _box(lines)
97
+ tools = "read_inbox, query_erp, query_supplier, query_forecast, submit_po, transfer, quarantine_lot, file_justification, end_shift"
98
+ return box + "\nAwaiting input.\nAvailable tools: " + tools
99
+
100
+
101
+ # ─── Inventory Table ─────────────────────────────────────────────────────────
102
+
103
+ def format_inventory_table(
104
+ state: "SimState",
105
+ task_config: "TaskConfig",
106
+ location: str,
107
+ sku: str,
108
+ ) -> str:
109
+ loc_filter = location.lower()
110
+ sku_filter = sku.lower()
111
+
112
+ loc_label = location.upper() if loc_filter != "all" else "ALL"
113
+ sku_label = sku.upper() if sku_filter != "all" else "ALL"
114
+
115
+ header = [
116
+ f"SYSTEM QUERY RESULT [TABLE: INVENTORY] [LOC: {loc_label}] [SKU: {sku_label}]",
117
+ f"[TIMESTAMP: Day {state.day}]",
118
+ ]
119
+ sep = _sep()
120
+ col_header = f"{'LOT_ID':<22} | {'DESC':<24} | {'QTY':>5} | {'EXP_DAY':>7} | {'DAYS_LEFT':>9} | STATUS"
121
+ rows = []
122
+
123
+ product_map = {p.product_id: p for p in task_config.products}
124
+
125
+ for (loc_id, product_id), lots in sorted(state.inventory.items()):
126
+ if loc_filter != "all" and loc_id.lower() != loc_filter:
127
+ continue
128
+ if sku_filter != "all" and product_id.lower() != sku_filter:
129
+ continue
130
+
131
+ product = product_map.get(product_id)
132
+ desc = product.name if product else product_id
133
+
134
+ for lot in lots:
135
+ if lot.qty == 0:
136
+ continue
137
+ is_quarantined = lot.lot_id in state.quarantined_lots
138
+ days_left = (lot.expiry_day - state.day) if lot.expiry_day is not None else None
139
+ status = "[QUARANTINED]" if is_quarantined else _status(days_left)
140
+ exp_str = f"{lot.expiry_day:07d}" if lot.expiry_day is not None else "N/A "
141
+ dl_str = f"{days_left:09d}" if days_left is not None else "N/A "
142
+ row = f"{lot.lot_id:<22} | {desc[:24]:<24} | {lot.qty:>5} | {exp_str:>7} | {dl_str:>9} | {status}"
143
+ rows.append(row)
144
+
145
+ lines = header + [sep, col_header, sep]
146
+ if rows:
147
+ lines += rows
148
+ else:
149
+ lines.append("(no stock found)")
150
+ lines += [sep, f"QUERY OK | {len(rows)} row(s) returned"]
151
+ return "\n".join(lines)
152
+
153
+
154
+ # ─── Expiry Table ─────────────────────────────────────────────────────────────
155
+
156
+ def format_expiry_table(
157
+ state: "SimState",
158
+ task_config: "TaskConfig",
159
+ location: str,
160
+ sku: str,
161
+ ) -> str:
162
+ loc_filter = location.lower()
163
+ sku_filter = sku.lower()
164
+
165
+ header = [
166
+ "SYSTEM QUERY RESULT [TABLE: EXPIRY] [Lots expiring within 14 days]",
167
+ f"[TIMESTAMP: Day {state.day}]",
168
+ ]
169
+ sep = _sep()
170
+ col_header = f"{'LOT_ID':<22} | {'LOC':<16} | {'SKU':<12} | {'QTY':>5} | {'EXP_DAY':>7} | {'DAYS_LEFT':>9} | STATUS"
171
+ rows = []
172
+
173
+ product_map = {p.product_id: p for p in task_config.products}
174
+
175
+ for (loc_id, product_id), lots in sorted(state.inventory.items()):
176
+ if loc_filter != "all" and loc_id.lower() != loc_filter:
177
+ continue
178
+ if sku_filter != "all" and product_id.lower() != sku_filter:
179
+ continue
180
+ for lot in lots:
181
+ if lot.qty == 0:
182
+ continue
183
+ if lot.expiry_day is None:
184
+ continue
185
+ days_left = lot.expiry_day - state.day
186
+ if days_left > 14:
187
+ continue
188
+ is_quarantined = lot.lot_id in state.quarantined_lots
189
+ status = "[QUARANTINED]" if is_quarantined else _status(days_left)
190
+ row = (
191
+ f"{lot.lot_id:<22} | {loc_id:<16} | {product_id:<12} | "
192
+ f"{lot.qty:>5} | {lot.expiry_day:>7} | {days_left:>9} | {status}"
193
+ )
194
+ rows.append(row)
195
+
196
+ lines = header + [sep, col_header, sep]
197
+ if rows:
198
+ lines += rows
199
+ else:
200
+ lines.append("(no expiry warnings)")
201
+ lines += [sep, f"QUERY OK | {len(rows)} row(s) returned"]
202
+ return "\n".join(lines)
203
+
204
+
205
+ # ─── Pipeline Orders Table ────────────────────────────────────────────────────
206
+
207
+ def format_pipeline_table(
208
+ state: "SimState",
209
+ location: str,
210
+ sku: str,
211
+ ) -> str:
212
+ loc_filter = location.lower()
213
+ sku_filter = sku.lower()
214
+
215
+ header = ["SYSTEM QUERY RESULT [TABLE: PIPELINE_ORDERS]", f"[TIMESTAMP: Day {state.day}]"]
216
+ sep = _sep()
217
+ col_header = f"{'PO_ID':<9} | {'SUPPLIER':<12} | {'SKU':<12} | {'DESTINATION':<16} | {'QTY':>5} | {'PRIORITY':<10} | {'ETA':>5} | STATUS"
218
+ rows = []
219
+
220
+ for po in state.pipeline_orders:
221
+ if loc_filter != "all" and po.destination_id.lower() != loc_filter:
222
+ continue
223
+ if sku_filter != "all" and po.product_id.lower() != sku_filter:
224
+ continue
225
+ status = po.status.upper().replace("_", " ")
226
+ row = (
227
+ f"{po.po_id:<9} | {po.supplier_id:<12} | {po.product_id:<12} | "
228
+ f"{po.destination_id:<16} | {po.quantity:>5} | {po.priority:<10} | "
229
+ f"D-{po.eta_day:02d} | {status}"
230
+ )
231
+ rows.append(row)
232
+
233
+ lines = header + [sep, col_header, sep]
234
+ if rows:
235
+ lines += rows
236
+ else:
237
+ lines.append("(no orders in pipeline)")
238
+ lines += [sep, f"QUERY OK | {len(rows)} row(s) returned"]
239
+ return "\n".join(lines)
240
+
241
+
242
+ # ─── Demand History Table ─────────────────────────────────────────────────────
243
+
244
+ def format_demand_history(
245
+ state: "SimState",
246
+ task_config: "TaskConfig",
247
+ location: str,
248
+ sku: str,
249
+ ) -> str:
250
+ loc_filter = location.lower()
251
+ sku_filter = sku.lower()
252
+
253
+ loc_label = location.upper() if loc_filter != "all" else "ALL"
254
+ sku_label = sku.upper() if sku_filter != "all" else "ALL"
255
+
256
+ header = [
257
+ f"SYSTEM QUERY RESULT [TABLE: DEMAND_HISTORY] [LOC: {loc_label}] [SKU: {sku_label}]",
258
+ f"[TIMESTAMP: Day {state.day}] [Days 1–{state.day - 1} recorded]",
259
+ ]
260
+ sep = _sep()
261
+
262
+ if loc_filter != "all" and sku_filter != "all":
263
+ # Specific (location, product) pair
264
+ key = (location, sku)
265
+ demands = state.daily_product_demand.get(key, [])
266
+ fulfilleds = state.daily_product_fulfilled.get(key, [])
267
+ col_header = f"{'DAY':>4} | {'DEMAND':>6} | {'FULFILLED':>9} | SVC_LVL"
268
+ rows = []
269
+ for i, (d, f) in enumerate(zip(demands, fulfilleds)):
270
+ svc = f / max(d, 1)
271
+ rows.append(f"{i + 1:04d} | {d:>6} | {f:>9} | {svc:.2f}")
272
+ lines = header + [sep, col_header, sep]
273
+ lines += rows if rows else ["(no history yet)"]
274
+ else:
275
+ # Aggregate across matching keys
276
+ col_header = f"{'DAY':>4} | {'DEMAND':>8} | {'FULFILLED':>9} | SVC_LVL"
277
+ num_days = len(state.daily_demand)
278
+ rows = []
279
+ for i in range(num_days):
280
+ d = state.daily_demand[i]
281
+ f = state.daily_fulfilled[i]
282
+ svc = f / max(d, 1)
283
+ rows.append(f"{i + 1:04d} | {int(d):>8} | {int(f):>9} | {svc:.2f}")
284
+ lines = header + [sep, col_header, sep]
285
+ lines += rows if rows else ["(no history yet)"]
286
+
287
+ lines += [sep, f"QUERY OK | {len(rows)} row(s) returned"]
288
+ return "\n".join(lines)
289
+
290
+
291
+ # ─── Supplier Info ────────────────────────────────────────────────────────────
292
+
293
+ def format_supplier_info(
294
+ supplier: "Supplier",
295
+ effective_lead_time: int,
296
+ disruption_note: str,
297
+ ) -> str:
298
+ sep = _sep()
299
+ products_str = ", ".join(supplier.products)
300
+ lines = [
301
+ f"SUPPLIER INFO: {supplier.supplier_id}",
302
+ sep,
303
+ f"Name: {supplier.name}",
304
+ f"Status: ACTIVE",
305
+ f"Lead Time: {effective_lead_time} days (effective)",
306
+ f"Base Lead: {supplier.base_lead_time} days",
307
+ f"Cost Mult: {supplier.cost_multiplier:.1f}× base price",
308
+ f"Products: {products_str}",
309
+ f"Notes: {disruption_note}",
310
+ sep,
311
+ ]
312
+ return "\n".join(lines)
313
+
314
+
315
+ # ─── Forecast Table ───────────────────────────────────────────────────────────
316
+
317
+ def format_forecast(
318
+ state: "SimState",
319
+ task_config: "TaskConfig",
320
+ product: "Product",
321
+ location_id: str,
322
+ horizon_days: int,
323
+ ) -> str:
324
+ import math
325
+
326
+ sku_label = product.product_id.upper()
327
+ loc_label = location_id.upper() if location_id != "all" else "ALL"
328
+
329
+ header = [
330
+ f"SYSTEM QUERY RESULT [TABLE: FORECAST] [SKU: {sku_label}] [LOC: {loc_label}]",
331
+ f"[Forecast from Day {state.day} for {horizon_days} day(s)]",
332
+ ]
333
+ sep = _sep()
334
+ col_header = f"{'DAY':>4} | {'FORECAST_DEMAND':>15} | NOTES"
335
+
336
+ rows = []
337
+ locations_to_forecast = (
338
+ [location_id] if location_id != "all" else product.locations
339
+ )
340
+
341
+ for loc in locations_to_forecast:
342
+ for offset in range(horizon_days):
343
+ future_day = state.day + offset
344
+ base = product.base_demand
345
+
346
+ # Seasonal component (deterministic mean — no noise)
347
+ if product.seasonal_amplitude > 0 and product.seasonal_period > 0:
348
+ seasonal = product.seasonal_amplitude * math.sin(
349
+ 2 * math.pi * future_day / product.seasonal_period + product.seasonal_phase
350
+ )
351
+ base *= (1 + seasonal)
352
+
353
+ notes = []
354
+
355
+ # Apply active + future event effects
356
+ for event in task_config.events:
357
+ event_active_on_day = (
358
+ event.trigger_day <= future_day <= event.trigger_day + event.duration_days - 1
359
+ if event.duration_days > 0
360
+ else event.trigger_day == future_day
361
+ )
362
+ if event_active_on_day:
363
+ if event.event_type == "demand_surge":
364
+ if product.product_id in event.params.get("products", []):
365
+ base *= event.params.get("multiplier", 1.4)
366
+ notes.append("SURGE_ACTIVE")
367
+ elif event.event_type == "mci":
368
+ if product.criticality in ("CRITICAL", "HIGH") and loc in event.params.get("locations", []):
369
+ base *= event.params.get("demand_multiplier", 3.0)
370
+ notes.append("MCI_ACTIVE")
371
+
372
+ notes_str = ", ".join(notes) if notes else ""
373
+ prefix = f"{loc.upper()}:" if location_id == "all" and len(locations_to_forecast) > 1 else ""
374
+ rows.append(f"{future_day:04d} | {int(round(base)):>15} | {prefix}{notes_str}")
375
+
376
+ lines = header + [sep, col_header, sep]
377
+ lines += rows if rows else ["(no forecast data)"]
378
+ lines += [sep, "Forecast based on historical demand mean. Active alerts applied."]
379
+ return "\n".join(lines)
server/grader.py ADDED
@@ -0,0 +1,225 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Deterministic terminal reward computation for the MedChain Env environment.
3
+
4
+ Two reward streams exist:
5
+ - Per-step shaping rewards (in medchain_env_environment.py)
6
+ - Terminal score on the final end_shift() call — this module
7
+
8
+ All formulas are deterministic — no LLM judge.
9
+ """
10
+
11
+ from __future__ import annotations
12
+
13
+ from typing import TYPE_CHECKING, Dict, List, Set
14
+
15
+ if TYPE_CHECKING:
16
+ from .simulation import SimState
17
+ from .tasks import TaskConfig
18
+
19
+
20
+ def compute_reward(state: "SimState", task_config: "TaskConfig") -> float:
21
+ """Dispatch to task-specific terminal scorer."""
22
+ if task_config.name == "orientation_ward":
23
+ return compute_reward_task0(state, task_config)
24
+ elif task_config.name == "single_ward_stable":
25
+ return compute_reward_task1(state, task_config)
26
+ elif task_config.name == "multi_ward_seasonal":
27
+ return compute_reward_task2(state, task_config)
28
+ elif task_config.name == "hospital_network_crisis":
29
+ return compute_reward_task3(state, task_config)
30
+ return 0.0
31
+
32
+
33
+ def compute_reward_task0(state: "SimState", task_config: "TaskConfig") -> float:
34
+ """
35
+ Intro task: score = 0.70 × service_level + 0.30 × ordered_at_least_once
36
+ Rewards reading the situation and placing at least one replenishment order.
37
+ """
38
+ if not state.daily_demand:
39
+ return 0.0
40
+
41
+ total_demand = sum(state.daily_demand)
42
+ total_fulfilled = sum(state.daily_fulfilled)
43
+ service_level = total_fulfilled / max(total_demand, 1)
44
+
45
+ ordered = 1.0 if state.pipeline_orders or state.total_spend > 0 else 0.0
46
+
47
+ return min(1.0, 0.70 * service_level + 0.30 * ordered)
48
+
49
+
50
+ def compute_reward_task1(state: "SimState", task_config: "TaskConfig") -> float:
51
+ """
52
+ score = 0.50 × avg_service_level + 0.50 × cost_efficiency_vs_benchmark
53
+ """
54
+ if not state.daily_demand:
55
+ return 0.0
56
+
57
+ total_demand = sum(state.daily_demand)
58
+ total_fulfilled = sum(state.daily_fulfilled)
59
+ service_level = total_fulfilled / max(total_demand, 1)
60
+
61
+ avg_unit_cost = (
62
+ sum(p.unit_cost * p.base_demand for p in task_config.products)
63
+ / max(sum(p.base_demand for p in task_config.products), 1)
64
+ )
65
+ benchmark_spend = total_fulfilled * avg_unit_cost * 1.15
66
+ actual_spend = state.total_spend
67
+ if actual_spend <= 0:
68
+ cost_efficiency = 0.0
69
+ else:
70
+ cost_efficiency = min(1.0, benchmark_spend / actual_spend)
71
+
72
+ return 0.50 * service_level + 0.50 * cost_efficiency
73
+
74
+
75
+ def compute_reward_task2(state: "SimState", task_config: "TaskConfig") -> float:
76
+ """
77
+ score = 0.40 × avg_service_level
78
+ + 0.35 × cost_efficiency
79
+ + 0.15 × capacity_score
80
+ + 0.10 × transfer_efficiency
81
+ """
82
+ if not state.daily_demand:
83
+ return 0.0
84
+
85
+ total_demand = sum(state.daily_demand)
86
+ total_fulfilled = sum(state.daily_fulfilled)
87
+ service_level = total_fulfilled / max(total_demand, 1)
88
+
89
+ avg_unit_cost = (
90
+ sum(p.unit_cost * p.base_demand for p in task_config.products)
91
+ / max(sum(p.base_demand for p in task_config.products), 1)
92
+ )
93
+ benchmark_spend = total_fulfilled * avg_unit_cost * 1.2
94
+ cost_efficiency = min(1.0, benchmark_spend / max(state.total_spend, 0.01))
95
+
96
+ total_days = len(state.daily_demand)
97
+ capacity_score = max(0.0, 1.0 - state.capacity_violation_days / max(total_days, 1))
98
+
99
+ avg_transfers_per_day = state.transfer_count / max(total_days, 1)
100
+ transfer_efficiency = max(0.0, 1.0 - max(0.0, avg_transfers_per_day - 10) / 10.0)
101
+
102
+ return (
103
+ 0.40 * service_level
104
+ + 0.35 * cost_efficiency
105
+ + 0.15 * capacity_score
106
+ + 0.10 * transfer_efficiency
107
+ )
108
+
109
+
110
+ def compute_reward_task3(state: "SimState", task_config: "TaskConfig") -> float:
111
+ """
112
+ score = 0.35 × avg_service_level
113
+ + 0.25 × cost_efficiency
114
+ + 0.20 × (1 - critical_stockout_rate)
115
+ + 0.15 × (1 - waste_fraction)
116
+ + 0.05 × crisis_response_score
117
+ - justification_penalty (capped at 0.15)
118
+ """
119
+ if not state.daily_demand:
120
+ return 0.0
121
+
122
+ total_demand = sum(state.daily_demand)
123
+ total_fulfilled = sum(state.daily_fulfilled)
124
+ service_level = total_fulfilled / max(total_demand, 1)
125
+
126
+ avg_unit_cost = (
127
+ sum(p.unit_cost * p.base_demand for p in task_config.products)
128
+ / max(sum(p.base_demand for p in task_config.products), 1)
129
+ )
130
+ benchmark_spend = total_fulfilled * avg_unit_cost * 1.2
131
+ cost_efficiency = min(1.0, benchmark_spend / max(state.total_spend, 0.01))
132
+
133
+ total_crit_dem = sum(state.daily_critical_demand)
134
+ total_crit_ful = sum(state.daily_critical_fulfilled)
135
+ critical_service = total_crit_ful / max(total_crit_dem, 1)
136
+ critical_stockout_rate = 1.0 - critical_service
137
+
138
+ waste_fraction = min(1.0, state.total_wasted_value / max(state.total_spend, 0.01))
139
+
140
+ crisis_response_score = _compute_crisis_response_score(state, task_config)
141
+
142
+ incoherent_count = sum(1 for r in state.justification_log if not r.is_coherent)
143
+ justification_penalty = min(0.15, incoherent_count * 0.05)
144
+
145
+ score = (
146
+ 0.35 * service_level
147
+ + 0.25 * cost_efficiency
148
+ + 0.20 * (1.0 - critical_stockout_rate)
149
+ + 0.15 * (1.0 - waste_fraction)
150
+ + 0.05 * crisis_response_score
151
+ - justification_penalty
152
+ )
153
+ return max(0.0, min(1.0, score))
154
+
155
+
156
+ def _compute_crisis_response_score(
157
+ state: "SimState",
158
+ task_config: "TaskConfig",
159
+ ) -> float:
160
+ """
161
+ Measures crisis response for MCI and recall events.
162
+ Returns 0.0 to 1.0.
163
+ """
164
+ score = 0.0
165
+ max_score = 0.0
166
+
167
+ mci_event = next((e for e in task_config.events if e.event_id == "mci_activation"), None)
168
+ if mci_event:
169
+ max_score += 0.6
170
+ total_crit_dem = sum(state.daily_critical_demand)
171
+ total_crit_ful = sum(state.daily_critical_fulfilled)
172
+ mci_service = total_crit_ful / max(total_crit_dem, 1)
173
+ score += 0.6 * mci_service
174
+
175
+ recall_event = next((e for e in task_config.events if e.event_id == "iv_saline_recall"), None)
176
+ if recall_event:
177
+ max_score += 0.4
178
+ if state.recall_handled_by_day is not None:
179
+ days_delayed = state.recall_handled_by_day - recall_event.trigger_day
180
+ if days_delayed <= 0:
181
+ score += 0.4
182
+ elif days_delayed <= 2:
183
+ score += 0.2
184
+
185
+ if max_score == 0:
186
+ return 1.0
187
+
188
+ return score / max_score
189
+
190
+
191
+ def grade_justification(reason: str, active_event_types: Set[str]) -> bool:
192
+ """
193
+ Deterministic keyword-based justification grading.
194
+ Returns True if coherent (no penalty), False if incoherent.
195
+ """
196
+ CRISIS_KEYWORDS: Dict[str, List[str]] = {
197
+ "mci": ["mci", "mass casualty", "trauma", "incident", "accident",
198
+ "emergency", "casualties", "blood", "critical patients"],
199
+ "supplier_disruption": ["disruption", "delay", "lead time", "supplier",
200
+ "shortage", "force majeure", "extended"],
201
+ "product_recall": ["recall", "quarantine", "contamination", "lot",
202
+ "health authority", "batch", "defective", "compromised"],
203
+ "budget_tighten": ["budget", "fiscal", "quarter", "constraint",
204
+ "ceiling", "limit", "finance"],
205
+ "cold_chain_breach": ["cold chain", "temperature", "breach",
206
+ "refriger", "spoilage", "compromised"],
207
+ "demand_surge": ["demand", "surge", "increased", "elevated",
208
+ "high usage", "outbreak", "flu", "influenza"],
209
+ }
210
+ GENERIC_KEYWORDS = [
211
+ "urgent", "critical", "shortage", "low stock",
212
+ "stockout", "emergency", "insufficient",
213
+ ]
214
+
215
+ reason_lower = reason.lower()
216
+
217
+ if not active_event_types:
218
+ return any(kw in reason_lower for kw in GENERIC_KEYWORDS)
219
+
220
+ for event_type in active_event_types:
221
+ keywords = CRISIS_KEYWORDS.get(event_type, [])
222
+ if any(kw in reason_lower for kw in keywords):
223
+ return True
224
+
225
+ return any(kw in reason_lower for kw in GENERIC_KEYWORDS)
server/medchain_env_environment.py ADDED
@@ -0,0 +1,382 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ MedChain Env — OpenEnv Environment implementation.
3
+
4
+ Tool dispatch is delegated to an internal _MedchainMCPDelegate (MCPEnvironment)
5
+ to retain FastMCP schema generation and MCP-level argument validation.
6
+ MedchainEnvironment controls reward computation and returns MedchainToolObservation
7
+ — a plain Pydantic model — so the reward field survives the WebSocket
8
+ serialization round-trip intact.
9
+ """
10
+
11
+ import uuid
12
+ from typing import Any, Optional
13
+
14
+ from fastmcp import FastMCP
15
+ from openenv.core.env_server.interfaces import Environment
16
+ from openenv.core.env_server.mcp_environment import MCPEnvironment
17
+ from openenv.core.env_server.mcp_types import (
18
+ CallToolAction,
19
+ CallToolObservation,
20
+ ListToolsAction,
21
+ ListToolsObservation,
22
+ )
23
+ from openenv.core.env_server.types import Action, Observation
24
+
25
+ try:
26
+ from ..models import AVAILABLE_TOOLS, MedchainState, MedObservation, MedchainToolObservation
27
+ from .simulation import MedchainSimulation
28
+ from .tasks import get_task_config
29
+ except ImportError:
30
+ from models import AVAILABLE_TOOLS, MedchainState, MedObservation, MedchainToolObservation
31
+ from server.simulation import MedchainSimulation
32
+ from server.tasks import get_task_config
33
+
34
+
35
+ class _MedchainMCPDelegate(MCPEnvironment):
36
+ """
37
+ Thin MCPEnvironment wrapper that holds the FastMCP server.
38
+
39
+ Provides _handle_call_tool() and _handle_list_tools() to
40
+ MedchainEnvironment without any reward logic.
41
+ """
42
+
43
+ def reset(self, **kwargs: Any) -> Observation: # type: ignore[override]
44
+ return Observation(done=False, reward=0.0)
45
+
46
+ def _step_impl(self, action: Action, **kwargs: Any) -> Observation:
47
+ return Observation(done=False, reward=0.0)
48
+
49
+ @property
50
+ def state(self) -> MedchainState:
51
+ return MedchainState()
52
+
53
+
54
+ class MedchainEnvironment(Environment):
55
+ """
56
+ Hospital supply chain management environment.
57
+
58
+ The agent operates a simulated legacy ERP system (MEDSUPPLY v2.1) with 9 tools.
59
+ Each call to end_shift() advances the simulation by one day.
60
+ The episode terminates after max_days with a terminal reward in [0, 1].
61
+ """
62
+
63
+ SUPPORTS_CONCURRENT_SESSIONS: bool = True
64
+
65
+ def __init__(self, task: str = "single_ward_stable"):
66
+ self._sim = MedchainSimulation(get_task_config(task))
67
+ self._step_count = 0
68
+ self._state = MedchainState()
69
+
70
+ # ── Register 9 MCP tools ──────────────────────────────────────────
71
+
72
+ mcp = FastMCP("medchain_env")
73
+
74
+ @mcp.tool
75
+ def read_inbox(filter: str = "unread") -> str:
76
+ """
77
+ Read messages from the COMMS PAGER inbox.
78
+
79
+ Args:
80
+ filter: Message filter — 'unread' (default), 'all', or 'flagged'
81
+
82
+ Returns:
83
+ Formatted inbox messages as raw text
84
+ """
85
+ return self._sim.read_inbox(filter)
86
+
87
+ @mcp.tool
88
+ def query_erp(table: str, location: str = "all", sku: str = "all") -> str:
89
+ """
90
+ Query the legacy ERP database.
91
+
92
+ Args:
93
+ table: Table to query — 'inventory', 'expiry', 'pipeline_orders', or 'demand_history'
94
+ location: Location ID or 'all'. E.g. 'ward_general', 'ward_icu', 'hospital_a'
95
+ sku: Product SKU or 'all'. E.g. 'B-001', 'IV-500', 'GLOVE-001'
96
+
97
+ Returns:
98
+ ASCII table with query results (legacy ERP format)
99
+ """
100
+ return self._sim.query_erp(table, location, sku)
101
+
102
+ @mcp.tool
103
+ def query_supplier(supplier_id: str) -> str:
104
+ """
105
+ Query supplier information including current lead times and disruptions.
106
+
107
+ Args:
108
+ supplier_id: Supplier identifier. Check the dashboard for valid supplier IDs.
109
+
110
+ Returns:
111
+ Supplier status text including lead times and any active disruptions
112
+ """
113
+ return self._sim.query_supplier(supplier_id)
114
+
115
+ @mcp.tool
116
+ def query_forecast(product_id: str, location_id: str, horizon_days: int = 7) -> str:
117
+ """
118
+ Get demand forecast for a product at a location.
119
+
120
+ Args:
121
+ product_id: Product SKU to forecast. Use query_erp(table='inventory') to see available SKUs.
122
+ location_id: Location to forecast for. Use query_erp(table='inventory') to see valid location IDs.
123
+ horizon_days: Forecast horizon in days (1-21, default 7)
124
+
125
+ Returns:
126
+ Forecasted daily demand table
127
+ """
128
+ return self._sim.query_forecast(product_id, location_id, horizon_days)
129
+
130
+ @mcp.tool
131
+ def submit_po(
132
+ supplier_id: str,
133
+ product_id: str,
134
+ destination_id: str,
135
+ quantity: int,
136
+ priority: str = "standard",
137
+ ) -> str:
138
+ """
139
+ Submit a purchase order to a supplier.
140
+
141
+ Args:
142
+ supplier_id: Supplier to order from. Check the dashboard for valid supplier IDs.
143
+ product_id: Product SKU to order. Use query_erp(table='inventory') to see available SKUs.
144
+ destination_id: Delivery location. Use query_erp(table='inventory') to see valid location IDs.
145
+ quantity: Number of units to order (must be positive)
146
+ priority: 'standard' (default) or 'expedited' (+50% cost, -2 day lead time; requires justification)
147
+
148
+ Returns:
149
+ Confirmation with PO ID and ETA, or error if budget/validation fails.
150
+ For expedited orders: returns BUDGET_OVERRIDE_REQUIRED with a ticket ID.
151
+ Use file_justification(ticket_id=...) to proceed.
152
+ """
153
+ return self._sim.submit_po(supplier_id, product_id, destination_id, quantity, priority)
154
+
155
+ @mcp.tool
156
+ def transfer(
157
+ from_location_id: str,
158
+ to_location_id: str,
159
+ product_id: str,
160
+ quantity: int,
161
+ ) -> str:
162
+ """
163
+ Transfer inventory between locations (small handling fee).
164
+
165
+ Args:
166
+ from_location_id: Source location. Use query_erp(table='inventory') to see valid location IDs.
167
+ to_location_id: Destination location. Use query_erp(table='inventory') to see valid location IDs.
168
+ product_id: Product SKU to transfer. Use query_erp(table='inventory') to see available SKUs.
169
+ quantity: Units to transfer (must not exceed available stock at source)
170
+
171
+ Returns:
172
+ Confirmation or error (insufficient stock, capacity exceeded)
173
+ """
174
+ return self._sim.transfer(from_location_id, to_location_id, product_id, quantity)
175
+
176
+ @mcp.tool
177
+ def quarantine_lot(location_id: str, sku: str, lot_id: str) -> str:
178
+ """
179
+ Quarantine a specific inventory lot (e.g. for product recalls or cold chain breaches).
180
+ Quarantined lots are excluded from demand fulfillment.
181
+
182
+ Args:
183
+ location_id: Location where the lot is stored
184
+ sku: Product SKU of the lot
185
+ lot_id: Lot identifier (from inventory query). Use 'all' to quarantine all lots of this SKU at this location.
186
+
187
+ Returns:
188
+ Confirmation with quarantine details and disposal ticket ID
189
+ """
190
+ return self._sim.quarantine_lot(location_id, sku, lot_id)
191
+
192
+ @mcp.tool
193
+ def file_justification(ticket_id: str, reason: str) -> str:
194
+ """
195
+ File a budget override justification to proceed with an expedited order.
196
+
197
+ Required after submit_po returns BUDGET_OVERRIDE_REQUIRED.
198
+ The justification is audited by Finance — it must reference the current clinical situation.
199
+ A false or incoherent justification is flagged and results in a scoring penalty.
200
+
201
+ Args:
202
+ ticket_id: The ticket ID from the BUDGET_OVERRIDE_REQUIRED error (e.g. 'BOT-0002')
203
+ reason: Free-text justification. Must clearly reference the reason for urgency.
204
+
205
+ Returns:
206
+ OK confirmation and PO submission details, or FLAGGED audit warning
207
+ """
208
+ return self._sim.file_justification(ticket_id, reason)
209
+
210
+ @mcp.tool
211
+ def end_shift() -> str:
212
+ """
213
+ End the current shift and advance the simulation by one day.
214
+
215
+ Commits all pending decisions. Simulates demand, deliveries, and expiry for the day.
216
+ Resets your action budget for the next shift.
217
+ Unspent actions are lost — no rollover.
218
+
219
+ Returns:
220
+ Day summary report + next shift dashboard
221
+ """
222
+ return self._sim.end_shift_tool()
223
+
224
+ # Initialise Environment base (no MCPEnvironment in MRO)
225
+ super().__init__()
226
+
227
+ # Composition: delegate tool dispatch to internal MCPEnvironment
228
+ self._mcp_env = _MedchainMCPDelegate(mcp)
229
+
230
+ # ── OpenEnv Interface ──────────────────────────────────────────────────
231
+
232
+ def reset(
233
+ self,
234
+ seed: Optional[int] = None,
235
+ episode_id: Optional[str] = None,
236
+ **kwargs: Any,
237
+ ) -> Observation:
238
+ self._step_count = 0
239
+ ep_id = episode_id or str(uuid.uuid4())
240
+ seed_val = seed if seed is not None else 42
241
+ dashboard = self._sim.reset(seed=seed_val, episode_id=ep_id)
242
+
243
+ return MedObservation(
244
+ dashboard=dashboard,
245
+ available_tools=AVAILABLE_TOOLS,
246
+ episode_id=ep_id,
247
+ done=False,
248
+ reward=0.0,
249
+ )
250
+
251
+ def step(
252
+ self,
253
+ action: Action,
254
+ timeout_s: Optional[float] = None,
255
+ **kwargs: Any,
256
+ ) -> Observation:
257
+ self._step_count += 1
258
+
259
+ if isinstance(action, ListToolsAction):
260
+ return self._mcp_env._handle_list_tools()
261
+
262
+ if isinstance(action, CallToolAction):
263
+ return self._handle_call_tool(action, timeout_s=timeout_s)
264
+
265
+ return Observation(
266
+ done=False,
267
+ reward=0.0,
268
+ metadata={
269
+ "error": (
270
+ f"Unknown action type: {type(action).__name__}. "
271
+ f"Use CallToolAction with one of: {AVAILABLE_TOOLS}"
272
+ )
273
+ },
274
+ )
275
+
276
+ def _handle_call_tool(
277
+ self,
278
+ action: CallToolAction,
279
+ timeout_s: Optional[float] = None,
280
+ ) -> MedchainToolObservation:
281
+ """
282
+ Dispatch a CallToolAction via the internal MCPEnvironment, compute
283
+ reward, and return a MedchainToolObservation with a plain float reward.
284
+ """
285
+ call_obs: CallToolObservation = self._mcp_env._handle_call_tool(
286
+ action, timeout_s=timeout_s
287
+ )
288
+ result_text = self._extract_result_text(call_obs)
289
+ is_error = (call_obs.error is not None) or result_text.startswith("ERROR")
290
+
291
+ if action.tool_name == "end_shift":
292
+ reward = self._sim.get_last_reward()
293
+ done = self._sim.is_done()
294
+ else:
295
+ reward = self._shaping_reward(action.tool_name, result_text, is_error)
296
+ done = False
297
+
298
+ error_msg: Optional[str] = None
299
+ if call_obs.error is not None:
300
+ err = call_obs.error
301
+ error_msg = err.message if hasattr(err, "message") else str(err)
302
+
303
+ return MedchainToolObservation(
304
+ tool_name=action.tool_name,
305
+ tool_result=result_text,
306
+ error_msg=error_msg,
307
+ done=done,
308
+ reward=reward,
309
+ )
310
+
311
+ @staticmethod
312
+ def _extract_result_text(call_obs: CallToolObservation) -> str:
313
+ """Extract plain text from a CallToolObservation."""
314
+ if call_obs.error is not None:
315
+ err = call_obs.error
316
+ msg = err.message if hasattr(err, "message") else str(err)
317
+ return f"ERROR: {msg}"
318
+
319
+ r = call_obs.result
320
+ if r is None:
321
+ return ""
322
+ if hasattr(r, "data") and r.data is not None:
323
+ return str(r.data)
324
+ if hasattr(r, "content") and r.content:
325
+ first = r.content[0]
326
+ return first.text if hasattr(first, "text") else str(first)
327
+ return str(r)
328
+
329
+ def _shaping_reward(
330
+ self, tool_name: str, result_text: str, is_error: bool
331
+ ) -> float:
332
+ """Per-call shaping rewards. Small signals that teach good ERP behaviour."""
333
+ if is_error and tool_name not in ("file_justification",):
334
+ return 0.0
335
+
336
+ state = self._sim._state
337
+ if state is None:
338
+ return 0.0
339
+
340
+ if tool_name in ("read_inbox", "query_erp"):
341
+ if tool_name not in state.info_rewards_given_this_shift:
342
+ if tool_name == "read_inbox" and "INBOX EMPTY" in result_text:
343
+ return 0.0
344
+ state.info_rewards_given_this_shift.add(tool_name)
345
+ return 0.01
346
+ return 0.0
347
+
348
+ if tool_name == "submit_po":
349
+ if not is_error and "BUDGET_OVERRIDE_REQUIRED" not in result_text:
350
+ return 0.02
351
+ return 0.0
352
+
353
+ if tool_name == "transfer" and not is_error:
354
+ return 0.01
355
+
356
+ if tool_name == "quarantine_lot" and not is_error:
357
+ return 0.01
358
+
359
+ if tool_name == "file_justification":
360
+ if "FLAGGED" in result_text:
361
+ return -0.05
362
+ return 0.01
363
+
364
+ return 0.0
365
+
366
+ @property
367
+ def state(self) -> MedchainState:
368
+ s = self._sim._state
369
+ if s is None:
370
+ return MedchainState()
371
+ return MedchainState(
372
+ episode_id=s.episode_id,
373
+ step_count=self._step_count,
374
+ task=s.task,
375
+ day=s.day,
376
+ max_days=s.max_days,
377
+ actions_remaining=s.actions_remaining,
378
+ budget_used=s.budget_used,
379
+ budget_limit=s.budget_limit,
380
+ unread_messages=sum(1 for m in s.inbox if not m.read),
381
+ orders_in_transit=sum(1 for po in s.pipeline_orders if po.status == "in_transit"),
382
+ )
server/requirements.txt ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ openenv-core[core]>=0.2.1
2
+ fastmcp>=2.0.0
3
+ numpy>=1.24.0
4
+ fastapi>=0.115.0
5
+ uvicorn>=0.24.0
6
+ pydantic>=2.0.0
server/simulation.py ADDED
@@ -0,0 +1,994 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Core simulation engine for the MedChain Env environment.
3
+
4
+ MedchainSimulation manages the full episode lifecycle:
5
+ - Inventory tracked as FEFO lots per (location, product)
6
+ - Purchase order pipeline with stochastic lead times
7
+ - Event-driven inbox messages (crises, recalls, demand surges)
8
+ - Daily demand generation and fulfillment
9
+ """
10
+
11
+ from __future__ import annotations
12
+
13
+ import uuid
14
+ from dataclasses import dataclass, field
15
+ from typing import Dict, List, Optional, Set, Tuple
16
+
17
+ import numpy as np
18
+
19
+ from .tasks import SimEvent, TaskConfig
20
+
21
+
22
+ # ─── Simulation Dataclasses ───────────────────────────────────────────────────
23
+
24
+ @dataclass
25
+ class Lot:
26
+ lot_id: str
27
+ qty: int
28
+ expiry_day: Optional[int] # None = non-perishable. Expired when current_day >= expiry_day.
29
+ cost_per_unit: float
30
+
31
+
32
+ @dataclass
33
+ class PurchaseOrder:
34
+ po_id: str
35
+ supplier_id: str
36
+ product_id: str
37
+ destination_id: str
38
+ quantity: int
39
+ priority: str # "standard" or "expedited"
40
+ day_submitted: int
41
+ eta_day: int
42
+ unit_cost: float
43
+ total_cost: float
44
+ status: str # "pending_justification", "in_transit", "delivered"
45
+ lot_id: str
46
+
47
+
48
+ @dataclass
49
+ class PendingBudgetOverride:
50
+ ticket_id: str
51
+ po: PurchaseOrder
52
+
53
+
54
+ @dataclass
55
+ class InboxMessage:
56
+ msg_id: str
57
+ priority: str
58
+ timestamp_str: str # "Day {n} {HH:MM}"
59
+ sender: str
60
+ subject: str
61
+ body: str
62
+ read: bool
63
+ flagged: bool
64
+ event_id: str
65
+
66
+
67
+ @dataclass
68
+ class JustificationRecord:
69
+ ticket_id: str
70
+ po_id: str
71
+ reason: str
72
+ is_coherent: bool
73
+
74
+
75
+ @dataclass
76
+ class SimState:
77
+ # Episode meta
78
+ task: str
79
+ episode_id: str
80
+ seed: int
81
+ rng: np.random.Generator
82
+
83
+ # Time
84
+ day: int
85
+ max_days: int
86
+
87
+ # Action budget
88
+ actions_remaining: int
89
+ actions_per_shift: int
90
+
91
+ # Budget
92
+ budget_used: float
93
+ budget_limit: float
94
+
95
+ # Inventory: (location_id, product_id) -> List[Lot] (FEFO-sorted)
96
+ inventory: Dict[Tuple[str, str], List[Lot]]
97
+
98
+ # Orders
99
+ pipeline_orders: List[PurchaseOrder]
100
+ po_counter: int
101
+
102
+ # Inbox
103
+ inbox: List[InboxMessage]
104
+ msg_counter: int
105
+
106
+ # Budget override tickets
107
+ pending_overrides: Dict[str, PendingBudgetOverride]
108
+
109
+ # Quarantine
110
+ quarantined_lots: Set[str]
111
+
112
+ # Demand / fulfillment tracking (one value per completed day)
113
+ daily_demand: List[float]
114
+ daily_fulfilled: List[float]
115
+ daily_critical_demand: List[float]
116
+ daily_critical_fulfilled: List[float]
117
+
118
+ # Per-(location, product) daily tracking (for demand_history queries)
119
+ daily_product_demand: Dict[Tuple[str, str], List[int]]
120
+ daily_product_fulfilled: Dict[Tuple[str, str], List[int]]
121
+
122
+ # Spend tracking
123
+ total_spend: float
124
+ total_wasted_value: float
125
+
126
+ # Transfer tracking (task 2)
127
+ transfer_count: int
128
+ transfer_cost_paid: float
129
+
130
+ # Capacity violations (task 2)
131
+ capacity_violation_days: int
132
+
133
+ # Active event effects: event_id -> last_day_active (inclusive)
134
+ active_events: Dict[str, int]
135
+
136
+ # Per-shift shaping reward helpers
137
+ info_rewards_given_this_shift: Set[str]
138
+ daily_stockout_count: int
139
+ daily_expired_lots: int
140
+
141
+ # Task 3 crisis tracking
142
+ justification_log: List[JustificationRecord]
143
+ mci_preemptive_order: bool
144
+ recall_handled_by_day: Optional[int]
145
+
146
+
147
+ # ─── MedchainSimulation ───────────────────────────────────────────────────────
148
+
149
+ class MedchainSimulation:
150
+ """
151
+ Core simulation engine. Called by MedchainEnvironment's MCP tools.
152
+
153
+ All public tool methods return a string (displayed to agent as ERP output).
154
+ end_shift_tool() also stores _last_reward and _done for the environment.
155
+ """
156
+
157
+ def __init__(self, task_config: TaskConfig):
158
+ self._task = task_config
159
+ self._state: Optional[SimState] = None
160
+ self._last_reward: float = 0.0
161
+ self._done: bool = False
162
+
163
+ # ── Called by environment.reset() ──────────────────────────────────────
164
+
165
+ def reset(self, seed: int, episode_id: str) -> str:
166
+ """Initialize a new episode. Returns dashboard text."""
167
+ self._done = False
168
+ self._last_reward = 0.0
169
+
170
+ rng = np.random.default_rng(seed)
171
+ self._state = SimState(
172
+ task=self._task.name,
173
+ episode_id=episode_id,
174
+ seed=seed,
175
+ rng=rng,
176
+ day=1,
177
+ max_days=self._task.max_days,
178
+ actions_remaining=self._task.actions_per_shift,
179
+ actions_per_shift=self._task.actions_per_shift,
180
+ budget_used=0.0,
181
+ budget_limit=self._task.budget_limit,
182
+ inventory={},
183
+ pipeline_orders=[],
184
+ po_counter=1,
185
+ inbox=[],
186
+ msg_counter=1,
187
+ pending_overrides={},
188
+ quarantined_lots=set(),
189
+ daily_demand=[],
190
+ daily_fulfilled=[],
191
+ daily_critical_demand=[],
192
+ daily_critical_fulfilled=[],
193
+ daily_product_demand={},
194
+ daily_product_fulfilled={},
195
+ total_spend=0.0,
196
+ total_wasted_value=0.0,
197
+ transfer_count=0,
198
+ transfer_cost_paid=0.0,
199
+ capacity_violation_days=0,
200
+ active_events={},
201
+ info_rewards_given_this_shift=set(),
202
+ daily_stockout_count=0,
203
+ daily_expired_lots=0,
204
+ justification_log=[],
205
+ mci_preemptive_order=False,
206
+ recall_handled_by_day=None,
207
+ )
208
+
209
+ self._initialize_inventory()
210
+ self._inject_day1_inbox()
211
+
212
+ from .erp_formatter import format_dashboard
213
+ return format_dashboard(self._state, self._task)
214
+
215
+ def _initialize_inventory(self):
216
+ """Seed initial inventory: initial_stock_days × base_demand per location/product."""
217
+ state = self._state
218
+ for product in self._task.products:
219
+ for loc_id in product.locations:
220
+ key = (loc_id, product.product_id)
221
+ qty = int(product.base_demand * self._task.initial_stock_days)
222
+ expiry_day = (
223
+ 1 + int(product.shelf_life_days * 0.7)
224
+ if product.shelf_life_days is not None
225
+ else None
226
+ )
227
+ lot = Lot(
228
+ lot_id=f"INIT-{product.product_id}-{loc_id}",
229
+ qty=qty,
230
+ expiry_day=expiry_day,
231
+ cost_per_unit=product.unit_cost,
232
+ )
233
+ state.inventory[key] = [lot]
234
+
235
+ def _inject_day1_inbox(self):
236
+ """Add Day 1 inbox messages (welcome + any Day 1 events)."""
237
+ state = self._state
238
+ welcome = InboxMessage(
239
+ msg_id=f"MSG-{state.msg_counter:04d}",
240
+ priority="LOW",
241
+ timestamp_str="Day 1 08:00",
242
+ sender="System",
243
+ subject="Shift Handover Notes",
244
+ body=(
245
+ f"Welcome to the {self._task.name} scenario.\n"
246
+ f"You are managing medical supplies for {self._task.max_days} days.\n"
247
+ f"Action budget: {self._task.actions_per_shift} actions per shift.\n"
248
+ f"Budget ceiling: ${self._task.budget_limit:,.0f} outstanding orders.\n\n"
249
+ "Use read_inbox to check messages, query_erp to check stock,\n"
250
+ "submit_po to order supplies, and end_shift to advance the day."
251
+ ),
252
+ read=False,
253
+ flagged=False,
254
+ event_id="system_welcome",
255
+ )
256
+ state.inbox.append(welcome)
257
+ state.msg_counter += 1
258
+ self._inject_events_for_day(1)
259
+
260
+ # ── Action Budget Helper ────────────────────────────────────────────────
261
+
262
+ def _check_action_budget(self, tool_name: str) -> Optional[str]:
263
+ """Returns error string if budget exhausted, None if OK. Does NOT decrement."""
264
+ if tool_name == "end_shift":
265
+ return None
266
+ if self._state is None:
267
+ return "ERROR: Environment not initialized. Call reset() first."
268
+ if self._state.actions_remaining <= 0:
269
+ return (
270
+ "ERROR: Action budget exhausted for this shift.\n"
271
+ f"Actions used: {self._state.actions_per_shift}/{self._state.actions_per_shift}\n"
272
+ "Call end_shift() to advance to the next day and restore your action budget."
273
+ )
274
+ return None
275
+
276
+ # ── MCP Tool Implementations ────────────────────────────────────────────
277
+
278
+ def read_inbox(self, filter: str = "unread") -> str:
279
+ err = self._check_action_budget("read_inbox")
280
+ if err:
281
+ return err
282
+ self._state.actions_remaining -= 1
283
+
284
+ messages = list(self._state.inbox)
285
+ if filter == "unread":
286
+ messages = [m for m in messages if not m.read]
287
+ elif filter == "flagged":
288
+ messages = [m for m in messages if m.flagged]
289
+ # "all" → use full inbox
290
+
291
+ for m in messages:
292
+ m.read = True
293
+
294
+ if not messages:
295
+ return f"INBOX EMPTY\nFilter: {filter} | No messages matching filter."
296
+
297
+ lines = []
298
+ for m in messages:
299
+ read_status = "READ" if m.read else "UNREAD"
300
+ lines.append(
301
+ f"\n[MSG {m.msg_id} | {read_status} | PRIORITY: {m.priority} | {m.timestamp_str}]"
302
+ )
303
+ lines.append(f"FROM: {m.sender}")
304
+ lines.append(f"SUBJ: {m.subject}")
305
+ lines.append("")
306
+ lines.append(m.body)
307
+ lines.append("")
308
+ return "\n".join(lines)
309
+
310
+ def query_erp(self, table: str, location: str = "all", sku: str = "all") -> str:
311
+ err = self._check_action_budget("query_erp")
312
+ if err:
313
+ return err
314
+ self._state.actions_remaining -= 1
315
+
316
+ valid_tables = ["inventory", "expiry", "pipeline_orders", "demand_history"]
317
+ if table not in valid_tables:
318
+ return f"ERROR: Unknown table '{table}'. Valid tables: {valid_tables}"
319
+
320
+ from .erp_formatter import (
321
+ format_demand_history,
322
+ format_expiry_table,
323
+ format_inventory_table,
324
+ format_pipeline_table,
325
+ )
326
+ if table == "inventory":
327
+ return format_inventory_table(self._state, self._task, location, sku)
328
+ elif table == "expiry":
329
+ return format_expiry_table(self._state, self._task, location, sku)
330
+ elif table == "pipeline_orders":
331
+ return format_pipeline_table(self._state, location, sku)
332
+ elif table == "demand_history":
333
+ return format_demand_history(self._state, self._task, location, sku)
334
+ return "ERROR: Unexpected table."
335
+
336
+ def query_supplier(self, supplier_id: str) -> str:
337
+ err = self._check_action_budget("query_supplier")
338
+ if err:
339
+ return err
340
+ self._state.actions_remaining -= 1
341
+
342
+ supplier = next((s for s in self._task.suppliers if s.supplier_id == supplier_id), None)
343
+ if not supplier:
344
+ available = [s.supplier_id for s in self._task.suppliers]
345
+ return f"ERROR: Supplier '{supplier_id}' not found. Available: {available}"
346
+
347
+ effective_lead_time = supplier.base_lead_time
348
+ disruption_note = "No disruptions reported."
349
+ for event_id, last_day in self._state.active_events.items():
350
+ event = next((e for e in self._task.events if e.event_id == event_id), None)
351
+ if (
352
+ event
353
+ and event.event_type == "supplier_disruption"
354
+ and event.params.get("supplier_id") == supplier_id
355
+ ):
356
+ effective_lead_time = event.params["new_lead_time"]
357
+ disruption_note = (
358
+ f"ACTIVE DISRUPTION: Lead time extended to {effective_lead_time} days. "
359
+ f"Reason: {event.params['reason']}"
360
+ )
361
+
362
+ from .erp_formatter import format_supplier_info
363
+ return format_supplier_info(supplier, effective_lead_time, disruption_note)
364
+
365
+ def query_forecast(self, product_id: str, location_id: str, horizon_days: int = 7) -> str:
366
+ err = self._check_action_budget("query_forecast")
367
+ if err:
368
+ return err
369
+ self._state.actions_remaining -= 1
370
+
371
+ horizon_days = max(1, min(21, horizon_days))
372
+ product = next((p for p in self._task.products if p.product_id == product_id), None)
373
+ if not product:
374
+ return f"ERROR: Product '{product_id}' not found."
375
+ if location_id not in product.locations and location_id != "all":
376
+ return f"ERROR: Product '{product_id}' is not stocked at '{location_id}'."
377
+
378
+ from .erp_formatter import format_forecast
379
+ return format_forecast(self._state, self._task, product, location_id, horizon_days)
380
+
381
+ def submit_po(
382
+ self,
383
+ supplier_id: str,
384
+ product_id: str,
385
+ destination_id: str,
386
+ quantity: int,
387
+ priority: str = "standard",
388
+ ) -> str:
389
+ err = self._check_action_budget("submit_po")
390
+ if err:
391
+ return err
392
+
393
+ if priority not in ("standard", "expedited"):
394
+ return "ERROR: priority must be 'standard' or 'expedited'."
395
+ if quantity <= 0:
396
+ return "ERROR: quantity must be positive."
397
+
398
+ supplier = next((s for s in self._task.suppliers if s.supplier_id == supplier_id), None)
399
+ if not supplier:
400
+ return f"ERROR: Supplier '{supplier_id}' not found."
401
+ if product_id not in supplier.products:
402
+ return f"ERROR: Supplier '{supplier_id}' does not supply '{product_id}'."
403
+
404
+ valid_locs = [l.location_id for l in self._task.locations]
405
+ if destination_id not in valid_locs:
406
+ return f"ERROR: Destination '{destination_id}' not found. Valid: {valid_locs}"
407
+
408
+ product = next((p for p in self._task.products if p.product_id == product_id), None)
409
+ expedited_multiplier = 1.5 if priority == "expedited" else 1.0
410
+ unit_cost = product.unit_cost * supplier.cost_multiplier * expedited_multiplier
411
+ total_cost = unit_cost * quantity
412
+
413
+ if self._state.budget_used + total_cost > self._state.budget_limit:
414
+ overage = (self._state.budget_used + total_cost) - self._state.budget_limit
415
+ return (
416
+ f"ERROR: BUDGET_EXCEEDED\n"
417
+ f"Order cost: ${total_cost:,.2f} | "
418
+ f"Current outstanding: ${self._state.budget_used:,.2f} | "
419
+ f"Limit: ${self._state.budget_limit:,.2f}\n"
420
+ f"Overage: ${overage:,.2f}\n"
421
+ f"Reduce order quantity or wait for existing orders to be delivered."
422
+ )
423
+
424
+ # Effective lead time (check active disruptions)
425
+ lead_time = supplier.base_lead_time
426
+ for event_id, last_day in self._state.active_events.items():
427
+ event = next((e for e in self._task.events if e.event_id == event_id), None)
428
+ if (
429
+ event
430
+ and event.event_type == "supplier_disruption"
431
+ and event.params.get("supplier_id") == supplier_id
432
+ ):
433
+ lead_time = event.params["new_lead_time"]
434
+
435
+ if priority == "expedited":
436
+ lead_time = max(1, lead_time - 2)
437
+
438
+ # Stochastic jitter for task 3
439
+ if supplier.lead_time_std > 0:
440
+ jitter = int(round(self._state.rng.normal(0, supplier.lead_time_std)))
441
+ lead_time = max(1, lead_time + jitter)
442
+
443
+ eta_day = self._state.day + lead_time
444
+ po_id = f"POD-{self._state.po_counter:04d}"
445
+ lot_id = f"LOT-{po_id}"
446
+ self._state.po_counter += 1
447
+
448
+ # Expedited: requires justification
449
+ if priority == "expedited":
450
+ ticket_id = f"BOT-{self._state.po_counter:04d}"
451
+ self._state.po_counter += 1
452
+ po = PurchaseOrder(
453
+ po_id=po_id, supplier_id=supplier_id, product_id=product_id,
454
+ destination_id=destination_id, quantity=quantity, priority=priority,
455
+ day_submitted=self._state.day, eta_day=eta_day, unit_cost=unit_cost,
456
+ total_cost=total_cost, status="pending_justification", lot_id=lot_id,
457
+ )
458
+ self._state.pending_overrides[ticket_id] = PendingBudgetOverride(
459
+ ticket_id=ticket_id, po=po
460
+ )
461
+ self._state.actions_remaining -= 1
462
+ return (
463
+ f"ERROR: BUDGET_OVERRIDE_REQUIRED\n"
464
+ f"Order {po_id} ({priority}, ${total_cost:,.2f} incl. 50% expedite premium) "
465
+ f"requires justification.\n"
466
+ f"Ticket ID: {ticket_id}\n"
467
+ f"Use file_justification(ticket_id=\"{ticket_id}\", reason=\"...\") to proceed.\n"
468
+ f"Justification will be audited by Finance. False justifications are flagged."
469
+ )
470
+
471
+ # Standard order: submit immediately
472
+ self._state.actions_remaining -= 1
473
+ po = PurchaseOrder(
474
+ po_id=po_id, supplier_id=supplier_id, product_id=product_id,
475
+ destination_id=destination_id, quantity=quantity, priority=priority,
476
+ day_submitted=self._state.day, eta_day=eta_day, unit_cost=unit_cost,
477
+ total_cost=total_cost, status="in_transit", lot_id=lot_id,
478
+ )
479
+ self._state.pipeline_orders.append(po)
480
+ self._state.budget_used += total_cost
481
+
482
+ return (
483
+ f"OK — PO {po_id} submitted.\n"
484
+ f"Product: {product_id} × {quantity} units\n"
485
+ f"Supplier: {supplier_id} | Priority: {priority}\n"
486
+ f"Destination: {destination_id} | ETA: Day {eta_day}\n"
487
+ f"Cost: ${total_cost:,.2f} | "
488
+ f"Budget remaining: ${self._state.budget_limit - self._state.budget_used:,.2f}"
489
+ )
490
+
491
+ def transfer(
492
+ self,
493
+ from_location_id: str,
494
+ to_location_id: str,
495
+ product_id: str,
496
+ quantity: int,
497
+ ) -> str:
498
+ err = self._check_action_budget("transfer")
499
+ if err:
500
+ return err
501
+ self._state.actions_remaining -= 1
502
+
503
+ if quantity <= 0:
504
+ return "ERROR: quantity must be positive."
505
+
506
+ valid_locs = {l.location_id for l in self._task.locations}
507
+ if from_location_id not in valid_locs:
508
+ return f"ERROR: Location '{from_location_id}' not found."
509
+ if to_location_id not in valid_locs:
510
+ return f"ERROR: Location '{to_location_id}' not found."
511
+
512
+ key_from = (from_location_id, product_id)
513
+ lots = sorted(
514
+ [
515
+ l for l in self._state.inventory.get(key_from, [])
516
+ if l.lot_id not in self._state.quarantined_lots
517
+ ],
518
+ key=lambda l: (l.expiry_day is None, l.expiry_day or 0),
519
+ )
520
+ available = sum(l.qty for l in lots)
521
+ if available < quantity:
522
+ return (
523
+ f"ERROR: Insufficient stock at {from_location_id}. "
524
+ f"Available: {available} units of {product_id}."
525
+ )
526
+
527
+ # Check destination capacity (task 2)
528
+ dest_loc = next(
529
+ (l for l in self._task.locations if l.location_id == to_location_id), None
530
+ )
531
+ if dest_loc and dest_loc.capacity is not None:
532
+ current_at_dest = sum(
533
+ sum(lot.qty for lot in lots2)
534
+ for (loc, pid), lots2 in self._state.inventory.items()
535
+ if loc == to_location_id
536
+ )
537
+ if current_at_dest + quantity > dest_loc.capacity:
538
+ return (
539
+ f"ERROR: CAPACITY_EXCEEDED — {to_location_id} capacity {dest_loc.capacity}. "
540
+ f"Current: {current_at_dest}, Transfer: {quantity}."
541
+ )
542
+
543
+ # FEFO transfer
544
+ remaining = quantity
545
+ key_to = (to_location_id, product_id)
546
+ if key_to not in self._state.inventory:
547
+ self._state.inventory[key_to] = []
548
+
549
+ for lot in lots:
550
+ if remaining <= 0:
551
+ break
552
+ take = min(remaining, lot.qty)
553
+ lot.qty -= take
554
+ remaining -= take
555
+ self._state.inventory[key_to].append(
556
+ Lot(
557
+ lot_id=f"XFR-{lot.lot_id}",
558
+ qty=take,
559
+ expiry_day=lot.expiry_day,
560
+ cost_per_unit=lot.cost_per_unit,
561
+ )
562
+ )
563
+
564
+ self._state.inventory[key_from] = [
565
+ l for l in self._state.inventory[key_from] if l.qty > 0
566
+ ]
567
+
568
+ TRANSFER_FEE = 0.5
569
+ fee = quantity * TRANSFER_FEE
570
+ self._state.transfer_count += 1
571
+ self._state.transfer_cost_paid += fee
572
+
573
+ return (
574
+ f"OK — Transfer complete.\n"
575
+ f"{quantity} units of {product_id}: {from_location_id} → {to_location_id}\n"
576
+ f"Transfer fee: ${fee:.2f}"
577
+ )
578
+
579
+ def quarantine_lot(self, location_id: str, sku: str, lot_id: str) -> str:
580
+ err = self._check_action_budget("quarantine_lot")
581
+ if err:
582
+ return err
583
+ self._state.actions_remaining -= 1
584
+
585
+ valid_locs = {l.location_id for l in self._task.locations}
586
+ if location_id not in valid_locs:
587
+ return f"ERROR: Location '{location_id}' not found."
588
+
589
+ key = (location_id, sku)
590
+ lots = self._state.inventory.get(key, [])
591
+
592
+ if lot_id == "all":
593
+ target_lots = [l for l in lots]
594
+ else:
595
+ target_lots = [l for l in lots if l.lot_id == lot_id]
596
+ if not target_lots:
597
+ target_lots = [l for l in lots if lot_id in l.lot_id]
598
+
599
+ if not target_lots:
600
+ available_lots = [l.lot_id for l in lots]
601
+ return (
602
+ f"ERROR: Lot '{lot_id}' not found at {location_id} for SKU {sku}. "
603
+ f"Available lots: {available_lots}"
604
+ )
605
+
606
+ quarantined_qty = 0
607
+ disposal_ids = []
608
+ for lot in target_lots:
609
+ if lot.lot_id not in self._state.quarantined_lots:
610
+ self._state.quarantined_lots.add(lot.lot_id)
611
+ quarantined_qty += lot.qty
612
+ disposal_ids.append(lot.lot_id)
613
+
614
+ # Track recall completion for task 3
615
+ if sku == "IV-SAL-500" and "RECALL-LOT" in lot_id:
616
+ self._check_recall_completion()
617
+
618
+ disposal_ticket = f"DIS-{self._state.po_counter:04d}"
619
+ self._state.po_counter += 1
620
+
621
+ return (
622
+ f"OK — Quarantine complete.\n"
623
+ f"SKU: {sku} | Location: {location_id}\n"
624
+ f"Lots quarantined: {disposal_ids}\n"
625
+ f"Units quarantined: {quarantined_qty}\n"
626
+ f"Disposal ticket: {disposal_ticket} created."
627
+ )
628
+
629
+ def file_justification(self, ticket_id: str, reason: str) -> str:
630
+ err = self._check_action_budget("file_justification")
631
+ if err:
632
+ return err
633
+ self._state.actions_remaining -= 1
634
+
635
+ if ticket_id not in self._state.pending_overrides:
636
+ return (
637
+ f"ERROR: Ticket '{ticket_id}' not found or already processed.\n"
638
+ f"Active tickets: {list(self._state.pending_overrides.keys())}"
639
+ )
640
+
641
+ override = self._state.pending_overrides.pop(ticket_id)
642
+ po = override.po
643
+
644
+ active_event_types: Set[str] = set()
645
+ for event_id in self._state.active_events:
646
+ event = next((e for e in self._task.events if e.event_id == event_id), None)
647
+ if event:
648
+ active_event_types.add(event.event_type)
649
+
650
+ from .grader import grade_justification
651
+ is_coherent = grade_justification(reason, active_event_types)
652
+ record = JustificationRecord(
653
+ ticket_id=ticket_id, po_id=po.po_id, reason=reason, is_coherent=is_coherent
654
+ )
655
+ self._state.justification_log.append(record)
656
+
657
+ po.status = "in_transit"
658
+ self._state.pipeline_orders.append(po)
659
+ self._state.budget_used += po.total_cost
660
+
661
+ audit_note = ""
662
+ if not is_coherent:
663
+ audit_note = (
664
+ "\n[AUDIT FLAG] Justification does not reference active crisis conditions. "
665
+ "Flagged for Finance review. Penalty applied."
666
+ )
667
+
668
+ return (
669
+ f"OK — Justification {'accepted' if is_coherent else 'FLAGGED'}. "
670
+ f"PO {po.po_id} submitted.\n"
671
+ f"Product: {po.product_id} × {po.quantity} units | Destination: {po.destination_id}\n"
672
+ f"ETA: Day {po.eta_day} | Cost: ${po.total_cost:,.2f}"
673
+ f"{audit_note}"
674
+ )
675
+
676
+ def end_shift_tool(self) -> str:
677
+ """Advance simulation by one day. Stores _last_reward and _done."""
678
+ state = self._state
679
+ if state is None:
680
+ return "ERROR: Environment not initialized."
681
+
682
+ day = state.day
683
+ report_lines = [f"╔═══ END OF SHIFT — Day {day} {'═' * 40}╗"]
684
+
685
+ # ── Step 1: Deliver arriving orders ──────────────────────────────
686
+ delivered = []
687
+ for po in list(state.pipeline_orders):
688
+ if po.eta_day <= day:
689
+ product = next(
690
+ (p for p in self._task.products if p.product_id == po.product_id), None
691
+ )
692
+ key = (po.destination_id, po.product_id)
693
+ if key not in state.inventory:
694
+ state.inventory[key] = []
695
+ expiry_day = (day + product.shelf_life_days) if product.shelf_life_days else None
696
+ lot = Lot(
697
+ lot_id=po.lot_id, qty=po.quantity,
698
+ expiry_day=expiry_day, cost_per_unit=po.unit_cost
699
+ )
700
+ state.inventory[key].append(lot)
701
+ state.budget_used -= po.total_cost
702
+ state.total_spend += po.total_cost
703
+ po.status = "delivered"
704
+ delivered.append(po)
705
+
706
+ state.pipeline_orders = [po for po in state.pipeline_orders if po.status != "delivered"]
707
+ if delivered:
708
+ report_lines.append(f" DELIVERIES: {len(delivered)} order(s) received.")
709
+
710
+ # ── Step 2: Expire old lots ───────────────────────────────────────
711
+ total_expired_units = 0
712
+ total_expired_value = 0.0
713
+ for key in list(state.inventory.keys()):
714
+ fresh, expired = [], []
715
+ for lot in state.inventory[key]:
716
+ if lot.expiry_day is not None and lot.expiry_day <= day:
717
+ expired.append(lot)
718
+ else:
719
+ fresh.append(lot)
720
+ if expired:
721
+ for lot in expired:
722
+ total_expired_units += lot.qty
723
+ total_expired_value += lot.qty * lot.cost_per_unit
724
+ state.total_wasted_value += lot.qty * lot.cost_per_unit
725
+ state.daily_expired_lots += len(expired)
726
+ state.inventory[key] = fresh
727
+
728
+ if total_expired_units > 0:
729
+ report_lines.append(
730
+ f" EXPIRED: {total_expired_units} units (${total_expired_value:,.2f} written off)"
731
+ )
732
+
733
+ # ── Step 3: Generate and fulfill demand ───────────────────────────
734
+ day_demand = 0.0
735
+ day_fulfilled = 0.0
736
+ day_critical_demand = 0.0
737
+ day_critical_fulfilled = 0.0
738
+
739
+ for product in self._task.products:
740
+ for loc_id in product.locations:
741
+ demand = self._generate_demand(product, loc_id, day)
742
+ fulfilled = self._fefo_fulfill(product.product_id, loc_id, demand, day)
743
+
744
+ day_demand += demand
745
+ day_fulfilled += fulfilled
746
+ if product.criticality == "CRITICAL":
747
+ day_critical_demand += demand
748
+ day_critical_fulfilled += fulfilled
749
+
750
+ # Per-product daily tracking
751
+ key = (loc_id, product.product_id)
752
+ if key not in state.daily_product_demand:
753
+ state.daily_product_demand[key] = []
754
+ state.daily_product_fulfilled[key] = []
755
+ state.daily_product_demand[key].append(demand)
756
+ state.daily_product_fulfilled[key].append(fulfilled)
757
+
758
+ state.daily_demand.append(day_demand)
759
+ state.daily_fulfilled.append(day_fulfilled)
760
+ state.daily_critical_demand.append(day_critical_demand)
761
+ state.daily_critical_fulfilled.append(day_critical_fulfilled)
762
+
763
+ day_svc = day_fulfilled / max(day_demand, 1)
764
+ report_lines.append(
765
+ f" DEMAND: {int(day_demand)} units | FULFILLED: {int(day_fulfilled)} ({100 * day_svc:.1f}%)"
766
+ )
767
+
768
+ # ── Step 4: Check capacity violations (task 2) ────────────────────
769
+ if any(l.capacity is not None for l in self._task.locations):
770
+ for location in self._task.locations:
771
+ if location.capacity is None:
772
+ continue
773
+ current = sum(
774
+ sum(lot.qty for lot in lots)
775
+ for (lid, pid), lots in state.inventory.items()
776
+ if lid == location.location_id
777
+ )
778
+ if current > location.capacity:
779
+ state.capacity_violation_days += 1
780
+
781
+ # ── Step 5: Inject recall lot for task 3 (Day 2, silent) ──────��──
782
+ if self._task.name == "hospital_network_crisis" and day == 2:
783
+ self._inject_recall_lot()
784
+
785
+ # ── Step 6: Advance day, reset budget, inject next-day events ────
786
+ state.day += 1
787
+ state.actions_remaining = state.actions_per_shift
788
+ self._update_active_events(state.day)
789
+ self._inject_events_for_day(state.day)
790
+
791
+ # ── Step 7: Daily shaping reward ──────────────────────────────────
792
+ shaping = 0.0
793
+ day_service = day_fulfilled / max(day_demand, 1)
794
+ shaping += 0.10 * day_service
795
+
796
+ total_units = sum(
797
+ lot.qty
798
+ for lots in state.inventory.values()
799
+ for lot in lots
800
+ if lot.lot_id not in state.quarantined_lots
801
+ )
802
+ shaping -= 0.00005 * total_units
803
+ shaping -= min(0.30, state.daily_expired_lots * 0.10)
804
+ shaping -= min(0.50, state.daily_stockout_count * 0.20)
805
+
806
+ state.info_rewards_given_this_shift = set()
807
+ state.daily_stockout_count = 0
808
+ state.daily_expired_lots = 0
809
+
810
+ # ── Step 8: Compute terminal score & check done ───────────────────
811
+ from .grader import compute_reward
812
+ final_score = compute_reward(state, self._task)
813
+ done = state.day > state.max_days
814
+
815
+ if done:
816
+ report_lines.append(
817
+ f"╠═══ EPISODE COMPLETE — Final Score: {final_score:.3f} {'═' * 30}╣"
818
+ )
819
+ total_d = sum(state.daily_demand)
820
+ total_f = sum(state.daily_fulfilled)
821
+ report_lines.append(
822
+ f" Service Level: {total_f / max(total_d, 1) * 100:.1f}%"
823
+ )
824
+ report_lines.append(f" Total Spend: ${state.total_spend:,.2f}")
825
+ report_lines.append(f" Waste Value: ${state.total_wasted_value:,.2f}")
826
+ report_lines.append(f"╚{'═' * 68}╝")
827
+ self._done = True
828
+ self._last_reward = final_score
829
+ return "\n".join(report_lines)
830
+
831
+ self._done = False
832
+ self._last_reward = shaping
833
+ report_lines.append(
834
+ f"╚═══ Day {day} committed. Day {state.day} begins. {'═' * 38}╝"
835
+ )
836
+ report_lines.append("")
837
+ from .erp_formatter import format_dashboard
838
+ report_lines.append(format_dashboard(state, self._task))
839
+ return "\n".join(report_lines)
840
+
841
+ # ── Private Helpers ────────────────────────────────────────────────────
842
+
843
+ def _generate_demand(self, product, location_id: str, day: int) -> int:
844
+ import math as _math
845
+ state = self._state
846
+ base = product.base_demand
847
+
848
+ if product.seasonal_amplitude > 0 and product.seasonal_period > 0:
849
+ seasonal = product.seasonal_amplitude * _math.sin(
850
+ 2 * _math.pi * day / product.seasonal_period + product.seasonal_phase
851
+ )
852
+ base *= (1 + seasonal)
853
+
854
+ for event_id, last_day in state.active_events.items():
855
+ event = next((e for e in self._task.events if e.event_id == event_id), None)
856
+ if event is None:
857
+ continue
858
+ if event.event_type == "mci":
859
+ if (
860
+ product.criticality in ("CRITICAL", "HIGH")
861
+ and location_id in event.params.get("locations", [])
862
+ ):
863
+ base *= event.params.get("demand_multiplier", 3.0)
864
+ elif event.event_type == "demand_surge":
865
+ if product.product_id in event.params.get("products", []):
866
+ base *= event.params.get("multiplier", 1.4)
867
+
868
+ noise = state.rng.normal(0, product.demand_std)
869
+ return max(0, int(round(base + noise)))
870
+
871
+ def _fefo_fulfill(
872
+ self, product_id: str, location_id: str, demand: int, day: int
873
+ ) -> int:
874
+ state = self._state
875
+ key = (location_id, product_id)
876
+ lots = state.inventory.get(key, [])
877
+
878
+ lots_sorted = sorted(
879
+ [l for l in lots if l.lot_id not in state.quarantined_lots and l.qty > 0],
880
+ key=lambda l: (l.expiry_day is None, l.expiry_day or 0),
881
+ )
882
+
883
+ fulfilled = 0
884
+ for lot in lots_sorted:
885
+ if fulfilled >= demand:
886
+ break
887
+ take = min(demand - fulfilled, lot.qty)
888
+ lot.qty -= take
889
+ fulfilled += take
890
+
891
+ state.inventory[key] = [l for l in lots if l.qty > 0]
892
+
893
+ if fulfilled < demand:
894
+ state.daily_stockout_count += 1
895
+
896
+ return fulfilled
897
+
898
+ def _update_active_events(self, day: int):
899
+ state = self._state
900
+ state.active_events = {
901
+ eid: last_day
902
+ for eid, last_day in state.active_events.items()
903
+ if last_day >= day
904
+ }
905
+ for event in self._task.events:
906
+ if event.trigger_day == day and event.duration_days > 0:
907
+ state.active_events[event.event_id] = day + event.duration_days - 1
908
+
909
+ def _inject_events_for_day(self, day: int):
910
+ state = self._state
911
+ for event in self._task.events:
912
+ if event.trigger_day == day:
913
+ msg = InboxMessage(
914
+ msg_id=f"MSG-{state.msg_counter:04d}",
915
+ priority=event.message.priority,
916
+ timestamp_str=f"Day {day} 06:00",
917
+ sender=event.message.sender,
918
+ subject=event.message.subject,
919
+ body=event.message.body,
920
+ read=False,
921
+ flagged=(event.message.priority == "CRITICAL"),
922
+ event_id=event.event_id,
923
+ )
924
+ state.inbox.append(msg)
925
+ state.msg_counter += 1
926
+
927
+ if event.event_type == "cold_chain_breach":
928
+ self._apply_cold_chain_breach(event)
929
+
930
+ if event.event_type == "budget_tighten":
931
+ state.budget_limit = event.params["new_budget_limit"]
932
+
933
+ if event.warning_message and event.trigger_day - 1 == day:
934
+ msg = InboxMessage(
935
+ msg_id=f"MSG-{state.msg_counter:04d}",
936
+ priority=event.warning_message.priority,
937
+ timestamp_str=f"Day {day} 18:00",
938
+ sender=event.warning_message.sender,
939
+ subject=event.warning_message.subject,
940
+ body=event.warning_message.body,
941
+ read=False,
942
+ flagged=False,
943
+ event_id=f"{event.event_id}_warning",
944
+ )
945
+ state.inbox.append(msg)
946
+ state.msg_counter += 1
947
+
948
+ def _apply_cold_chain_breach(self, event: SimEvent):
949
+ state = self._state
950
+ loc = event.params["location_id"]
951
+ prod = event.params["product_id"]
952
+ key = (loc, prod)
953
+ for lot in state.inventory.get(key, []):
954
+ state.quarantined_lots.add(lot.lot_id)
955
+
956
+ def _inject_recall_lot(self):
957
+ state = self._state
958
+ recall_lot_id = "RECALL-LOT-IV2026-9821"
959
+ for event in self._task.events:
960
+ if event.event_id == "iv_saline_recall":
961
+ qty = event.params["qty_per_location"]
962
+ product = next(
963
+ (p for p in self._task.products if p.product_id == "IV-SAL-500"), None
964
+ )
965
+ if product is None:
966
+ break
967
+ for loc_id in event.params["locations_with_lot"]:
968
+ key = (loc_id, "IV-SAL-500")
969
+ if key not in state.inventory:
970
+ state.inventory[key] = []
971
+ lot = Lot(
972
+ lot_id=recall_lot_id,
973
+ qty=qty,
974
+ expiry_day=None,
975
+ cost_per_unit=product.unit_cost,
976
+ )
977
+ state.inventory[key].append(lot)
978
+ break
979
+
980
+ def _check_recall_completion(self):
981
+ state = self._state
982
+ recall_lot_id = "RECALL-LOT-IV2026-9821"
983
+ if recall_lot_id not in state.quarantined_lots:
984
+ return
985
+ if state.recall_handled_by_day is None:
986
+ state.recall_handled_by_day = state.day
987
+
988
+ # ── Accessors used by MedchainEnvironment ──────────────────────────────
989
+
990
+ def get_last_reward(self) -> float:
991
+ return self._last_reward
992
+
993
+ def is_done(self) -> bool:
994
+ return self._done
server/tasks.py ADDED
@@ -0,0 +1,404 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Task configurations for the MedChain Env environment.
3
+
4
+ Four tasks of increasing difficulty:
5
+ 0. orientation_ward — 2 days, 1 ward, no events (intro)
6
+ 1. single_ward_stable — 3 days, 1 ward, no events
7
+ 2. multi_ward_seasonal — 6 days, 4 wards, flu surge + supplier delay
8
+ 3. hospital_network_crisis — 12 days, 4 sites, 5 overlapping crises
9
+ """
10
+
11
+ from dataclasses import dataclass, field
12
+ from typing import Dict, List, Optional
13
+
14
+
15
+ @dataclass
16
+ class Product:
17
+ product_id: str
18
+ name: str
19
+ shelf_life_days: Optional[int] # None = non-perishable
20
+ criticality: str # "CRITICAL", "HIGH", "NORMAL"
21
+ unit_cost: float
22
+ locations: List[str]
23
+ base_demand: float
24
+ demand_std: float
25
+ seasonal_amplitude: float # 0.0 = no seasonality
26
+ seasonal_period: int # 0 = no cycle
27
+ seasonal_phase: float # phase offset in radians
28
+
29
+
30
+ @dataclass
31
+ class Supplier:
32
+ supplier_id: str
33
+ name: str
34
+ base_lead_time: int
35
+ lead_time_std: float # 0.0 = deterministic
36
+ cost_multiplier: float
37
+ products: List[str]
38
+
39
+
40
+ @dataclass
41
+ class Location:
42
+ location_id: str
43
+ name: str
44
+ capacity: Optional[int] # None = unlimited
45
+
46
+
47
+ @dataclass(frozen=True)
48
+ class InboxMessageTemplate:
49
+ priority: str
50
+ sender: str
51
+ subject: str
52
+ body: str
53
+
54
+
55
+ @dataclass
56
+ class SimEvent:
57
+ event_id: str
58
+ event_type: str # "mci", "supplier_disruption", "product_recall",
59
+ # "demand_surge", "budget_tighten", "cold_chain_breach"
60
+ trigger_day: int
61
+ duration_days: int # 0 = instant/one-off
62
+ params: Dict
63
+ message: InboxMessageTemplate
64
+ warning_message: Optional[InboxMessageTemplate] # fires on trigger_day-1
65
+
66
+
67
+ @dataclass
68
+ class TaskConfig:
69
+ name: str
70
+ max_days: int
71
+ actions_per_shift: int
72
+ budget_limit: float
73
+ products: List[Product]
74
+ suppliers: List[Supplier]
75
+ locations: List[Location]
76
+ events: List[SimEvent]
77
+ initial_stock_days: int
78
+ benchmark_score: float
79
+
80
+
81
+ # ─── Task 0: Orientation Ward ────────────────────────────────────────────────
82
+
83
+ TASK0 = TaskConfig(
84
+ name="orientation_ward",
85
+ max_days=2,
86
+ actions_per_shift=5,
87
+ budget_limit=5_000.0,
88
+ initial_stock_days=1,
89
+ benchmark_score=0.88,
90
+ locations=[
91
+ Location("ward_general", "General Ward", capacity=None),
92
+ ],
93
+ products=[
94
+ Product("GLOVE-001", "Surgical Gloves (box)", None, "NORMAL", 5.0, ["ward_general"], 20, 3, 0.0, 0, 0.0),
95
+ Product("SYR-10", "Syringes 10ml", None, "NORMAL", 0.5, ["ward_general"], 30, 5, 0.0, 0, 0.0),
96
+ Product("MASK-001", "Surgical Masks", None, "NORMAL", 1.0, ["ward_general"], 25, 4, 0.0, 0, 0.0),
97
+ ],
98
+ suppliers=[
99
+ Supplier("MEDLINE", "MedLine Medical", base_lead_time=1, lead_time_std=0.0,
100
+ cost_multiplier=1.0,
101
+ products=["GLOVE-001", "SYR-10", "MASK-001"]),
102
+ ],
103
+ events=[],
104
+ )
105
+
106
+
107
+ # ─── Task 1: Single Ward Stable ──────────────────────────────────────────────
108
+
109
+ TASK1 = TaskConfig(
110
+ name="single_ward_stable",
111
+ max_days=3,
112
+ actions_per_shift=6,
113
+ budget_limit=20_000.0,
114
+ initial_stock_days=2,
115
+ benchmark_score=0.68,
116
+ locations=[
117
+ Location("ward_general", "General Ward", capacity=None),
118
+ ],
119
+ products=[
120
+ Product("GLOVE-001", "Surgical Gloves (box)", None, "NORMAL", 5.0, ["ward_general"], 50, 8, 0.0, 0, 0.0),
121
+ Product("IV-500", "IV Bags 500ml", 540, "NORMAL", 3.5, ["ward_general"], 30, 5, 0.0, 0, 0.0),
122
+ Product("SYR-10", "Syringes 10ml", None, "NORMAL", 0.5, ["ward_general"], 80, 12, 0.0, 0, 0.0),
123
+ Product("PARA-500", "Paracetamol 500mg", 360, "NORMAL", 0.1, ["ward_general"], 40, 6, 0.0, 0, 0.0),
124
+ Product("MASK-001", "Surgical Masks", None, "NORMAL", 1.0, ["ward_general"], 60, 10, 0.0, 0, 0.0),
125
+ Product("SAL-001", "Saline Solution", 720, "NORMAL", 2.5, ["ward_general"], 25, 4, 0.0, 0, 0.0),
126
+ ],
127
+ suppliers=[
128
+ Supplier("MEDLINE", "MedLine Medical", base_lead_time=2, lead_time_std=0.0,
129
+ cost_multiplier=1.0,
130
+ products=["GLOVE-001", "IV-500", "SYR-10", "PARA-500", "MASK-001", "SAL-001"]),
131
+ ],
132
+ events=[],
133
+ )
134
+
135
+
136
+ # ─── Task 2: Multi-Ward Seasonal ─────────────────────────────────────────────
137
+
138
+ TASK2 = TaskConfig(
139
+ name="multi_ward_seasonal",
140
+ max_days=6,
141
+ actions_per_shift=8,
142
+ budget_limit=50_000.0,
143
+ initial_stock_days=3,
144
+ benchmark_score=0.55,
145
+ locations=[
146
+ Location("central_pharmacy", "Central Pharmacy", capacity=15_000),
147
+ Location("ward_icu", "ICU", capacity=None),
148
+ Location("ward_emergency", "Emergency", capacity=None),
149
+ Location("ward_general", "General Medicine", capacity=None),
150
+ ],
151
+ products=[
152
+ Product("IV-500", "IV Bags 500ml", 540, "NORMAL", 3.5, ["central_pharmacy", "ward_icu"], 25, 4, 0.0, 0, 0.0),
153
+ Product("SAL-001", "Saline Solution", 720, "NORMAL", 2.5, ["central_pharmacy", "ward_icu", "ward_emergency"], 20, 3, 0.0, 0, 0.0),
154
+ Product("SYR-10", "Syringes 10ml", None, "NORMAL", 0.5, ["central_pharmacy", "ward_icu", "ward_emergency", "ward_general"], 70, 10, 0.0, 0, 0.0),
155
+ Product("ANTIVIR-01","Antiviral Medication", 365, "HIGH", 15.0, ["central_pharmacy", "ward_emergency"], 10, 2, 0.55, 365, 0.0),
156
+ Product("MASK-N95", "N95 Masks", None, "NORMAL", 2.5, ["central_pharmacy", "ward_emergency", "ward_icu"], 30, 5, 0.4, 365, 0.0),
157
+ Product("PARA-500", "Paracetamol 500mg", 360, "NORMAL", 0.1, ["central_pharmacy", "ward_general"], 40, 6, 0.3, 180, 0.0),
158
+ Product("GLOVE-001", "Surgical Gloves (box)", None, "NORMAL", 5.0, ["central_pharmacy", "ward_icu", "ward_emergency", "ward_general"], 50, 8, 0.0, 0, 0.0),
159
+ Product("SUTURE-01", "Suture Kit", None, "NORMAL", 12.0, ["central_pharmacy", "ward_general"], 8, 2, 0.0, 0, 0.0),
160
+ Product("DRAIN-01", "Surgical Drain", None, "NORMAL", 18.0, ["central_pharmacy", "ward_general"], 5, 1, 0.0, 0, 0.0),
161
+ Product("DRESS-01", "Wound Dressing Kit", None, "NORMAL", 6.0, ["central_pharmacy", "ward_general", "ward_emergency"], 20, 4, 0.0, 0, 0.0),
162
+ ],
163
+ suppliers=[
164
+ Supplier("FASTMED", "FastMed Express", base_lead_time=1, lead_time_std=0.0, cost_multiplier=1.4,
165
+ products=["IV-500", "SAL-001", "SYR-10", "ANTIVIR-01", "MASK-N95", "PARA-500", "GLOVE-001", "SUTURE-01", "DRAIN-01", "DRESS-01"]),
166
+ Supplier("MEDLINE", "MedLine Medical", base_lead_time=4, lead_time_std=0.0, cost_multiplier=1.0,
167
+ products=["IV-500", "SAL-001", "SYR-10", "ANTIVIR-01", "MASK-N95", "PARA-500", "GLOVE-001", "SUTURE-01", "DRAIN-01", "DRESS-01"]),
168
+ ],
169
+ events=[
170
+ SimEvent(
171
+ event_id="flu_surge",
172
+ event_type="demand_surge",
173
+ trigger_day=3,
174
+ duration_days=3,
175
+ params={"products": ["ANTIVIR-01", "MASK-N95", "PARA-500"], "multiplier": 1.5},
176
+ message=InboxMessageTemplate(
177
+ priority="HIGH",
178
+ sender="Regional Health Authority",
179
+ subject="Influenza Activity Alert — Demand Surge Active",
180
+ body=(
181
+ "Regional influenza activity has been elevated above seasonal baseline.\n"
182
+ "Emergency department visits up 40-55%. Surge in effect now.\n"
183
+ "Antiviral, N95, and analgesic stock levels require immediate review.\n"
184
+ "Prepare for increased ICU admissions over the next 3 days."
185
+ ),
186
+ ),
187
+ warning_message=InboxMessageTemplate(
188
+ priority="LOW",
189
+ sender="Regional Health Authority",
190
+ subject="Early Warning: Influenza Activity Rising",
191
+ body=(
192
+ "Early indicators suggest influenza activity is rising above seasonal norms.\n"
193
+ "Consider reviewing antiviral and PPE stock levels as a precaution."
194
+ ),
195
+ ),
196
+ ),
197
+ SimEvent(
198
+ event_id="medline_delay",
199
+ event_type="supplier_disruption",
200
+ trigger_day=4,
201
+ duration_days=3,
202
+ params={"supplier_id": "MEDLINE", "new_lead_time": 7, "reason": "industrial action"},
203
+ message=InboxMessageTemplate(
204
+ priority="HIGH",
205
+ sender="MedLine Medical — Supply Chain",
206
+ subject="Service Disruption Notice — Lead Time Extension",
207
+ body=(
208
+ "Due to ongoing industrial action at our primary warehouse facility,\n"
209
+ "MedLine Medical lead times are currently extended from 4 to 7 days.\n"
210
+ "This affects all standard orders. FastMed Express remains unaffected.\n"
211
+ "We apologise for the inconvenience. Reference: SUPDIS-2026-0006."
212
+ ),
213
+ ),
214
+ warning_message=None,
215
+ ),
216
+ ],
217
+ )
218
+
219
+
220
+ # ─── Task 3: Hospital Network Crisis ─────────────────────────────────────────
221
+
222
+ TASK3 = TaskConfig(
223
+ name="hospital_network_crisis",
224
+ max_days=12,
225
+ actions_per_shift=10,
226
+ budget_limit=150_000.0,
227
+ initial_stock_days=4,
228
+ benchmark_score=0.38,
229
+ locations=[
230
+ Location("regional_dc", "Regional Distribution Centre", capacity=None),
231
+ Location("hospital_a", "Hospital A", capacity=None),
232
+ Location("hospital_b", "Hospital B", capacity=None),
233
+ Location("hospital_c", "Hospital C", capacity=None),
234
+ ],
235
+ products=[
236
+ Product("B-001", "O-Negative Blood (RBC)", 42, "CRITICAL", 350.0, ["regional_dc", "hospital_a", "hospital_b", "hospital_c"], 3, 1, 0.0, 0, 0.0),
237
+ Product("B-002", "Platelet Pack", 5, "CRITICAL", 250.0, ["regional_dc", "hospital_a", "hospital_b", "hospital_c"], 2, 1, 0.0, 0, 0.0),
238
+ Product("FFP-001", "Fresh Frozen Plasma", 365, "HIGH", 120.0, ["regional_dc", "hospital_a", "hospital_b", "hospital_c"], 4, 1, 0.0, 0, 0.0),
239
+ Product("INS-001", "Insulin (opened vial)", 28, "HIGH", 45.0, ["hospital_a", "hospital_b", "hospital_c"], 8, 2, 0.0, 0, 0.0),
240
+ Product("CHEMO-01", "Chemotherapy Agent (recon.)", 7, "HIGH", 800.0, ["hospital_a", "hospital_b"], 1, 0, 0.0, 0, 0.0),
241
+ Product("IV-SAL-500","IV Saline Solution 500ml", 720, "NORMAL", 3.5, ["regional_dc", "hospital_a", "hospital_b", "hospital_c"], 60, 10, 0.0, 0, 0.0),
242
+ Product("IV-500", "IV Bags 500ml", 540, "NORMAL", 3.5, ["regional_dc", "hospital_a", "hospital_b", "hospital_c"], 40, 7, 0.0, 0, 0.0),
243
+ Product("SYR-10", "Syringes 10ml", None, "NORMAL", 0.5, ["regional_dc", "hospital_a", "hospital_b", "hospital_c"], 100,15, 0.0, 0, 0.0),
244
+ Product("GLOVE-001", "Surgical Gloves (box)", None, "NORMAL", 5.0, ["regional_dc", "hospital_a", "hospital_b", "hospital_c"], 40, 6, 0.0, 0, 0.0),
245
+ Product("PARA-500", "Paracetamol 500mg", 360, "NORMAL", 0.1, ["hospital_a", "hospital_b", "hospital_c"], 60, 8, 0.0, 0, 0.0),
246
+ Product("B-003", "AB-Positive Blood (RBC)", 42, "HIGH", 320.0, ["regional_dc", "hospital_a", "hospital_b", "hospital_c"], 2, 1, 0.0, 0, 0.0),
247
+ Product("MASK-001", "Surgical Masks", None, "NORMAL", 1.0, ["regional_dc", "hospital_a", "hospital_b", "hospital_c"], 80, 12, 0.0, 0, 0.0),
248
+ Product("DRAIN-01", "Surgical Drain", None, "NORMAL", 18.0, ["hospital_a", "hospital_b", "hospital_c"], 4, 1, 0.0, 0, 0.0),
249
+ Product("SAL-001", "Saline Solution", 720, "NORMAL", 2.5, ["regional_dc", "hospital_a", "hospital_b", "hospital_c"], 30, 5, 0.0, 0, 0.0),
250
+ Product("DRESS-01", "Wound Dressing Kit", None, "NORMAL", 6.0, ["hospital_a", "hospital_b", "hospital_c"], 25, 4, 0.0, 0, 0.0),
251
+ ],
252
+ suppliers=[
253
+ Supplier("BLOODBANK-A", "Regional Blood Bank", base_lead_time=1, lead_time_std=0.5, cost_multiplier=1.0,
254
+ products=["B-001", "B-002", "FFP-001", "B-003"]),
255
+ Supplier("SUPPLIER-A", "HealthCo Supplies", base_lead_time=3, lead_time_std=0.5, cost_multiplier=1.0,
256
+ products=["IV-SAL-500", "IV-500", "SYR-10", "GLOVE-001", "PARA-500", "MASK-001", "DRAIN-01", "SAL-001", "DRESS-01"]),
257
+ Supplier("SUPPLIER-B", "MedFast Express", base_lead_time=2, lead_time_std=0.0, cost_multiplier=1.4,
258
+ products=["IV-SAL-500", "IV-500", "SYR-10", "GLOVE-001", "PARA-500", "MASK-001", "DRAIN-01", "SAL-001", "DRESS-01"]),
259
+ Supplier("PHARMA-X", "PharmaCorp", base_lead_time=3, lead_time_std=0.5, cost_multiplier=1.0,
260
+ products=["INS-001", "CHEMO-01", "PARA-500"]),
261
+ ],
262
+ events=[
263
+ SimEvent(
264
+ event_id="cold_chain_breach",
265
+ event_type="cold_chain_breach",
266
+ trigger_day=3,
267
+ duration_days=0,
268
+ params={
269
+ "location_id": "regional_dc",
270
+ "product_id": "B-002",
271
+ "qty_affected": "all",
272
+ },
273
+ message=InboxMessageTemplate(
274
+ priority="CRITICAL",
275
+ sender="Regional DC Facilities Management",
276
+ subject="URGENT: Cold Chain Breach — Platelet Inventory Compromised",
277
+ body=(
278
+ "Cold chain monitoring detected temperature excursion in Fridge Unit 3.\n"
279
+ "Recorded +8°C for 4 hours (safe range: +20 to +24°C).\n"
280
+ "ALL platelet units at Regional DC are presumed compromised and auto-quarantined.\n"
281
+ "Affected product: Platelet Pack (B-002) — ALL lots at regional_dc.\n"
282
+ "Action required:\n"
283
+ " 1. Arrange emergency replacement order from BLOODBANK-A\n"
284
+ " 2. Notify clinical teams — platelet supply now limited to hospital stock only.\n"
285
+ "Ref: CCBR-2026-0003"
286
+ ),
287
+ ),
288
+ warning_message=None,
289
+ ),
290
+ SimEvent(
291
+ event_id="supplier_a_disruption",
292
+ event_type="supplier_disruption",
293
+ trigger_day=6,
294
+ duration_days=9,
295
+ params={"supplier_id": "SUPPLIER-A", "new_lead_time": 7, "reason": "force majeure — flu absenteeism"},
296
+ message=InboxMessageTemplate(
297
+ priority="HIGH",
298
+ sender="HealthCo Supplies — Logistics",
299
+ subject="Force Majeure Notice — Extended Lead Times",
300
+ body=(
301
+ "HealthCo Supplies hereby provides formal notice of force majeure event.\n"
302
+ "Widespread workforce absenteeism due to influenza has impacted fulfilment operations.\n"
303
+ "Effective immediately, standard lead times are extended from 3 to 7 days.\n"
304
+ "All product lines affected. MedFast Express (SUPPLIER-B) is unaffected.\n"
305
+ "Ref: FM-2026-0006"
306
+ ),
307
+ ),
308
+ warning_message=None,
309
+ ),
310
+ SimEvent(
311
+ event_id="mci_warning",
312
+ event_type="demand_surge",
313
+ trigger_day=8,
314
+ duration_days=0,
315
+ params={},
316
+ message=InboxMessageTemplate(
317
+ priority="HIGH",
318
+ sender="Emergency Management Coordination",
319
+ subject="Mass Casualty Incident — STANDBY",
320
+ body=(
321
+ "Multi-vehicle collision reported on Interstate highway.\n"
322
+ "Current status: STANDBY. Incident Command not yet activated.\n"
323
+ "Preliminary estimate: 15-25 critically injured.\n"
324
+ "All blood banks placed on AMBER alert.\n"
325
+ "Recommend pre-emptive review of O-neg and platelet stock levels NOW.\n"
326
+ "Further update to follow."
327
+ ),
328
+ ),
329
+ warning_message=None,
330
+ ),
331
+ SimEvent(
332
+ event_id="mci_activation",
333
+ event_type="mci",
334
+ trigger_day=9,
335
+ duration_days=3,
336
+ params={
337
+ "products": ["B-001", "B-002", "FFP-001"],
338
+ "demand_multiplier": 3.0,
339
+ "locations": ["hospital_a", "hospital_b", "hospital_c"],
340
+ "mci_tracking": True,
341
+ },
342
+ message=InboxMessageTemplate(
343
+ priority="CRITICAL",
344
+ sender="Incident Command System",
345
+ subject="MCI ACTIVATION — Mass Casualty Event",
346
+ body=(
347
+ "INCIDENT COMMAND ACTIVATED.\n"
348
+ "Multi-vehicle collision — confirmed 23 critically injured en route.\n"
349
+ "Hospital A ETA: 18 min | Hospital B ETA: 29 min | Hospital C ETA: 41 min.\n"
350
+ "Trauma surgery teams mobilised at all hospitals.\n"
351
+ "All blood banks placed on RED alert.\n"
352
+ "Est. blood product requirements: 60-90 units O-neg RBC, 30-40 platelet packs.\n"
353
+ "IMMEDIATE ACTION: Verify blood product inventory and initiate emergency procurement."
354
+ ),
355
+ ),
356
+ warning_message=None,
357
+ ),
358
+ SimEvent(
359
+ event_id="iv_saline_recall",
360
+ event_type="product_recall",
361
+ trigger_day=11,
362
+ duration_days=0,
363
+ params={
364
+ "product_id": "IV-SAL-500",
365
+ "lot_id": "RECALL-LOT-IV2026-9821",
366
+ "locations_with_lot": ["hospital_a", "hospital_b", "hospital_c", "regional_dc"],
367
+ "qty_per_location": 60,
368
+ },
369
+ message=InboxMessageTemplate(
370
+ priority="CRITICAL",
371
+ sender="Pharmacy Automated System",
372
+ subject="MANDATORY RECALL — IV Saline Solution 500ml",
373
+ body=(
374
+ "MANDATORY RECALL — Health Authority ref #HA-2026-0013\n"
375
+ "Product: IV Saline Solution 500ml (IV-SAL-500)\n"
376
+ "Affected lot: RECALL-LOT-IV2026-9821\n"
377
+ "Reason: Potential endotoxin contamination detected in batch.\n"
378
+ "ACTION REQUIRED:\n"
379
+ " 1. Query inventory at ALL locations for lot RECALL-LOT-IV2026-9821\n"
380
+ " 2. Quarantine ALL affected units immediately — do not use\n"
381
+ " 3. Submit replacement order\n"
382
+ "Supplier contact: SUPPLIER-A case #88291. Ref: RECALL-2026-0013."
383
+ ),
384
+ ),
385
+ warning_message=None,
386
+ ),
387
+ ],
388
+ )
389
+
390
+
391
+ def get_task_config(task_name: str) -> TaskConfig:
392
+ if task_name == "orientation_ward":
393
+ return TASK0
394
+ elif task_name == "single_ward_stable":
395
+ return TASK1
396
+ elif task_name == "multi_ward_seasonal":
397
+ return TASK2
398
+ elif task_name == "hospital_network_crisis":
399
+ return TASK3
400
+ else:
401
+ raise ValueError(
402
+ f"Unknown task: {task_name!r}. "
403
+ "Choose from: orientation_ward, single_ward_stable, multi_ward_seasonal, hospital_network_crisis"
404
+ )
test.py ADDED
@@ -0,0 +1,152 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Test inference script for medchain_env simulating runs without LLM calls.
3
+ """
4
+
5
+ import asyncio
6
+ import logging
7
+ import sys
8
+ from pathlib import Path
9
+
10
+ # Add project root to sys.path so we can import medchain_env
11
+ sys.path.insert(0, str(Path(__file__).parent))
12
+
13
+ from medchain_env import CallToolAction, MedchainEnv
14
+
15
+ _log_fmt = logging.Formatter("%(asctime)s [%(levelname)s] %(message)s", datefmt="%H:%M:%S")
16
+
17
+ class DualWriter:
18
+ def __init__(self, filepath):
19
+ self.terminal = sys.stdout
20
+ self.log = open(filepath, "w")
21
+
22
+ def write(self, message):
23
+ self.terminal.write(message)
24
+ self.log.write(message)
25
+ self.log.flush()
26
+
27
+ def flush(self):
28
+ self.terminal.flush()
29
+ self.log.flush()
30
+
31
+ sys.stdout = DualWriter("test_inference.log")
32
+
33
+ _stream_handler = logging.StreamHandler(sys.stdout)
34
+ _stream_handler.setFormatter(_log_fmt)
35
+ logging.basicConfig(level=logging.INFO, handlers=[_stream_handler])
36
+ log = logging.getLogger(__name__)
37
+
38
+ async def run_test(task_name: str, actions_to_take: list):
39
+ log.info("Starting task: %s", task_name)
40
+
41
+ # We use the same Docker environment as in inference_medchain_env.py
42
+ env = await MedchainEnv.from_docker_image(
43
+ "medchain_env-env:latest",
44
+ env_vars={"MEDCHAIN_TASK": task_name},
45
+ )
46
+
47
+ try:
48
+ log.info("[%s] Docker env started", task_name)
49
+ mcp_tools = await env.list_tools()
50
+ tool_names = [t.name for t in mcp_tools]
51
+ log.info("Available tools: %s", tool_names)
52
+
53
+ obs = await env.reset()
54
+ obs = obs.observation
55
+ dashboard = obs.metadata.get("dashboard", "")
56
+ log.info("[%s] env.reset() complete. done=%s metadata_keys=%s",
57
+ task_name, obs.done, list(obs.metadata.keys()))
58
+
59
+ print(f"\n{'=' * 60}")
60
+ print(f"TASK: {task_name}")
61
+ print(f"{'=' * 60}")
62
+ print(dashboard[:500])
63
+
64
+ step_count = 0
65
+ final_reward = 0.0
66
+ done = obs.done
67
+
68
+ for act_dict in actions_to_take:
69
+ if done:
70
+ log.info("[%s] Episode already done before taking action %s", task_name, act_dict["tool_name"])
71
+ break
72
+
73
+ step_count += 1
74
+ tool_name = act_dict["tool_name"]
75
+ tool_args = act_dict["arguments"]
76
+
77
+ print(f"\n{'─' * 60}")
78
+ print(f"[{task_name}] Step {step_count} — predefined action")
79
+ print(f"{'─' * 60}")
80
+
81
+ print(f"\n[{task_name}] Step {step_count} — SIMULATED AGENT RESPONSE:")
82
+ print(f" TOOL CALL: {tool_name}({tool_args})")
83
+
84
+ log.info("[%s] Step %d - calling tool: %s(%s)", task_name, step_count, tool_name, tool_args)
85
+ action = CallToolAction(tool_name=tool_name, arguments=tool_args)
86
+ step_result = await env.step(action)
87
+ obs = step_result.observation
88
+ done = obs.done
89
+
90
+ result_text = obs.metadata.get("tool_result", str(obs.metadata))
91
+
92
+ if "EPISODE COMPLETE" in (result_text or ""):
93
+ log.info("[%s] Step %d - 'EPISODE COMPLETE' detected in result text; marking done", task_name, step_count)
94
+ done = True
95
+
96
+ print(f"\n[{task_name}] Step {step_count} — SERVER RESPONSE (tool_result):")
97
+ print(f" {(result_text or 'EMPTY')[:500]}")
98
+ log.info("[%s] Step %d - env.step() returned. done=%s reward=%s result_preview=%r",
99
+ task_name, step_count, done, obs.reward, (result_text or "")[:120])
100
+
101
+ if obs.reward is not None and obs.reward > 0:
102
+ final_reward = obs.reward
103
+ print(f" Reward: {obs.reward:.4f} | Done: {done}")
104
+
105
+ # Sleep slightly to replicate inference_medchain_env behaviour and prevent overwhelming
106
+ await asyncio.sleep(0.1)
107
+
108
+ log.info("[%s] Episode finished. steps=%d done=%s final_reward=%.4f", task_name, step_count, done, final_reward)
109
+ print(f" Final reward: {final_reward:.4f} | Steps: {step_count} | Done: {done}")
110
+
111
+ finally:
112
+ await env.close()
113
+
114
+ async def main():
115
+ # Intro/easy task: orientation_ward (2 days, explore tools, place 1 order)
116
+ intro_actions = [
117
+ {"tool_name": "read_inbox", "arguments": {"filter": "all"}},
118
+ {"tool_name": "query_erp", "arguments": {"table": "inventory"}},
119
+ {"tool_name": "submit_po", "arguments": {"supplier_id": "MEDLINE", "product_id": "GLOVE-001", "destination_id": "ward_general", "quantity": 40}},
120
+ {"tool_name": "submit_po", "arguments": {"supplier_id": "MEDLINE", "product_id": "SYR-10", "destination_id": "ward_general", "quantity": 60}},
121
+ {"tool_name": "end_shift", "arguments": {}},
122
+ {"tool_name": "end_shift", "arguments": {}},
123
+ ]
124
+ await run_test("orientation_ward", intro_actions)
125
+
126
+ # Medium task: single_ward_stable (3 days, 6 products, no events)
127
+ easy_actions = [
128
+ {"tool_name": "read_inbox", "arguments": {"filter": "unread"}},
129
+ {"tool_name": "query_erp", "arguments": {"table": "inventory"}},
130
+ {"tool_name": "submit_po", "arguments": {"supplier_id": "MEDLINE", "product_id": "IV-500", "destination_id": "ward_general", "quantity": 100}},
131
+ {"tool_name": "end_shift", "arguments": {}},
132
+ {"tool_name": "end_shift", "arguments": {}},
133
+ {"tool_name": "end_shift", "arguments": {}},
134
+ {"tool_name": "end_shift", "arguments": {}},
135
+ ]
136
+ await run_test("single_ward_stable", easy_actions)
137
+
138
+ # Medium-hard task: multi_ward_seasonal (6 days, flu surge + supplier delay)
139
+ medium_actions = [
140
+ {"tool_name": "read_inbox", "arguments": {"filter": "unread"}},
141
+ {"tool_name": "query_erp", "arguments": {"table": "inventory"}},
142
+ {"tool_name": "transfer", "arguments": {"from_location_id": "central_pharmacy", "to_location_id": "ward_icu", "product_id": "IV-500", "quantity": 50}},
143
+ {"tool_name": "submit_po", "arguments": {"supplier_id": "MEDLINE", "product_id": "IV-500", "destination_id": "central_pharmacy", "quantity": 200}},
144
+ ]
145
+ # Add enough end_shift actions to finish the 6-day episode
146
+ for _ in range(7):
147
+ medium_actions.append({"tool_name": "end_shift", "arguments": {}})
148
+
149
+ await run_test("multi_ward_seasonal", medium_actions)
150
+
151
+ if __name__ == "__main__":
152
+ asyncio.run(main())
uv.lock ADDED
The diff for this file is too large to render. See raw diff
 
validate-submission.sh ADDED
@@ -0,0 +1,185 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env bash
2
+ #
3
+ # validate-submission.sh — OpenEnv Submission Validator
4
+ #
5
+ # Checks that your HF Space is live, Docker image builds, and openenv validate passes.
6
+ #
7
+ # Prerequisites:
8
+ # - Docker: https://docs.docker.com/get-docker/
9
+ # - openenv-core: pip install openenv-core
10
+ # - curl (usually pre-installed)
11
+ #
12
+ # Run:
13
+ # curl -fsSL https://raw.githubusercontent.com/<owner>/<repo>/main/scripts/validate-submission.sh | bash -s -- <ping_url> [repo_dir]
14
+ #
15
+ # Or download and run locally:
16
+ # chmod +x validate-submission.sh
17
+ # ./validate-submission.sh <ping_url> [repo_dir]
18
+ #
19
+ # Arguments:
20
+ # ping_url Your HuggingFace Space URL (e.g. https://your-space.hf.space)
21
+ # repo_dir Path to your repo (default: current directory)
22
+ #
23
+ # Examples:
24
+ # ./validate-submission.sh https://my-team.hf.space
25
+ # ./validate-submission.sh https://my-team.hf.space ./my-repo
26
+ #
27
+
28
+ set -uo pipefail
29
+
30
+ DOCKER_BUILD_TIMEOUT=600
31
+ if [ -t 1 ]; then
32
+ RED='\033[0;31m'
33
+ GREEN='\033[0;32m'
34
+ YELLOW='\033[1;33m'
35
+ BOLD='\033[1m'
36
+ NC='\033[0m'
37
+ else
38
+ RED='' GREEN='' YELLOW='' BOLD='' NC=''
39
+ fi
40
+
41
+ run_with_timeout() {
42
+ local secs="$1"; shift
43
+ if command -v timeout &>/dev/null; then
44
+ timeout "$secs" "$@"
45
+ elif command -v gtimeout &>/dev/null; then
46
+ gtimeout "$secs" "$@"
47
+ else
48
+ "$@" &
49
+ local pid=$!
50
+ ( sleep "$secs" && kill "$pid" 2>/dev/null ) &
51
+ local watcher=$!
52
+ wait "$pid" 2>/dev/null
53
+ local rc=$?
54
+ kill "$watcher" 2>/dev/null
55
+ wait "$watcher" 2>/dev/null
56
+ return $rc
57
+ fi
58
+ }
59
+
60
+ portable_mktemp() {
61
+ local prefix="${1:-validate}"
62
+ mktemp "${TMPDIR:-/tmp}/${prefix}-XXXXXX" 2>/dev/null || mktemp
63
+ }
64
+
65
+ CLEANUP_FILES=()
66
+ cleanup() { rm -f "${CLEANUP_FILES[@]+"${CLEANUP_FILES[@]}"}"; }
67
+ trap cleanup EXIT
68
+
69
+ PING_URL="${1:-}"
70
+ REPO_DIR="${2:-.}"
71
+
72
+ if [ -z "$PING_URL" ]; then
73
+ printf "Usage: %s <ping_url> [repo_dir]\n" "$0"
74
+ printf "\n"
75
+ printf " ping_url Your HuggingFace Space URL (e.g. https://your-space.hf.space)\n"
76
+ printf " repo_dir Path to your repo (default: current directory)\n"
77
+ exit 1
78
+ fi
79
+
80
+ if ! REPO_DIR="$(cd "$REPO_DIR" 2>/dev/null && pwd)"; then
81
+ printf "Error: directory '%s' not found\n" "${2:-.}"
82
+ exit 1
83
+ fi
84
+ PING_URL="${PING_URL%/}"
85
+ export PING_URL
86
+ PASS=0
87
+
88
+ log() { printf "[%s] %b\n" "$(date -u +%H:%M:%S)" "$*"; }
89
+ pass() { log "${GREEN}PASSED${NC} -- $1"; PASS=$((PASS + 1)); }
90
+ fail() { log "${RED}FAILED${NC} -- $1"; }
91
+ hint() { printf " ${YELLOW}Hint:${NC} %b\n" "$1"; }
92
+ stop_at() {
93
+ printf "\n"
94
+ printf "${RED}${BOLD}Validation stopped at %s.${NC} Fix the above before continuing.\n" "$1"
95
+ exit 1
96
+ }
97
+
98
+ printf "\n"
99
+ printf "${BOLD}========================================${NC}\n"
100
+ printf "${BOLD} OpenEnv Submission Validator${NC}\n"
101
+ printf "${BOLD}========================================${NC}\n"
102
+ log "Repo: $REPO_DIR"
103
+ log "Ping URL: $PING_URL"
104
+ printf "\n"
105
+
106
+ log "${BOLD}Step 1/3: Pinging HF Space${NC} ($PING_URL/reset) ..."
107
+
108
+ CURL_OUTPUT=$(portable_mktemp "validate-curl")
109
+ CLEANUP_FILES+=("$CURL_OUTPUT")
110
+ HTTP_CODE=$(curl -s -o "$CURL_OUTPUT" -w "%{http_code}" -X POST \
111
+ -H "Content-Type: application/json" -d '{}' \
112
+ "$PING_URL/reset" --max-time 30 2>"$CURL_OUTPUT" || printf "000")
113
+
114
+ if [ "$HTTP_CODE" = "200" ]; then
115
+ pass "HF Space is live and responds to /reset"
116
+ elif [ "$HTTP_CODE" = "000" ]; then
117
+ fail "HF Space not reachable (connection failed or timed out)"
118
+ hint "Check your network connection and that the Space is running."
119
+ hint "Try: curl -s -o /dev/null -w '%%{http_code}' -X POST $PING_URL/reset"
120
+ stop_at "Step 1"
121
+ else
122
+ fail "HF Space /reset returned HTTP $HTTP_CODE (expected 200)"
123
+ hint "Make sure your Space is running and the URL is correct."
124
+ hint "Try opening $PING_URL in your browser first."
125
+ stop_at "Step 1"
126
+ fi
127
+
128
+ log "${BOLD}Step 2/3: Running docker build${NC} ..."
129
+
130
+ if ! command -v docker &>/dev/null; then
131
+ fail "docker command not found"
132
+ hint "Install Docker: https://docs.docker.com/get-docker/"
133
+ stop_at "Step 2"
134
+ fi
135
+
136
+ if [ -f "$REPO_DIR/Dockerfile" ]; then
137
+ DOCKER_CONTEXT="$REPO_DIR"
138
+ elif [ -f "$REPO_DIR/server/Dockerfile" ]; then
139
+ DOCKER_CONTEXT="$REPO_DIR/server"
140
+ else
141
+ fail "No Dockerfile found in repo root or server/ directory"
142
+ stop_at "Step 2"
143
+ fi
144
+
145
+ log " Found Dockerfile in $DOCKER_CONTEXT"
146
+
147
+ BUILD_OK=false
148
+ BUILD_OUTPUT=$(run_with_timeout "$DOCKER_BUILD_TIMEOUT" docker build "$DOCKER_CONTEXT" 2>&1) && BUILD_OK=true
149
+
150
+ if [ "$BUILD_OK" = true ]; then
151
+ pass "Docker build succeeded"
152
+ else
153
+ fail "Docker build failed (timeout=${DOCKER_BUILD_TIMEOUT}s)"
154
+ printf "%s\n" "$BUILD_OUTPUT" | tail -20
155
+ stop_at "Step 2"
156
+ fi
157
+
158
+ log "${BOLD}Step 3/3: Running openenv validate${NC} ..."
159
+
160
+ if ! command -v openenv &>/dev/null; then
161
+ fail "openenv command not found"
162
+ hint "Install it: pip install openenv-core"
163
+ stop_at "Step 3"
164
+ fi
165
+
166
+ VALIDATE_OK=false
167
+ VALIDATE_OUTPUT=$(cd "$REPO_DIR" && openenv validate 2>&1) && VALIDATE_OK=true
168
+
169
+ if [ "$VALIDATE_OK" = true ]; then
170
+ pass "openenv validate passed"
171
+ [ -n "$VALIDATE_OUTPUT" ] && log " $VALIDATE_OUTPUT"
172
+ else
173
+ fail "openenv validate failed"
174
+ printf "%s\n" "$VALIDATE_OUTPUT"
175
+ stop_at "Step 3"
176
+ fi
177
+
178
+ printf "\n"
179
+ printf "${BOLD}========================================${NC}\n"
180
+ printf "${GREEN}${BOLD} All 3/3 checks passed!${NC}\n"
181
+ printf "${GREEN}${BOLD} Your submission is ready to submit.${NC}\n"
182
+ printf "${BOLD}========================================${NC}\n"
183
+ printf "\n"
184
+
185
+ exit 0