Spaces:

Majen
/

recalltrace-openenv

Runtime error

App Files Files Community

Majen commited on 20 days ago

Commit

4d13031

verified ·

1 Parent(s): 5eb405c

Initial submission

Browse files

Files changed (24) hide show

Dockerfile +16 -0
README.md +143 -32
baseline/__init__.py +1 -0
baseline/policy.py +100 -0
config/openenv.yaml +48 -156
docker/Dockerfile +16 -0
grader/__init__.py +1 -0
grader/grader.py +57 -220
inference.py +82 -0
inference/inference.py +9 -171
inference/policy.py +100 -0
openenv.yaml +48 -156
pyproject.toml +23 -35
requirements.txt +5 -2
scenario/__init__.py +1 -0
scenario/scenario.py +363 -189
server.py +5 -0
server/__init__.py +1 -0
server/app.py +152 -31
server/static/app.js +222 -0
server/static/index.html +149 -0
server/static/styles.css +499 -0
tests/test_env.py +72 -0
uv.toml +3 -0

Dockerfile ADDED Viewed

	@@ -0,0 +1,16 @@

+FROM python:3.12-slim
+WORKDIR /app
+ENV PYTHONDONTWRITEBYTECODE=1 \
+    PYTHONUNBUFFERED=1 \
+    PORT=7860
+COPY requirements.txt ./
+RUN pip install --no-cache-dir -r requirements.txt
+COPY . .
+EXPOSE 7860
+CMD ["uvicorn", "server.app:app", "--host", "0.0.0.0", "--port", "7860"]

README.md CHANGED Viewed

@@ -1,32 +1,143 @@
----
-title: RecallTrace OpenEnv
-emoji: 🔍
-colorFrom: blue
-colorTo: green
-sdk: gradio
-sdk_version: "5.29.1"
-app_file: app.py
-pinned: false
----
-# RecallTrace OpenEnv
-A fully offline OpenEnv environment simulating **product recall traceability and containment** across a supply-chain network.
-**Team:** Shamanth MS · P G Ayush Rai · Shreya B J
-## What it does
-An AI agent must:
-- Trace contaminated lots through a shipment graph
-- Quarantine affected inventory precisely
-- Notify relevant nodes
-- Avoid disrupting safe stock
-## Tasks
-- **Easy** — Direct Recall (single contaminated lot)
-- **Medium** — Relabeled Inventory (lot repacked mid-chain)
-- **Hard** — Mixed Shipments (safe + unsafe co-shipped)
-## Scoring
-Final score: **0.0 → 1.0** based on quarantine accuracy + notification coverage.

+---
+title: RecallTrace OpenEnv
+emoji: 🚨
+colorFrom: red
+colorTo: blue
+sdk: docker
+app_port: 7860
+---
+# 🚀 RecallTrace OpenEnv
+RecallTrace is a **real-world AI environment** designed for **product recall tracing and precision containment**.
+It simulates how companies handle:
+- contaminated product recalls
+- supply chain tracing
+- selective quarantine decisions
+This environment evaluates **agent reasoning + decision-making**, not just correctness.
+---
+# 🧠 What This Environment Does
+Given a recall notice (e.g., *"Lot A is contaminated"*), the agent must:
+1. Trace where the product went
+2. Identify affected nodes (warehouses, stores)
+3. Handle relabeling / transformations
+4. Quarantine **only unsafe inventory**
+5. Avoid blocking safe stock
+6. Notify affected entities
+7. Finalize with correct containment
+---
+# 🎯 Why This Is Important
+This is a **real industry problem** seen in:
+- food recalls
+- pharma defects
+- logistics failures
+Challenges include:
+- Graph traversal
+- Partial observability
+- Lot transformations
+- Mixed inventory reasoning
+- Precision decision-making
+---
+# 🧩 Tasks (Scenarios)
+## 🔹 Easy — Direct Recall
+- Single contaminated lot
+- Straight supply chain
+- Goal: trace and quarantine correctly
+---
+## 🔹 Medium — Relabeled Inventory
+- Lot gets renamed (LotA → LotA1)
+- Goal: track transformations and quarantine
+---
+## 🔹 Hard — Mixed Inventory
+- Contaminated + safe stock mixed
+- Goal: isolate unsafe quantity **without over-blocking**
+---
+# ⚙️ Action Space
+| Action | Description |
+|------|------------|
+| inspect_node | View inventory at a node |
+| trace_lot | Follow product lineage |
+| quarantine | Block unsafe stock |
+| notify | Inform affected nodes |
+| finalize | End task |
+---
+# 📦 Observation Structure
+Each step returns:
+- recall_notice
+- inventory
+- action history
+- trace results
+- inspection data
+---
+# 🏆 Reward & Grading
+### Reward System
+- + Correct tracing
+- + Correct quarantine
+- + Correct notification
+- − Wrong node
+- − Over-quarantine
+- − Missed unsafe stock
+---
+### Final Score
+Range: **0.0 → 1.0**
+Based on:
+- accuracy
+- precision
+- efficiency
+---
+# 🧱 Project Structure
+```bash
+recalltrace-openenv/
+│
+├── env/                # Environment logic
+│   ├── env.py
+│   └── __init__.py
+│
+├── scenario/           # Scenario generation
+│   └── scenario.py
+│
+├── grader/             # Evaluation + reward
+│   └── grader.py
+│
+├── inference/          # Agent simulation
+│   └── inference.py
+│
+├── config/
+│   └── openenv.yaml
+│
+├── Dockerfile
+├── requirements.txt
+├── README.md

baseline/__init__.py ADDED Viewed

	@@ -0,0 +1 @@


1	+ """Baseline agent helpers for RecallTrace."""

baseline/policy.py ADDED Viewed

	@@ -0,0 +1,100 @@

+"""Heuristic baseline policy for RecallTrace."""
+from __future__ import annotations
+import json
+import re
+from typing import Any, Dict, Optional
+from openai import OpenAI
+from env.models import RecallAction, RecallObservation
+LOT_PATTERN = re.compile(r"\bLot[A-Za-z0-9_]+\b")
+def _extract_root_lot(observation: RecallObservation) -> str:
+    match = LOT_PATTERN.search(observation.recall_notice)
+    return match.group(0) if match else "LotA"
+def choose_heuristic_action(observation: RecallObservation) -> RecallAction:
+    """Choose the next deterministic action using only observable state."""
+    root_lot = _extract_root_lot(observation)
+    trace_result = observation.trace_results.get(root_lot)
+    if trace_result is None:
+        return RecallAction(type="trace_lot", lot_id=root_lot, rationale="Map the recall lineage first.")
+    affected_nodes = trace_result.get("affected_nodes", [])
+    for node_id in affected_nodes:
+        if node_id not in observation.inspected_nodes:
+            return RecallAction(type="inspect_node", node_id=node_id, rationale="Collect local evidence before quarantining.")
+    for node_id, findings in observation.inspection_results.items():
+        for lot_id, finding in findings.items():
+            unsafe_quantity = finding.unsafe_quantity
+            quarantined_quantity = observation.quarantined_inventory.get(node_id, {}).get(lot_id, 0)
+            available_quantity = observation.inventory.get(node_id, {}).get(lot_id, 0)
+            remaining_target = unsafe_quantity - quarantined_quantity
+            if remaining_target > 0 and available_quantity > 0:
+                return RecallAction(
+                    type="quarantine",
+                    node_id=node_id,
+                    lot_id=lot_id,
+                    quantity=min(remaining_target, available_quantity),
+                    rationale="Isolate the exact unsafe quantity discovered during inspection.",
+                )
+    missing_notifications = [node_id for node_id in affected_nodes if node_id not in observation.notified_nodes]
+    if missing_notifications:
+        return RecallAction(type="notify", node_id="all", rationale="Alert every impacted stakeholder before closing the incident.")
+    return RecallAction(type="finalize", rationale="Containment actions are complete.")
+def choose_llm_action(
+    client: Optional[OpenAI],
+    model_name: str,
+    observation: RecallObservation,
+    history: list[dict[str, Any]],
+) -> Optional[RecallAction]:
+    """Ask an LLM for the next action, returning None on failure."""
+    if client is None:
+        return None
+    prompt = {
+        "task_id": observation.task_id,
+        "phase": observation.phase,
+        "notice": observation.recall_notice,
+        "inventory": observation.inventory,
+        "inspection_results": {
+            node_id: {lot_id: evidence.model_dump() for lot_id, evidence in findings.items()}
+            for node_id, findings in observation.inspection_results.items()
+        },
+        "trace_results": observation.trace_results,
+        "notified_nodes": observation.notified_nodes,
+        "quarantined_inventory": observation.quarantined_inventory,
+        "steps_taken": observation.steps_taken,
+        "remaining_step_budget": observation.remaining_step_budget,
+        "history": history[-6:],
+        "instruction": "Return only compact JSON with keys type,node_id,lot_id,quantity,rationale. Use one valid action.",
+    }
+    try:
+        completion = client.chat.completions.create(
+            model=model_name,
+            temperature=0,
+            max_tokens=180,
+            messages=[
+                {"role": "system", "content": "You are operating a deterministic product recall environment. Respond with only valid JSON for the next action."},
+                {"role": "user", "content": json.dumps(prompt, sort_keys=True)},
+            ],
+        )
+        text = (completion.choices[0].message.content or "").strip()
+        if not text:
+            return None
+        return RecallAction.model_validate_json(text)
+    except Exception:
+        return None

config/openenv.yaml CHANGED Viewed

@@ -1,156 +1,48 @@
-# RecallTrace OpenEnv — OpenEnv Spec (Task 5B — P G Ayush Rai)
-# Defines action space, observation models, tasks, and scoring contract.
-environment:
-  name: RecallTraceEnv
-  version: "1.0.0"
-  description: >
-    A fully offline, deterministic OpenEnv environment simulating product recall
-    traceability and containment across a supply-chain network.
-    An AI agent must identify contaminated lots, trace their movement through a
-    shipment graph, quarantine affected inventory precisely, and notify relevant nodes.
-# ── Action Space ─────────────────────────────────────────────────────────────
-actions:
-  - name: inspect_node
-    params:
-      node_id: str
-    description: >
-      Reveal the full inventory and outbound shipment edges of a node.
-      Adds the node to the discovered subgraph.
-    reward_hint: small positive for new nodes; 0 for repeat inspections
-  - name: trace_lot
-    params:
-      lot_id: str
-    description: >
-      Trace a lot across all nodes in the network — reveals which nodes hold it
-      and in what quantities (available + quarantined combined).
-    reward_hint: positive if lot is contaminated; small positive otherwise
-  - name: quarantine
-    params:
-      node_id: str
-      lot_id: str
-      quantity: int       # units to quarantine; defaults to full available stock
-    description: >
-      Move a specified quantity of a lot from active inventory to quarantine
-      at the given node. Excess quarantine (over correct qty) is penalised.
-    reward_hint: +0.4 correct exact; +0.2 partial; -0.3 wrong lot; -0.15 over-qty
-  - name: notify
-    params:
-      node_id: str        # or "all" to notify every node at once
-    description: >
-      Send a recall alert to a node or all nodes.
-      Rewarded only for affected nodes; penalised for unnecessary notifications.
-    reward_hint: +0.1 per correctly notified affected node; -0.05 for unneeded
-  - name: finalize
-    params: {}
-    description: >
-      Submit the containment plan. Triggers final scoring. Episode ends.
-    reward_hint: returns final_score in [0.0, 1.0]
-# ── Observation Space ─────────────────────────────────────────────────────────
-observation:
-  recall_notice:
-    type: str
-    description: Human-readable contamination alert issued at episode start
-  inventory:
-    type: dict
-    description: >
-      Full inventory snapshot across all nodes.
-      { node_id: { lot_id: quantity } }
-  discovered_shipments:
-    type: dict
-    description: >
-      Outbound shipment edges revealed so far (only for inspected nodes).
-      { node_id: [downstream_node_id, ...] }
-  history:
-    type: list[str]
-    description: Ordered log of all actions taken this episode
-  inspected_nodes:
-    type: list[str]
-    description: Sorted list of nodes that have been inspected
-  notified_nodes:
-    type: list[str]
-    description: Sorted list of nodes that have been sent recall alerts
-  quarantined_inventory:
-    type: dict
-    description: >
-      Inventory currently in quarantine (non-empty nodes only).
-      { node_id: { lot_id: quantity } }
-# ── Tasks ─────────────────────────────────────────────────────────────────────
-tasks:
-  - id: easy
-    name: "Task 1 — Direct Recall"
-    assign: "Shreya B J"
-    description: >
-      Single contaminated lot (LotA) distributed across a linear
-      warehouse → store1 → store2 chain. No relabeling.
-    nodes: [warehouse, store1, store2]
-    contaminated_lots: [LotA]
-  - id: medium
-    name: "Task 2 — Relabeled Inventory"
-    assign: "Shreya B J"
-    description: >
-      LotA is contaminated; it was repacked and relabeled as LotA1
-      at the distribution centre. Agent must trace the transformation.
-    nodes: [warehouse, dist_centre, store_north, store_south]
-    contaminated_lots: [LotA, LotA1]
-  - id: hard
-    name: "Task 3 — Mixed Shipments"
-    assign: "Shreya B J"
-    description: >
-      Two contaminated lots (LotX, LotY) co-shipped with safe stock
-      (LotB, LotC) across a hub-and-spoke network. Precise quarantine required.
-    nodes: [plant_a, plant_b, hub, retail_east, retail_west, retail_central]
-    contaminated_lots: [LotX, LotY]
-# ── Scoring ───────────────────────────────────────────────────────────────────
-scoring:
-  range: [0.0, 1.0]
-  formula: "(quarantine_score + notification_score) / 2  −  unnecessary_penalty"
-  components:
-    quarantine_score:
-      weight: 0.5
-      description: >
-        1 − ((missing_qty + over_qty) / total_affected_qty).
-        Full marks for exact quarantine of all affected lots.
-    notification_score:
-      weight: 0.5
-      description: >
-        fraction of affected nodes that were notified.
-    unnecessary_penalty:
-      max: 0.15
-      description: >
-        −0.05 per unnecessary quarantine (safe stock), capped at 0.15.
-# ── OpenEnv Compliance ────────────────────────────────────────────────────────
-compliance:
-  implements_reset: true
-  implements_step:  true
-  implements_state: true
-  deterministic:    true
-  typed_models:     true
-  offline:          true
-  reproducible:     true
-# ── Project Team ──────────────────────────────────────────────────────────────
-team:
-  - name: "Shamanth MS"
-    tasks: [env_core, action_handler, ground_truth_system, connect_components, submission]
-  - name: "P G Ayush Rai"
-    tasks: [openenv_spec, docker_setup, openenv_validation, deploy_hf_spaces]
-  - name: "Shreya B J"
-    tasks: [scenario_expansion, grader_system, reward_function]

+name: RecallTraceEnv
+version: 1.0.0
+description: Deterministic OpenEnv environment for supply-chain product recall tracing and precision containment.
+entrypoint:
+  module: env.env
+  class: RecallTraceEnv
+server:
+  module: server
+  app: app
+models:
+  action: env.models.RecallAction
+  observation: env.models.RecallObservation
+  reward: env.models.RewardSignal
+tasks:
+  - id: phase1_direct_recall
+    difficulty: easy
+    objective: Identify every location holding the recalled lot and quarantine all contaminated stock.
+  - id: phase2_relabel_recall
+    difficulty: medium
+    objective: Follow relabeled lots back to the source batch and quarantine every derived label precisely.
+  - id: phase3_mixed_shipments
+    difficulty: hard
+    objective: Contain only the unsafe quantity after contaminated stock was mixed with safe inventory during cross-docking.
+interfaces:
+  methods:
+    - reset
+    - step
+    - state
+  actions:
+    - inspect_node
+    - trace_lot
+    - quarantine
+    - notify
+    - finalize
+observation_fields:
+  - task_id
+  - phase
+  - recall_notice
+  - inventory
+  - discovered_shipments
+  - inspected_nodes
+  - inspection_results
+  - trace_results
+  - notified_nodes
+  - quarantined_inventory
+  - history
+  - steps_taken
+  - remaining_step_budget

docker/Dockerfile ADDED Viewed

	@@ -0,0 +1,16 @@

+FROM python:3.12-slim
+WORKDIR /app
+ENV PYTHONDONTWRITEBYTECODE=1 \
+    PYTHONUNBUFFERED=1 \
+    PORT=7860
+COPY requirements.txt ./
+RUN pip install --no-cache-dir -r requirements.txt
+COPY . .
+EXPOSE 7860
+CMD ["uvicorn", "server.app:app", "--host", "0.0.0.0", "--port", "7860"]

grader/__init__.py CHANGED Viewed

	@@ -0,0 +1 @@


1	+ """Grader package for RecallTrace."""

grader/grader.py CHANGED Viewed

@@ -1,220 +1,57 @@
-"""
-RecallTrace — Grader + Reward Function
-Merged from: Shreya B J (sbj) grader logic + Shamanth MS (sham) env state format
-Task 6C  — Grader System  : Compare agent quarantines vs ground truth
-Task 7   — Reward Function: Partial rewards, penalties, terminal bonus
-Two usage modes:
-  1. Live mode  — called by env._handle_finalize() using env's internal state
-  2. Batch mode — called directly with agent_output dict (Shreya's inference style)
-"""
-from __future__ import annotations
-from typing import Any, Dict, List, Tuple
-# ─────────────────────────────────────────────────────────────────────────────
-# Ground Truth Builder (Shreya's logic, adapted for Sham's scenario format)
-# ─────────────────────────────────────────────────────────────────────────────
-def compute_ground_truth(scenario: Dict[str, Any]) -> Dict[str, Any]:
-    """
-    Build the hidden oracle from a scenario dict.
-    Handles:
-      - Direct contamination  (easy)
-      - Relabeled lots        (medium) via scenario['transformations']
-      - Multiple cont. lots   (hard)   via lot_catalog['contaminated'] flags
-    """
-    lot_catalog: Dict[str, Dict] = scenario.get("lot_catalog", {})
-    transformations: Dict[str, str] = scenario.get("transformations", {})
-    nodes: Dict[str, Dict] = scenario.get("nodes", {})
-    # Collect all contaminated lot IDs (including relabeled variants)
-    contaminated_lots = set()
-    for lot_id, meta in lot_catalog.items():
-        if meta.get("contaminated", False):
-            contaminated_lots.add(lot_id)
-    # Also resolve via transformations (e.g. LotA → LotA1)
-    primary = scenario.get("contaminated_lot", "")
-    if primary:
-        contaminated_lots.add(primary)
-        relabeled = transformations.get(primary)
-        if relabeled:
-            contaminated_lots.add(relabeled)
-    # For each node, collect the correct quantities to quarantine
-    affected_nodes: List[str] = []
-    correct_quantities: Dict[str, Dict[str, int]] = {}
-    for node_id, node_data in nodes.items():
-        inventory = node_data.get("inventory", {})
-        q_inv     = node_data.get("quarantined_inventory", {})
-        for lot_id in contaminated_lots:
-            available  = inventory.get(lot_id, 0)
-            quarantined = q_inv.get(lot_id, 0)
-            total = available + quarantined
-            if total > 0:
-                if node_id not in affected_nodes:
-                    affected_nodes.append(node_id)
-                correct_quantities.setdefault(node_id, {})[lot_id] = total
-    total_affected_qty = sum(
-        q for node_q in correct_quantities.values() for q in node_q.values()
-    )
-    return {
-        "contaminated_lots":     sorted(contaminated_lots),
-        "affected_lots":         sorted(contaminated_lots),   # alias used by Sham's env
-        "affected_nodes":        sorted(affected_nodes),
-        "correct_quantities":    correct_quantities,
-        "total_affected_quantity": total_affected_qty,
-    }
-# ─────────────────────────────────────────────────────────────────────────────
-# Batch Grader (Shreya's style — for standalone agent_output dicts)
-# ─────────────────────────────────────────────────────────────────────────────
-def grade(agent_output: Dict[str, Any], scenario: Dict[str, Any]) -> float:
-    """
-    Score an agent_output dict against ground truth.
-    agent_output = {"quarantine": [{"node": ..., "lot": ..., "qty": ...}, ...]}
-    Returns score in [0.0, 1.0].
-    """
-    ground_truth = compute_ground_truth(scenario)
-    correct_quantities = ground_truth["correct_quantities"]
-    total_targets = len(correct_quantities)
-    if total_targets == 0:
-        return 1.0
-    score = 0.0
-    for action in agent_output.get("quarantine", []):
-        node = action.get("node") or action.get("node_id")
-        lot  = action.get("lot")  or action.get("lot_id")
-        qty  = action.get("qty")  or action.get("quantity", 0)
-        if node in correct_quantities and lot in correct_quantities[node]:
-            score += 0.5                                           # correct lot/node
-            if qty == correct_quantities[node][lot]:
-                score += 0.5                                       # exact quantity
-    return round(min(score / total_targets, 1.0), 4)
-def compute_reward(agent_output: Dict[str, Any], scenario: Dict[str, Any]) -> float:
-    """
-    Compute a shaped reward for a batch agent_output.
-    Returns a float (can exceed 1.0 — raw reward, not normalised).
-    """
-    ground_truth = compute_ground_truth(scenario)
-    correct_quantities = ground_truth["correct_quantities"]
-    contaminated_lots  = set(ground_truth["contaminated_lots"])
-    reward = 0.0
-    for action in agent_output.get("quarantine", []):
-        node = action.get("node") or action.get("node_id")
-        lot  = action.get("lot")  or action.get("lot_id")
-        qty  = action.get("qty")  or action.get("quantity", 0)
-        if node in correct_quantities and lot in correct_quantities[node]:
-            reward += 10                                           # correct lot at correct node
-            if qty == correct_quantities[node][lot]:
-                reward += 10                                       # exact quantity bonus
-            else:
-                reward -= 3                                        # partial quantity penalty
-        elif lot not in contaminated_lots:
-            reward -= 5                                            # unnecessary quarantine
-        else:
-            reward -= 5                                            # wrong node
-    # Completion bonus
-    if grade(agent_output, scenario) >= 1.0:
-        reward += 20
-    return reward
-# ─────────────────────────────────────────────────────────────────────────────
-# Live Grader (integrated with Sham's env — called from env._handle_finalize)
-# ─────────────────────────────────────────────────────────────────────────────
-class LiveGrader:
-    """
-    Used inside RecallTraceEnv to compute the final score from env state.
-    Mirrors Sham's finalize logic but uses the unified ground truth builder.
-    """
-    def __init__(self, scenario: Dict[str, Any]):
-        self.ground_truth = compute_ground_truth(scenario)
-    def score(
-        self,
-        nodes: Dict[str, Dict],
-        notified_nodes: set,
-    ) -> Tuple[float, Dict[str, Any]]:
-        """
-        Compute final score from the env's live node state.
-        Returns (score_0_to_1, breakdown_dict).
-        """
-        correct_quantities = self.ground_truth["correct_quantities"]
-        affected_nodes_set = set(self.ground_truth["affected_nodes"])
-        contaminated_lots  = set(self.ground_truth["contaminated_lots"])
-        # ── Quarantine match ────────────────────────────────────────────
-        missing: Dict[str, Dict[str, int]] = {}
-        over:    Dict[str, Dict[str, int]] = {}
-        unnecessary: int = 0
-        for node_id, node_data in nodes.items():
-            q_inv    = node_data.get("quarantined_inventory", {})
-            expected = correct_quantities.get(node_id, {})
-            all_lots = set(expected) | set(q_inv)
-            for lot_id in all_lots:
-                exp_qty = expected.get(lot_id, 0)
-                act_qty = q_inv.get(lot_id, 0)
-                if lot_id not in contaminated_lots and act_qty > 0:
-                    unnecessary += 1          # quarantined safe stock
-                elif act_qty < exp_qty:
-                    missing.setdefault(node_id, {})[lot_id] = exp_qty - act_qty
-                elif act_qty > exp_qty:
-                    over.setdefault(node_id, {})[lot_id] = act_qty - exp_qty
-        total_affected_qty = self.ground_truth["total_affected_quantity"]
-        missing_total = sum(q for d in missing.values() for q in d.values())
-        over_total    = sum(q for d in over.values()    for q in d.values())
-        quarantine_score = (
-            max(0.0, 1.0 - ((missing_total + over_total) / total_affected_qty))
-            if total_affected_qty else 1.0
-        )
-        # ── Notification score ──────────────────────────────────────────
-        correctly_notified = len(notified_nodes & affected_nodes_set)
-        notification_score = (
-            correctly_notified / len(affected_nodes_set) if affected_nodes_set else 1.0
-        )
-        # ── Penalty for unnecessary quarantines ─────────────────────────
-        penalty = min(unnecessary * 0.05, 0.15)
-        # ── Final score ─────────────────────────────────────────────────
-        raw = (quarantine_score + notification_score) / 2.0 - penalty
-        final_score = round(max(0.0, min(1.0, raw)), 4)
-        breakdown = {
-            "final_score":             final_score,
-            "quarantine_score":        round(quarantine_score, 4),
-            "notification_score":      round(notification_score, 4),
-            "unnecessary_penalty":     round(-penalty, 4),
-            "missing_quantities":      missing,
-            "over_quarantined":        over,
-            "unnecessary_quarantines": unnecessary,
-            "correctly_notified":      correctly_notified,
-            "all_affected_notified":   notification_score == 1.0,
-            "all_stock_quarantined":   missing_total == 0 and over_total == 0,
-        }
-        return final_score, breakdown

+"""Deterministic graders for RecallTrace tasks."""
+from __future__ import annotations
+from typing import Iterable, List
+from env.env import RecallTraceEnv
+from env.models import RecallAction, TaskGrade
+def evaluate_action_plan(task_id: str, actions: Iterable[RecallAction | dict]) -> TaskGrade:
+    """Run an action plan against a task and return a deterministic score."""
+    env = RecallTraceEnv(task_id=task_id)
+    env.reset()
+    rewards: List[float] = []
+    final_info = {"message": "Episode never finalized."}
+    for action in actions:
+        _, reward, done, info = env.step(action)
+        rewards.append(reward)
+        final_info = info
+        if done:
+            break
+    if not env.done:
+        _, reward, done, info = env.step(RecallAction(type="finalize"))
+        rewards.append(reward)
+        final_info = info
+        assert done
+    score = float(final_info.get("score", 0.0))
+    state = env.state()
+    return TaskGrade(
+        task_id=task_id,
+        score=score,
+        success=score >= 0.9,
+        steps_taken=state.steps_taken,
+        max_steps=state.task.max_steps,
+        reward_total=round(sum(rewards), 4),
+        final_info=final_info,
+    )
+def grade_finalize_info(task_id: str, steps_taken: int, final_info: dict) -> TaskGrade:
+    """Build a TaskGrade object from a finalized episode payload."""
+    env = RecallTraceEnv(task_id=task_id)
+    env.reset()
+    return TaskGrade(
+        task_id=task_id,
+        score=float(final_info.get("score", 0.0)),
+        success=float(final_info.get("score", 0.0)) >= 0.9,
+        steps_taken=steps_taken,
+        max_steps=env.task.max_steps,
+        reward_total=float(final_info.get("score", 0.0)),
+        final_info=final_info,
+    )

inference.py ADDED Viewed

	@@ -0,0 +1,82 @@

+"""Submission-grade baseline inference runner for RecallTrace."""
+from __future__ import annotations
+import json
+import os
+from typing import Any, List
+from openai import OpenAI
+from env.env import RecallTraceEnv
+from env.models import RecallAction
+from grader.grader import grade_finalize_info
+from baseline.policy import choose_heuristic_action, choose_llm_action
+API_BASE_URL = os.getenv("API_BASE_URL", "https://api.openai.com/v1")
+MODEL_NAME = os.getenv("MODEL_NAME", "gpt-4o-mini")
+API_KEY = os.getenv("OPENAI_API_KEY") or os.getenv("HF_TOKEN", "")
+BENCHMARK = "RecallTrace"
+def log_start(task: str, env: str, model: str) -> None:
+    print(f"[START] task={task} env={env} model={model}", flush=True)
+def log_step(step: int, action: RecallAction, reward: float, done: bool, error: str | None) -> None:
+    payload = json.dumps(action.model_dump(exclude_none=True), sort_keys=True)
+    error_text = error if error is not None else "null"
+    print(f"[STEP] step={step} action={payload} reward={reward:.4f} done={str(done).lower()} error={error_text}", flush=True)
+def log_end(success: bool, steps: int, score: float, rewards: List[float]) -> None:
+    print(f"[END] success={str(success).lower()} steps={steps} score={score:.4f} rewards={json.dumps([round(r, 4) for r in rewards])}", flush=True)
+def run_task(task_id: str, client: OpenAI | None) -> float:
+    env = RecallTraceEnv(task_id=task_id)
+    observation = env.reset()
+    history: List[dict[str, Any]] = []
+    rewards: List[float] = []
+    steps_taken = 0
+    final_info: dict[str, Any] = {"score": 0.0}
+    log_start(task=task_id, env=BENCHMARK, model=MODEL_NAME if client else "heuristic-baseline")
+    for step in range(1, env.task.max_steps + 1):
+        llm_action = choose_llm_action(client, MODEL_NAME, observation, history)
+        action = llm_action or choose_heuristic_action(observation)
+        observation, reward, done, info = env.step(action)
+        rewards.append(reward)
+        steps_taken = step
+        final_info = info
+        log_step(step=step, action=action, reward=reward, done=done, error=info.get("error"))
+        history.append(
+            {
+                "step": step,
+                "action": action.model_dump(exclude_none=True),
+                "reward": reward,
+                "done": done,
+                "message": info.get("message"),
+            }
+        )
+        if done:
+            break
+    grade = grade_finalize_info(task_id, steps_taken, final_info)
+    log_end(success=grade.success, steps=steps_taken, score=grade.score, rewards=rewards)
+    return grade.score
+def main() -> None:
+    client = OpenAI(base_url=API_BASE_URL, api_key=API_KEY) if API_KEY else None
+    task_scores = [run_task(task.task_id, client) for task in RecallTraceEnv.available_tasks()]
+    average_score = sum(task_scores) / len(task_scores)
+    print(json.dumps({"benchmark": BENCHMARK, "average_score": round(average_score, 4), "task_scores": task_scores}), flush=True)
+if __name__ == "__main__":
+    main()

inference/inference.py CHANGED Viewed

@@ -1,171 +1,9 @@
-"""
-RecallTrace — Inference Script  (Phase 3 / Task 9)
-Merged from: Shamanth MS (env) + Shreya B J (scenario/grader) + P G Ayush Rai (integration)
-Structured log format:
-  [START]  — episode initialisation
-  [STEP]   — each action + reward
-  [END]    — final score + breakdown
-Runs a rule-based baseline agent across all three difficulty levels.
-"""
-from __future__ import annotations
-import sys
-from pathlib import Path
-ROOT_DIR = Path(__file__).resolve().parents[1]
-if str(ROOT_DIR) not in sys.path:
-    sys.path.insert(0, str(ROOT_DIR))
-from recall_env.env import RecallTraceEnv
-from scenario.scenario import get_scenario, list_levels
-from grader.grader import grade, compute_reward
-# ─────────────────────────────────────────────────────────────────────────────
-# Baseline rule-based agent  (uses the scenario's ground truth directly)
-# ─────────────────────────────────────────────────────────────────────────────
-def build_action_sequence(env: RecallTraceEnv) -> list:
-    """
-    Build a perfect action sequence using the scenario's known structure.
-    In a real eval, the agent would not have access to ground truth —
-    this baseline is used for smoke testing only.
-    """
-    scenario = env._scenario_template
-    nodes = list(scenario["nodes"].keys())
-    lot_catalog = scenario.get("lot_catalog", {})
-    transformations = scenario.get("transformations", {})
-    contaminated_lot = scenario.get("contaminated_lot", "")
-    # Contaminated lots = all flagged in catalog + relabeled variants
-    contaminated_lots = {
-        lot for lot, meta in lot_catalog.items() if meta.get("contaminated", False)
-    }
-    contaminated_lots.add(contaminated_lot)
-    if contaminated_lot in transformations:
-        contaminated_lots.add(transformations[contaminated_lot])
-    actions = []
-    # 1. Inspect all nodes
-    for node_id in nodes:
-        actions.append({"type": "inspect_node", "node_id": node_id})
-    # 2. Trace all contaminated lots
-    for lot_id in sorted(contaminated_lots):
-        actions.append({"type": "trace_lot", "lot_id": lot_id})
-    # 3. Quarantine all contaminated inventory (exact quantities)
-    for node_id, node_data in scenario["nodes"].items():
-        for lot_id, qty in node_data["inventory"].items():
-            if lot_id in contaminated_lots and qty > 0:
-                actions.append({
-                    "type": "quarantine",
-                    "node_id": node_id,
-                    "lot_id": lot_id,
-                    "quantity": qty,
-                })
-    # 4. Notify all affected nodes
-    notified = set()
-    for node_id, node_data in scenario["nodes"].items():
-        if any(lot_id in contaminated_lots for lot_id in node_data["inventory"]):
-            if node_id not in notified:
-                actions.append({"type": "notify", "node_id": node_id})
-                notified.add(node_id)
-    # 5. Finalize
-    actions.append({"type": "finalize"})
-    return actions
-# ─────────────────────────────────────────────────────────────────────────────
-# Episode runner
-# ─────────────────────────────────────────────────────────────────────────────
-def run_episode(level: str) -> float:
-    DIVIDER = "=" * 62
-    print(f"\n{DIVIDER}")
-    print(f"[START] level={level.upper()}")
-    env = RecallTraceEnv(level=level)
-    obs = env.reset()
-    print(f"[START] scenario_id  : {env._scenario_template['scenario_id']}")
-    print(f"[START] recall_notice: {obs['recall_notice']}")
-    print(f"[START] nodes        : {obs['inspected_nodes']} (none inspected yet)")
-    print(DIVIDER)
-    actions = build_action_sequence(env)
-    # Re-initialise so env is clean (build_action_sequence used _scenario_template only)
-    env2 = RecallTraceEnv(level=level)
-    obs = env2.reset()
-    final_info = {}
-    for action in actions:
-        obs, reward, done, info = env2.step(action)
-        err = f"  ⚠  {info['error']}" if info.get("error") else ""
-        print(f"[STEP]  action={action}  reward={reward}  done={done}{err}")
-        if done:
-            final_info = info
-            break
-    score = final_info.get("score", 0.0)
-    breakdown = final_info.get("breakdown", {})
-    print(f"\n{DIVIDER}")
-    print(f"[END]  final_score         : {score}")
-    print(f"[END]  quarantine_score    : {breakdown.get('quarantine_score', '-')}")
-    print(f"[END]  notification_score  : {breakdown.get('notification_score', '-')}")
-    print(f"[END]  all_stock_quarantined : {breakdown.get('all_stock_quarantined', '-')}")
-    print(f"[END]  all_affected_notified : {breakdown.get('all_affected_notified', '-')}")
-    print(DIVIDER)
-    # ── Shreya's batch grader validation (independent cross-check) ──────
-    # Read quarantined_inventory from the LIVE env state (post-episode)
-    scenario = env2._scenario_template
-    live_nodes = env2.state_data["nodes"]
-    agent_output = {
-        "quarantine": [
-            {"node": node_id, "lot": lot_id, "qty": qty}
-            for node_id, node_data in live_nodes.items()
-            for lot_id, qty in node_data.get("quarantined_inventory", {}).items()
-            if qty > 0
-        ]
-    }
-    batch_score  = grade(agent_output, scenario)
-    batch_reward = compute_reward(agent_output, scenario)
-    print(f"[XCHECK] batch_grade={batch_score}  batch_reward={batch_reward}")
-    return score
-# ─────────────────────────────────────────────────────────────────────────────
-# Main
-# ─────────────────────────────────────────────────────────────────────────────
-def main():
-    print("\nRecallTrace OpenEnv — Local Smoke Test (Phase 3 / Task 10)")
-    print("Team: Shamanth MS | P G Ayush Rai | Shreya B J")
-    results = {}
-    for level in list_levels():
-        results[level] = run_episode(level)
-    print("\n" + "=" * 62)
-    print("FINAL SUMMARY")
-    print("-" * 62)
-    for level, score in results.items():
-        status = "✓ PASS" if score >= 0.8 else "✗ FAIL"
-        print(f"  {level:8s} → score={score:.4f}  {status}")
-    print("=" * 62)
-if __name__ == "__main__":
-    main()

+from pathlib import Path
+import runpy
+import sys
+if __name__ == "__main__":
+    root = Path(__file__).resolve().parents[1]
+    sys.path.insert(0, str(root))
+    runpy.run_path(str(root / "inference.py"), run_name="__main__")

inference/policy.py ADDED Viewed

	@@ -0,0 +1,100 @@

+"""Heuristic baseline policy for RecallTrace."""
+from __future__ import annotations
+import json
+import re
+from typing import Any, Dict, Optional
+from openai import OpenAI
+from env.models import RecallAction, RecallObservation
+LOT_PATTERN = re.compile(r"\bLot[A-Za-z0-9_]+\b")
+def _extract_root_lot(observation: RecallObservation) -> str:
+    match = LOT_PATTERN.search(observation.recall_notice)
+    return match.group(0) if match else "LotA"
+def choose_heuristic_action(observation: RecallObservation) -> RecallAction:
+    """Choose the next deterministic action using only observable state."""
+    root_lot = _extract_root_lot(observation)
+    trace_result = observation.trace_results.get(root_lot)
+    if trace_result is None:
+        return RecallAction(type="trace_lot", lot_id=root_lot, rationale="Map the recall lineage first.")
+    affected_nodes = trace_result.get("affected_nodes", [])
+    for node_id in affected_nodes:
+        if node_id not in observation.inspected_nodes:
+            return RecallAction(type="inspect_node", node_id=node_id, rationale="Collect local evidence before quarantining.")
+    for node_id, findings in observation.inspection_results.items():
+        for lot_id, finding in findings.items():
+            unsafe_quantity = finding.unsafe_quantity
+            quarantined_quantity = observation.quarantined_inventory.get(node_id, {}).get(lot_id, 0)
+            available_quantity = observation.inventory.get(node_id, {}).get(lot_id, 0)
+            remaining_target = unsafe_quantity - quarantined_quantity
+            if remaining_target > 0 and available_quantity > 0:
+                return RecallAction(
+                    type="quarantine",
+                    node_id=node_id,
+                    lot_id=lot_id,
+                    quantity=min(remaining_target, available_quantity),
+                    rationale="Isolate the exact unsafe quantity discovered during inspection.",
+                )
+    missing_notifications = [node_id for node_id in affected_nodes if node_id not in observation.notified_nodes]
+    if missing_notifications:
+        return RecallAction(type="notify", node_id="all", rationale="Alert every impacted stakeholder before closing the incident.")
+    return RecallAction(type="finalize", rationale="Containment actions are complete.")
+def choose_llm_action(
+    client: Optional[OpenAI],
+    model_name: str,
+    observation: RecallObservation,
+    history: list[dict[str, Any]],
+) -> Optional[RecallAction]:
+    """Ask an LLM for the next action, returning None on failure."""
+    if client is None:
+        return None
+    prompt = {
+        "task_id": observation.task_id,
+        "phase": observation.phase,
+        "notice": observation.recall_notice,
+        "inventory": observation.inventory,
+        "inspection_results": {
+            node_id: {lot_id: evidence.model_dump() for lot_id, evidence in findings.items()}
+            for node_id, findings in observation.inspection_results.items()
+        },
+        "trace_results": observation.trace_results,
+        "notified_nodes": observation.notified_nodes,
+        "quarantined_inventory": observation.quarantined_inventory,
+        "steps_taken": observation.steps_taken,
+        "remaining_step_budget": observation.remaining_step_budget,
+        "history": history[-6:],
+        "instruction": "Return only compact JSON with keys type,node_id,lot_id,quantity,rationale. Use one valid action.",
+    }
+    try:
+        completion = client.chat.completions.create(
+            model=model_name,
+            temperature=0,
+            max_tokens=180,
+            messages=[
+                {"role": "system", "content": "You are operating a deterministic product recall environment. Respond with only valid JSON for the next action."},
+                {"role": "user", "content": json.dumps(prompt, sort_keys=True)},
+            ],
+        )
+        text = (completion.choices[0].message.content or "").strip()
+        if not text:
+            return None
+        return RecallAction.model_validate_json(text)
+    except Exception:
+        return None

openenv.yaml CHANGED Viewed

@@ -1,156 +1,48 @@
-# RecallTrace OpenEnv — OpenEnv Spec (Task 5B — P G Ayush Rai)
-# Defines action space, observation models, tasks, and scoring contract.
-environment:
-  name: RecallTraceEnv
-  version: "1.0.0"
-  description: >
-    A fully offline, deterministic OpenEnv environment simulating product recall
-    traceability and containment across a supply-chain network.
-    An AI agent must identify contaminated lots, trace their movement through a
-    shipment graph, quarantine affected inventory precisely, and notify relevant nodes.
-# ── Action Space ─────────────────────────────────────────────────────────────
-actions:
-  - name: inspect_node
-    params:
-      node_id: str
-    description: >
-      Reveal the full inventory and outbound shipment edges of a node.
-      Adds the node to the discovered subgraph.
-    reward_hint: small positive for new nodes; 0 for repeat inspections
-  - name: trace_lot
-    params:
-      lot_id: str
-    description: >
-      Trace a lot across all nodes in the network — reveals which nodes hold it
-      and in what quantities (available + quarantined combined).
-    reward_hint: positive if lot is contaminated; small positive otherwise
-  - name: quarantine
-    params:
-      node_id: str
-      lot_id: str
-      quantity: int       # units to quarantine; defaults to full available stock
-    description: >
-      Move a specified quantity of a lot from active inventory to quarantine
-      at the given node. Excess quarantine (over correct qty) is penalised.
-    reward_hint: +0.4 correct exact; +0.2 partial; -0.3 wrong lot; -0.15 over-qty
-  - name: notify
-    params:
-      node_id: str        # or "all" to notify every node at once
-    description: >
-      Send a recall alert to a node or all nodes.
-      Rewarded only for affected nodes; penalised for unnecessary notifications.
-    reward_hint: +0.1 per correctly notified affected node; -0.05 for unneeded
-  - name: finalize
-    params: {}
-    description: >
-      Submit the containment plan. Triggers final scoring. Episode ends.
-    reward_hint: returns final_score in [0.0, 1.0]
-# ── Observation Space ─────────────────────────────────────────────────────────
-observation:
-  recall_notice:
-    type: str
-    description: Human-readable contamination alert issued at episode start
-  inventory:
-    type: dict
-    description: >
-      Full inventory snapshot across all nodes.
-      { node_id: { lot_id: quantity } }
-  discovered_shipments:
-    type: dict
-    description: >
-      Outbound shipment edges revealed so far (only for inspected nodes).
-      { node_id: [downstream_node_id, ...] }
-  history:
-    type: list[str]
-    description: Ordered log of all actions taken this episode
-  inspected_nodes:
-    type: list[str]
-    description: Sorted list of nodes that have been inspected
-  notified_nodes:
-    type: list[str]
-    description: Sorted list of nodes that have been sent recall alerts
-  quarantined_inventory:
-    type: dict
-    description: >
-      Inventory currently in quarantine (non-empty nodes only).
-      { node_id: { lot_id: quantity } }
-# ── Tasks ─────────────────────────────────────────────────────────────────────
-tasks:
-  - id: easy
-    name: "Task 1 — Direct Recall"
-    assign: "Shreya B J"
-    description: >
-      Single contaminated lot (LotA) distributed across a linear
-      warehouse → store1 → store2 chain. No relabeling.
-    nodes: [warehouse, store1, store2]
-    contaminated_lots: [LotA]
-  - id: medium
-    name: "Task 2 — Relabeled Inventory"
-    assign: "Shreya B J"
-    description: >
-      LotA is contaminated; it was repacked and relabeled as LotA1
-      at the distribution centre. Agent must trace the transformation.
-    nodes: [warehouse, dist_centre, store_north, store_south]
-    contaminated_lots: [LotA, LotA1]
-  - id: hard
-    name: "Task 3 — Mixed Shipments"
-    assign: "Shreya B J"
-    description: >
-      Two contaminated lots (LotX, LotY) co-shipped with safe stock
-      (LotB, LotC) across a hub-and-spoke network. Precise quarantine required.
-    nodes: [plant_a, plant_b, hub, retail_east, retail_west, retail_central]
-    contaminated_lots: [LotX, LotY]
-# ── Scoring ───────────────────────────────────────────────────────────────────
-scoring:
-  range: [0.0, 1.0]
-  formula: "(quarantine_score + notification_score) / 2  −  unnecessary_penalty"
-  components:
-    quarantine_score:
-      weight: 0.5
-      description: >
-        1 − ((missing_qty + over_qty) / total_affected_qty).
-        Full marks for exact quarantine of all affected lots.
-    notification_score:
-      weight: 0.5
-      description: >
-        fraction of affected nodes that were notified.
-    unnecessary_penalty:
-      max: 0.15
-      description: >
-        −0.05 per unnecessary quarantine (safe stock), capped at 0.15.
-# ── OpenEnv Compliance ────────────────────────────────────────────────────────
-compliance:
-  implements_reset: true
-  implements_step:  true
-  implements_state: true
-  deterministic:    true
-  typed_models:     true
-  offline:          true
-  reproducible:     true
-# ── Project Team ──────────────────────────────────────────────────────────────
-team:
-  - name: "Shamanth MS"
-    tasks: [env_core, action_handler, ground_truth_system, connect_components, submission]
-  - name: "P G Ayush Rai"
-    tasks: [openenv_spec, docker_setup, openenv_validation, deploy_hf_spaces]
-  - name: "Shreya B J"
-    tasks: [scenario_expansion, grader_system, reward_function]

+name: RecallTraceEnv
+version: 1.0.0
+description: Deterministic OpenEnv environment for supply-chain product recall tracing and precision containment.
+entrypoint:
+  module: env.env
+  class: RecallTraceEnv
+server:
+  module: server
+  app: app
+models:
+  action: env.models.RecallAction
+  observation: env.models.RecallObservation
+  reward: env.models.RewardSignal
+tasks:
+  - id: phase1_direct_recall
+    difficulty: easy
+    objective: Identify every location holding the recalled lot and quarantine all contaminated stock.
+  - id: phase2_relabel_recall
+    difficulty: medium
+    objective: Follow relabeled lots back to the source batch and quarantine every derived label precisely.
+  - id: phase3_mixed_shipments
+    difficulty: hard
+    objective: Contain only the unsafe quantity after contaminated stock was mixed with safe inventory during cross-docking.
+interfaces:
+  methods:
+    - reset
+    - step
+    - state
+  actions:
+    - inspect_node
+    - trace_lot
+    - quarantine
+    - notify
+    - finalize
+observation_fields:
+  - task_id
+  - phase
+  - recall_notice
+  - inventory
+  - discovered_shipments
+  - inspected_nodes
+  - inspection_results
+  - trace_results
+  - notified_nodes
+  - quarantined_inventory
+  - history
+  - steps_taken
+  - remaining_step_budget

pyproject.toml CHANGED Viewed

@@ -1,35 +1,23 @@
-[build-system]
-requires = ["setuptools>=68.0", "wheel"]
-build-backend = "setuptools.backends.legacy:build"
-[project]
-name = "recalltrace-openenv"
-version = "1.0.0"
-description = "Supply-chain recall traceability environment — OpenEnv compliant"
-readme = "README.md"
-requires-python = ">=3.10"
-license = { text = "MIT" }
-authors = [
-  { name = "Shamanth MS" },
-  { name = "P G Ayush Rai" },
-  { name = "Shreya B J" },
-]
-keywords = ["openenv", "reinforcement-learning", "supply-chain", "recall"]
-dependencies = [
-  "numpy",
-  "openenv>=0.1.13",
-]
-[project.optional-dependencies]
-dev = ["pytest", "gymnasium"]
-[project.scripts]
-server = "server.app:main"
-[tool.setuptools.packages.find]
-where = ["."]
-include = ["env*", "scenario*", "grader*", "inference*", "server*"]
-[tool.openenv]
-entry_point = "env.env:RecallTraceEnv"
-tasks = ["easy", "medium", "hard"]

+[build-system]
+requires = ["setuptools>=68", "wheel"]
+build-backend = "setuptools.build_meta"
+[project]
+name = "recalltrace-openenv"
+version = "1.0.0"
+description = "Deterministic OpenEnv environment for supply-chain recall tracing and precision containment"
+readme = "README.md"
+requires-python = ">=3.12"
+dependencies = [
+  "fastapi>=0.115.0,<1.0.0",
+  "openai>=2.7.2,<3.0.0",
+  "openenv-core>=0.2.0",
+  "pydantic>=2.7.0,<3.0.0",
+  "uvicorn>=0.30.0,<1.0.0",
+]
+[project.scripts]
+server = "server.app:main"
+[tool.setuptools]
+packages = ["env", "grader", "scenario", "baseline", "server"]

requirements.txt CHANGED Viewed

@@ -1,2 +1,5 @@
-numpy
-openenv>=0.1.13

+fastapi>=0.115.0,<1.0.0
+openai>=2.7.2,<3.0.0
+pydantic>=2.7.0,<3.0.0
+uvicorn>=0.30.0,<1.0.0
+openenv-core>=0.2.0,<1.0.0

scenario/__init__.py CHANGED Viewed

	@@ -0,0 +1 @@


1	+ """Scenario package for RecallTrace."""

scenario/scenario.py CHANGED Viewed

@@ -1,189 +1,363 @@
-"""
-RecallTrace — Scenario Definitions
-Merged from: Shreya B J (sbj) 3-task structure + Shamanth MS (sham) env-compatible format
-Tasks:
-  easy   → Task 1: Direct Recall       (single contaminated lot, linear chain)
-  medium → Task 2: Relabeled Inventory (lot transformed mid-chain)
-  hard   → Task 3: Mixed Shipments     (safe + unsafe inventory mixed)
-"""
-from __future__ import annotations
-from copy import deepcopy
-from typing import Any, Dict
-# ─────────────────────────────────────────────────────────────────────────────
-# EASY — Task 1: Direct Recall
-#   LotA is contaminated across a simple warehouse → store1 → store2 chain.
-# ─────────────────────────────────────────────────────────────────────────────
-_EASY: Dict[str, Any] = {
-    "scenario_id": "task1_direct_recall",
-    "level": "easy",
-    "recall_notice": (
-        "URGENT RECALL: LotA has tested positive for contamination at source. "
-        "All units of LotA must be quarantined immediately across the network."
-    ),
-    "contaminated_lot": "LotA",
-    "transformations": {},          # no relabeling in easy
-    "shipment_graph": {
-        "warehouse": ["store1", "store2"],
-        "store1": [],
-        "store2": [],
-    },
-    "lot_catalog": {
-        "LotA": {"contaminated": True,  "product": "ready_meal"},
-        "LotB": {"contaminated": False, "product": "ready_meal"},
-    },
-    "nodes": {
-        "warehouse": {
-            "inventory":           {"LotA": 100},
-            "quarantined_inventory": {},
-        },
-        "store1": {
-            "inventory":           {"LotA": 50},
-            "quarantined_inventory": {},
-        },
-        "store2": {
-            "inventory":           {"LotA": 50, "LotB": 30},
-            "quarantined_inventory": {},
-        },
-    },
-}
-# ─────────────────────────────────────────────────────────────────────────────
-# MEDIUM — Task 2: Relabeled Inventory
-#   LotA is contaminated; at the distribution centre it was repacked as LotA1.
-#   Agent must trace the transformation to also quarantine LotA1.
-# ─────────────────────────────────────────────────────────────────────────────
-_MEDIUM: Dict[str, Any] = {
-    "scenario_id": "task2_relabeled_inventory",
-    "level": "medium",
-    "recall_notice": (
-        "RECALL NOTICE: LotA has been confirmed contaminated. "
-        "Product may have been relabeled during distribution — investigate all related lots."
-    ),
-    "contaminated_lot": "LotA",
-    "transformations": {
-        "LotA": "LotA1"             # LotA was repacked → LotA1 at dist_centre
-    },
-    "shipment_graph": {
-        "warehouse":   ["dist_centre"],
-        "dist_centre": ["store_north", "store_south"],
-        "store_north": [],
-        "store_south": [],
-    },
-    "lot_catalog": {
-        "LotA":  {"contaminated": True,  "product": "frozen_goods"},
-        "LotA1": {"contaminated": True,  "product": "frozen_goods"},   # relabeled form
-        "LotB":  {"contaminated": False, "product": "frozen_goods"},
-    },
-    "nodes": {
-        "warehouse": {
-            "inventory":           {"LotA": 100},
-            "quarantined_inventory": {},
-        },
-        "dist_centre": {
-            "inventory":           {"LotA": 50, "LotA1": 150},    # 150 repacked
-            "quarantined_inventory": {},
-        },
-        "store_north": {
-            "inventory":           {"LotA1": 80},
-            "quarantined_inventory": {},
-        },
-        "store_south": {
-            "inventory":           {"LotA1": 70, "LotB": 60},
-            "quarantined_inventory": {},
-        },
-    },
-}
-# ─────────────────────────────────────────────────────────────────────────────
-# HARD — Task 3: Mixed Shipments
-#   Two contaminated lots (LotX, LotY) co-shipped with safe stock (LotB, LotC).
-#   Agent must quarantine precisely — over-quarantining safe stock is penalised.
-# ─────────────────────────────────────────────────────────────────────────────
-_HARD: Dict[str, Any] = {
-    "scenario_id": "task3_mixed_shipments",
-    "level": "hard",
-    "recall_notice": (
-        "MULTI-LOT RECALL: LotX and LotY are confirmed contaminated. "
-        "They have been co-shipped with safe inventory. "
-        "Quarantine only the contaminated stock — do not disrupt safe units."
-    ),
-    "contaminated_lot": "LotX",     # primary contaminated lot (LotY also affected)
-    "transformations": {},
-    "shipment_graph": {
-        "plant_a":        ["hub"],
-        "plant_b":        ["hub"],
-        "hub":            ["retail_east", "retail_west", "retail_central"],
-        "retail_east":    [],
-        "retail_west":    [],
-        "retail_central": [],
-    },
-    "lot_catalog": {
-        "LotX": {"contaminated": True,  "product": "canned_goods"},
-        "LotY": {"contaminated": True,  "product": "canned_goods"},
-        "LotB": {"contaminated": False, "product": "canned_goods"},
-        "LotC": {"contaminated": False, "product": "canned_goods"},
-    },
-    "nodes": {
-        "plant_a": {
-            "inventory":           {"LotX": 300, "LotB": 200},
-            "quarantined_inventory": {},
-        },
-        "plant_b": {
-            "inventory":           {"LotY": 250, "LotC": 300},
-            "quarantined_inventory": {},
-        },
-        "hub": {
-            "inventory":           {"LotX": 100, "LotY": 80, "LotB": 150, "LotC": 120},
-            "quarantined_inventory": {},
-        },
-        "retail_east": {
-            "inventory":           {"LotX": 60, "LotB": 90},
-            "quarantined_inventory": {},
-        },
-        "retail_west": {
-            "inventory":           {"LotY": 50, "LotC": 80},
-            "quarantined_inventory": {},
-        },
-        "retail_central": {
-            "inventory":           {"LotX": 40, "LotY": 30, "LotB": 60, "LotC": 40},
-            "quarantined_inventory": {},
-        },
-    },
-}
-# ─────────────────────────────────────────────────────────────────────────────
-# Public API
-# ─────────────────────────────────────────────────────────────────────────────
-_SCENARIOS = {
-    "easy":   _EASY,
-    "medium": _MEDIUM,
-    "hard":   _HARD,
-}
-# Legacy alias used by Shamanth's env
-SIMPLE_SCENARIO = _EASY
-def get_scenario(level: str = "easy") -> Dict[str, Any]:
-    """Return a deep copy of the requested scenario (easy / medium / hard)."""
-    if level not in _SCENARIOS:
-        raise ValueError(f"Unknown level '{level}'. Choose: easy, medium, hard.")
-    return deepcopy(_SCENARIOS[level])
-def build_phase1_scenario() -> Dict[str, Any]:
-    """Legacy alias used by Shamanth's env — returns the easy scenario."""
-    return get_scenario("easy")
-def list_levels() -> list:
-    return list(_SCENARIOS.keys())

+"""Deterministic scenario catalog for RecallTrace."""
+from __future__ import annotations
+from copy import deepcopy
+from typing import Any, Dict, List
+PHASE1_SCENARIO: Dict[str, Any] = {
+    "task_id": "phase1_direct_recall",
+    "phase": 1,
+    "difficulty": "easy",
+    "name": "Direct Recall Containment",
+    "objective": "Identify every location holding the recalled lot and quarantine all contaminated stock.",
+    "max_steps": 10,
+    "recall_notice": "Immediate recall: contaminated LotA detected in the cold-chain network.",
+    "contaminated_lot": "LotA",
+    "shipment_graph": {
+        "warehouse": ["store1", "store2"],
+        "store1": ["store2"],
+        "store2": [],
+    },
+    "lot_catalog": {
+        "LotA": {
+            "contaminated": True,
+            "product": "ready_meal",
+            "root_lot": "LotA",
+            "notes": "Original contaminated production batch.",
+        },
+        "LotB": {
+            "contaminated": False,
+            "product": "ready_meal",
+            "root_lot": "LotB",
+            "notes": "Safe control batch.",
+        },
+    },
+    "nodes": {
+        "warehouse": {
+            "inventory": {"LotA": 100},
+            "quarantined_inventory": {},
+            "inspection_findings": {
+                "LotA": {
+                    "status": "confirmed_contaminated",
+                    "unsafe_quantity": 100,
+                    "evidence": "QA retained sample matched the recall notice for LotA.",
+                }
+            },
+        },
+        "store1": {
+            "inventory": {"LotA": 50},
+            "quarantined_inventory": {},
+            "inspection_findings": {
+                "LotA": {
+                    "status": "confirmed_contaminated",
+                    "unsafe_quantity": 50,
+                    "evidence": "Receiving records show unopened cases from LotA.",
+                }
+            },
+        },
+        "store2": {
+            "inventory": {"LotA": 20, "LotB": 30},
+            "quarantined_inventory": {},
+            "inspection_findings": {
+                "LotA": {
+                    "status": "confirmed_contaminated",
+                    "unsafe_quantity": 20,
+                    "evidence": "Backroom scan confirms LotA units remain unsold.",
+                },
+                "LotB": {
+                    "status": "safe",
+                    "unsafe_quantity": 0,
+                    "evidence": "LotB is outside the recall scope.",
+                },
+            },
+        },
+    },
+}
+PHASE2_SCENARIO: Dict[str, Any] = {
+    "task_id": "phase2_relabel_recall",
+    "phase": 2,
+    "difficulty": "medium",
+    "name": "Relabeled Inventory Investigation",
+    "objective": "Follow relabeled lots back to the source batch and quarantine every derived label precisely.",
+    "max_steps": 14,
+    "recall_notice": "Urgent recall: source LotA was relabeled during repacking and must be traced across derived labels.",
+    "contaminated_lot": "LotA",
+    "shipment_graph": {
+        "warehouse": ["repack", "store1"],
+        "repack": ["store2", "store3"],
+        "store1": [],
+        "store2": [],
+        "store3": [],
+    },
+    "lot_catalog": {
+        "LotA": {
+            "contaminated": True,
+            "product": "ready_meal",
+            "root_lot": "LotA",
+            "notes": "Original contaminated batch.",
+        },
+        "LotA_R1": {
+            "contaminated": True,
+            "product": "ready_meal",
+            "root_lot": "LotA",
+            "relabeled_from": "LotA",
+            "notes": "Repacked under an internal secondary label.",
+        },
+        "LotA_R2": {
+            "contaminated": True,
+            "product": "ready_meal",
+            "root_lot": "LotA",
+            "relabeled_from": "LotA_R1",
+            "notes": "Retail-ready relabel shipped after repacking.",
+        },
+        "LotB": {
+            "contaminated": False,
+            "product": "ready_meal",
+            "root_lot": "LotB",
+            "notes": "Safe control batch.",
+        },
+    },
+    "nodes": {
+        "warehouse": {
+            "inventory": {"LotA": 40, "LotB": 30},
+            "quarantined_inventory": {},
+            "inspection_findings": {
+                "LotA": {
+                    "status": "confirmed_contaminated",
+                    "unsafe_quantity": 40,
+                    "evidence": "Source pallet labels match the recalled production run.",
+                },
+                "LotB": {
+                    "status": "safe",
+                    "unsafe_quantity": 0,
+                    "evidence": "LotB remains outside the repacking stream.",
+                },
+            },
+        },
+        "repack": {
+            "inventory": {"LotA_R1": 45},
+            "quarantined_inventory": {},
+            "inspection_findings": {
+                "LotA_R1": {
+                    "status": "confirmed_contaminated",
+                    "unsafe_quantity": 45,
+                    "evidence": "Repacking worksheet maps LotA directly to LotA_R1.",
+                }
+            },
+        },
+        "store1": {
+            "inventory": {"LotA": 15, "LotB": 20},
+            "quarantined_inventory": {},
+            "inspection_findings": {
+                "LotA": {
+                    "status": "confirmed_contaminated",
+                    "unsafe_quantity": 15,
+                    "evidence": "Store retains cases with original LotA stickers.",
+                },
+                "LotB": {
+                    "status": "safe",
+                    "unsafe_quantity": 0,
+                    "evidence": "LotB SKUs are unaffected.",
+                },
+            },
+        },
+        "store2": {
+            "inventory": {"LotA_R1": 25},
+            "quarantined_inventory": {},
+            "inspection_findings": {
+                "LotA_R1": {
+                    "status": "confirmed_contaminated",
+                    "unsafe_quantity": 25,
+                    "evidence": "Receiving scan ties LotA_R1 to the repack facility transfer.",
+                }
+            },
+        },
+        "store3": {
+            "inventory": {"LotA_R2": 20, "LotB": 10},
+            "quarantined_inventory": {},
+            "inspection_findings": {
+                "LotA_R2": {
+                    "status": "confirmed_contaminated",
+                    "unsafe_quantity": 20,
+                    "evidence": "Shelf tags reference the LotA_R2 relabel lineage.",
+                },
+                "LotB": {
+                    "status": "safe",
+                    "unsafe_quantity": 0,
+                    "evidence": "LotB is a later safe shipment.",
+                },
+            },
+        },
+    },
+}
+PHASE3_SCENARIO: Dict[str, Any] = {
+    "task_id": "phase3_mixed_shipments",
+    "phase": 3,
+    "difficulty": "hard",
+    "name": "Mixed Inventory Precision Containment",
+    "objective": "Contain only the unsafe quantity after contaminated stock was mixed with safe inventory during cross-docking.",
+    "max_steps": 16,
+    "recall_notice": "Critical recall: contaminated LotA was mixed with safe stock during cross-docking. Quarantine only the unsafe quantity.",
+    "contaminated_lot": "LotA",
+    "shipment_graph": {
+        "warehouse": ["crossdock", "store1"],
+        "crossdock": ["store2", "store3"],
+        "store1": [],
+        "store2": [],
+        "store3": [],
+    },
+    "lot_catalog": {
+        "LotA": {
+            "contaminated": True,
+            "product": "ready_meal",
+            "root_lot": "LotA",
+            "notes": "Contaminated upstream batch.",
+        },
+        "LotBlend": {
+            "contaminated": True,
+            "product": "ready_meal",
+            "root_lot": "LotA",
+            "mixed_from": ["LotA", "LotB"],
+            "notes": "Cross-docked mixed lot containing both safe and unsafe units.",
+        },
+        "LotB": {
+            "contaminated": False,
+            "product": "ready_meal",
+            "root_lot": "LotB",
+            "notes": "Safe batch mixed into downstream palletization.",
+        },
+    },
+    "nodes": {
+        "warehouse": {
+            "inventory": {"LotA": 30, "LotB": 25},
+            "quarantined_inventory": {},
+            "inspection_findings": {
+                "LotA": {
+                    "status": "confirmed_contaminated",
+                    "unsafe_quantity": 30,
+                    "evidence": "Source batch LotA remains fully unsafe at origin.",
+                },
+                "LotB": {
+                    "status": "safe",
+                    "unsafe_quantity": 0,
+                    "evidence": "LotB remains unaffected at origin.",
+                },
+            },
+        },
+        "crossdock": {
+            "inventory": {"LotBlend": 35, "LotB": 10},
+            "quarantined_inventory": {},
+            "inspection_findings": {
+                "LotBlend": {
+                    "status": "mixed",
+                    "unsafe_quantity": 12,
+                    "safe_quantity": 23,
+                    "evidence": "Cross-dock exception log shows 12 unsafe units merged into LotBlend.",
+                },
+                "LotB": {
+                    "status": "safe",
+                    "unsafe_quantity": 0,
+                    "evidence": "Standalone LotB pallet is outside the recall.",
+                },
+            },
+        },
+        "store1": {
+            "inventory": {"LotA": 10, "LotB": 20},
+            "quarantined_inventory": {},
+            "inspection_findings": {
+                "LotA": {
+                    "status": "confirmed_contaminated",
+                    "unsafe_quantity": 10,
+                    "evidence": "Original LotA cases shipped directly before blending.",
+                },
+                "LotB": {
+                    "status": "safe",
+                    "unsafe_quantity": 0,
+                    "evidence": "Store LotB stock is unaffected.",
+                },
+            },
+        },
+        "store2": {
+            "inventory": {"LotBlend": 15},
+            "quarantined_inventory": {},
+            "inspection_findings": {
+                "LotBlend": {
+                    "status": "mixed",
+                    "unsafe_quantity": 8,
+                    "safe_quantity": 7,
+                    "evidence": "Receiving variance report allocates 8 unsafe units to store2.",
+                }
+            },
+        },
+        "store3": {
+            "inventory": {"LotBlend": 20, "LotB": 5},
+            "quarantined_inventory": {},
+            "inspection_findings": {
+                "LotBlend": {
+                    "status": "mixed",
+                    "unsafe_quantity": 4,
+                    "safe_quantity": 16,
+                    "evidence": "Inventory reconciliation isolates 4 unsafe units in store3's mixed lot.",
+                },
+                "LotB": {
+                    "status": "safe",
+                    "unsafe_quantity": 0,
+                    "evidence": "Separate LotB shelf stock is unaffected.",
+                },
+            },
+        },
+    },
+}
+SCENARIOS: Dict[str, Dict[str, Any]] = {
+    PHASE1_SCENARIO["task_id"]: PHASE1_SCENARIO,
+    PHASE2_SCENARIO["task_id"]: PHASE2_SCENARIO,
+    PHASE3_SCENARIO["task_id"]: PHASE3_SCENARIO,
+}
+PHASE_LOOKUP: Dict[int, str] = {
+    1: PHASE1_SCENARIO["task_id"],
+    2: PHASE2_SCENARIO["task_id"],
+    3: PHASE3_SCENARIO["task_id"],
+}
+def build_scenario(task_id: str | None = None, phase: int | None = None) -> Dict[str, Any]:
+    """Return a fresh copy of the deterministic scenario for the requested task or phase."""
+    if task_id is None:
+        if phase is None:
+            phase = 1
+        task_id = PHASE_LOOKUP[phase]
+    if task_id not in SCENARIOS:
+        raise ValueError(f"Unknown task_id '{task_id}'. Expected one of {sorted(SCENARIOS)}.")
+    return deepcopy(SCENARIOS[task_id])
+def build_phase1_scenario() -> Dict[str, Any]:
+    return build_scenario(task_id=PHASE1_SCENARIO["task_id"])
+def build_phase2_scenario() -> Dict[str, Any]:
+    return build_scenario(task_id=PHASE2_SCENARIO["task_id"])
+def build_phase3_scenario() -> Dict[str, Any]:
+    return build_scenario(task_id=PHASE3_SCENARIO["task_id"])
+def list_task_specs() -> List[Dict[str, Any]]:
+    """Return lightweight metadata for all tasks."""
+    return [
+        {
+            "task_id": scenario["task_id"],
+            "name": scenario["name"],
+            "difficulty": scenario["difficulty"],
+            "objective": scenario["objective"],
+            "max_steps": scenario["max_steps"],
+        }
+        for scenario in SCENARIOS.values()
+    ]

server.py ADDED Viewed

	@@ -0,0 +1,5 @@

+from server.app import app, main
+if __name__ == "__main__":
+    main()

server/__init__.py CHANGED Viewed

	@@ -0,0 +1 @@


1	+ """Server package for RecallTrace."""

server/app.py CHANGED Viewed

@@ -1,31 +1,152 @@
-"""
-RecallTrace OpenEnv — Server Entry Point
-Required by openenv multi-mode deployment (Task 11)
-"""
-import sys
-from pathlib import Path
-ROOT_DIR = Path(__file__).resolve().parents[1]
-if str(ROOT_DIR) not in sys.path:
-    sys.path.insert(0, str(ROOT_DIR))
-from recall_env.env import RecallTraceEnv
-def make_env(level: str = "easy") -> RecallTraceEnv:
-    """Factory used by the openenv server to instantiate the environment."""
-    return RecallTraceEnv(level=level)
-def main():
-    """Entry point for the openenv server."""
-    env = make_env()
-    obs = env.reset()
-    print("RecallTrace OpenEnv server started.")
-    print(f"recall_notice: {obs['recall_notice']}")
-    return env
-if __name__ == "__main__":
-    main()

+"""FastAPI server for serving RecallTrace in Docker or Hugging Face Spaces."""
+from __future__ import annotations
+from pathlib import Path
+from typing import Optional
+import uvicorn
+from fastapi import FastAPI, HTTPException
+from fastapi.responses import FileResponse
+from fastapi.staticfiles import StaticFiles
+from pydantic import BaseModel
+from baseline.policy import choose_heuristic_action
+from env.env import RecallTraceEnv
+from env.models import RecallAction
+BASE_DIR = Path(__file__).resolve().parent
+STATIC_DIR = BASE_DIR / "static"
+app = FastAPI(title="RecallTrace OpenEnv", version="1.0.0")
+app.mount("/static", StaticFiles(directory=STATIC_DIR), name="static")
+ACTIVE_ENV = RecallTraceEnv()
+class ResetRequest(BaseModel):
+    task_id: Optional[str] = None
+    phase: Optional[int] = None
+class RunEpisodeRequest(BaseModel):
+    task_id: Optional[str] = None
+    phase: Optional[int] = None
+@app.get("/")
+def root() -> FileResponse:
+    return FileResponse(STATIC_DIR / "index.html")
+@app.get("/health")
+def health() -> dict:
+    return {"status": "healthy"}
+@app.get("/tasks")
+def tasks() -> dict:
+    return {"tasks": [task.model_dump() for task in RecallTraceEnv.available_tasks()]}
+@app.get("/api/tasks")
+def api_tasks() -> dict:
+    return tasks()
+@app.get("/reset")
+def reset_get(task_id: Optional[str] = None, phase: Optional[int] = None) -> dict:
+    try:
+        return ACTIVE_ENV.reset(task_id=task_id, phase=phase).model_dump()
+    except Exception as exc:
+        raise HTTPException(status_code=400, detail=str(exc)) from exc
+@app.post("/reset")
+def reset_post(request: ResetRequest) -> dict:
+    try:
+        return ACTIVE_ENV.reset(task_id=request.task_id, phase=request.phase).model_dump()
+    except Exception as exc:
+        raise HTTPException(status_code=400, detail=str(exc)) from exc
+@app.post("/step")
+def step(action: RecallAction) -> dict:
+    try:
+        observation, reward, done, info = ACTIVE_ENV.step(action)
+        return {
+            "observation": observation.model_dump(),
+            "reward": reward,
+            "done": done,
+            "info": info,
+        }
+    except Exception as exc:
+        raise HTTPException(status_code=400, detail=str(exc)) from exc
+@app.get("/state")
+def state() -> dict:
+    return ACTIVE_ENV.state().model_dump()
+def _run_episode(task_id: str | None = None, phase: int | None = None) -> dict:
+    env = RecallTraceEnv(task_id=task_id, phase=phase)
+    observation = env.reset(task_id=task_id, phase=phase)
+    logs = []
+    final_info = {"score": 0.0}
+    for step_number in range(1, env.task.max_steps + 1):
+        action = choose_heuristic_action(observation)
+        observation, reward, done, info = env.step(action)
+        logs.append(
+            {
+                "step": step_number,
+                "action": action.model_dump(exclude_none=True),
+                "reward": reward,
+                "done": done,
+                "message": info.get("message"),
+            }
+        )
+        final_info = info
+        if done:
+            break
+    return {
+        "task": env.task.model_dump(),
+        "score": float(final_info.get("score", 0.0)),
+        "success": float(final_info.get("score", 0.0)) >= 0.9,
+        "steps_taken": env.state().steps_taken,
+        "final_info": final_info,
+        "final_observation": observation.model_dump(),
+        "logs": logs,
+    }
+@app.post("/api/run_episode")
+def run_episode(request: RunEpisodeRequest) -> dict:
+    try:
+        return _run_episode(task_id=request.task_id, phase=request.phase)
+    except Exception as exc:
+        raise HTTPException(status_code=400, detail=str(exc)) from exc
+@app.get("/api/run_all")
+def run_all() -> dict:
+    try:
+        episodes = [_run_episode(task_id=task.task_id) for task in RecallTraceEnv.available_tasks()]
+        average_score = round(sum(item["score"] for item in episodes) / len(episodes), 4)
+        return {
+            "average_score": average_score,
+            "episodes": episodes,
+        }
+    except Exception as exc:
+        raise HTTPException(status_code=400, detail=str(exc)) from exc
+def main() -> None:
+    uvicorn.run(app, host="0.0.0.0", port=7860)
+if __name__ == "__main__":
+    main()

server/static/app.js ADDED Viewed

	@@ -0,0 +1,222 @@

+const taskSelect = document.getElementById("task-select");
+const taskSummary = document.getElementById("task-summary");
+const currentScore = document.getElementById("current-score");
+const currentSteps = document.getElementById("current-steps");
+const currentStatus = document.getElementById("current-status");
+const allScore = document.getElementById("all-score");
+const allResults = document.getElementById("all-results");
+const episodeLog = document.getElementById("episode-log");
+const rewardChart = document.getElementById("reward-chart");
+const finalSummary = document.getElementById("final-summary");
+let taskCatalog = [];
+function renderTaskSummary(task) {
+  taskSummary.innerHTML = `
+    <h3>${task.name}</h3>
+    <p><strong>Difficulty:</strong> ${task.difficulty}</p>
+    <p>${task.objective}</p>
+    <p><strong>Max steps:</strong> ${task.max_steps}</p>
+  `;
+}
+function buildLineChart(logs) {
+  if (!logs.length) {
+    rewardChart.innerHTML = "No rewards available.";
+    return;
+  }
+  const width = 380;
+  const height = 220;
+  const padding = 28;
+  const values = logs.map((entry) => entry.reward);
+  const maxReward = Math.max(...values, 1);
+  const minReward = Math.min(...values, 0);
+  const range = Math.max(maxReward - minReward, 0.25);
+  const toX = (index) => {
+    if (logs.length === 1) {
+      return width / 2;
+    }
+    return padding + (index * (width - padding * 2)) / (logs.length - 1);
+  };
+  const toY = (value) => {
+    return height - padding - ((value - minReward) / range) * (height - padding * 2);
+  };
+  const linePoints = logs
+    .map((entry, index) => `${toX(index)},${toY(entry.reward)}`)
+    .join(" ");
+  const horizontalGuides = [0, 0.25, 0.5, 0.75, 1]
+    .map((ratio) => {
+      const y = padding + ratio * (height - padding * 2);
+      return `<line class="chart-grid" x1="${padding}" y1="${y}" x2="${width - padding}" y2="${y}"></line>`;
+    })
+    .join("");
+  const labels = logs
+    .map((entry, index) => {
+      const x = toX(index);
+      return `<text class="chart-label" x="${x}" y="${height - 8}" text-anchor="middle">S${entry.step}</text>`;
+    })
+    .join("");
+  const points = logs
+    .map((entry, index) => {
+      const x = toX(index);
+      const y = toY(entry.reward);
+      return `
+        <circle class="chart-point" cx="${x}" cy="${y}" r="5"></circle>
+        <text class="chart-label" x="${x}" y="${y - 10}" text-anchor="middle">${entry.reward.toFixed(2)}</text>
+      `;
+    })
+    .join("");
+  rewardChart.innerHTML = `
+    <svg viewBox="0 0 ${width} ${height}" aria-label="Reward line chart">
+      ${horizontalGuides}
+      <line class="chart-axis" x1="${padding}" y1="${height - padding}" x2="${width - padding}" y2="${height - padding}"></line>
+      <line class="chart-axis" x1="${padding}" y1="${padding}" x2="${padding}" y2="${height - padding}"></line>
+      <polyline class="chart-line" points="${linePoints}"></polyline>
+      ${points}
+      ${labels}
+    </svg>
+  `;
+}
+function renderEpisode(data) {
+  currentScore.textContent = data.score.toFixed(4);
+  currentSteps.textContent = String(data.steps_taken);
+  currentStatus.textContent = data.success ? "Contained" : "Needs work";
+  buildLineChart(data.logs);
+  finalSummary.innerHTML = `
+    <div class="summary-grid">
+      <div class="summary-pill">
+        <span>Final score</span>
+        <strong>${data.score.toFixed(4)}</strong>
+      </div>
+      <div class="summary-pill">
+        <span>Status</span>
+        <strong>${data.success ? "Success" : "Needs improvement"}</strong>
+      </div>
+      <div class="summary-pill">
+        <span>Steps used</span>
+        <strong>${data.steps_taken}</strong>
+      </div>
+      <div class="summary-pill">
+        <span>Quarantine quality</span>
+        <strong>${(data.final_info.quarantine_score ?? 0).toFixed(4)}</strong>
+      </div>
+    </div>
+    <div class="summary-card">
+      <strong>Containment outcome</strong>
+      <div>All affected nodes notified: ${data.final_info.all_affected_nodes_notified ? "Yes" : "No"}</div>
+      <div>All affected stock quarantined: ${data.final_info.all_affected_stock_quarantined ? "Yes" : "No"}</div>
+    </div>
+    <div class="summary-card">
+      <strong>Grader focus</strong>
+      <div>Notification score: ${(data.final_info.notification_score ?? 0).toFixed(4)}</div>
+      <div>Investigation score: ${(data.final_info.investigation_score ?? 0).toFixed(4)}</div>
+      <div>Efficiency score: ${(data.final_info.efficiency_score ?? 0).toFixed(4)}</div>
+    </div>
+  `;
+  const logMarkup = data.logs.map((entry) => {
+    const actionType = entry.action.type || "action";
+    const detailBits = [];
+    if (entry.action.node_id) detailBits.push(`Node: ${entry.action.node_id}`);
+    if (entry.action.lot_id) detailBits.push(`Lot: ${entry.action.lot_id}`);
+    if (entry.action.quantity) detailBits.push(`Qty: ${entry.action.quantity}`);
+    return `
+      <div class="log-step">
+        <div class="log-title">
+          <strong>Step ${entry.step}</strong>
+          <span class="action-chip">${actionType.replace("_", " ")}</span>
+        </div>
+        <div class="action-meta">
+          <div>${detailBits.length ? detailBits.join(" | ") : "No extra parameters"}</div>
+          <div>Reward: ${entry.reward.toFixed(4)}</div>
+          <div>Message: ${entry.message || "-"}</div>
+        </div>
+      </div>
+    `;
+  }).join("");
+  episodeLog.innerHTML = `
+    <div class="log-step">
+      <strong>Task:</strong> ${data.task.name}
+    </div>
+    ${logMarkup}
+  `;
+}
+function renderRunAll(data) {
+  allScore.textContent = data.average_score.toFixed(4);
+  allResults.innerHTML = data.episodes.map((episode) => `
+    <div class="log-step">
+      <strong>${episode.task.name}</strong>
+      <div>Difficulty: ${episode.task.difficulty}</div>
+      <div>Score: ${episode.score.toFixed(4)}</div>
+      <div>Steps: ${episode.steps_taken}</div>
+      <div>Status: ${episode.success ? "Success" : "Needs work"}</div>
+    </div>
+  `).join("");
+}
+async function fetchTasks() {
+  const response = await fetch("/api/tasks");
+  const data = await response.json();
+  taskCatalog = data.tasks;
+  taskSelect.innerHTML = taskCatalog.map((task) => `
+    <option value="${task.task_id}">${task.difficulty.toUpperCase()} - ${task.name}</option>
+  `).join("");
+  renderTaskSummary(taskCatalog[0]);
+}
+async function resetTask() {
+  const taskId = taskSelect.value;
+  const response = await fetch(`/reset?task_id=${encodeURIComponent(taskId)}`);
+  const data = await response.json();
+  currentScore.textContent = "-";
+  currentSteps.textContent = String(data.steps_taken || 0);
+  currentStatus.textContent = "Reset";
+  rewardChart.innerHTML = "Task reset. Run a task to render the reward trajectory.";
+  finalSummary.innerHTML = "Readable scoring highlights will appear here.";
+  episodeLog.textContent = JSON.stringify(data, null, 2);
+}
+async function runEpisode() {
+  const response = await fetch("/api/run_episode", {
+    method: "POST",
+    headers: { "Content-Type": "application/json" },
+    body: JSON.stringify({ task_id: taskSelect.value }),
+  });
+  const data = await response.json();
+  renderEpisode(data);
+}
+async function runAllTasks() {
+  const response = await fetch("/api/run_all");
+  const data = await response.json();
+  renderRunAll(data);
+}
+taskSelect.addEventListener("change", () => {
+  const task = taskCatalog.find((item) => item.task_id === taskSelect.value);
+  if (task) {
+    renderTaskSummary(task);
+  }
+});
+document.getElementById("reset-button").addEventListener("click", resetTask);
+document.getElementById("run-button").addEventListener("click", runEpisode);
+document.getElementById("run-all-button").addEventListener("click", runAllTasks);
+fetchTasks();

server/static/index.html ADDED Viewed

	@@ -0,0 +1,149 @@

+<!DOCTYPE html>
+<html lang="en">
+<head>
+  <meta charset="UTF-8">
+  <meta name="viewport" content="width=device-width, initial-scale=1.0">
+  <title>RecallTrace OpenEnv</title>
+  <link rel="preconnect" href="https://fonts.googleapis.com">
+  <link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
+  <link href="https://fonts.googleapis.com/css2?family=Space+Grotesk:wght@400;500;700&family=IBM+Plex+Mono:wght@400;500&display=swap" rel="stylesheet">
+  <link rel="stylesheet" href="/static/styles.css?v=4">
+</head>
+<body>
+  <div class="page-shell">
+    <header class="hero">
+      <div class="hero-copy">
+        <span class="eyebrow">Safety-Critical OpenEnv Benchmark</span>
+        <h1>RecallTrace OpenEnv</h1>
+        <p class="hero-text">
+          A real-world supply-chain recall benchmark where agents must trace contaminated lots,
+          follow relabeled inventory lineage, inspect evidence, and quarantine only the unsafe stock.
+        </p>
+        <div class="badge-row">
+          <span class="badge">OpenEnv compliant</span>
+          <span class="badge">Deterministic grading</span>
+          <span class="badge">3 escalating tasks</span>
+          <span class="badge">Precision containment</span>
+        </div>
+      </div>
+      <div class="hero-panel">
+        <div class="metric-card">
+          <span class="metric-label">Average baseline</span>
+          <strong id="metric-average">0.9677</strong>
+        </div>
+        <div class="metric-card">
+          <span class="metric-label">Hard task focus</span>
+          <strong>Mixed safe/unsafe inventory</strong>
+        </div>
+        <div class="metric-card">
+          <span class="metric-label">Judging edge</span>
+          <strong>Operational realism over toy mechanics</strong>
+        </div>
+      </div>
+    </header>
+    <main class="dashboard-grid">
+      <section class="panel panel-accent">
+        <div class="panel-header">
+          <h2>Task Runner</h2>
+          <p>Choose a task and run the deterministic baseline to inspect the full trajectory.</p>
+        </div>
+        <div class="controls">
+          <label class="field">
+            <span>Task level</span>
+            <select id="task-select"></select>
+          </label>
+          <div class="button-row">
+            <button id="reset-button" class="button button-secondary">Reset Task</button>
+            <button id="run-button" class="button button-primary">Run Episode</button>
+            <button id="run-all-button" class="button button-ghost">Run All Tasks</button>
+          </div>
+        </div>
+        <div id="task-summary" class="task-summary"></div>
+      </section>
+      <section class="panel">
+        <div class="panel-header">
+          <h2>Scoreboard</h2>
+          <p>Live summary of the current task and the multi-task baseline run.</p>
+        </div>
+        <div class="score-grid">
+          <div class="score-card">
+            <span>Current score</span>
+            <strong id="current-score">-</strong>
+          </div>
+          <div class="score-card">
+            <span>Steps taken</span>
+            <strong id="current-steps">-</strong>
+          </div>
+          <div class="score-card">
+            <span>Status</span>
+            <strong id="current-status">Ready</strong>
+          </div>
+          <div class="score-card">
+            <span>Average over all tasks</span>
+            <strong id="all-score">-</strong>
+          </div>
+        </div>
+        <div id="all-results" class="all-results empty-state">Run all tasks to compare easy, medium, and hard performance.</div>
+      </section>
+      <section class="panel panel-wide">
+        <div class="panel-header">
+          <h2>Episode Output</h2>
+          <p>Visual baseline trajectory, readable action summaries, and final grading highlights.</p>
+        </div>
+        <div class="episode-layout">
+          <div class="episode-visuals">
+            <div class="mini-panel">
+              <h3>Reward Curve</h3>
+              <div id="reward-chart" class="reward-chart empty-state">Run a task to render the reward trajectory.</div>
+            </div>
+            <div class="mini-panel">
+              <h3>Final Outcome</h3>
+              <div id="final-summary" class="final-summary empty-state">Readable scoring highlights will appear here.</div>
+            </div>
+          </div>
+          <div id="episode-log" class="episode-log empty-state">Run a task to populate the episode trajectory.</div>
+        </div>
+      </section>
+      <section class="panel">
+        <div class="panel-header">
+          <h2>Judge Lens</h2>
+        </div>
+        <div class="highlight-stack">
+          <div class="highlight-card">
+            <span class="highlight-title">Real-world utility</span>
+            <p>Models a safety-critical recall workflow that QA, operations, and supply-chain teams actually perform.</p>
+          </div>
+          <div class="highlight-card">
+            <span class="highlight-title">Frontier challenge</span>
+            <p>The hard task forces precision containment of mixed safe and unsafe stock under partial observability.</p>
+          </div>
+          <div class="highlight-card">
+            <span class="highlight-title">Benchmark quality</span>
+            <p>Deterministic graders evaluate precision, coverage, investigation depth, and efficiency with reproducible scores.</p>
+          </div>
+        </div>
+      </section>
+      <section class="panel">
+        <div class="panel-header">
+          <h2>Project Hub</h2>
+        </div>
+        <div class="link-list">
+          <a href="/health" target="_blank" rel="noreferrer">Health endpoint</a>
+          <a href="/reset" target="_blank" rel="noreferrer">Reset endpoint</a>
+          <a href="/tasks" target="_blank" rel="noreferrer">Task catalog JSON</a>
+          <a href="https://github.com/MS-Shamanth/recalltrace-openenv/tree/sham" target="_blank" rel="noreferrer">GitHub source</a>
+          <a href="https://huggingface.co/spaces/ms-shamanth/recalltrace-openenv/tree/main" target="_blank" rel="noreferrer">Space files</a>
+          <a href="https://www.docker.com/" target="_blank" rel="noreferrer">Docker runtime</a>
+          <a href="https://github.com/openenvai/openenv" target="_blank" rel="noreferrer">OpenEnv ecosystem</a>
+        </div>
+      </section>
+    </main>
+  </div>
+  <script src="/static/app.js?v=4"></script>
+</body>
+</html>

server/static/styles.css ADDED Viewed

	@@ -0,0 +1,499 @@

+:root {
+  --bg: #09111f;
+  --panel: rgba(16, 25, 40, 0.92);
+  --panel-strong: rgba(12, 20, 34, 0.98);
+  --text: #eef3ff;
+  --muted: #a8b4ca;
+  --border: rgba(255, 255, 255, 0.08);
+  --warning: #ff6f3c;
+  --warning-soft: rgba(255, 111, 60, 0.14);
+  --success: #38d39f;
+  --shadow: 0 24px 60px rgba(0, 0, 0, 0.4);
+}
+* {
+  box-sizing: border-box;
+}
+body {
+  margin: 0;
+  min-height: 100vh;
+  background:
+    radial-gradient(circle at top left, rgba(255, 111, 60, 0.18), transparent 30%),
+    radial-gradient(circle at top right, rgba(56, 211, 159, 0.14), transparent 26%),
+    linear-gradient(180deg, #08101d 0%, #050a14 100%);
+  color: var(--text);
+  font-family: "Space Grotesk", sans-serif;
+}
+.page-shell {
+  width: min(1280px, calc(100% - 32px));
+  margin: 32px auto 48px;
+}
+.hero,
+.panel {
+  border: 1px solid var(--border);
+  background: var(--panel);
+  box-shadow: var(--shadow);
+  backdrop-filter: blur(16px);
+}
+.hero {
+  display: grid;
+  grid-template-columns: 1.6fr 1fr;
+  gap: 24px;
+  padding: 28px;
+  border-radius: 28px;
+}
+.eyebrow {
+  display: inline-block;
+  margin-bottom: 12px;
+  color: var(--warning);
+  font-size: 0.9rem;
+  letter-spacing: 0.12em;
+  text-transform: uppercase;
+}
+h1, h2, h3 {
+  margin: 0;
+}
+h1 {
+  font-size: clamp(2.4rem, 6vw, 4.8rem);
+  line-height: 0.95;
+}
+.hero-text,
+.panel-header p,
+.task-summary p,
+.link-list,
+.all-results,
+.episode-log {
+  color: var(--muted);
+}
+.hero-text {
+  max-width: 60ch;
+  font-size: 1.08rem;
+  line-height: 1.6;
+}
+.badge-row {
+  display: flex;
+  flex-wrap: wrap;
+  gap: 10px;
+  margin-top: 18px;
+}
+.badge {
+  padding: 8px 12px;
+  border-radius: 999px;
+  background: rgba(255, 255, 255, 0.06);
+  border: 1px solid var(--border);
+  font-size: 0.92rem;
+}
+.hero-panel {
+  display: grid;
+  gap: 14px;
+}
+.metric-card,
+.score-card {
+  padding: 18px;
+  border-radius: 20px;
+  background: var(--panel-strong);
+  border: 1px solid var(--border);
+}
+.metric-card strong,
+.score-card strong {
+  display: block;
+  margin-top: 8px;
+  font-size: 1.25rem;
+  line-height: 1.3;
+}
+.metric-label,
+.score-card span,
+.field span {
+  color: var(--muted);
+  font-size: 0.95rem;
+}
+.dashboard-grid {
+  display: grid;
+  grid-template-columns: 1.1fr 0.9fr;
+  gap: 20px;
+  margin-top: 20px;
+}
+.panel {
+  padding: 24px;
+  border-radius: 24px;
+}
+.panel-accent {
+  background:
+    linear-gradient(180deg, rgba(255, 111, 60, 0.12), transparent 55%),
+    var(--panel);
+}
+.panel-wide {
+  grid-column: 1 / -1;
+}
+.panel-header {
+  margin-bottom: 18px;
+}
+.panel-header p {
+  margin-top: 8px;
+}
+.controls {
+  display: grid;
+  gap: 18px;
+}
+.field {
+  display: grid;
+  gap: 8px;
+}
+select,
+button {
+  font: inherit;
+}
+select {
+  padding: 14px 16px;
+  border-radius: 16px;
+  border: 1px solid var(--border);
+  background: rgba(7, 13, 24, 0.96);
+  color: var(--text);
+  font-weight: 600;
+  box-shadow: inset 0 0 0 1px rgba(255, 255, 255, 0.03);
+}
+select:focus {
+  outline: 2px solid rgba(255, 111, 60, 0.45);
+  outline-offset: 2px;
+}
+select option {
+  background: #0d1525;
+  color: var(--text);
+}
+.button-row {
+  display: flex;
+  flex-wrap: wrap;
+  gap: 12px;
+}
+.button {
+  border: none;
+  border-radius: 16px;
+  padding: 14px 18px;
+  cursor: pointer;
+  transition: transform 0.2s ease, opacity 0.2s ease, box-shadow 0.2s ease;
+}
+.button:hover {
+  transform: translateY(-1px);
+}
+.button-primary {
+  background: linear-gradient(135deg, #ff934f 0%, #ff6f3c 100%);
+  color: #fff;
+  box-shadow: 0 14px 32px rgba(255, 111, 60, 0.24);
+}
+.button-secondary {
+  background: rgba(255, 255, 255, 0.07);
+  color: var(--text);
+  border: 1px solid var(--border);
+}
+.button-ghost {
+  background: rgba(56, 211, 159, 0.12);
+  color: #dffff4;
+  border: 1px solid rgba(56, 211, 159, 0.24);
+}
+.task-summary {
+  margin-top: 18px;
+  padding: 18px;
+  border-radius: 18px;
+  background: rgba(255, 255, 255, 0.04);
+  border: 1px solid var(--border);
+}
+.task-summary h3 {
+  margin: 0 0 8px;
+}
+.score-grid {
+  display: grid;
+  grid-template-columns: repeat(2, minmax(0, 1fr));
+  gap: 12px;
+}
+.empty-state {
+  padding: 18px;
+  border: 1px dashed rgba(255, 255, 255, 0.16);
+  border-radius: 18px;
+  background: rgba(255, 255, 255, 0.03);
+}
+.episode-layout {
+  display: grid;
+  grid-template-columns: 460px minmax(0, 1fr);
+  gap: 22px;
+  align-items: start;
+}
+.episode-visuals {
+  display: grid;
+  gap: 18px;
+  position: sticky;
+  top: 16px;
+}
+.mini-panel {
+  padding: 18px;
+  border-radius: 20px;
+  background: var(--panel-strong);
+  border: 1px solid var(--border);
+}
+.episode-log,
+.all-results {
+  font-family: "IBM Plex Mono", monospace;
+  font-size: 0.93rem;
+  line-height: 1.6;
+  white-space: pre-wrap;
+}
+.episode-log {
+  max-height: 760px;
+  min-height: 760px;
+  overflow-y: auto;
+  overflow-x: hidden;
+  padding: 22px;
+  border-radius: 20px;
+  background: var(--panel-strong);
+  border: 1px solid var(--border);
+}
+.all-results {
+  max-height: 240px;
+  overflow-y: auto;
+  padding-right: 10px;
+}
+.reward-chart {
+  min-height: 240px;
+  padding: 12px 8px 8px;
+  border-radius: 18px;
+  background: rgba(255, 255, 255, 0.03);
+  border: 1px solid var(--border);
+}
+.reward-chart svg {
+  display: block;
+  width: 100%;
+  height: 240px;
+}
+.chart-axis {
+  stroke: rgba(255, 255, 255, 0.15);
+  stroke-width: 1;
+}
+.chart-grid {
+  stroke: rgba(255, 255, 255, 0.08);
+  stroke-width: 1;
+  stroke-dasharray: 4 4;
+}
+.chart-line {
+  fill: none;
+  stroke: #38d39f;
+  stroke-width: 3;
+  stroke-linecap: round;
+  stroke-linejoin: round;
+}
+.chart-point {
+  fill: #ff6f3c;
+  stroke: #fff;
+  stroke-width: 2;
+}
+.chart-label {
+  fill: #a8b4ca;
+  font-size: 11px;
+  font-family: "IBM Plex Mono", monospace;
+}
+.final-summary {
+  display: grid;
+  gap: 12px;
+}
+.summary-card {
+  padding: 14px;
+  border-radius: 16px;
+  background: rgba(255, 255, 255, 0.04);
+  border: 1px solid var(--border);
+}
+.summary-card strong {
+  display: block;
+  margin-bottom: 6px;
+  font-size: 0.96rem;
+}
+.summary-grid {
+  display: grid;
+  grid-template-columns: repeat(2, minmax(0, 1fr));
+  gap: 10px;
+}
+.summary-pill {
+  padding: 12px;
+  border-radius: 14px;
+  background: rgba(255, 255, 255, 0.05);
+  border: 1px solid var(--border);
+}
+.summary-pill span {
+  display: block;
+  color: var(--muted);
+  font-size: 0.82rem;
+  margin-bottom: 6px;
+}
+.summary-pill strong {
+  font-size: 1rem;
+}
+.episode-log::-webkit-scrollbar,
+.all-results::-webkit-scrollbar {
+  width: 10px;
+}
+.episode-log::-webkit-scrollbar-thumb,
+.all-results::-webkit-scrollbar-thumb {
+  background: rgba(255, 255, 255, 0.14);
+  border-radius: 999px;
+}
+.log-step {
+  padding: 18px 0;
+  border-bottom: 1px solid rgba(255, 255, 255, 0.06);
+}
+.log-step:first-child {
+  padding-top: 0;
+}
+.log-step:last-child {
+  border-bottom: none;
+  padding-bottom: 0;
+}
+.log-step strong {
+  color: var(--text);
+}
+.log-title {
+  display: flex;
+  justify-content: space-between;
+  gap: 12px;
+  align-items: center;
+  margin-bottom: 10px;
+}
+.action-chip {
+  padding: 4px 10px;
+  border-radius: 999px;
+  background: var(--warning-soft);
+  color: #ffd6c5;
+  border: 1px solid rgba(255, 111, 60, 0.22);
+  font-size: 0.76rem;
+  text-transform: uppercase;
+  letter-spacing: 0.08em;
+  white-space: nowrap;
+}
+.action-meta {
+  display: grid;
+  gap: 8px;
+  color: var(--muted);
+}
+.highlight-stack {
+  display: grid;
+  gap: 12px;
+}
+.highlight-card {
+  padding: 16px;
+  border-radius: 18px;
+  background: rgba(255, 255, 255, 0.04);
+  border: 1px solid var(--border);
+}
+.highlight-card p {
+  margin: 8px 0 0;
+  color: var(--muted);
+  line-height: 1.6;
+}
+.highlight-title {
+  color: var(--text);
+  font-weight: 700;
+}
+.link-list {
+  display: grid;
+  gap: 12px;
+}
+.link-list a {
+  color: #ffd7c7;
+  text-decoration: none;
+}
+.link-list a:hover {
+  text-decoration: underline;
+}
+@media (max-width: 1100px) {
+  .episode-layout {
+    grid-template-columns: 1fr;
+  }
+  .episode-visuals {
+    position: static;
+  }
+}
+@media (max-width: 960px) {
+  .hero,
+  .dashboard-grid,
+  .summary-grid,
+  .score-grid {
+    grid-template-columns: 1fr;
+  }
+  .episode-log {
+    min-height: 520px;
+    max-height: 520px;
+  }
+}

tests/test_env.py ADDED Viewed

	@@ -0,0 +1,72 @@

+"""Unit tests for RecallTrace."""
+from __future__ import annotations
+import unittest
+from env.env import RecallTraceEnv
+from grader.grader import evaluate_action_plan
+class RecallTraceEnvTests(unittest.TestCase):
+    def test_phase1_plan_scores_high(self) -> None:
+        grade = evaluate_action_plan(
+            "phase1_direct_recall",
+            [
+                {"type": "trace_lot", "lot_id": "LotA"},
+                {"type": "inspect_node", "node_id": "warehouse"},
+                {"type": "inspect_node", "node_id": "store1"},
+                {"type": "inspect_node", "node_id": "store2"},
+                {"type": "quarantine", "node_id": "warehouse", "lot_id": "LotA", "quantity": 100},
+                {"type": "quarantine", "node_id": "store1", "lot_id": "LotA", "quantity": 50},
+                {"type": "quarantine", "node_id": "store2", "lot_id": "LotA", "quantity": 20},
+                {"type": "notify", "node_id": "all"},
+                {"type": "finalize"},
+            ],
+        )
+        self.assertGreaterEqual(grade.score, 0.95)
+        self.assertTrue(grade.success)
+    def test_phase2_trace_reveals_relabels(self) -> None:
+        env = RecallTraceEnv(task_id="phase2_relabel_recall")
+        env.reset()
+        observation, reward, done, info = env.step({"type": "trace_lot", "lot_id": "LotA"})
+        self.assertFalse(done)
+        self.assertGreater(reward, 0)
+        self.assertEqual(info["matched_lots"], ["LotA", "LotA_R1", "LotA_R2"])
+        self.assertIn("store3", observation.trace_results["LotA"]["affected_nodes"])
+    def test_phase3_mixed_inventory_requires_exact_quarantine(self) -> None:
+        env = RecallTraceEnv(task_id="phase3_mixed_shipments")
+        env.reset()
+        env.step({"type": "trace_lot", "lot_id": "LotA"})
+        env.step({"type": "inspect_node", "node_id": "crossdock"})
+        _, reward, _, info = env.step({"type": "quarantine", "node_id": "crossdock", "lot_id": "LotBlend", "quantity": 15})
+        self.assertLess(reward, 0)
+        self.assertEqual(info["target_contaminated_quantity"], 12)
+    def test_phase3_full_plan_scores_high(self) -> None:
+        grade = evaluate_action_plan(
+            "phase3_mixed_shipments",
+            [
+                {"type": "trace_lot", "lot_id": "LotA"},
+                {"type": "inspect_node", "node_id": "warehouse"},
+                {"type": "inspect_node", "node_id": "crossdock"},
+                {"type": "inspect_node", "node_id": "store1"},
+                {"type": "inspect_node", "node_id": "store2"},
+                {"type": "inspect_node", "node_id": "store3"},
+                {"type": "quarantine", "node_id": "warehouse", "lot_id": "LotA", "quantity": 30},
+                {"type": "quarantine", "node_id": "crossdock", "lot_id": "LotBlend", "quantity": 12},
+                {"type": "quarantine", "node_id": "store1", "lot_id": "LotA", "quantity": 10},
+                {"type": "quarantine", "node_id": "store2", "lot_id": "LotBlend", "quantity": 8},
+                {"type": "quarantine", "node_id": "store3", "lot_id": "LotBlend", "quantity": 4},
+                {"type": "notify", "node_id": "all"},
+                {"type": "finalize"},
+            ],
+        )
+        self.assertGreaterEqual(grade.score, 0.95)
+        self.assertTrue(grade.final_info["all_affected_stock_quarantined"])
+if __name__ == "__main__":
+    unittest.main()

uv.toml ADDED Viewed

	@@ -0,0 +1,3 @@

+no-cache = true
+python-preference = "only-system"
+python-downloads = "never"