Spaces:

Torchflow1
/

Multi-Agent-Incident-Command-Center

Running

App Files Files Community

SwapnilPatil28 commited on 24 days ago

Commit

eb2d131

verified ·

1 Parent(s): 906a5a5

New Upgrade to Multi-Agent Incident Command Center

Browse files

Files changed (18) hide show

.gitignore +5 -0
README.md +80 -40
__init__.py +6 -6
client.py +24 -13
inference.py +202 -50
models.py +58 -18
openenv.yaml +6 -6
pre_validate.sh +2 -0
pyproject.toml +15 -18
requirements.txt +10 -4
server/Dockerfile +3 -3
server/__init__.py +1 -0
server/app.py +18 -11
server/environment.py +501 -46
server/requirements.txt +1 -1
server/support_env_environment.py +5 -0
train_trl.py +194 -0
validate-submission.sh +5 -0

.gitignore ADDED Viewed

	@@ -0,0 +1,5 @@

+__pycache__/
+*.pyc
+.venv/
+artifacts/
+outputs/

README.md CHANGED Viewed

@@ -1,8 +1,8 @@
 ---
-title: Support Ticket Routing
-emoji: 🎫
-colorFrom: blue
-colorTo: indigo
 sdk: docker
 pinned: false
 app_port: 8000
@@ -11,65 +11,105 @@ tags:
   - openenv
   - reinforcement-learning
   - llm-agents
 ---
-# 🎫 Customer Support Ticket Routing Environment
-## 📝 Description and Motivation
-This environment simulates a production-grade customer support triage system. Automated agents are tasked with analyzing raw customer queries and routing them to the appropriate department: **Billing**, **Tech**, or **Sales**.
-In real-world scenarios, misrouting leads to high churn and operational costs. This benchmark measures the ability of LLM-based agents to perform high-precision classification in a restricted environment compliant with the `openenv-core` SDK.
-## 🎯 Environment Specification
 ### Action Space
-- `action_type`: Literal["route", "search"]
-- `department`: Optional[str] — Required for `route` action. Valid values: `"Billing"`, `"Tech"`, `"Sales"`.
 ### Observation Space
-- `ticket_id`: Unique tracking ID (e.g., T1, T4).
-- `content`: The raw text string of the customer's request.
-- `search_result`: Contextual data retrieved from the internal database (if the `search` action is invoked).
-- `available_departments`: A list of valid routing targets.
 ### Reward Function
-To facilitate stable training and clear evaluation metrics, this environment uses **strictly bounded rewards**:
-- **0.99**: Correct Department Routing.
-- **0.01**: Incorrect Department Routing.
-- **-0.05**: Search Penalty (Encourages efficiency unless context is truly needed).
-## 🏁 Tasks and Difficulty
-| Task ID | Tickets | Description |
-| :--- | :--- | :--- |
-| `easy` | 1 | Clear keywords (e.g., "Refund", "Invoice"). |
-| `medium` | 2 | Standard conversational support language. |
-| `hard` | 3 | Complex queries involving API logs and technical stack traces. |
-## 🚀 Setup & Benchmarking
-### 1. Installation
 ```bash
-pip install openenv-core uvicorn openai
 ```
-### 2. Run Local Validation
-Ensure your local setup matches the competition requirements:
 ```bash
-openenv validate
 ```
-### 3. Run Baseline Inference
-Execute the provided baseline using the Hugging Face Router and the Qwen2.5-72B model:
 ```bash
-export HF_TOKEN="your_huggingface_token"
 python inference.py
 ```
-## 🛠️ Technical Architecture
-- **Backend**: Python FastAPI serving `openenv-core` compatible endpoints.
-- **Infrastructure**: Containerized deployment via Docker on Hugging Face Spaces.
-- **Models**: Pydantic-based state and action validation.
 ---
-*Submission for the Scaler Meta PyTorch Hackathon.*
-*Environment ID: `support_env` | Powered by OpenEnv SDK.*

 ---
+title: Multi-Agent Incident Command Center
+emoji: 🚨
+colorFrom: red
+colorTo: purple
 sdk: docker
 pinned: false
 app_port: 8000
   - openenv
   - reinforcement-learning
   - llm-agents
+  - multi-agent
+  - long-horizon
 ---
+# 🚨 Multi-Agent Incident Command Center (OpenEnv Round 2)
+## Problem and Motivation
+This environment simulates incident management for a modern software platform under real operational constraints.
+The agent must coordinate multiple specialist roles and resolve incidents over long trajectories with partial observability, action costs, and SLA pressure. This targets Round-2 themes:
+- **Theme #1 Multi-Agent Interactions**: triage, investigator, and ops-manager role coordination
+- **Theme #3.1 World Modeling (Professional Tasks)**: realistic logs/metrics/KB workflows
+- **Theme #2 Long-Horizon Planning**: delayed rewards, carry-over constraints, budget-limited sessions
+## Environment Design
 ### Action Space
+- `inspect_logs(target)`
+- `inspect_metrics(target)`
+- `consult_kb(target)`
+- `negotiate_handoff(target)` where target is one of:
+  - `triage_agent`
+  - `investigator_agent`
+  - `ops_manager_agent`
+- `apply_fix(resolution_summary)`
+- `close_incident(root_cause, resolution_summary)`
 ### Observation Space
+- `incident_id`, `incident_title`, `incident_description`
+- `visible_signals` (partial clues)
+- `available_actions`, `available_teams`
+- `budget_remaining`, `sla_minutes_remaining`, `incidents_remaining`
+- `terminal_output` (response from world/tool execution)
 ### Reward Function
+- Dense shaping with delayed completion rewards:
+  - Small penalty for investigation actions to discourage brute-force scanning
+  - Positive reward for discovering new root-cause evidence
+  - Bonus for correct specialist handoff
+  - Positive reward for effective mitigation
+  - Large terminal reward for correct closure (with additional speed bonus)
+  - Strong negative reward for wrong closure, SLA exhaustion, or budget exhaustion
+## Task Levels
+- `easy`: 2 incidents
+- `medium`: 3 incidents
+- `hard`: 4 incidents with stricter planning requirements
+## Local Setup
 ```bash
+python -m venv .venv
+# Windows PowerShell:
+.venv\Scripts\Activate.ps1
+pip install -r requirements.txt
 ```
+### Run environment
 ```bash
+python -m server.app
 ```
+### Run baseline inference
 ```bash
 python inference.py
 ```
+### OpenEnv validation
+```bash
+openenv validate
+```
+## Training Script (TRL)
+This repo includes `train_trl.py` for minimum Round-2 training evidence using Hugging Face TRL.
+It does:
+1. Roll out trajectories from a baseline coordinator
+2. Convert trajectories into SFT-style chat examples
+3. Train a compact model with `SFTTrainer`
+4. Evaluate random vs heuristic policy and save plots
+```bash
+python train_trl.py
+```
+Artifacts are written to `artifacts/`:
+- `reward_curve.png`
+- `summary_metrics.json`
+## Hugging Face Space
+After testing locally, deploy this repo as a Docker Space and set `app_port=8000`.
+## Submission Checklist
+- [ ] OpenEnv latest runtime and `openenv validate` passing
+- [ ] HF Space URL live and reachable
+- [ ] `train_trl.py` (or Colab equivalent) run with real outputs
+- [ ] Reward/loss plot images committed and linked
+- [ ] 2-minute demo video/blog link added
+- [ ] README links all artifacts and references
 ---
+*Environment ID: `incident_command_center_env`*

__init__.py CHANGED Viewed

@@ -4,13 +4,13 @@
 # This source code is licensed under the BSD-style license found in the
 # LICENSE file in the root directory of this source tree.
-"""Support Env Environment."""
-from .client import SupportEnv
-from .models import SupportAction, SupportObservation
 __all__ = [
-    "SupportAction",
-    "SupportObservation",
-    "SupportEnv",
 ]

 # This source code is licensed under the BSD-style license found in the
 # LICENSE file in the root directory of this source tree.
+"""Incident Command Center environment."""
+from .client import IncidentCommandEnvClient
+from .models import IncidentAction, IncidentObservation
 __all__ = [
+    "IncidentAction",
+    "IncidentObservation",
+    "IncidentCommandEnvClient",
 ]

client.py CHANGED Viewed

@@ -1,26 +1,37 @@
 from openenv.core.env_client import EnvClient
 from openenv.core.client_types import StepResult
-from models import SREAction, SREObservation, SREState
-class SREEnvClient(EnvClient[SREAction, SREObservation, SREState]):
-    def _step_payload(self, action: SREAction) -> dict:
         return action.model_dump(exclude_none=True)
     def _parse_result(self, payload: dict) -> StepResult:
         obs_data = payload.get("observation", {})
-        # Unpacking the new SRE variables safely just like your original code did
-        observation = SREObservation(
-            ticket_id=obs_data.get("ticket_id", ""),
-            content=obs_data.get("content", ""),
-            terminal_output=obs_data.get("terminal_output", "")
         )
         return StepResult(
             observation=observation,
             reward=payload.get("reward", 0.0),
-            done=payload.get("done", False)
         )
-    def _parse_state(self, payload: dict) -> SREState:
-        return SREState(**payload)

 from openenv.core.env_client import EnvClient
 from openenv.core.client_types import StepResult
+from models import IncidentAction, IncidentObservation, IncidentState
+class IncidentCommandEnvClient(EnvClient[IncidentAction, IncidentObservation, IncidentState]):
+    def _step_payload(self, action: IncidentAction) -> dict:
         return action.model_dump(exclude_none=True)
     def _parse_result(self, payload: dict) -> StepResult:
         obs_data = payload.get("observation", {})
+        observation = IncidentObservation(
+            incident_id=obs_data.get("incident_id", ""),
+            incident_title=obs_data.get("incident_title", ""),
+            incident_description=obs_data.get("incident_description", ""),
+            available_actions=obs_data.get("available_actions", []),
+            available_teams=obs_data.get("available_teams", []),
+            visible_signals=obs_data.get("visible_signals", []),
+            terminal_output=obs_data.get("terminal_output", ""),
+            budget_remaining=obs_data.get("budget_remaining", 0),
+            sla_minutes_remaining=obs_data.get("sla_minutes_remaining", 0),
+            incidents_remaining=obs_data.get("incidents_remaining", 0),
         )
         return StepResult(
             observation=observation,
             reward=payload.get("reward", 0.0),
+            done=payload.get("done", False),
         )
+    def _parse_state(self, payload: dict) -> IncidentState:
+        return IncidentState(**payload)
+# Backward-compatible alias for older imports.
+SREEnvClient = IncidentCommandEnvClient

inference.py CHANGED Viewed

@@ -1,83 +1,235 @@
-import os
 import asyncio
-from typing import List, Optional
-from openai import OpenAI
-from client import SupportEnvClient, SupportAction
-# 1. Mandatory Environment Variables
-HF_TOKEN = os.getenv("HF_TOKEN") or os.getenv("API_KEY")
-API_BASE_URL = os.getenv("API_BASE_URL", "https://router.huggingface.co/v1")
-MODEL_NAME = os.getenv("MODEL_NAME", "Qwen/Qwen2.5-72B-Instruct")
-ENV_URL = os.getenv("ENV_URL", "https://swapnilpatil28-support-env.hf.space")
-BENCHMARK = "support_env"
-# 2. Logging Helpers (Exactly per Sample Script)
-def log_start(task: str, env: str, model: str) -> None:
-    print(f"[START] task={task} env={env} model={model}", flush=True)
 def log_step(step: int, action: str, reward: float, done: bool, error: Optional[str]) -> None:
     error_val = error if error else "null"
     done_val = str(done).lower()
-    print(f"[STEP] step={step} action={action} reward={reward:.2f} done={done_val} error={error_val}", flush=True)
 def log_end(success: bool, steps: int, score: float, rewards: List[float]) -> None:
     rewards_str = ",".join(f"{r:.2f}" for r in rewards)
-    print(f"[END] success={str(success).lower()} steps={steps} score={score:.3f} rewards={rewards_str}", flush=True)
-# 3. Model Interaction Logic
-def get_model_action(client: OpenAI, ticket_content: str) -> str:
-    try:
-        prompt = f"Ticket: {ticket_content}. Reply with ONE word: Billing, Tech, or Sales."
-        completion = client.chat.completions.create(
-            model=MODEL_NAME,
-            messages=[{"role": "user", "content": prompt}],
-            temperature=0.7,
-            max_tokens=10
         )
-        return completion.choices[0].message.content.strip().strip('.')
-    except Exception as e:
-        return "Tech" # Fallback
 async def run_task(task_name: str):
-    client = OpenAI(base_url=API_BASE_URL, api_key=HF_TOKEN)
-    env = SupportEnvClient(base_url=ENV_URL).sync() # Sync wrapper used for simplicity
-    log_start(task=task_name, env=BENCHMARK, model=MODEL_NAME)
-    rewards = []
     steps_taken = 0
-    score = 0.0
     success = False
     try:
-        # Initial Reset
         res = env.reset(task_name=task_name)
         while not res.done:
             steps_taken += 1
-            action_str = get_model_action(client, res.observation.content)
-            # Step in environment
-            res = env.step(SupportAction(action_type="route", department=action_str))
             reward = float(res.reward or 0.0)
             rewards.append(reward)
-            log_step(step=steps_taken, action=action_str, reward=reward, done=res.done, error=None)
-        # Scoring Logic (Normalized [0,1])
         score = sum(rewards) / len(rewards) if rewards else 0.0
-        score = min(max(score, 0.0), 1.0)
-        success = score > 0.5
     finally:
         try:
             env.close()
-        except:
             pass
         log_end(success=success, steps=steps_taken, score=score, rewards=rewards)
-if __name__ == "__main__":
-    # Iterate through tasks sequentially
     for task in ["easy", "medium", "hard"]:
         asyncio.run(run_task(task))

 import asyncio
+import os
+import random
+from typing import Dict, List, Optional
+from client import IncidentCommandEnvClient
+from models import IncidentAction
+ENV_URL = os.getenv("ENV_URL", "http://127.0.0.1:8000")
+BENCHMARK = "incident_command_center_env"
+RANDOM_BASELINE = os.getenv("RANDOM_BASELINE", "false").lower() == "true"
+def log_start(task: str, env: str, policy: str) -> None:
+    print(f"[START] task={task} env={env} policy={policy}", flush=True)
 def log_step(step: int, action: str, reward: float, done: bool, error: Optional[str]) -> None:
     error_val = error if error else "null"
     done_val = str(done).lower()
+    print(
+        f"[STEP] step={step} action={action} reward={reward:.2f} done={done_val} error={error_val}",
+        flush=True,
+    )
 def log_end(success: bool, steps: int, score: float, rewards: List[float]) -> None:
     rewards_str = ",".join(f"{r:.2f}" for r in rewards)
+    print(
+        f"[END] success={str(success).lower()} steps={steps} score={score:.3f} rewards={rewards_str}",
+        flush=True,
+    )
+class HeuristicCoordinator:
+    """Simple policy for baseline demonstrations and offline data generation."""
+    def __init__(self) -> None:
+        self._phase_by_incident: Dict[str, int] = {}
+        self._suspects_by_incident: Dict[str, str] = {}
+    def select_action(self, observation) -> IncidentAction:
+        incident_id = observation.incident_id
+        text = (
+            f"{observation.incident_title} {observation.incident_description} "
+            f"{' '.join(observation.visible_signals)} {observation.terminal_output}"
+        ).lower()
+        phase = self._phase_by_incident.get(incident_id, 0)
+        if phase == 0:
+            self._phase_by_incident[incident_id] = 1
+            return IncidentAction(
+                actor="triage_agent",
+                action_type="inspect_logs",
+                target=self._pick_log_target(text),
+            )
+        if phase == 1:
+            self._phase_by_incident[incident_id] = 2
+            return IncidentAction(
+                actor="investigator_agent",
+                action_type="inspect_metrics",
+                target=self._pick_metric_target(text),
+            )
+        if phase == 2:
+            self._phase_by_incident[incident_id] = 3
+            owner = self._pick_owner(text)
+            return IncidentAction(
+                actor="ops_manager_agent",
+                action_type="negotiate_handoff",
+                target=owner,
+            )
+        if phase == 3:
+            self._phase_by_incident[incident_id] = 4
+            guess = self._infer_root_cause(text)
+            self._suspects_by_incident[incident_id] = guess
+            return IncidentAction(
+                actor="investigator_agent",
+                action_type="apply_fix",
+                resolution_summary=self._generate_fix_plan(guess),
+            )
+        guess = self._suspects_by_incident.get(incident_id, self._infer_root_cause(text))
+        return IncidentAction(
+            actor="ops_manager_agent",
+            action_type="close_incident",
+            root_cause=guess,
+            resolution_summary=f"Closed with hypothesis {guess}.",
         )
+    def _pick_log_target(self, text: str) -> str:
+        mapping = {
+            "checkout": "payments-api",
+            "login": "auth-service",
+            "catalog": "catalog-api",
+            "shipment": "route-planner",
+            "invoice": "billing-worker",
+            "cascade": "notification-gateway",
+            "export": "export-worker",
+            "alert": "alert-router",
+            "inventory": "inventory-ledger",
+        }
+        return self._pick_from_mapping(text, mapping, "auth-service")
+    def _pick_metric_target(self, text: str) -> str:
+        mapping = {
+            "checkout": "dash-redis",
+            "login": "dash-auth",
+            "catalog": "dash-kafka",
+            "shipment": "dash-eta",
+            "invoice": "dash-billing",
+            "cascade": "dash-notify",
+            "export": "dash-export",
+            "alert": "dash-alerts",
+            "inventory": "dash-inventory",
+        }
+        return self._pick_from_mapping(text, mapping, "dash-global")
+    def _pick_owner(self, text: str) -> str:
+        if any(token in text for token in ["deploy", "rate", "sla", "rotation"]):
+            return "ops_manager_agent"
+        if any(token in text for token in ["schema", "export", "cache", "inventory"]):
+            return "investigator_agent"
+        return "triage_agent"
+    def _infer_root_cause(self, text: str) -> str:
+        if "redis" in text and "pool" in text:
+            return "redis_connection_pool_exhausted"
+        if "jwt" in text or "token" in text:
+            return "jwt_clock_skew_mismatch"
+        if "cache" in text and "invalidation" in text:
+            return "cache_invalidation_topic_lag"
+        if "timezone" in text or "offset" in text:
+            return "timezone_normalization_bug"
+        if "idempotency" in text or "duplicate invoice" in text:
+            return "idempotency_key_regression"
+        if "429" in text or "promo" in text:
+            return "rate_limit_misconfigured_for_promo_segment"
+        if "schema" in text and "drift" in text:
+            return "schema_version_drift"
+        if "dedupe" in text or "alert storm" in text:
+            return "dedupe_rule_disabled"
+        if "out-of-order" in text or "oversell" in text:
+            return "event_ordering_race_condition"
+        return "unknown"
+    def _generate_fix_plan(self, root_cause: str) -> str:
+        fixes = {
+            "redis_connection_pool_exhausted": "increase redis pool and recycle stale connections",
+            "jwt_clock_skew_mismatch": "sync clock tolerance and increase jwt leeway",
+            "cache_invalidation_topic_lag": "scale invalidation consumer and replay partition 3",
+            "timezone_normalization_bug": "patch timezone parser and use iana timezone map",
+            "idempotency_key_regression": "restore idempotency guard and persist retry token first",
+            "rate_limit_misconfigured_for_promo_segment": "hotfix promo segment rate limits and enable exponential backoff",
+            "schema_version_drift": "enforce schema negotiation and pin serializer to v11",
+            "dedupe_rule_disabled": "restore dedupe rule and replay critical fingerprints",
+            "event_ordering_race_condition": "enable sequence guards and quarantine out-of-order events",
+        }
+        return fixes.get(root_cause, "collect additional diagnostics and rollback last change")
+    def _pick_from_mapping(self, text: str, mapping: Dict[str, str], default: str) -> str:
+        for token, value in mapping.items():
+            if token in text:
+                return value
+        return default
+def random_action(observation) -> IncidentAction:
+    action_type = random.choice(observation.available_actions or ["inspect_logs"])
+    teams = observation.available_teams or ["triage_agent", "investigator_agent", "ops_manager_agent"]
+    actor = random.choice(teams)
+    random_target = random.choice(
+        [
+            "payments-api",
+            "auth-service",
+            "dash-auth",
+            "dash-redis",
+            "kb-rate-limits",
+            "investigator_agent",
+        ]
+    )
+    return IncidentAction(
+        actor=actor,
+        action_type=action_type,
+        target=random_target,
+        root_cause="unknown",
+        resolution_summary="random baseline action",
+    )
 async def run_task(task_name: str):
+    env = IncidentCommandEnvClient(base_url=ENV_URL).sync()
+    policy_name = "random_baseline" if RANDOM_BASELINE else "heuristic_coordinator"
+    coordinator = HeuristicCoordinator()
+    log_start(task=task_name, env=BENCHMARK, policy=policy_name)
+    rewards: List[float] = []
     steps_taken = 0
     success = False
     try:
         res = env.reset(task_name=task_name)
         while not res.done:
             steps_taken += 1
+            action = random_action(res.observation) if RANDOM_BASELINE else coordinator.select_action(
+                res.observation
+            )
+            res = env.step(action)
             reward = float(res.reward or 0.0)
             rewards.append(reward)
+            log_step(
+                step=steps_taken,
+                action=f"{action.actor}:{action.action_type}:{action.target or '-'}",
+                reward=reward,
+                done=res.done,
+                error=None,
+            )
         score = sum(rewards) / len(rewards) if rewards else 0.0
+        success = score > 0.2
     finally:
         try:
             env.close()
+        except Exception:
             pass
         log_end(success=success, steps=steps_taken, score=score, rewards=rewards)
+def main() -> None:
     for task in ["easy", "medium", "hard"]:
         asyncio.run(run_task(task))
+if __name__ == "__main__":
+    main()

models.py CHANGED Viewed

@@ -1,18 +1,58 @@
-from typing import Optional, Literal
-from openenv.core.env_server import Action, Observation, State
-from pydantic import Field
-class SREAction(Action):
-    action_type: Literal["query_logs", "check_metrics", "resolve_ticket"] = Field(..., description="Action to take")
-    service_name: Optional[str] = Field(None, description="Required for query_logs")
-    dashboard_id: Optional[str] = Field(None, description="Required for check_metrics")
-    root_cause: Optional[str] = Field(None, description="Required for resolve_ticket")
-class SREObservation(Observation):
-    ticket_id: str
-    content: str
-    terminal_output: str = ""
-class SREState(State):
-    task_id: str = "easy"
-    current_ticket_index: int = 0

+from typing import Dict, List, Literal, Optional
+from openenv.core.env_server import Action, Observation, State
+from pydantic import Field
+class IncidentAction(Action):
+    action_type: Literal[
+        "inspect_logs",
+        "inspect_metrics",
+        "consult_kb",
+        "negotiate_handoff",
+        "apply_fix",
+        "close_incident",
+    ] = Field(..., description="The action selected by the acting agent.")
+    target: Optional[str] = Field(
+        None,
+        description="Service/dashboard/knowledge id depending on action_type.",
+    )
+    root_cause: Optional[str] = Field(
+        None,
+        description="Predicted root cause when action_type=close_incident.",
+    )
+    resolution_summary: Optional[str] = Field(
+        None,
+        description="Human-readable fix summary for apply_fix/close_incident.",
+    )
+    actor: Literal["triage_agent", "investigator_agent", "ops_manager_agent"] = Field(
+        "triage_agent",
+        description="Which specialist is currently acting in the environment.",
+    )
+class IncidentObservation(Observation):
+    incident_id: str
+    incident_title: str
+    incident_description: str
+    available_actions: List[str] = Field(default_factory=list)
+    available_teams: List[str] = Field(default_factory=list)
+    visible_signals: List[str] = Field(default_factory=list)
+    terminal_output: str = ""
+    budget_remaining: int = 0
+    sla_minutes_remaining: int = 0
+    incidents_remaining: int = 0
+class IncidentState(State):
+    task_id: str = "easy"
+    current_incident_index: int = 0
+    incidents_resolved: int = 0
+    incidents_failed: int = 0
+    budget_remaining: int = 0
+    sla_minutes_remaining: int = 0
+    mitigation_applied: bool = False
+    clues_found: List[str] = Field(default_factory=list)
+    handoff_history: List[str] = Field(default_factory=list)
+    action_trace: List[str] = Field(default_factory=list)
+    per_incident_steps: Dict[str, int] = Field(default_factory=dict)

openenv.yaml CHANGED Viewed

@@ -1,10 +1,10 @@
-name: "support_env"
-version: "1.0"
-description: "A real-world environment for routing customer support tickets."
 tasks:
   - id: "easy"
-    description: "Route 1 obvious ticket."
   - id: "medium"
-    description: "Route 2 standard tickets."
   - id: "hard"
-    description: "Route 3 complex tickets."

+name: "incident_command_center_env"
+version: "2.0"
+description: "A multi-agent long-horizon environment for incident triage, investigation, and coordinated remediation."
 tasks:
   - id: "easy"
+    description: "Resolve 2 incidents with clear but noisy signals."
   - id: "medium"
+    description: "Resolve 3 incidents with partial observability and trade-offs."
   - id: "hard"
+    description: "Resolve 4 incidents under strict budget + SLA constraints."

pre_validate.sh CHANGED Viewed

@@ -10,6 +10,8 @@ openenv validate
 echo "[3/3] Checking Inference Script format..."
 if [ -f "inference.py" ]; then echo "  ✓ inference.py found"; else echo "  ✗ inference.py missing"; exit 1; fi
 echo "========================================"
 echo "  Ready for Submission!"
 echo "========================================"

 echo "[3/3] Checking Inference Script format..."
 if [ -f "inference.py" ]; then echo "  ✓ inference.py found"; else echo "  ✗ inference.py missing"; exit 1; fi
+if [ -f "train_trl.py" ]; then echo "  ✓ train_trl.py found"; else echo "  ✗ train_trl.py missing"; exit 1; fi
 echo "========================================"
 echo "  Ready for Submission!"
 echo "========================================"

pyproject.toml CHANGED Viewed

@@ -9,23 +9,21 @@ requires = ["setuptools>=45", "wheel"]
 build-backend = "setuptools.build_meta"
 [project]
-name = "openenv-support_env"
 version = "0.1.0"
-description = "Support Env environment for OpenEnv"
 requires-python = ">=3.10"
 dependencies = [
-    # Core OpenEnv runtime (provides FastAPI server + HTTP client types)
-    # install from github
-    # "openenv-core[core] @ git+https://github.com/meta-pytorch/OpenEnv.git",
     "openenv-core[core]>=0.2.2",
-    # Environment-specific dependencies
-    # Add all dependencies needed for your environment here
-    # Examples:
-    # "numpy>=1.19.0",
-    # "torch>=2.0.0",
-    # "gymnasium>=0.29.0",
-    # "openspiel>=1.0.0",
-    # "smolagents>=1.22.0,<2",
 ]
 [project.optional-dependencies]
@@ -35,11 +33,10 @@ dev = [
 ]
 [project.scripts]
-# Server entry point - enables running via: uv run --project . server
-# or: python -m support_env.server.app
-server = "support_env.server.app:main"
 [tool.setuptools]
 include-package-data = true
-packages = ["support_env", "support_env.server"]
-package-dir = { "support_env" = ".", "support_env.server" = "server" }

 build-backend = "setuptools.build_meta"
 [project]
+name = "openenv-incident-command-center"
 version = "0.1.0"
+description = "Multi-agent Incident Command Center environment for OpenEnv"
 requires-python = ">=3.10"
 dependencies = [
     "openenv-core[core]>=0.2.2",
+    "fastapi>=0.115.0",
+    "uvicorn>=0.30.0",
+    "pydantic>=2.7.0",
+    "transformers>=4.44.0",
+    "trl>=0.10.1",
+    "datasets>=2.20.0",
+    "accelerate>=0.33.0",
+    "peft>=0.12.0",
+    "matplotlib>=3.8.0",
 ]
 [project.optional-dependencies]
 ]
 [project.scripts]
+server = "server.app:main"
+run-baseline = "inference:main"
+run-training = "train_trl:main"
 [tool.setuptools]
 include-package-data = true
+py-modules = ["client", "models", "inference", "train_trl"]

requirements.txt CHANGED Viewed

@@ -1,4 +1,10 @@
-openenv-core
-fastapi
-uvicorn
-pydantic

+openenv-core[core]>=0.2.2
+fastapi>=0.115.0
+uvicorn>=0.30.0
+pydantic>=2.7.0
+transformers>=4.44.0
+trl>=0.10.1
+datasets>=2.20.0
+accelerate>=0.33.0
+peft>=0.12.0
+matplotlib>=3.8.0

server/Dockerfile CHANGED Viewed

@@ -1,6 +1,6 @@
 FROM python:3.11-slim
 WORKDIR /app
-COPY requirements.txt .
-RUN pip install --no-cache-dir -r requirements.txt
-COPY . .
 CMD ["uvicorn", "server.app:app", "--host", "0.0.0.0", "--port", "8000"]

 FROM python:3.11-slim
 WORKDIR /app
+COPY server/requirements.txt /app/requirements.txt
+RUN pip install --no-cache-dir -r /app/requirements.txt
+COPY . /app
 CMD ["uvicorn", "server.app:app", "--host", "0.0.0.0", "--port", "8000"]

server/__init__.py ADDED Viewed

	@@ -0,0 +1 @@


1	+ """Server package for Incident Command Center environment."""

server/app.py CHANGED Viewed

@@ -1,6 +1,6 @@
 from openenv.core.env_server import create_fastapi_app
-from models import SREAction, SREObservation
-from server.environment import SREEnvironment
 from fastapi.responses import HTMLResponse
 import uvicorn
@@ -10,7 +10,7 @@ dashboard_content = r"""
 <head>
     <meta charset='UTF-8'>
     <meta name='viewport' content='width=device-width, initial-scale=1.0'>
-    <title>SRE Debugging | OpenEnv Dashboard</title>
     <style>
         :root { --primary: #3b82f6; --bg: #0f172a; --card: #1e293b; --text: #e2e8f0; }
         body { font-family: -apple-system, sans-serif; background-color: var(--bg); color: var(--text); padding: 2rem; }
@@ -20,24 +20,31 @@ dashboard_content = r"""
 </head>
 <body>
     <div class='container'>
-        <h1>Long-Horizon SRE Debugging</h1>
-        <p>Theme 3.1: Professional Tasks - Multi-step agent triage simulator.</p>
         <h2>Action Space</h2>
         <ul>
-            <li><code>query_logs(service_name)</code></li>
-            <li><code>check_metrics(dashboard_id)</code></li>
-            <li><code>resolve_ticket(root_cause)</code></li>
         </ul>
         <h2>Reward Logic</h2>
-        <p>Correctly resolving a ticket yields <b>+1.0</b>. Querying logs or checking metrics costs a slight penalty of <b>-0.1</b> to encourage agent efficiency. Wrong resolution gives <b>-1.0</b>.</p>
     </div>
 </body>
 </html>
 """
-app = create_fastapi_app(SREEnvironment, SREAction, SREObservation)
 @app.get('/', response_class=HTMLResponse)
 @app.get('/web', response_class=HTMLResponse)
@@ -48,4 +55,4 @@ def main():
     uvicorn.run(app, host='0.0.0.0', port=8000)
 if __name__ == '__main__':
-    main()

 from openenv.core.env_server import create_fastapi_app
+from models import IncidentAction, IncidentObservation
+from server.environment import IncidentCommandCenterEnvironment
 from fastapi.responses import HTMLResponse
 import uvicorn
 <head>
     <meta charset='UTF-8'>
     <meta name='viewport' content='width=device-width, initial-scale=1.0'>
+    <title>Incident Command Center | OpenEnv Dashboard</title>
     <style>
         :root { --primary: #3b82f6; --bg: #0f172a; --card: #1e293b; --text: #e2e8f0; }
         body { font-family: -apple-system, sans-serif; background-color: var(--bg); color: var(--text); padding: 2rem; }
 </head>
 <body>
     <div class='container'>
+        <h1>Multi-Agent Incident Command Center</h1>
+        <p>Round-2 themes: Multi-Agent Interactions + World Modeling (Professional Tasks).</p>
         <h2>Action Space</h2>
         <ul>
+            <li><code>inspect_logs(target)</code></li>
+            <li><code>inspect_metrics(target)</code></li>
+            <li><code>consult_kb(target)</code></li>
+            <li><code>negotiate_handoff(target)</code></li>
+            <li><code>apply_fix(resolution_summary)</code></li>
+            <li><code>close_incident(root_cause)</code></li>
         </ul>
         <h2>Reward Logic</h2>
+        <p>Dense reward shaping for clue discovery, team coordination, and efficient resolution under budget + SLA constraints. Correct closure with mitigation gets the highest reward.</p>
     </div>
 </body>
 </html>
 """
+app = create_fastapi_app(
+    IncidentCommandCenterEnvironment,
+    IncidentAction,
+    IncidentObservation,
+)
 @app.get('/', response_class=HTMLResponse)
 @app.get('/web', response_class=HTMLResponse)
     uvicorn.run(app, host='0.0.0.0', port=8000)
 if __name__ == '__main__':
+    main()

server/environment.py CHANGED Viewed

@@ -1,61 +1,516 @@
 import uuid
-from typing import List, Dict
 from openenv.core.env_server import Environment
-from models import SREAction, SREObservation, SREState
-class SREEnvironment(Environment):
     def __init__(self):
         super().__init__()
-        self.logs = {
-            "auth-service": "ERROR: Connection to DB timed out.",
-            "database": "WARN: Storage 99% full.",
-            "frontend": "500 Internal Server Error calling backend-service",
-            "backend-service": "Failed to authenticate request: timeout from auth-service"
-        }
-        self.metrics = {
-            "dash-db": "CPU: 15%, Memory: 99.9%, Disk: 99% full.",
-            "dash-auth": "CPU: 5%",
-            "dash-front": "CPU: 10%",
-            "dash-back": "CPU: 20%"
-        }
-        self.tasks = {
-            "easy": [{"id": "E1", "text": "Users can't login, check auth-service logs.", "root_cause": "database"}],
-            "medium": [{"id": "M1", "text": "Backend service failing.", "root_cause": "auth-service"}],
-            "hard": [{"id": "H1", "text": "Frontend reporting 500s. Trace it back to the root cause.", "root_cause": "database"}],
         }
-    def reset(self, task_name: str = "easy") -> SREObservation:
-        self.current_task = self.tasks.get(task_name, self.tasks["easy"])
-        self._state = SREState(episode_id=str(uuid.uuid4()), task_id=task_name, current_ticket_index=0)
-        t = self.current_task[0]
-        return SREObservation(done=False, reward=0.0, ticket_id=t["id"], content=t["text"], terminal_output="Environment initialized.")
-    def step(self, action: SREAction) -> SREObservation:
         self._state.step_count += 1
-        ticket = self.current_task[self._state.current_ticket_index]
         reward = 0.0
         terminal_output = ""
-        if action.action_type == "query_logs":
-            reward = -0.1
-            terminal_output = self.logs.get(action.service_name, f"No logs found for {action.service_name}")
-        elif action.action_type == "check_metrics":
-            reward = -0.1
-            terminal_output = self.metrics.get(action.dashboard_id, f"Dashboard {action.dashboard_id} not found")
-        elif action.action_type == "resolve_ticket":
-            correct = action.root_cause and action.root_cause.strip().lower() == ticket["root_cause"].lower()
-            reward = 1.0 if correct else -1.0
-            self._state.current_ticket_index += 1
-            if self._state.current_ticket_index < len(self.current_task):
-                t = self.current_task[self._state.current_ticket_index]
-                return SREObservation(done=False, reward=reward, ticket_id=t["id"], content=t["text"], terminal_output=f"Previous ticket resolved. Result: {'Correct' if correct else 'Incorrect'}. Next ticket.")
-            return SREObservation(done=True, reward=reward, ticket_id="EOF", content="Done.", terminal_output=f"Final ticket resolved. Result: {'Correct' if correct else 'Incorrect'}.")
-        return SREObservation(done=False, reward=reward, ticket_id=ticket["id"], content=ticket["text"], terminal_output=terminal_output)
     @property
-    def state(self) -> SREState:
-        return self._state

 import uuid
+from typing import Dict, List
 from openenv.core.env_server import Environment
+from models import IncidentAction, IncidentObservation, IncidentState
+class IncidentCommandCenterEnvironment(Environment):
+    """Multi-agent, long-horizon SRE incident simulation for OpenEnv."""
     def __init__(self):
         super().__init__()
+        self.tasks = self._build_tasks()
+        self._task_budgets = {"easy": 24, "medium": 48, "hard": 72}
+        self._task_sla = {"easy": 90, "medium": 180, "hard": 300}
+        self.current_task: List[Dict[str, object]] = []
+    def _build_tasks(self) -> Dict[str, List[Dict[str, object]]]:
+        return {
+            "easy": [
+                {
+                    "id": "INC-E1",
+                    "title": "Checkout timeouts",
+                    "description": "Payment checkout is failing intermittently for premium users.",
+                    "root_cause": "redis_connection_pool_exhausted",
+                    "signals": [
+                        "Spike in checkout latency for premium cohort",
+                        "Error budget dropped from 99.9% to 99.2%",
+                    ],
+                    "logs": {
+                        "payments-api": "Timeout waiting for redis write lock",
+                        "checkout-worker": "Queue delay exceeds 12s under load",
+                        "redis-cluster": "Connection pool exhausted at 512/512",
+                    },
+                    "metrics": {
+                        "dash-checkout": "p99 latency 4.1s, error-rate 6.2%",
+                        "dash-redis": "connections 512/512, eviction 0, cpu 74%",
+                        "dash-worker": "queue_depth 440, consumer_lag 380",
+                    },
+                    "kb": {
+                        "kb-redis-pool": "Raise redis pool and recycle stale handles in checkout-worker.",
+                        "kb-checkout-fallback": "Degrade recommendation calls when payment queue > 300.",
+                    },
+                    "good_handoff": "investigator_agent",
+                    "accepted_fixes": [
+                        "increase redis pool",
+                        "recycle stale connections",
+                        "enable checkout fallback",
+                    ],
+                },
+                {
+                    "id": "INC-E2",
+                    "title": "Login failures after deploy",
+                    "description": "Users report frequent login retries after auth rollout.",
+                    "root_cause": "jwt_clock_skew_mismatch",
+                    "signals": [
+                        "Auth errors spike immediately after deployment",
+                        "Regional variance appears in mobile clients",
+                    ],
+                    "logs": {
+                        "auth-service": "Token issued-at in future; rejected by validator",
+                        "gateway": "401 bursts from auth-service route",
+                        "mobile-api": "Retrying auth flow due to invalid token state",
+                    },
+                    "metrics": {
+                        "dash-auth": "401_rate 14%, token_validation_failures high",
+                        "dash-gateway": "auth_route_retries 3.2x baseline",
+                    },
+                    "kb": {
+                        "kb-jwt-time": "Synchronize clock skew tolerance for issuer and verifier.",
+                        "kb-mobile-auth": "Fallback to server timestamp for token freshness checks.",
+                    },
+                    "good_handoff": "ops_manager_agent",
+                    "accepted_fixes": [
+                        "increase jwt leeway",
+                        "sync clock tolerance",
+                        "roll back token validator",
+                    ],
+                },
+            ],
+            "medium": [
+                {
+                    "id": "INC-M1",
+                    "title": "Catalog stale prices",
+                    "description": "Users see old prices during flash sale windows.",
+                    "root_cause": "cache_invalidation_topic_lag",
+                    "signals": [
+                        "Mismatch between checkout and catalog prices",
+                        "Issue concentrated in high-traffic products",
+                    ],
+                    "logs": {
+                        "catalog-api": "Read from cache generation=188, expected=193",
+                        "kafka-consumer": "Lag increased on invalidation-topic partition 3",
+                        "pricing-service": "Published invalidation events at 2.1k/s",
+                    },
+                    "metrics": {
+                        "dash-catalog": "cache_hit 98%, stale_reads elevated",
+                        "dash-kafka": "consumer_lag 5400 on partition 3",
+                    },
+                    "kb": {
+                        "kb-cache-invalidation": "Scale invalidation consumers and replay stalled partition.",
+                    },
+                    "good_handoff": "investigator_agent",
+                    "accepted_fixes": [
+                        "scale invalidation consumer",
+                        "replay partition 3",
+                        "flush impacted cache keys",
+                    ],
+                },
+                {
+                    "id": "INC-M2",
+                    "title": "Shipment ETA corruption",
+                    "description": "Shipping ETAs jump unpredictably after route service update.",
+                    "root_cause": "timezone_normalization_bug",
+                    "signals": [
+                        "ETA jumps by +24h in APAC region",
+                        "Warehouse scans are on-time, only UI estimate is wrong",
+                    ],
+                    "logs": {
+                        "route-planner": "Parsed timezone fallback=UTC for locale en-IN",
+                        "eta-service": "Normalization mismatch for offset +05:30",
+                    },
+                    "metrics": {
+                        "dash-eta": "eta_anomaly_rate 9.4%",
+                        "dash-route": "parser_warnings spike post deploy",
+                    },
+                    "kb": {
+                        "kb-timezone": "Use IANA timezone mapping and validate locale fallback path.",
+                    },
+                    "good_handoff": "triage_agent",
+                    "accepted_fixes": [
+                        "patch timezone parser",
+                        "use iana timezone map",
+                        "rollback route update",
+                    ],
+                },
+                {
+                    "id": "INC-M3",
+                    "title": "Invoice duplicates",
+                    "description": "A subset of merchants received duplicate invoices.",
+                    "root_cause": "idempotency_key_regression",
+                    "signals": [
+                        "Duplicate invoices share same order id",
+                        "Triggered after billing retry logic change",
+                    ],
+                    "logs": {
+                        "billing-worker": "Retry path ignored idempotency token for v2 flow",
+                        "billing-api": "POST /invoice executed twice for order O-92A",
+                    },
+                    "metrics": {
+                        "dash-billing": "duplicate_invoice_rate 3.7%",
+                        "dash-worker": "retry_attempts 2.4x",
+                    },
+                    "kb": {
+                        "kb-idempotency": "Persist retry token before dispatch and enforce dedupe check.",
+                    },
+                    "good_handoff": "ops_manager_agent",
+                    "accepted_fixes": [
+                        "restore idempotency guard",
+                        "persist retry token first",
+                        "dedupe duplicate invoice jobs",
+                    ],
+                },
+            ],
+            "hard": [
+                {
+                    "id": "INC-H1",
+                    "title": "Cross-service saturation cascade",
+                    "description": "A sudden promo launch causes cascading failures across checkout, auth, and notification services.",
+                    "root_cause": "rate_limit_misconfigured_for_promo_segment",
+                    "signals": [
+                        "Failure spreads from notifications to checkout within minutes",
+                        "Customer segment 'promo_mega' has concentrated failures",
+                    ],
+                    "logs": {
+                        "notification-gateway": "429 flood for promo_mega segment",
+                        "checkout-api": "Retries amplified upstream failures from notification sidecar",
+                        "auth-service": "Session refresh queue saturation due to retry storm",
+                    },
+                    "metrics": {
+                        "dash-global": "error budget burn 3.7x",
+                        "dash-notify": "429_rate 38%",
+                        "dash-auth": "session_queue_depth 940",
+                    },
+                    "kb": {
+                        "kb-rate-limits": "Segment-specific limits must be applied with gradual rollout and backoff.",
+                    },
+                    "good_handoff": "ops_manager_agent",
+                    "accepted_fixes": [
+                        "hotfix promo segment rate limits",
+                        "enable exponential backoff",
+                        "throttle notification fanout",
+                    ],
+                },
+                {
+                    "id": "INC-H2",
+                    "title": "Data export corruption",
+                    "description": "Enterprise customers report corrupted CSV exports from analytics dashboard.",
+                    "root_cause": "schema_version_drift",
+                    "signals": [
+                        "Corruption only in accounts migrated last week",
+                        "Export job success is high but data quality is low",
+                    ],
+                    "logs": {
+                        "export-worker": "Schema mismatch: expected v11 got v10 on tenant shard",
+                        "analytics-api": "Fallback serializer dropped nullable columns",
+                    },
+                    "metrics": {
+                        "dash-export": "job_success 97%, data_quality_score 61%",
+                        "dash-analytics": "schema_mismatch counter rising",
+                    },
+                    "kb": {
+                        "kb-schema-drift": "Force schema negotiation at read time and backfill migrated shards.",
+                    },
+                    "good_handoff": "investigator_agent",
+                    "accepted_fixes": [
+                        "enforce schema negotiation",
+                        "backfill migrated shards",
+                        "pin serializer to v11",
+                    ],
+                },
+                {
+                    "id": "INC-H3",
+                    "title": "On-call alert storm",
+                    "description": "On-call rotations are overwhelmed by noisy duplicate alerts, masking a real outage.",
+                    "root_cause": "dedupe_rule_disabled",
+                    "signals": [
+                        "Alert volume 10x baseline with low incident diversity",
+                        "Primary outage not visible in first-page alerts",
+                    ],
+                    "logs": {
+                        "alert-router": "Deduplication pipeline bypassed after config reload",
+                        "pager-service": "Repeated notifications for identical fingerprint",
+                    },
+                    "metrics": {
+                        "dash-alerts": "alerts_per_minute 1200",
+                        "dash-pager": "notification_duplicates 87%",
+                    },
+                    "kb": {
+                        "kb-alert-dedupe": "Restore dedupe stage and replay suppressed critical fingerprint set.",
+                    },
+                    "good_handoff": "triage_agent",
+                    "accepted_fixes": [
+                        "restore dedupe rule",
+                        "replay critical fingerprints",
+                        "mute duplicate alert channels",
+                    ],
+                },
+                {
+                    "id": "INC-H4",
+                    "title": "Inventory phantom stock",
+                    "description": "Inventory service reports available stock that does not exist in warehouse.",
+                    "root_cause": "event_ordering_race_condition",
+                    "signals": [
+                        "Negative physical stock but positive ledger entries",
+                        "Warehouse reconciliation jobs are delayed",
+                    ],
+                    "logs": {
+                        "inventory-ledger": "Out-of-order reserve/release events for same SKU",
+                        "warehouse-sync": "Late event merge exceeded ordering window",
+                    },
+                    "metrics": {
+                        "dash-inventory": "oversell_incidents 4.2%",
+                        "dash-sync": "late_event_ratio 17%",
+                    },
+                    "kb": {
+                        "kb-event-ordering": "Use monotonic sequence guards and quarantine out-of-order events.",
+                    },
+                    "good_handoff": "investigator_agent",
+                    "accepted_fixes": [
+                        "enable sequence guards",
+                        "quarantine out-of-order events",
+                        "reconcile affected skus",
+                    ],
+                },
+            ],
         }
+    def reset(self, task_name: str = "easy") -> IncidentObservation:
+        selected_task = task_name if task_name in self.tasks else "easy"
+        self.current_task = self.tasks[selected_task]
+        self._state = IncidentState(
+            episode_id=str(uuid.uuid4()),
+            task_id=selected_task,
+            current_incident_index=0,
+            budget_remaining=self._task_budgets[selected_task],
+            sla_minutes_remaining=self._task_sla[selected_task],
+        )
+        return self._observation_for_current_incident(
+            terminal_output=(
+                "Incident Command Center initialized. "
+                "Coordinate triage_agent, investigator_agent, and ops_manager_agent."
+            ),
+            reward=0.0,
+            done=False,
+        )
+    def step(self, action: IncidentAction) -> IncidentObservation:
         self._state.step_count += 1
+        self._state.sla_minutes_remaining = max(0, self._state.sla_minutes_remaining - 5)
+        self._state.budget_remaining -= 1
+        if self._state.current_incident_index >= len(self.current_task):
+            return IncidentObservation(
+                done=True,
+                reward=0.0,
+                incident_id="EOF",
+                incident_title="All incidents completed",
+                incident_description="Episode ended.",
+                terminal_output="No remaining incidents.",
+            )
+        if self._state.budget_remaining < 0:
+            self._state.incidents_failed += 1
+            return IncidentObservation(
+                done=True,
+                reward=-1.5,
+                incident_id="BUDGET_EXHAUSTED",
+                incident_title="Resource budget exhausted",
+                incident_description="Agent used too many actions before finishing the task.",
+                terminal_output="Episode terminated: investigation budget exhausted.",
+                budget_remaining=0,
+                sla_minutes_remaining=self._state.sla_minutes_remaining,
+                incidents_remaining=len(self.current_task) - self._state.current_incident_index,
+            )
+        incident = self.current_task[self._state.current_incident_index]
+        incident_id = str(incident["id"])
+        self._state.per_incident_steps[incident_id] = (
+            self._state.per_incident_steps.get(incident_id, 0) + 1
+        )
+        self._state.action_trace.append(f"{action.actor}:{action.action_type}:{action.target or '-'}")
+        if self._state.sla_minutes_remaining <= 0:
+            self._state.incidents_failed += 1
+            return IncidentObservation(
+                done=True,
+                reward=-1.2,
+                incident_id=incident_id,
+                incident_title=str(incident["title"]),
+                incident_description=str(incident["description"]),
+                terminal_output="Episode terminated: global SLA budget reached zero.",
+                budget_remaining=max(self._state.budget_remaining, 0),
+                sla_minutes_remaining=0,
+                incidents_remaining=len(self.current_task) - self._state.current_incident_index,
+            )
         reward = 0.0
         terminal_output = ""
+        if action.action_type == "inspect_logs":
+            reward -= 0.04
+            lookup = (action.target or "").strip()
+            logs = incident["logs"]
+            terminal_output = logs.get(lookup, f"No logs found for target '{lookup}'.")
+            reward += self._grant_clue_reward(incident, terminal_output)
+        elif action.action_type == "inspect_metrics":
+            reward -= 0.04
+            lookup = (action.target or "").strip()
+            metrics = incident["metrics"]
+            terminal_output = metrics.get(lookup, f"No metrics found for target '{lookup}'.")
+            reward += self._grant_clue_reward(incident, terminal_output)
+        elif action.action_type == "consult_kb":
+            reward -= 0.03
+            lookup = (action.target or "").strip()
+            kb = incident["kb"]
+            terminal_output = kb.get(lookup, f"No KB article found for key '{lookup}'.")
+            reward += self._grant_clue_reward(incident, terminal_output)
+        elif action.action_type == "negotiate_handoff":
+            reward -= 0.02
+            team = (action.target or "").strip()
+            self._state.handoff_history.append(team)
+            if team == incident["good_handoff"]:
+                reward += 0.12
+                terminal_output = (
+                    f"Handoff accepted by {team}. "
+                    "New hypothesis confidence increased."
+                )
+            else:
+                reward -= 0.10
+                terminal_output = (
+                    f"Handoff to {team} introduced delay. "
+                    "This incident likely needs a different owner."
+                )
+        elif action.action_type == "apply_fix":
+            reward -= 0.02
+            fix_text = (action.resolution_summary or "").lower()
+            accepted_fixes = incident["accepted_fixes"]
+            is_good_fix = any(token in fix_text for token in accepted_fixes)
+            if is_good_fix:
+                self._state.mitigation_applied = True
+                reward += 0.35
+                terminal_output = "Mitigation accepted. Error rate is stabilizing."
+            else:
+                reward -= 0.30
+                terminal_output = "Applied mitigation appears ineffective."
+        elif action.action_type == "close_incident":
+            guess = (action.root_cause or "").strip().lower()
+            expected = str(incident["root_cause"]).lower()
+            correct = guess == expected
+            episode_done = False
+            if correct:
+                completion_reward = 0.80
+                if self._state.mitigation_applied:
+                    completion_reward += 0.30
+                completion_reward += self._speed_bonus(incident_id)
+                reward += completion_reward
+                self._state.incidents_resolved += 1
+                terminal_output = (
+                    "Incident resolved successfully. "
+                    f"Root cause confirmed: {incident['root_cause']}."
+                )
+            else:
+                reward -= 1.10
+                self._state.incidents_failed += 1
+                terminal_output = (
+                    "Incident closure rejected by postmortem checker. "
+                    f"Expected root cause differs from '{guess or 'unknown'}'."
+                )
+            self._advance_incident()
+            if self._state.current_incident_index >= len(self.current_task):
+                episode_done = True
+                terminal_output += " All assigned incidents processed."
+            else:
+                next_incident = self.current_task[self._state.current_incident_index]
+                terminal_output += f" Next incident: {next_incident['id']}."
+            return self._observation_for_current_incident(
+                terminal_output=terminal_output,
+                reward=reward,
+                done=episode_done,
+            )
+        else:
+            reward -= 0.25
+            terminal_output = f"Unsupported action_type: {action.action_type}"
+        return self._observation_for_current_incident(
+            terminal_output=terminal_output,
+            reward=reward,
+            done=False,
+        )
+    def _grant_clue_reward(self, incident: Dict[str, object], signal_text: str) -> float:
+        root = str(incident["root_cause"]).lower()
+        signal_key = signal_text.strip().lower()
+        if root in signal_key and signal_key not in self._state.clues_found:
+            self._state.clues_found.append(signal_key)
+            return 0.12
+        return 0.0
+    def _speed_bonus(self, incident_id: str) -> float:
+        steps_used = self._state.per_incident_steps.get(incident_id, 1)
+        if steps_used <= 4:
+            return 0.20
+        if steps_used <= 7:
+            return 0.10
+        return 0.0
+    def _advance_incident(self) -> None:
+        self._state.current_incident_index += 1
+        self._state.mitigation_applied = False
+        self._state.clues_found = []
+    def _observation_for_current_incident(
+        self, terminal_output: str, reward: float, done: bool
+    ) -> IncidentObservation:
+        if done:
+            return IncidentObservation(
+                done=True,
+                reward=reward,
+                incident_id="EOF",
+                incident_title="All incidents completed",
+                incident_description="Episode ended.",
+                available_actions=[],
+                available_teams=[],
+                visible_signals=[],
+                terminal_output=terminal_output,
+                budget_remaining=max(self._state.budget_remaining, 0),
+                sla_minutes_remaining=self._state.sla_minutes_remaining,
+                incidents_remaining=0,
+            )
+        incident = self.current_task[self._state.current_incident_index]
+        return IncidentObservation(
+            done=False,
+            reward=reward,
+            incident_id=str(incident["id"]),
+            incident_title=str(incident["title"]),
+            incident_description=str(incident["description"]),
+            available_actions=[
+                "inspect_logs",
+                "inspect_metrics",
+                "consult_kb",
+                "negotiate_handoff",
+                "apply_fix",
+                "close_incident",
+            ],
+            available_teams=["triage_agent", "investigator_agent", "ops_manager_agent"],
+            visible_signals=list(incident["signals"]),
+            terminal_output=terminal_output,
+            budget_remaining=max(self._state.budget_remaining, 0),
+            sla_minutes_remaining=self._state.sla_minutes_remaining,
+            incidents_remaining=len(self.current_task) - self._state.current_incident_index,
+        )
     @property
+    def state(self) -> IncidentState:
+        return self._state

server/requirements.txt CHANGED Viewed

@@ -1,4 +1,4 @@
-openenv[core]>=0.2.0
 fastapi>=0.115.0
 uvicorn>=0.24.0

+openenv-core[core]>=0.2.2
 fastapi>=0.115.0
 uvicorn>=0.24.0

server/support_env_environment.py ADDED Viewed

	@@ -0,0 +1,5 @@

+"""Backward-compatible alias for older imports."""
+from server.environment import IncidentCommandCenterEnvironment
+SupportEnvEnvironment = IncidentCommandCenterEnvironment

train_trl.py ADDED Viewed

	@@ -0,0 +1,194 @@

+import json
+import os
+import random
+from dataclasses import dataclass
+from pathlib import Path
+from typing import Dict, List
+import matplotlib.pyplot as plt
+from datasets import Dataset
+from client import IncidentCommandEnvClient
+from inference import HeuristicCoordinator, random_action
+from models import IncidentAction
+ARTIFACT_DIR = Path("artifacts")
+ARTIFACT_DIR.mkdir(parents=True, exist_ok=True)
+ENV_URL = os.getenv("ENV_URL", "http://127.0.0.1:8000")
+BASE_MODEL = os.getenv("BASE_MODEL", "Qwen/Qwen2.5-1.5B-Instruct")
+MAX_ROLLOUT_STEPS = int(os.getenv("MAX_ROLLOUT_STEPS", "120"))
+@dataclass
+class EpisodeStats:
+    policy_name: str
+    task_name: str
+    total_reward: float
+    steps: int
+    success: bool
+def obs_to_prompt(obs) -> str:
+    return (
+        "You are controlling a multi-agent incident command center.\n"
+        f"Incident ID: {obs.incident_id}\n"
+        f"Title: {obs.incident_title}\n"
+        f"Description: {obs.incident_description}\n"
+        f"Visible signals: {', '.join(obs.visible_signals)}\n"
+        f"Budget remaining: {obs.budget_remaining}\n"
+        f"SLA minutes remaining: {obs.sla_minutes_remaining}\n"
+        f"Terminal output: {obs.terminal_output}\n"
+        "Return a JSON object with keys: actor, action_type, target, root_cause, resolution_summary."
+    )
+def action_to_json(action: IncidentAction) -> str:
+    return json.dumps(action.model_dump(exclude_none=True), ensure_ascii=True)
+def rollout(policy_name: str, task_name: str, collect_dataset: bool = False):
+    env = IncidentCommandEnvClient(base_url=ENV_URL).sync()
+    coordinator = HeuristicCoordinator()
+    records: List[Dict[str, str]] = []
+    rewards: List[float] = []
+    steps = 0
+    try:
+        result = env.reset(task_name=task_name)
+        while not result.done and steps < MAX_ROLLOUT_STEPS:
+            steps += 1
+            if policy_name == "heuristic":
+                action = coordinator.select_action(result.observation)
+            else:
+                action = random_action(result.observation)
+            if collect_dataset:
+                records.append(
+                    {
+                        "prompt": obs_to_prompt(result.observation),
+                        "response": action_to_json(action),
+                    }
+                )
+            result = env.step(action)
+            rewards.append(float(result.reward or 0.0))
+    finally:
+        try:
+            env.close()
+        except Exception:
+            pass
+    total_reward = sum(rewards)
+    success = total_reward > 0.0
+    return EpisodeStats(policy_name, task_name, total_reward, steps, success), records, rewards
+def build_training_dataset(episodes_per_task: int = 4) -> Dataset:
+    all_rows: List[Dict[str, str]] = []
+    for task in ["easy", "medium", "hard"]:
+        for _ in range(episodes_per_task):
+            _, rows, _ = rollout(policy_name="heuristic", task_name=task, collect_dataset=True)
+            all_rows.extend(rows)
+    return Dataset.from_list(all_rows)
+def run_trl_sft(dataset: Dataset) -> None:
+    """
+    Minimal TRL script.
+    This intentionally stays lightweight for CPU-friendly reproducibility.
+    For actual hackathon runs, execute in Colab with a GPU and adjust params.
+    """
+    try:
+        from transformers import AutoModelForCausalLM, AutoTokenizer
+        from trl import SFTConfig, SFTTrainer
+    except ImportError as exc:
+        raise RuntimeError(
+            "Missing training dependencies. Install with: pip install -r requirements.txt"
+        ) from exc
+    tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL)
+    if tokenizer.pad_token is None:
+        tokenizer.pad_token = tokenizer.eos_token
+    model = AutoModelForCausalLM.from_pretrained(BASE_MODEL)
+    def formatting_func(example):
+        return f"<|user|>\n{example['prompt']}\n<|assistant|>\n{example['response']}"
+    config = SFTConfig(
+        output_dir="outputs/sft_run",
+        per_device_train_batch_size=1,
+        gradient_accumulation_steps=2,
+        learning_rate=2e-5,
+        num_train_epochs=1,
+        max_seq_length=768,
+        logging_steps=5,
+        save_strategy="no",
+        report_to=[],
+    )
+    trainer = SFTTrainer(
+        model=model,
+        args=config,
+        train_dataset=dataset,
+        formatting_func=formatting_func,
+    )
+    trainer.train()
+def evaluate_policies() -> Dict[str, List[float]]:
+    random_scores: List[float] = []
+    heuristic_scores: List[float] = []
+    for task in ["easy", "medium", "hard"]:
+        random.seed(7)
+        random_stats, _, _ = rollout("random", task)
+        heuristic_stats, _, _ = rollout("heuristic", task)
+        random_scores.append(random_stats.total_reward)
+        heuristic_scores.append(heuristic_stats.total_reward)
+    return {"random": random_scores, "heuristic": heuristic_scores}
+def plot_rewards(score_map: Dict[str, List[float]]) -> None:
+    labels = ["easy", "medium", "hard"]
+    x = list(range(len(labels)))
+    plt.figure(figsize=(8, 4.5))
+    plt.plot(x, score_map["random"], marker="o", label="Random baseline")
+    plt.plot(x, score_map["heuristic"], marker="o", label="Heuristic coordinator")
+    plt.xticks(x, labels)
+    plt.xlabel("Task difficulty")
+    plt.ylabel("Episode total reward")
+    plt.title("Incident Command Center: baseline comparison")
+    plt.grid(alpha=0.3)
+    plt.legend()
+    plt.tight_layout()
+    plt.savefig(ARTIFACT_DIR / "reward_curve.png", dpi=160)
+    plt.close()
+def main() -> None:
+    dataset = build_training_dataset(episodes_per_task=3)
+    dataset.save_to_disk("artifacts/trl_dataset")
+    run_trl_sft(dataset)
+    scores = evaluate_policies()
+    plot_rewards(scores)
+    summary = {
+        "base_model": BASE_MODEL,
+        "dataset_rows": len(dataset),
+        "random_rewards": scores["random"],
+        "heuristic_rewards": scores["heuristic"],
+    }
+    with open(ARTIFACT_DIR / "summary_metrics.json", "w", encoding="utf-8") as f:
+        json.dump(summary, f, indent=2)
+    print("Training and evaluation complete.")
+    print(f"Saved artifacts in: {ARTIFACT_DIR.resolve()}")
+if __name__ == "__main__":
+    main()

validate-submission.sh CHANGED Viewed

@@ -20,6 +20,11 @@ portable_mktemp() {
 PING_URL="${1:-}"
 REPO_DIR="${2:-.}"
 log()  { printf "[%s] %b\n" "$(date -u +%H:%M:%S)" "$*"; }
 pass() { log "${GREEN}PASSED${NC} -- $1"; }
 fail() { log "${RED}FAILED${NC} -- $1"; }

 PING_URL="${1:-}"
 REPO_DIR="${2:-.}"
+if [ -z "$PING_URL" ]; then
+  printf "Usage: ./validate-submission.sh <hf_space_url> [repo_dir]\n"
+  exit 1
+fi
 log()  { printf "[%s] %b\n" "$(date -u +%H:%M:%S)" "$*"; }
 pass() { log "${GREEN}PASSED${NC} -- $1"; }
 fail() { log "${RED}FAILED${NC} -- $1"; }