Spaces:

Roopalgn
/

AIHack-ITHelpDesk

Running

App Files Files Community

Roopalgn commited on Apr 8

Commit

043d9e1

1 Parent(s): 8241eb5

Upgrade helpdesk env with queue dynamics and operational actions

Browse files

Files changed (13) hide show

README.md +31 -17
inference.py +83 -2
models.py +22 -1
policy_learning.py +56 -2
server/environment.py +665 -15
server/grader.py +14 -4
server/tasks.py +55 -13
tests/test_api_integration.py +2 -2
tests/test_competitive_upgrade.py +39 -20
tests/test_extra_fields_penalty.py +102 -76
tests/test_grader_unit.py +196 -42
tests/test_policy_learning.py +1 -1
tests/test_tasks_unit.py +9 -10

README.md CHANGED Viewed

@@ -25,7 +25,7 @@ If a judge reads only one short explanation, it should be this:
 - this environment models a real enterprise workflow, not a toy classification task
 - each ticket requires typed routing decisions that are easy to score deterministically
-- the task ladder moves cleanly from single-field classification to full operational routing
 - the repo is small enough to rerun quickly and explicit enough to understand without hidden business logic
 ## What This Environment Simulates
@@ -34,9 +34,10 @@ The environment models a realistic helpdesk workflow:
 1. a new ticket enters the queue
 2. the agent reads the ticket title and description
-3. the agent may investigate with lightweight tools, then submit structured routing fields
-4. the grader assigns deterministic credit
-5. the environment advances to the next ticket until the queue is complete
 For hard-task tickets, the environment can now withhold decisive routing context until the agent uses the right investigation tool. That keeps the task from collapsing into one-shot classification and makes tool choice part of the policy.
@@ -45,7 +46,7 @@ This domain is useful for OpenEnv because it is operationally realistic, easy to
 ## Why This Is A Good Hackathon Domain
 - it reflects real enterprise support operations
-- the action space is structured and judge-friendly, with a small investigate-versus-submit split
 - correctness can be scored deterministically
 - the hard task is meaningfully harder than the easy and medium tasks
 - the environment is small enough to rerun quickly
@@ -55,9 +56,9 @@ This domain is useful for OpenEnv because it is operationally realistic, easy to
 The project uses a queue-based episode model.
 - `reset()` samples a task and a queue of 3 to 5 tickets
-- `step()` lets the agent investigate or submit one ticket at a time
 - `state()` exposes the internal episode snapshot
-- hard-task episodes also track queue-level capacity, alternate acceptable routes, and planning penalties across tickets
 - final evaluation is based on the queue outcome, not on isolated per-ticket classification alone
 The environment classes and vocabulary are intentionally frozen to keep collaboration and judging simple.
@@ -91,15 +92,15 @@ Artifacts are written to `analysis/policy_learning_runs/` by default:
 - `search_eval_episodes.jsonl`
 - `search_eval_trajectories.jsonl`
-The default submit policy inside this runner stays deterministic and local. It reuses the repo's heuristic routing logic plus planning-aware routing overrides, so the search loop can study both investigation policy and queue-aware submission quality without depending on external LLM latency or API cost.
 ## Task Ladder
 | ID | Name | Difficulty | Required Fields | What The Agent Must Do |
 |----|------|------------|-----------------|-------------------------|
-| 1 | Issue Type Classification | Easy | `issue_type` | classify the ticket into the best issue category |
-| 2 | Issue Type And Priority | Medium | `issue_type`, `priority` | classify the issue and estimate urgency |
-| 3 | Full Ticket Routing | Hard | `issue_type`, `priority`, `assignment_group`, `resolution_action` | perform full routing and next-step selection |
 ## Locked Vocabulary
@@ -151,10 +152,13 @@ Visible ticket fields:
 - `description`
 - optional `ambiguity_note`
 - optional `planning_note`
 - optional `related_ticket_id`
 - optional `related_ticket_preview`
 - optional `routing_options`
 - optional `capacity_state`
 Each observation also includes:
@@ -196,16 +200,23 @@ The internal `HelpdeskTicketState` tracks:
 - `team_capacity_remaining`
 - `high_priority_slots_remaining`
 - `escalation_slots_remaining`
 - `planning_penalty_total`
 ## Grading And Reward
 Scoring is deterministic and normalized to `[0.0, 1.0]`.
-The action model now supports two paths:
 - `action_type="submit"` for the final routing answer
 - `action_type="investigate"` with a small built-in tool surface before submission
 Available tools:
@@ -223,6 +234,8 @@ Hard-task investigation behavior:
 - blind or repeated probing does not pay by default
 - premature hard-task submission can incur a shaping penalty even when the visible text looks plausible
 - resource-greedy routing can add planning penalties later in the queue even when a single ticket looks correct in isolation
 - terminal `rubric_reward` remains the objective evaluation signal, while per-step `reward` is the denser training signal
 Per-field behavior:
@@ -237,9 +250,9 @@ Task weights:
 | Task | Issue Type | Priority | Assignment Group | Resolution Action |
 |------|------------|----------|------------------|-------------------|
-| 1 | 100% | - | - | - |
-| 2 | 60% | 40% | - | - |
-| 3 | 35% | 20% | 25% | 20% |
 Final episode rubric reward is queue-based:
@@ -251,7 +264,7 @@ Both `reward` and `rubric_reward` now use the closed interval `[0.0, 1.0]`.
 Step reward is lightly milestone-shaped: high per-ticket scores get a small bonus and very low scores get a small penalty before the final clamp.
-Final reward also includes a queue-economics penalty when the agent exceeds the free investigation budget. One investigation per queued ticket is free, but extra investigation steps reduce the final reward more noticeably than before. On hard-task queues, assignment-group capacity, high-priority slots, and escalation slots also create cross-ticket trade-offs.
 To make the environment more RL-friendly, each observation now also surfaces structured reward telemetry:
@@ -302,7 +315,7 @@ It includes:
 ## Difficulty Coverage
-The difficulty ladder is visible both in the task fields and in the dataset itself.
 Easy-style examples:
@@ -322,6 +335,7 @@ Hard-style examples:
 - `ticket-029`: seat expansion combined with a prorating question
 - `ticket-038`: follow-up billing thread with escalated urgency
 - `ticket-045`: repeated account suspension thread with legal-escalation pressure
 ## Repository Layout

 - this environment models a real enterprise workflow, not a toy classification task
 - each ticket requires typed routing decisions that are easy to score deterministically
+- the task ladder now keeps full routing on every task and scales observability, queue pressure, and operational controls instead
 - the repo is small enough to rerun quickly and explicit enough to understand without hidden business logic
 ## What This Environment Simulates
 1. a new ticket enters the queue
 2. the agent reads the ticket title and description
+3. the agent may investigate, request more information, open an incident, defer the ticket, or submit a routing decision
+4. the queue state mutates: capacity shrinks, incidents stay open, deferred tickets return later, and poor handling can spawn follow-up tickets
+5. the grader assigns deterministic credit
+6. the environment advances until the queue is complete
 For hard-task tickets, the environment can now withhold decisive routing context until the agent uses the right investigation tool. That keeps the task from collapsing into one-shot classification and makes tool choice part of the policy.
 ## Why This Is A Good Hackathon Domain
 - it reflects real enterprise support operations
+- the action space is structured and judge-friendly, but now includes meaningful operational controls beyond investigate-versus-submit
 - correctness can be scored deterministically
 - the hard task is meaningfully harder than the easy and medium tasks
 - the environment is small enough to rerun quickly
 The project uses a queue-based episode model.
 - `reset()` samples a task and a queue of 3 to 5 tickets
+- `step()` lets the agent investigate, request clarification, defer, open incidents, or submit one ticket at a time
 - `state()` exposes the internal episode snapshot
+- hard-task episodes also track queue-level capacity, incident slots, alternate acceptable routes, planning penalties, SLA pressure, and dynamic follow-up tickets across the queue
 - final evaluation is based on the queue outcome, not on isolated per-ticket classification alone
 The environment classes and vocabulary are intentionally frozen to keep collaboration and judging simple.
 - `search_eval_episodes.jsonl`
 - `search_eval_trajectories.jsonl`
+The default submit policy inside this runner stays deterministic and local. It reuses the repo's heuristic routing logic plus planning-aware routing overrides, and the policy loop can now also exercise operational actions such as `request_info`, `open_incident`, and `defer` without depending on external LLM latency or API cost.
 ## Task Ladder
 | ID | Name | Difficulty | Required Fields | What The Agent Must Do |
 |----|------|------------|-----------------|-------------------------|
+| 1 | Guided Full Routing | Easy | `issue_type`, `priority`, `assignment_group`, `resolution_action` | route a mostly visible ticket correctly |
+| 2 | Contextual Full Routing | Medium | `issue_type`, `priority`, `assignment_group`, `resolution_action` | route under partial observability with investigation and clarification |
+| 3 | Adaptive Queue Routing | Hard | `issue_type`, `priority`, `assignment_group`, `resolution_action` | route while managing queue pressure, incidents, deferrals, and downstream follow-ups |
 ## Locked Vocabulary
 - `description`
 - optional `ambiguity_note`
 - optional `planning_note`
+- optional `customer_update_note`
 - optional `related_ticket_id`
 - optional `related_ticket_preview`
 - optional `routing_options`
 - optional `capacity_state`
+- optional `operational_context`
+- optional `generated_from_ticket_id`
 Each observation also includes:
 - `team_capacity_remaining`
 - `high_priority_slots_remaining`
 - `escalation_slots_remaining`
+- `incident_slots_remaining`
 - `planning_penalty_total`
+- `incident_gap_total`
+- `sla_breach_count`
+- `dynamic_queue_events`
 ## Grading And Reward
 Scoring is deterministic and normalized to `[0.0, 1.0]`.
+The action model now supports five paths:
 - `action_type="submit"` for the final routing answer
 - `action_type="investigate"` with a small built-in tool surface before submission
+- `action_type="request_info"` to ask for customer / operator clarification on the current ticket
+- `action_type="open_incident"` to reserve incident handling capacity before routing risky tickets
+- `action_type="defer"` to push a ticket later in the queue and accept the downstream queue consequences
 Available tools:
 - blind or repeated probing does not pay by default
 - premature hard-task submission can incur a shaping penalty even when the visible text looks plausible
 - resource-greedy routing can add planning penalties later in the queue even when a single ticket looks correct in isolation
+- incident-sensitive tickets can require an explicit `open_incident` step to avoid future follow-up debt
+- bad or incomplete hard-task handling can append a deterministic follow-up ticket later in the same episode
 - terminal `rubric_reward` remains the objective evaluation signal, while per-step `reward` is the denser training signal
 Per-field behavior:
 | Task | Issue Type | Priority | Assignment Group | Resolution Action |
 |------|------------|----------|------------------|-------------------|
+| 1 | 40% | 20% | 20% | 20% |
+| 2 | 32% | 20% | 24% | 24% |
+| 3 | 30% | 20% | 25% | 25% |
 Final episode rubric reward is queue-based:
 Step reward is lightly milestone-shaped: high per-ticket scores get a small bonus and very low scores get a small penalty before the final clamp.
+Final reward also includes a queue-economics penalty when the agent exceeds the free investigation budget. One investigation-style step per queued ticket is free, but extra investigation or clarification steps reduce the final reward more noticeably than before. On hard-task queues, assignment-group capacity, high-priority slots, escalation slots, incident slots, and deferred-ticket SLA pressure all create cross-ticket trade-offs.
 To make the environment more RL-friendly, each observation now also surfaces structured reward telemetry:
 ## Difficulty Coverage
+The difficulty ladder is now visible in observability and control, not just in the submitted field count.
 Easy-style examples:
 - `ticket-029`: seat expansion combined with a prorating question
 - `ticket-038`: follow-up billing thread with escalated urgency
 - `ticket-045`: repeated account suspension thread with legal-escalation pressure
+- generated `*-followup` tickets: deterministic reopened cases that only appear when the earlier handling was incomplete or operationally risky
 ## Repository Layout

inference.py CHANGED Viewed

@@ -196,9 +196,11 @@ def format_recent_history_entries(
 def build_llm_user_message(ticket: dict, allowed_fields: list[str], instructions: str) -> str:
     ambiguity_note = ticket.get("ambiguity_note")
     planning_note = ticket.get("planning_note")
     related_preview = ticket.get("related_ticket_preview") or {}
     last_tool_result = ticket.get("last_tool_result")
     context_status = ticket.get("context_status") or {}
     recent_history = ticket.get("recent_history") or []
     feedback_summary = ticket.get("feedback_summary")
     last_reward_components = ticket.get("last_reward_components") or {}
@@ -213,6 +215,8 @@ def build_llm_user_message(ticket: dict, allowed_fields: list[str], instructions
         extra_context_lines.append(f"Ambiguity note: {ambiguity_note}")
     if planning_note:
         extra_context_lines.append(f"Planning note: {planning_note}")
     if related_preview:
         extra_context_lines.extend(
             [
@@ -230,6 +234,10 @@ def build_llm_user_message(ticket: dict, allowed_fields: list[str], instructions
         extra_context_lines.append(
             "Context status: " + json.dumps(context_status, sort_keys=True)
         )
     if capacity_state:
         extra_context_lines.append(
             "Queue capacity state: " + json.dumps(capacity_state, sort_keys=True)
@@ -572,16 +580,19 @@ def build_routing_text(ticket: dict) -> str:
     related_preview = ticket.get("related_ticket_preview") or {}
     last_tool_result = ticket.get("last_tool_result") or {}
     routing_options = ticket.get("routing_options") or []
     return " ".join(
         [
             ticket.get("title", ""),
             ticket.get("description", ""),
             ticket.get("ambiguity_note", ""),
             ticket.get("planning_note", ""),
             related_preview.get("title", ""),
             related_preview.get("description", ""),
             json.dumps(last_tool_result, sort_keys=True),
             json.dumps(routing_options, sort_keys=True),
             json.dumps(ticket.get("capacity_state") or {}, sort_keys=True),
             json.dumps(ticket.get("future_queue_demand") or {}, sort_keys=True),
         ]
@@ -909,9 +920,14 @@ def build_action(
         )
-def should_investigate(ticket: dict, history: list[dict[str, Any]]) -> tuple[bool, str | None]:
     if not ticket:
         return False, None
     context_status = ticket.get("context_status") or {}
     hidden_context_remaining = bool(context_status.get("hidden_context_remaining"))
     investigation_required = bool(context_status.get("investigation_required"))
@@ -945,6 +961,7 @@ def should_investigate(ticket: dict, history: list[dict[str, Any]]) -> tuple[boo
         tool_name
         for tool_name in context_status.get("recommended_tools", [])
         if tool_name not in used_tools
     ]
     if hidden_context_remaining and recommended_tools:
         return True, recommended_tools[0]
@@ -1018,6 +1035,8 @@ def should_investigate(ticket: dict, history: list[dict[str, Any]]) -> tuple[boo
         )
     for tool_name in preferred_tools:
         if tool_name not in used_tools:
             return True, tool_name
@@ -1026,6 +1045,39 @@ def should_investigate(ticket: dict, history: list[dict[str, Any]]) -> tuple[boo
     return False, None
 def merge_ticket_context(ticket: dict, observation: Any) -> dict:
     merged_ticket = dict(ticket)
     if getattr(observation, "last_tool_result", None) is not None:
@@ -1033,6 +1085,7 @@ def merge_ticket_context(ticket: dict, observation: Any) -> dict:
     merged_ticket["recent_history"] = list(getattr(observation, "history", []))
     merged_ticket["queue_position"] = getattr(observation, "queue_position", None)
     merged_ticket["tickets_remaining"] = getattr(observation, "tickets_remaining", None)
     merged_ticket["investigation_budget_remaining"] = getattr(
         observation,
         "investigation_budget_remaining",
@@ -1040,6 +1093,10 @@ def merge_ticket_context(ticket: dict, observation: Any) -> dict:
     )
     merged_ticket["average_score_so_far"] = getattr(observation, "average_score_so_far", None)
     merged_ticket["progress_fraction"] = getattr(observation, "progress_fraction", None)
     merged_ticket["last_reward_components"] = dict(
         getattr(observation, "last_reward_components", {}) or {}
     )
@@ -1096,7 +1153,11 @@ def run() -> None:
                     break
                 while getattr(obs, "investigation_budget_remaining", 0) > 0:
-                    investigate, tool_name = should_investigate(ticket, obs.history)
                     if not investigate or tool_name is None:
                         break
                     tool_action = HelpdeskTicketAction(
@@ -1129,6 +1190,26 @@ def run() -> None:
                     break
                 ticket_with_context = merge_ticket_context(ticket, obs)
                 action, action_source, fallback_reason = build_action(
                     ticket_with_context,
                     obs.allowed_fields,

 def build_llm_user_message(ticket: dict, allowed_fields: list[str], instructions: str) -> str:
     ambiguity_note = ticket.get("ambiguity_note")
     planning_note = ticket.get("planning_note")
+    customer_update_note = ticket.get("customer_update_note")
     related_preview = ticket.get("related_ticket_preview") or {}
     last_tool_result = ticket.get("last_tool_result")
     context_status = ticket.get("context_status") or {}
+    operational_context = ticket.get("operational_context") or {}
     recent_history = ticket.get("recent_history") or []
     feedback_summary = ticket.get("feedback_summary")
     last_reward_components = ticket.get("last_reward_components") or {}
         extra_context_lines.append(f"Ambiguity note: {ambiguity_note}")
     if planning_note:
         extra_context_lines.append(f"Planning note: {planning_note}")
+    if customer_update_note:
+        extra_context_lines.append(f"Customer update: {customer_update_note}")
     if related_preview:
         extra_context_lines.extend(
             [
         extra_context_lines.append(
             "Context status: " + json.dumps(context_status, sort_keys=True)
         )
+    if operational_context:
+        extra_context_lines.append(
+            "Operational context: " + json.dumps(operational_context, sort_keys=True)
+        )
     if capacity_state:
         extra_context_lines.append(
             "Queue capacity state: " + json.dumps(capacity_state, sort_keys=True)
     related_preview = ticket.get("related_ticket_preview") or {}
     last_tool_result = ticket.get("last_tool_result") or {}
     routing_options = ticket.get("routing_options") or []
+    operational_context = ticket.get("operational_context") or {}
     return " ".join(
         [
             ticket.get("title", ""),
             ticket.get("description", ""),
             ticket.get("ambiguity_note", ""),
             ticket.get("planning_note", ""),
+            ticket.get("customer_update_note", ""),
             related_preview.get("title", ""),
             related_preview.get("description", ""),
             json.dumps(last_tool_result, sort_keys=True),
             json.dumps(routing_options, sort_keys=True),
+            json.dumps(operational_context, sort_keys=True),
             json.dumps(ticket.get("capacity_state") or {}, sort_keys=True),
             json.dumps(ticket.get("future_queue_demand") or {}, sort_keys=True),
         ]
         )
+def should_investigate(
+    ticket: dict,
+    history: list[dict[str, Any]],
+    available_tools: list[str] | None = None,
+) -> tuple[bool, str | None]:
     if not ticket:
         return False, None
+    available_tool_set = set(available_tools or [])
     context_status = ticket.get("context_status") or {}
     hidden_context_remaining = bool(context_status.get("hidden_context_remaining"))
     investigation_required = bool(context_status.get("investigation_required"))
         tool_name
         for tool_name in context_status.get("recommended_tools", [])
         if tool_name not in used_tools
+        and (not available_tool_set or tool_name in available_tool_set)
     ]
     if hidden_context_remaining and recommended_tools:
         return True, recommended_tools[0]
         )
     for tool_name in preferred_tools:
+        if available_tool_set and tool_name not in available_tool_set:
+            continue
         if tool_name not in used_tools:
             return True, tool_name
     return False, None
+def choose_operational_action(
+    ticket: dict,
+    history: list[dict[str, Any]],
+    available_action_types: list[str] | None = None,
+) -> tuple[HelpdeskTicketAction | None, str | None]:
+    if not ticket:
+        return None, None
+    operational_context = ticket.get("operational_context") or {}
+    recommended_actions = list(operational_context.get("recommended_actions") or [])
+    available_action_set = set(available_action_types or [])
+    current_ticket_id = ticket.get("ticket_id")
+    prior_ticket_history = [
+        entry for entry in history if entry.get("ticket_id") == current_ticket_id
+    ]
+    used_action_types = {
+        entry.get("predicted", {}).get("action_type")
+        for entry in prior_ticket_history
+        if entry.get("predicted")
+    }
+    for action_name in ("open_incident", "request_info", "defer"):
+        if action_name not in recommended_actions:
+            continue
+        if available_action_set and action_name not in available_action_set:
+            continue
+        if action_name in used_action_types:
+            continue
+        if action_name == "defer" and ticket.get("tickets_after_current", 0) <= 0:
+            continue
+        return HelpdeskTicketAction(action_type=action_name), action_name
+    return None, None
 def merge_ticket_context(ticket: dict, observation: Any) -> dict:
     merged_ticket = dict(ticket)
     if getattr(observation, "last_tool_result", None) is not None:
     merged_ticket["recent_history"] = list(getattr(observation, "history", []))
     merged_ticket["queue_position"] = getattr(observation, "queue_position", None)
     merged_ticket["tickets_remaining"] = getattr(observation, "tickets_remaining", None)
+    merged_ticket["tickets_after_current"] = getattr(observation, "tickets_after_current", None)
     merged_ticket["investigation_budget_remaining"] = getattr(
         observation,
         "investigation_budget_remaining",
     )
     merged_ticket["average_score_so_far"] = getattr(observation, "average_score_so_far", None)
     merged_ticket["progress_fraction"] = getattr(observation, "progress_fraction", None)
+    merged_ticket["available_tools"] = list(getattr(observation, "available_tools", []) or [])
+    merged_ticket["available_action_types"] = list(
+        getattr(observation, "available_action_types", []) or []
+    )
     merged_ticket["last_reward_components"] = dict(
         getattr(observation, "last_reward_components", {}) or {}
     )
                     break
                 while getattr(obs, "investigation_budget_remaining", 0) > 0:
+                    investigate, tool_name = should_investigate(
+                        ticket,
+                        obs.history,
+                        list(getattr(obs, "available_tools", []) or []),
+                    )
                     if not investigate or tool_name is None:
                         break
                     tool_action = HelpdeskTicketAction(
                     break
                 ticket_with_context = merge_ticket_context(ticket, obs)
+                operational_action, operational_source = choose_operational_action(
+                    ticket_with_context,
+                    obs.history,
+                    list(getattr(obs, "available_action_types", []) or []),
+                )
+                if operational_action is not None and operational_source is not None:
+                    result = sync_client.step(operational_action)
+                    obs = result.observation
+                    step_num += 1
+                    reward = float(result.reward or 0.0)
+                    if result.reward is not None:
+                        task_step_rewards.append(reward)
+                    log_step(
+                        step=step_num,
+                        action=operational_action,
+                        reward=reward,
+                        done=bool(result.done),
+                        error=operational_source,
+                    )
+                    continue
                 action, action_source, fallback_reason = build_action(
                     ticket_with_context,
                     obs.allowed_fields,

models.py CHANGED Viewed

@@ -16,7 +16,13 @@ ISSUE_TYPE_SET = set(ISSUE_TYPES)
 PRIORITY_SET = set(PRIORITIES)
 ASSIGNMENT_GROUP_SET = set(ASSIGNMENT_GROUPS)
 RESOLUTION_ACTION_SET = set(RESOLUTION_ACTIONS)
-ACTION_TYPE_SET = {"submit", "investigate"}
 TOOL_NAME_SET = {"lookup_related_ticket", "lookup_requester_history"}
 TOOL_NAME_SET.add("lookup_internal_routing_note")
 TOOL_NAME_SET.add("lookup_queue_capacity_forecast")
@@ -54,6 +60,9 @@ class HelpdeskTicketRecord(BaseModel):
     alternate_assignment_group: Optional[str] = None
     alternate_resolution_action: Optional[str] = None
     alternate_route_score_multiplier: float = 0.0
     @field_validator("issue_type")
     @classmethod
@@ -203,4 +212,16 @@ class HelpdeskTicketState(State):
     escalation_slots_remaining: int = 0
     planning_penalty_total: float = 0.0
     capacity_pressure_tickets_resolved: int = 0
     history_entries: list[dict] = Field(default_factory=list)

 PRIORITY_SET = set(PRIORITIES)
 ASSIGNMENT_GROUP_SET = set(ASSIGNMENT_GROUPS)
 RESOLUTION_ACTION_SET = set(RESOLUTION_ACTIONS)
+ACTION_TYPE_SET = {
+    "submit",
+    "investigate",
+    "request_info",
+    "defer",
+    "open_incident",
+}
 TOOL_NAME_SET = {"lookup_related_ticket", "lookup_requester_history"}
 TOOL_NAME_SET.add("lookup_internal_routing_note")
 TOOL_NAME_SET.add("lookup_queue_capacity_forecast")
     alternate_assignment_group: Optional[str] = None
     alternate_resolution_action: Optional[str] = None
     alternate_route_score_multiplier: float = 0.0
+    customer_update_note: Optional[str] = None
+    incident_recommended: bool = False
+    generated_from_ticket_id: Optional[str] = None
     @field_validator("issue_type")
     @classmethod
     escalation_slots_remaining: int = 0
     planning_penalty_total: float = 0.0
     capacity_pressure_tickets_resolved: int = 0
+    ticket_request_info_usage: dict[str, int] = Field(default_factory=dict)
+    ticket_defer_counts: dict[str, int] = Field(default_factory=dict)
+    open_incident_ticket_ids: list[str] = Field(default_factory=list)
+    incident_slots_initial: int = 0
+    incident_slots_remaining: int = 0
+    incident_actions_used: int = 0
+    incident_gap_total: float = 0.0
+    deferred_ticket_count: int = 0
+    sla_breach_count: int = 0
+    spawned_follow_up_ticket_ids: list[str] = Field(default_factory=list)
+    spawned_follow_up_source_ids: list[str] = Field(default_factory=list)
+    dynamic_queue_events: list[dict[str, Any]] = Field(default_factory=list)
     history_entries: list[dict] = Field(default_factory=list)

policy_learning.py CHANGED Viewed

@@ -244,8 +244,10 @@ def _routing_text(ticket: dict[str, Any]) -> str:
         str(ticket.get("description", "")),
         str(ticket.get("ambiguity_note", "")),
         str(ticket.get("planning_note", "")),
         json.dumps(ticket.get("last_tool_result") or {}, sort_keys=True),
         json.dumps(ticket.get("routing_options") or [], sort_keys=True),
         json.dumps(ticket.get("capacity_state") or {}, sort_keys=True),
         json.dumps(ticket.get("future_queue_demand") or {}, sort_keys=True),
     ]
@@ -265,6 +267,7 @@ def infer_ticket_cue(ticket: dict[str, Any]) -> str:
     if (
         ticket.get("planning_note")
         or ticket.get("routing_options")
         or "lookup_queue_capacity_forecast"
         in (context_status.get("recommended_tools") or [])
         or any(
@@ -316,6 +319,8 @@ def infer_ticket_cue(ticket: dict[str, Any]) -> str:
         for phrase in ("still", "again", "overdue", "legal", "priority")
     ):
         return "history_pressure"
     return "generic_hidden_context"
@@ -397,17 +402,54 @@ def select_cue_based_tool(
     *,
     hidden_context_remaining: bool,
     used_tools: set[str],
 ) -> str | None:
     preferred_tools = preferred_tool_order(
         ticket,
         hidden_context_remaining=hidden_context_remaining,
     )
     for tool_name in preferred_tools:
         if tool_name not in used_tools:
             return tool_name
     return None
 def choose_policy_action(
     policy: PolicyConfig,
     observation: HelpdeskTicketObservation,
@@ -425,6 +467,7 @@ def choose_policy_action(
         used_tools = set(used_tools_by_ticket.get(ticket_id, set()))
     context_status = ticket.get("context_status") or {}
     hidden_context_remaining = bool(context_status.get("hidden_context_remaining"))
     if ticket_investigations < policy.max_investigations_per_ticket:
         if policy.strategy == "adaptive" and adaptive_bandit is not None and hidden_context_remaining:
@@ -434,11 +477,13 @@ def choose_policy_action(
                     ticket,
                     hidden_context_remaining=hidden_context_remaining,
                 )
-                if tool_name not in used_tools
             ]
             if not candidate_tools:
                 candidate_tools = [
-                    tool_name for tool_name in AVAILABLE_TOOLS if tool_name not in used_tools
                 ]
             if candidate_tools:
                 cue = infer_ticket_cue(ticket)
@@ -454,6 +499,7 @@ def choose_policy_action(
                 ticket,
                 hidden_context_remaining=hidden_context_remaining,
                 used_tools=used_tools,
             )
             if tool_name is not None:
                 return (
@@ -492,6 +538,14 @@ def choose_policy_action(
                 infer_ticket_cue(ticket),
             )
     return submit_builder(ticket, list(observation.allowed_fields)), "submit", None

         str(ticket.get("description", "")),
         str(ticket.get("ambiguity_note", "")),
         str(ticket.get("planning_note", "")),
+        str(ticket.get("customer_update_note", "")),
         json.dumps(ticket.get("last_tool_result") or {}, sort_keys=True),
         json.dumps(ticket.get("routing_options") or [], sort_keys=True),
+        json.dumps(ticket.get("operational_context") or {}, sort_keys=True),
         json.dumps(ticket.get("capacity_state") or {}, sort_keys=True),
         json.dumps(ticket.get("future_queue_demand") or {}, sort_keys=True),
     ]
     if (
         ticket.get("planning_note")
         or ticket.get("routing_options")
+        or (ticket.get("operational_context") or {}).get("incident_recommended")
         or "lookup_queue_capacity_forecast"
         in (context_status.get("recommended_tools") or [])
         or any(
         for phrase in ("still", "again", "overdue", "legal", "priority")
     ):
         return "history_pressure"
+    if any(phrase in text for phrase in ("incident", "outage", "lockout", "company-wide")):
+        return "incident_pressure"
     return "generic_hidden_context"
     *,
     hidden_context_remaining: bool,
     used_tools: set[str],
+    available_tools: set[str] | None = None,
 ) -> str | None:
     preferred_tools = preferred_tool_order(
         ticket,
         hidden_context_remaining=hidden_context_remaining,
     )
+    available_tool_set = set(available_tools or [])
     for tool_name in preferred_tools:
+        if available_tool_set and tool_name not in available_tool_set:
+            continue
         if tool_name not in used_tools:
             return tool_name
     return None
+def choose_operational_action(
+    ticket: dict[str, Any],
+    history: list[dict[str, Any]],
+    available_action_types: list[str] | None = None,
+) -> tuple[HelpdeskTicketAction | None, str | None]:
+    if not ticket:
+        return None, None
+    operational_context = ticket.get("operational_context") or {}
+    recommended_actions = list(operational_context.get("recommended_actions") or [])
+    available_action_set = set(available_action_types or [])
+    current_ticket_id = str(ticket.get("ticket_id", ""))
+    prior_ticket_history = [
+        entry for entry in history if entry.get("ticket_id") == current_ticket_id
+    ]
+    used_action_types = {
+        entry.get("predicted", {}).get("action_type")
+        for entry in prior_ticket_history
+        if entry.get("predicted")
+    }
+    for action_name in ("open_incident", "request_info", "defer"):
+        if action_name not in recommended_actions:
+            continue
+        if available_action_set and action_name not in available_action_set:
+            continue
+        if action_name in used_action_types:
+            continue
+        if action_name == "defer" and not ticket.get("tickets_after_current", 0):
+            continue
+        return HelpdeskTicketAction(action_type=action_name), action_name
+    return None, None
 def choose_policy_action(
     policy: PolicyConfig,
     observation: HelpdeskTicketObservation,
         used_tools = set(used_tools_by_ticket.get(ticket_id, set()))
     context_status = ticket.get("context_status") or {}
     hidden_context_remaining = bool(context_status.get("hidden_context_remaining"))
+    available_tools = set(getattr(observation, "available_tools", []) or [])
     if ticket_investigations < policy.max_investigations_per_ticket:
         if policy.strategy == "adaptive" and adaptive_bandit is not None and hidden_context_remaining:
                     ticket,
                     hidden_context_remaining=hidden_context_remaining,
                 )
+                if tool_name not in used_tools and tool_name in available_tools
             ]
             if not candidate_tools:
                 candidate_tools = [
+                    tool_name
+                    for tool_name in AVAILABLE_TOOLS
+                    if tool_name not in used_tools and tool_name in available_tools
                 ]
             if candidate_tools:
                 cue = infer_ticket_cue(ticket)
                 ticket,
                 hidden_context_remaining=hidden_context_remaining,
                 used_tools=used_tools,
+                available_tools=available_tools,
             )
             if tool_name is not None:
                 return (
                 infer_ticket_cue(ticket),
             )
+    operational_action, operational_source = choose_operational_action(
+        ticket,
+        list(getattr(observation, "history", []) or []),
+        list(getattr(observation, "available_action_types", []) or []),
+    )
+    if operational_action is not None and operational_source is not None:
+        return operational_action, operational_source, infer_ticket_cue(ticket)
     return submit_builder(ticket, list(observation.allowed_fields)), "submit", None

server/environment.py CHANGED Viewed

@@ -26,17 +26,37 @@ from vocabulary import (
 QUEUE_SIZE_RANGE = (3, 5)
-AVAILABLE_ACTION_TYPES = ("submit", "investigate")
-AVAILABLE_TOOLS = (
     "lookup_related_ticket",
     "lookup_requester_history",
     "lookup_internal_routing_note",
     "lookup_queue_capacity_forecast",
 )
 FREE_INVESTIGATIONS_PER_TICKET = 1
 EXTRA_INVESTIGATION_COST = 0.04
 MAX_EXTRA_INVESTIGATION_PENALTY = 0.25
 USEFUL_INVESTIGATION_REWARD = 0.03
 PREMATURE_SUBMIT_PENALTY = 0.22
 NONDEFAULT_HIDDEN_CONTEXT_PENALTY = 0.08
 CONTEXT_COMPLETION_BONUS = 0.06
@@ -49,6 +69,11 @@ TEAM_CAPACITY_OVERFLOW_PENALTY = 0.08
 HIGH_PRIORITY_SLOT_OVERFLOW_PENALTY = 0.06
 ESCALATION_SLOT_OVERFLOW_PENALTY = 0.05
 PLANNING_SUCCESS_BONUS = 0.05
 TASK3_INVESTIGATION_TOOL_PLAN: dict[str, tuple[str, ...]] = {
     "ticket-021": ("lookup_related_ticket", "lookup_requester_history"),
@@ -170,6 +195,7 @@ class HelpdeskTicketRoutingEnvironment(
             team_capacity_initial,
             high_priority_slots_initial,
             escalation_slots_initial,
         ) = self._initial_capacity_state_for_queue(task_id)
         self._state = HelpdeskTicketState(
@@ -193,8 +219,20 @@ class HelpdeskTicketRoutingEnvironment(
             high_priority_slots_remaining=high_priority_slots_initial,
             escalation_slots_initial=escalation_slots_initial,
             escalation_slots_remaining=escalation_slots_initial,
             planning_penalty_total=0.0,
             capacity_pressure_tickets_resolved=0,
         )
         return self._build_observation(task)
@@ -215,9 +253,19 @@ class HelpdeskTicketRoutingEnvironment(
         current_ticket = self._queue[idx]
         task_id = self._state.current_task_id
         task = get_task_definition(task_id)
         if action.action_type == "investigate":
             return self._handle_investigation_action(task, current_ticket, action, idx)
         submitted_fields = {
             f
@@ -317,6 +365,7 @@ class HelpdeskTicketRoutingEnvironment(
             action,
             task_id=task_id,
         )
         capacity_penalty, capacity_details = self._apply_capacity_usage(
             current_ticket,
             action,
@@ -353,7 +402,7 @@ class HelpdeskTicketRoutingEnvironment(
                 trajectory_reward - self._state.planning_penalty_total
             )
             final_reward = clamp_open_unit_interval(
-                rubric_reward - context_penalty - capacity_penalty
             )
             self._state.total_reward = rubric_reward
             investigation_penalty = self._compute_episode_penalty()
@@ -363,7 +412,31 @@ class HelpdeskTicketRoutingEnvironment(
             self._state.step_count += 1
             self._state.current_ticket_index += 1
             final_reward = clamp_open_unit_interval(
-                step_reward - context_penalty - capacity_penalty
             )
         reward_components = self._build_reward_components(
@@ -379,6 +452,7 @@ class HelpdeskTicketRoutingEnvironment(
                 "context_gap_penalty": context_penalty,
                 "context_completion_bonus": process_bonus,
                 "risk_penalty": risk_penalty,
                 "capacity_penalty": capacity_penalty,
                 "delta_adjustment": step_adjustments["delta_adjustment"],
                 "required_investigation_count": len(self._required_tools_for_ticket(current_ticket)),
@@ -391,6 +465,7 @@ class HelpdeskTicketRoutingEnvironment(
                 "planning_success_bonus": self._planning_success_bonus()
                 if is_done
                 else 0.0,
                 "rubric_reward": rubric_reward,
                 "trajectory_average_reward": (
                     trajectory_components["average_reward"]
@@ -457,6 +532,10 @@ class HelpdeskTicketRoutingEnvironment(
     def _apply_episode_economics(self, base_reward: float) -> float:
         penalty = self._compute_episode_penalty()
         return clamp_open_unit_interval(base_reward - penalty)
     def _current_average_score(self) -> float:
@@ -464,6 +543,17 @@ class HelpdeskTicketRoutingEnvironment(
             return 0.0
         return sum(self._state.per_ticket_scores) / len(self._state.per_ticket_scores)
     def _ticket_has_alternate_route(self, ticket: HelpdeskTicketRecord) -> bool:
         return any(
             value is not None
@@ -552,9 +642,9 @@ class HelpdeskTicketRoutingEnvironment(
     def _initial_capacity_state_for_queue(
         self,
         task_id: int,
-    ) -> tuple[dict[str, int], int, int]:
         if task_id != 3:
-            return {}, 0, 0
         primary_group_demand: dict[str, int] = {}
         alternate_relief_by_group: dict[str, int] = {}
@@ -563,6 +653,7 @@ class HelpdeskTicketRoutingEnvironment(
         high_priority_relief = 0
         escalation_demand = 0
         escalation_relief = 0
         for ticket in self._queue:
             primary_route = self._route_for_ticket(ticket)
@@ -574,6 +665,8 @@ class HelpdeskTicketRoutingEnvironment(
                 high_priority_demand += 1
             if primary_route["resolution_action"] in {"assign", "escalate"}:
                 escalation_demand += 1
             if self._ticket_has_alternate_route(ticket):
                 alternate_route = self._route_for_ticket(ticket, use_alternate=True)
@@ -622,10 +715,16 @@ class HelpdeskTicketRoutingEnvironment(
         else:
             escalation_slots_initial = escalation_demand
         return (
             team_capacity_initial,
             high_priority_slots_initial,
             escalation_slots_initial,
         )
     def _future_queue_demand(self) -> dict[str, Any]:
@@ -634,6 +733,7 @@ class HelpdeskTicketRoutingEnvironment(
         high_priority_needed = 0
         escalation_needed = 0
         capacity_sensitive_tickets = 0
         for ticket in future_tickets:
             route = self._route_for_ticket(ticket)
@@ -646,6 +746,8 @@ class HelpdeskTicketRoutingEnvironment(
                 escalation_needed += 1
             if self._ticket_has_alternate_route(ticket):
                 capacity_sensitive_tickets += 1
         return {
             "remaining_ticket_count": len(future_tickets),
@@ -653,6 +755,7 @@ class HelpdeskTicketRoutingEnvironment(
             "high_priority_needed": high_priority_needed,
             "escalation_needed": escalation_needed,
             "capacity_sensitive_tickets": capacity_sensitive_tickets,
         }
     def _capacity_state_snapshot(self) -> dict[str, Any]:
@@ -663,6 +766,8 @@ class HelpdeskTicketRoutingEnvironment(
             "high_priority_slots_initial": self._state.high_priority_slots_initial,
             "escalation_slots_remaining": self._state.escalation_slots_remaining,
             "escalation_slots_initial": self._state.escalation_slots_initial,
         }
     def _planning_route_recommendation(self, ticket: HelpdeskTicketRecord) -> dict[str, Any]:
@@ -897,8 +1002,191 @@ class HelpdeskTicketRoutingEnvironment(
             )
         )
     def _ticket_repeated_requester_count(self, ticket: HelpdeskTicketRecord) -> int:
-        return sum(1 for candidate in self._dataset if candidate.requester == ticket.requester)
     def _tool_has_available_context(
         self,
@@ -927,7 +1215,7 @@ class HelpdeskTicketRoutingEnvironment(
         task_id: int | None = None,
     ) -> list[str]:
         resolved_task_id = self._state.current_task_id if task_id is None else task_id
-        if resolved_task_id != 3:
             return []
         required_tools: list[str] = list(TASK3_INVESTIGATION_TOOL_PLAN.get(ticket.ticket_id, ()))
         if ticket.related_ticket_id is not None and "lookup_related_ticket" not in required_tools:
@@ -949,18 +1237,51 @@ class HelpdeskTicketRoutingEnvironment(
         ):
             required_tools.append("lookup_requester_history")
         if (
-            self._ticket_is_capacity_sensitive(ticket)
             and "lookup_queue_capacity_forecast" not in required_tools
         ):
             required_tools.append("lookup_queue_capacity_forecast")
         filtered_required_tools: list[str] = []
         for tool_name in required_tools:
             if tool_name in filtered_required_tools:
                 continue
             if self._tool_has_available_context(ticket, tool_name):
                 filtered_required_tools.append(tool_name)
         return filtered_required_tools
     def _used_tools_for_ticket(self, ticket_id: str) -> list[str]:
         return list(self._state.ticket_tool_usage.get(ticket_id, []))
@@ -983,6 +1304,8 @@ class HelpdeskTicketRoutingEnvironment(
         revealed_tools = self._used_tools_for_ticket(ticket.ticket_id)
         remaining_tools = self._remaining_tools_for_ticket(ticket)
         total_required = max(1, len(required_tools))
         return {
             "required_tools": required_tools,
             "revealed_tools": revealed_tools,
@@ -990,6 +1313,8 @@ class HelpdeskTicketRoutingEnvironment(
             "revealed_count": len(revealed_tools),
             "remaining_count": len(remaining_tools),
             "completeness": round(len(revealed_tools) / total_required, 2),
         }
     def _default_redacted_description(self, ticket: HelpdeskTicketRecord) -> str:
@@ -1028,7 +1353,7 @@ class HelpdeskTicketRoutingEnvironment(
         return "Helpdesk routing decision"
     def _visible_title(self, ticket: HelpdeskTicketRecord) -> str:
-        if self._state.current_task_id == 3 and self._remaining_tools_for_ticket(ticket):
             return HARD_TASK_TITLE_REDACTIONS.get(
                 ticket.ticket_id,
                 self._default_redacted_title(ticket),
@@ -1036,7 +1361,7 @@ class HelpdeskTicketRoutingEnvironment(
         return ticket.title
     def _visible_description(self, ticket: HelpdeskTicketRecord) -> str:
-        if self._state.current_task_id == 3 and self._remaining_tools_for_ticket(ticket):
             return HARD_TASK_DESCRIPTION_REDACTIONS.get(
                 ticket.ticket_id,
                 self._default_redacted_description(ticket),
@@ -1122,6 +1447,21 @@ class HelpdeskTicketRoutingEnvironment(
         return round(priority_penalty + resolution_penalty, 4)
     def _build_reward_components(
         self,
         *,
@@ -1198,7 +1538,7 @@ class HelpdeskTicketRoutingEnvironment(
                 "assignment_group": ticket.assignment_group,
                 "resolution_action": ticket.resolution_action,
             }
-            for ticket in self._dataset
             if ticket.requester == current_ticket.requester
             and ticket.ticket_id != current_ticket.ticket_id
         ]
@@ -1235,6 +1575,7 @@ class HelpdeskTicketRoutingEnvironment(
             "capacity_state": recommendation["capacity_state"],
             "future_queue_demand": recommendation["future_demand"],
             "routing_options": routing_options,
         }
     def _run_investigation_tool(
@@ -1262,6 +1603,8 @@ class HelpdeskTicketRoutingEnvironment(
     ) -> HelpdeskTicketObservation:
         if action.tool_name is None:
             raise ValueError("Investigate actions require tool_name")
         submitted_fields = {
             field
             for field in ("issue_type", "priority", "assignment_group", "resolution_action")
@@ -1332,10 +1675,279 @@ class HelpdeskTicketRoutingEnvironment(
         self._state.last_reward_components = reward_components
         return self._build_observation(task, done=False, reward=investigation_reward)
     def _build_ticket_view(self, ticket: HelpdeskTicketRecord) -> dict[str, Any]:
         progress = self._tool_progress_for_ticket(ticket)
         remaining_tools = progress["remaining_tools"]
         used_tools = set(self._used_tools_for_ticket(ticket.ticket_id))
         ticket_view: dict[str, Any] = {
             "ticket_id": ticket.ticket_id,
             "title": self._visible_title(ticket),
@@ -1354,6 +1966,14 @@ class HelpdeskTicketRoutingEnvironment(
                 "investigations_used_for_ticket": progress["revealed_count"],
                 "recommended_tools": list(remaining_tools),
             }
         if ticket.ambiguity_note is not None and "lookup_internal_routing_note" not in remaining_tools:
             ticket_view["ambiguity_note"] = ticket.ambiguity_note
         if (
@@ -1361,6 +1981,8 @@ class HelpdeskTicketRoutingEnvironment(
             and "lookup_internal_routing_note" not in remaining_tools
         ):
             ticket_view["planning_note"] = ticket.planning_note
         if ticket.related_ticket_id is not None and "lookup_related_ticket" not in remaining_tools:
             ticket_view["related_ticket_id"] = ticket.related_ticket_id
             related_ticket = self._tickets_by_id.get(ticket.related_ticket_id)
@@ -1376,6 +1998,8 @@ class HelpdeskTicketRoutingEnvironment(
             or "lookup_queue_capacity_forecast" in used_tools
         ):
             ticket_view["routing_options"] = self._routing_options_for_ticket(ticket)
         return ticket_view
     def _build_feedback_summary(
@@ -1398,6 +2022,13 @@ class HelpdeskTicketRoutingEnvironment(
             parts.append(f"Investigation step used {tool_name or 'a tool'}")
             if reward_components and reward_components.get("new_context_revealed"):
                 parts.append("new context was revealed")
         elif penalty_reason is not None:
             parts.append(f"Penalty applied: {penalty_reason}")
         else:
@@ -1435,6 +2066,12 @@ class HelpdeskTicketRoutingEnvironment(
             planning_penalty_total = reward_components.get("planning_penalty_total")
             if planning_penalty_total:
                 parts.append(f"planning_penalty_total={planning_penalty_total:.2f}")
         return "; ".join(parts)
@@ -1463,6 +2100,12 @@ class HelpdeskTicketRoutingEnvironment(
             "score": score,
             "breakdown": breakdown,
             "queue_position": queue_position,
         }
         if self._state.current_task_id == 3:
             history_entry["capacity_state"] = self._capacity_state_snapshot()
@@ -1479,6 +2122,8 @@ class HelpdeskTicketRoutingEnvironment(
             and "lookup_internal_routing_note" not in remaining_tools
         ):
             history_entry["planning_note"] = ticket.planning_note
         if ticket.related_ticket_id is not None and "lookup_related_ticket" not in remaining_tools:
             history_entry["related_ticket_id"] = ticket.related_ticket_id
             related_ticket = self._tickets_by_id.get(ticket.related_ticket_id)
@@ -1503,6 +2148,8 @@ class HelpdeskTicketRoutingEnvironment(
             history_entry["tool_result"] = tool_result
         if reward_components is not None:
             history_entry["reward_components"] = reward_components
         if progress["required_tools"]:
             history_entry["context_progress"] = {
                 "hidden_context_remaining": bool(progress["remaining_count"]),
@@ -1562,12 +2209,15 @@ class HelpdeskTicketRoutingEnvironment(
                 and (ticket_view.get("context_status") or {}).get("hidden_context_remaining")
             ),
             "action_mode": "investigate_or_submit",
-            "available_action_types": list(AVAILABLE_ACTION_TYPES),
             "average_score_so_far": self._state.average_score_so_far,
             "progress_fraction": progress_fraction,
             "investigation_penalty_applied": self._state.investigation_penalty_applied,
             "planning_penalty_total": self._state.planning_penalty_total,
             "planning_penalty_applied": self._state.planning_penalty_applied,
         }
         if self._state.current_task_id == 3:
             metadata["capacity_state"] = self._capacity_state_snapshot()
@@ -1591,8 +2241,8 @@ class HelpdeskTicketRoutingEnvironment(
             task_name=task["name"],
             instructions=task["instructions"],
             allowed_fields=list(task["allowed_fields"]),
-            available_action_types=list(AVAILABLE_ACTION_TYPES),
-            available_tools=list(AVAILABLE_TOOLS),
             investigation_budget_remaining=self._state.investigation_budget_remaining,
             last_tool_result=self._state.last_tool_result,
             current_ticket=ticket_view,

 QUEUE_SIZE_RANGE = (3, 5)
+BASE_AVAILABLE_TOOLS = (
     "lookup_related_ticket",
     "lookup_requester_history",
     "lookup_internal_routing_note",
     "lookup_queue_capacity_forecast",
 )
+TASK_AVAILABLE_ACTION_TYPES: dict[int, tuple[str, ...]] = {
+    1: ("submit", "investigate"),
+    2: ("submit", "investigate", "request_info"),
+    3: ("submit", "investigate", "request_info", "defer", "open_incident"),
+}
+TASK_AVAILABLE_TOOLS: dict[int, tuple[str, ...]] = {
+    1: (
+        "lookup_related_ticket",
+        "lookup_requester_history",
+        "lookup_internal_routing_note",
+    ),
+    2: (
+        "lookup_related_ticket",
+        "lookup_requester_history",
+        "lookup_internal_routing_note",
+    ),
+    3: BASE_AVAILABLE_TOOLS,
+}
 FREE_INVESTIGATIONS_PER_TICKET = 1
 EXTRA_INVESTIGATION_COST = 0.04
 MAX_EXTRA_INVESTIGATION_PENALTY = 0.25
 USEFUL_INVESTIGATION_REWARD = 0.03
+USEFUL_REQUEST_INFO_REWARD = 0.025
+INCIDENT_OPEN_REWARD = 0.03
+REQUEST_INFO_CONTEXT_COMPLETION_BONUS = 0.02
 PREMATURE_SUBMIT_PENALTY = 0.22
 NONDEFAULT_HIDDEN_CONTEXT_PENALTY = 0.08
 CONTEXT_COMPLETION_BONUS = 0.06
 HIGH_PRIORITY_SLOT_OVERFLOW_PENALTY = 0.06
 ESCALATION_SLOT_OVERFLOW_PENALTY = 0.05
 PLANNING_SUCCESS_BONUS = 0.05
+INCIDENT_SLOT_OVERFLOW_PENALTY = 0.05
+INCIDENT_GAP_PENALTY = 0.07
+SLA_BREACH_PENALTY = 0.04
+FOLLOW_UP_SPAWN_THRESHOLD = 0.72
+MAX_DEFERS_PER_TICKET = 1
 TASK3_INVESTIGATION_TOOL_PLAN: dict[str, tuple[str, ...]] = {
     "ticket-021": ("lookup_related_ticket", "lookup_requester_history"),
             team_capacity_initial,
             high_priority_slots_initial,
             escalation_slots_initial,
+            incident_slots_initial,
         ) = self._initial_capacity_state_for_queue(task_id)
         self._state = HelpdeskTicketState(
             high_priority_slots_remaining=high_priority_slots_initial,
             escalation_slots_initial=escalation_slots_initial,
             escalation_slots_remaining=escalation_slots_initial,
+            incident_slots_initial=incident_slots_initial,
+            incident_slots_remaining=incident_slots_initial,
             planning_penalty_total=0.0,
             capacity_pressure_tickets_resolved=0,
+            ticket_request_info_usage={},
+            ticket_defer_counts={},
+            open_incident_ticket_ids=[],
+            incident_actions_used=0,
+            incident_gap_total=0.0,
+            deferred_ticket_count=0,
+            sla_breach_count=0,
+            spawned_follow_up_ticket_ids=[],
+            spawned_follow_up_source_ids=[],
+            dynamic_queue_events=[],
         )
         return self._build_observation(task)
         current_ticket = self._queue[idx]
         task_id = self._state.current_task_id
         task = get_task_definition(task_id)
+        if action.action_type not in self._available_action_types_for_task(task_id):
+            raise ValueError(
+                f"Unsupported action_type {action.action_type!r} for task {task_id}"
+            )
         if action.action_type == "investigate":
             return self._handle_investigation_action(task, current_ticket, action, idx)
+        if action.action_type == "request_info":
+            return self._handle_request_info_action(task, current_ticket, action, idx)
+        if action.action_type == "defer":
+            return self._handle_defer_action(task, current_ticket, action, idx)
+        if action.action_type == "open_incident":
+            return self._handle_open_incident_action(task, current_ticket, action, idx)
         submitted_fields = {
             f
             action,
             task_id=task_id,
         )
+        incident_gap_penalty = self._incident_gap_penalty(current_ticket, action)
         capacity_penalty, capacity_details = self._apply_capacity_usage(
             current_ticket,
             action,
                 trajectory_reward - self._state.planning_penalty_total
             )
             final_reward = clamp_open_unit_interval(
+                rubric_reward - context_penalty - capacity_penalty - incident_gap_penalty
             )
             self._state.total_reward = rubric_reward
             investigation_penalty = self._compute_episode_penalty()
             self._state.step_count += 1
             self._state.current_ticket_index += 1
             final_reward = clamp_open_unit_interval(
+                step_reward - context_penalty - capacity_penalty - incident_gap_penalty
+            )
+        spawned_follow_up_ticket_id = None
+        if self._should_spawn_follow_up(
+            current_ticket,
+            score=score,
+            context_penalty=context_penalty,
+            incident_gap_penalty=incident_gap_penalty,
+        ):
+            spawned_follow_up = self._spawn_follow_up_ticket(current_ticket)
+            spawned_follow_up_ticket_id = spawned_follow_up.ticket_id
+            if is_done:
+                is_done = False
+                trajectory_reward = None
+                trajectory_components = None
+                rubric_reward = None
+                final_reward = clamp_open_unit_interval(
+                    step_reward - context_penalty - capacity_penalty - incident_gap_penalty
+                )
+                self._state.total_reward = 0.0
+        if incident_gap_penalty > 0.0:
+            self._state.incident_gap_total = round(
+                self._state.incident_gap_total + incident_gap_penalty,
+                4,
             )
         reward_components = self._build_reward_components(
                 "context_gap_penalty": context_penalty,
                 "context_completion_bonus": process_bonus,
                 "risk_penalty": risk_penalty,
+                "incident_gap_penalty": incident_gap_penalty,
                 "capacity_penalty": capacity_penalty,
                 "delta_adjustment": step_adjustments["delta_adjustment"],
                 "required_investigation_count": len(self._required_tools_for_ticket(current_ticket)),
                 "planning_success_bonus": self._planning_success_bonus()
                 if is_done
                 else 0.0,
+                "spawned_follow_up_ticket_id": spawned_follow_up_ticket_id,
                 "rubric_reward": rubric_reward,
                 "trajectory_average_reward": (
                     trajectory_components["average_reward"]
     def _apply_episode_economics(self, base_reward: float) -> float:
         penalty = self._compute_episode_penalty()
+        penalty += min(
+            0.25,
+            self._state.sla_breach_count * SLA_BREACH_PENALTY + self._state.incident_gap_total,
+        )
         return clamp_open_unit_interval(base_reward - penalty)
     def _current_average_score(self) -> float:
             return 0.0
         return sum(self._state.per_ticket_scores) / len(self._state.per_ticket_scores)
+    def _available_action_types_for_task(self, task_id: int | None = None) -> list[str]:
+        resolved_task_id = self._state.current_task_id if task_id is None else task_id
+        return list(TASK_AVAILABLE_ACTION_TYPES.get(int(resolved_task_id or 1), ("submit",)))
+    def _available_tools_for_task(self, task_id: int | None = None) -> list[str]:
+        resolved_task_id = self._state.current_task_id if task_id is None else task_id
+        return list(TASK_AVAILABLE_TOOLS.get(int(resolved_task_id or 1), ()))
+    def _sync_queue_ticket_ids(self) -> None:
+        self._state.queue_ticket_ids = [ticket.ticket_id for ticket in self._queue]
     def _ticket_has_alternate_route(self, ticket: HelpdeskTicketRecord) -> bool:
         return any(
             value is not None
     def _initial_capacity_state_for_queue(
         self,
         task_id: int,
+    ) -> tuple[dict[str, int], int, int, int]:
         if task_id != 3:
+            return {}, 0, 0, 0
         primary_group_demand: dict[str, int] = {}
         alternate_relief_by_group: dict[str, int] = {}
         high_priority_relief = 0
         escalation_demand = 0
         escalation_relief = 0
+        incident_demand = 0
         for ticket in self._queue:
             primary_route = self._route_for_ticket(ticket)
                 high_priority_demand += 1
             if primary_route["resolution_action"] in {"assign", "escalate"}:
                 escalation_demand += 1
+            if self._requires_incident(ticket):
+                incident_demand += 1
             if self._ticket_has_alternate_route(ticket):
                 alternate_route = self._route_for_ticket(ticket, use_alternate=True)
         else:
             escalation_slots_initial = escalation_demand
+        if incident_demand <= 1:
+            incident_slots_initial = incident_demand
+        else:
+            incident_slots_initial = max(1, incident_demand - 1)
         return (
             team_capacity_initial,
             high_priority_slots_initial,
             escalation_slots_initial,
+            incident_slots_initial,
         )
     def _future_queue_demand(self) -> dict[str, Any]:
         high_priority_needed = 0
         escalation_needed = 0
         capacity_sensitive_tickets = 0
+        incident_needed = 0
         for ticket in future_tickets:
             route = self._route_for_ticket(ticket)
                 escalation_needed += 1
             if self._ticket_has_alternate_route(ticket):
                 capacity_sensitive_tickets += 1
+            if self._requires_incident(ticket):
+                incident_needed += 1
         return {
             "remaining_ticket_count": len(future_tickets),
             "high_priority_needed": high_priority_needed,
             "escalation_needed": escalation_needed,
             "capacity_sensitive_tickets": capacity_sensitive_tickets,
+            "incident_needed": incident_needed,
         }
     def _capacity_state_snapshot(self) -> dict[str, Any]:
             "high_priority_slots_initial": self._state.high_priority_slots_initial,
             "escalation_slots_remaining": self._state.escalation_slots_remaining,
             "escalation_slots_initial": self._state.escalation_slots_initial,
+            "incident_slots_remaining": self._state.incident_slots_remaining,
+            "incident_slots_initial": self._state.incident_slots_initial,
         }
     def _planning_route_recommendation(self, ticket: HelpdeskTicketRecord) -> dict[str, Any]:
             )
         )
+    def _ticket_text(self, ticket: HelpdeskTicketRecord) -> str:
+        return f"{ticket.title} {ticket.description}".lower()
+    def _requires_incident(self, ticket: HelpdeskTicketRecord) -> bool:
+        if ticket.incident_recommended:
+            return True
+        text = self._ticket_text(ticket)
+        return (
+            ticket.priority in {"high", "critical"}
+            and ticket.issue_type
+            in {"application_support", "identity_access", "security_compliance"}
+            and any(
+                phrase in text
+                for phrase in (
+                    "outage",
+                    "cannot log in",
+                    "login",
+                    "regression",
+                    "unstable",
+                    "blocked",
+                    "lockout",
+                    "company-wide",
+                    "production",
+                    "unresolved",
+                )
+            )
+        )
+    def _incident_open_for_ticket(self, ticket: HelpdeskTicketRecord) -> bool:
+        related_ids = {ticket.ticket_id}
+        if ticket.related_ticket_id:
+            related_ids.add(ticket.related_ticket_id)
+        if ticket.generated_from_ticket_id:
+            related_ids.add(ticket.generated_from_ticket_id)
+        return any(ticket_id in self._state.open_incident_ticket_ids for ticket_id in related_ids)
+    def _request_info_note_for_ticket(self, ticket: HelpdeskTicketRecord) -> str | None:
+        note_parts: list[str] = []
+        if ticket.customer_update_note:
+            note_parts.append(ticket.customer_update_note)
+        if ticket.related_ticket_id is not None:
+            note_parts.append(
+                "The requester confirmed this is connected to the earlier case and wants a single accountable owner."
+            )
+        if self._ticket_has_nondefault_routing(ticket):
+            note_parts.append(
+                "The requester clarified that the blocker owner matters more than the superficial request label."
+            )
+        if self._ticket_has_alternate_route(ticket):
+            note_parts.append(
+                "Operations said an acknowledged fallback path is acceptable if the preferred queue is saturated."
+            )
+        if self._requires_incident(ticket):
+            note_parts.append(
+                "Stakeholders asked for incident-style coordination because the issue is still operationally active."
+            )
+        if not note_parts:
+            return None
+        return " ".join(note_parts)
+    def _request_info_used(self, ticket_id: str) -> bool:
+        return self._state.ticket_request_info_usage.get(ticket_id, 0) > 0
+    def _defer_count(self, ticket_id: str) -> int:
+        return self._state.ticket_defer_counts.get(ticket_id, 0)
+    def _record_dynamic_queue_event(self, event_type: str, **details: Any) -> None:
+        self._state.dynamic_queue_events.append({"event_type": event_type, **details})
+    def _escalate_priority_level(self, priority: str) -> str:
+        if priority == "low":
+            return "medium"
+        if priority == "medium":
+            return "high"
+        return "critical"
+    def _escalate_ticket_after_delay(
+        self,
+        ticket: HelpdeskTicketRecord,
+        *,
+        defer_count: int,
+    ) -> HelpdeskTicketRecord:
+        escalated_priority = self._escalate_priority_level(ticket.priority)
+        description_suffix = (
+            " The ticket was deferred earlier in the queue and now needs firmer ownership."
+        )
+        customer_update = (
+            ticket.customer_update_note
+            or "The requester followed up after the delay and wants a committed owner."
+        )
+        return ticket.model_copy(
+            update={
+                "priority": escalated_priority,
+                "title": (
+                    ticket.title
+                    if ticket.title.lower().startswith("re:")
+                    else f"Re: {ticket.title}"
+                ),
+                "description": f"{ticket.description}{description_suffix}",
+                "customer_update_note": customer_update,
+            }
+        )
+    def _should_spawn_follow_up(
+        self,
+        ticket: HelpdeskTicketRecord,
+        *,
+        score: float,
+        context_penalty: float,
+        incident_gap_penalty: float,
+    ) -> bool:
+        if self._state.current_task_id != 3:
+            return False
+        if ticket.generated_from_ticket_id is not None:
+            return False
+        if ticket.ticket_id in self._state.spawned_follow_up_source_ids:
+            return False
+        if not (
+            self._requires_incident(ticket)
+            or self._ticket_mentions_follow_up(ticket)
+            or ticket.related_ticket_id is not None
+            or ticket.priority in {"high", "critical"}
+        ):
+            return False
+        return (
+            score < FOLLOW_UP_SPAWN_THRESHOLD
+            or (context_penalty >= 0.15 and score < 0.9)
+            or incident_gap_penalty > 0.0
+        )
+    def _spawn_follow_up_ticket(self, ticket: HelpdeskTicketRecord) -> HelpdeskTicketRecord:
+        follow_up_ticket = HelpdeskTicketRecord(
+            ticket_id=f"{ticket.ticket_id}-followup",
+            title=(
+                ticket.title
+                if ticket.title.lower().startswith("re:")
+                else f"Re: {ticket.title}"
+            ),
+            requester=ticket.requester,
+            description=(
+                "The earlier handling did not fully resolve the issue. The requester is "
+                f"following up on {ticket.ticket_id} and needs a single accountable owner now."
+            ),
+            issue_type=ticket.issue_type,
+            priority=(
+                "critical"
+                if ticket.priority in {"high", "critical"}
+                else self._escalate_priority_level(ticket.priority)
+            ),
+            assignment_group=ticket.assignment_group,
+            resolution_action=(
+                "escalate"
+                if ticket.priority in {"high", "critical"} or self._requires_incident(ticket)
+                else ticket.resolution_action
+            ),
+            ambiguity_note=(
+                ticket.ambiguity_note
+                or "Prior routing did not settle ownership; route to the team that can actually unblock the issue."
+            ),
+            related_ticket_id=ticket.ticket_id,
+            planning_note=ticket.planning_note,
+            customer_update_note=(
+                "The requester said the last response did not resolve the blocker and wants an accountable next owner."
+            ),
+            incident_recommended=self._requires_incident(ticket),
+            generated_from_ticket_id=ticket.ticket_id,
+        )
+        self._queue.append(follow_up_ticket)
+        self._tickets_by_id[follow_up_ticket.ticket_id] = follow_up_ticket
+        self._sync_queue_ticket_ids()
+        self._state.spawned_follow_up_ticket_ids.append(follow_up_ticket.ticket_id)
+        self._state.spawned_follow_up_source_ids.append(ticket.ticket_id)
+        self._record_dynamic_queue_event(
+            "spawn_follow_up",
+            source_ticket_id=ticket.ticket_id,
+            follow_up_ticket_id=follow_up_ticket.ticket_id,
+        )
+        return follow_up_ticket
     def _ticket_repeated_requester_count(self, ticket: HelpdeskTicketRecord) -> int:
+        return sum(
+            1
+            for candidate in self._tickets_by_id.values()
+            if candidate.requester == ticket.requester
+        )
     def _tool_has_available_context(
         self,
         task_id: int | None = None,
     ) -> list[str]:
         resolved_task_id = self._state.current_task_id if task_id is None else task_id
+        if resolved_task_id is None or resolved_task_id < 2:
             return []
         required_tools: list[str] = list(TASK3_INVESTIGATION_TOOL_PLAN.get(ticket.ticket_id, ()))
         if ticket.related_ticket_id is not None and "lookup_related_ticket" not in required_tools:
         ):
             required_tools.append("lookup_requester_history")
         if (
+            resolved_task_id == 3
+            and self._ticket_is_capacity_sensitive(ticket)
             and "lookup_queue_capacity_forecast" not in required_tools
         ):
             required_tools.append("lookup_queue_capacity_forecast")
         filtered_required_tools: list[str] = []
+        allowed_tool_set = set(self._available_tools_for_task(resolved_task_id))
         for tool_name in required_tools:
             if tool_name in filtered_required_tools:
                 continue
+            if tool_name not in allowed_tool_set:
+                continue
             if self._tool_has_available_context(ticket, tool_name):
                 filtered_required_tools.append(tool_name)
         return filtered_required_tools
+    def _recommended_operational_actions(self, ticket: HelpdeskTicketRecord) -> list[str]:
+        recommended_actions: list[str] = []
+        available_action_types = set(self._available_action_types_for_task())
+        if (
+            "request_info" in available_action_types
+            and self._request_info_note_for_ticket(ticket) is not None
+            and not self._request_info_used(ticket.ticket_id)
+        ):
+            recommended_actions.append("request_info")
+        if (
+            "open_incident" in available_action_types
+            and self._requires_incident(ticket)
+            and not self._incident_open_for_ticket(ticket)
+        ):
+            recommended_actions.append("open_incident")
+        if (
+            "defer" in available_action_types
+            and self._defer_count(ticket.ticket_id) < MAX_DEFERS_PER_TICKET
+            and self._state.current_ticket_index < len(self._queue) - 1
+            and ticket.priority not in {"high", "critical"}
+            and (
+                bool(self._remaining_tools_for_ticket(ticket))
+                or self._ticket_is_capacity_sensitive(ticket)
+                or self._request_info_note_for_ticket(ticket) is not None
+            )
+        ):
+            recommended_actions.append("defer")
+        return recommended_actions
     def _used_tools_for_ticket(self, ticket_id: str) -> list[str]:
         return list(self._state.ticket_tool_usage.get(ticket_id, []))
         revealed_tools = self._used_tools_for_ticket(ticket.ticket_id)
         remaining_tools = self._remaining_tools_for_ticket(ticket)
         total_required = max(1, len(required_tools))
+        request_info_used = self._request_info_used(ticket.ticket_id)
+        operational_actions = self._recommended_operational_actions(ticket)
         return {
             "required_tools": required_tools,
             "revealed_tools": revealed_tools,
             "revealed_count": len(revealed_tools),
             "remaining_count": len(remaining_tools),
             "completeness": round(len(revealed_tools) / total_required, 2),
+            "request_info_used": request_info_used,
+            "recommended_operational_actions": operational_actions,
         }
     def _default_redacted_description(self, ticket: HelpdeskTicketRecord) -> str:
         return "Helpdesk routing decision"
     def _visible_title(self, ticket: HelpdeskTicketRecord) -> str:
+        if self._state.current_task_id in {2, 3} and self._remaining_tools_for_ticket(ticket):
             return HARD_TASK_TITLE_REDACTIONS.get(
                 ticket.ticket_id,
                 self._default_redacted_title(ticket),
         return ticket.title
     def _visible_description(self, ticket: HelpdeskTicketRecord) -> str:
+        if self._state.current_task_id in {2, 3} and self._remaining_tools_for_ticket(ticket):
             return HARD_TASK_DESCRIPTION_REDACTIONS.get(
                 ticket.ticket_id,
                 self._default_redacted_description(ticket),
         return round(priority_penalty + resolution_penalty, 4)
+    def _incident_gap_penalty(
+        self,
+        ticket: HelpdeskTicketRecord,
+        action: HelpdeskTicketAction,
+    ) -> float:
+        if self._state.current_task_id != 3:
+            return 0.0
+        if not self._requires_incident(ticket):
+            return 0.0
+        if self._incident_open_for_ticket(ticket):
+            return 0.0
+        if action.resolution_action in {"escalate", "assign"}:
+            return round(INCIDENT_GAP_PENALTY / 2, 4)
+        return INCIDENT_GAP_PENALTY
     def _build_reward_components(
         self,
         *,
                 "assignment_group": ticket.assignment_group,
                 "resolution_action": ticket.resolution_action,
             }
+            for ticket in self._tickets_by_id.values()
             if ticket.requester == current_ticket.requester
             and ticket.ticket_id != current_ticket.ticket_id
         ]
             "capacity_state": recommendation["capacity_state"],
             "future_queue_demand": recommendation["future_demand"],
             "routing_options": routing_options,
+            "incident_recommended": self._requires_incident(current_ticket),
         }
     def _run_investigation_tool(
     ) -> HelpdeskTicketObservation:
         if action.tool_name is None:
             raise ValueError("Investigate actions require tool_name")
+        if action.tool_name not in self._available_tools_for_task():
+            raise ValueError(f"Unsupported tool_name for current task: {action.tool_name}")
         submitted_fields = {
             field
             for field in ("issue_type", "priority", "assignment_group", "resolution_action")
         self._state.last_reward_components = reward_components
         return self._build_observation(task, done=False, reward=investigation_reward)
+    def _handle_request_info_action(
+        self,
+        task: dict,
+        current_ticket: HelpdeskTicketRecord,
+        action: HelpdeskTicketAction,
+        idx: int,
+    ) -> HelpdeskTicketObservation:
+        submitted_fields = {
+            field
+            for field in ("issue_type", "priority", "assignment_group", "resolution_action")
+            if getattr(action, field) is not None
+        }
+        if submitted_fields:
+            raise ValueError(
+                "request_info actions cannot include submit fields: "
+                f"{sorted(submitted_fields)}"
+            )
+        ticket_id = current_ticket.ticket_id
+        note = self._request_info_note_for_ticket(current_ticket)
+        already_used = self._request_info_used(ticket_id)
+        useful_request = note is not None and not already_used
+        self._state.ticket_request_info_usage[ticket_id] = (
+            self._state.ticket_request_info_usage.get(ticket_id, 0) + 1
+        )
+        self._state.step_count += 1
+        self._state.investigation_steps += 1
+        self._state.investigation_budget_remaining = max(
+            0,
+            self._state.investigation_budget_remaining - 1,
+        )
+        request_reward = USEFUL_REQUEST_INFO_REWARD if useful_request else 0.0
+        tool_result = {
+            "action_type": "request_info",
+            "found": useful_request,
+            "ticket_id": ticket_id,
+            "customer_update_note": note if useful_request else "",
+        }
+        self._state.last_tool_result = tool_result
+        self._state.last_step_reward = request_reward
+        self._state.reward = request_reward
+        self._state.done = False
+        self._state.investigation_penalty_applied = self._compute_episode_penalty()
+        progress = self._tool_progress_for_ticket(current_ticket)
+        reward_components = self._build_reward_components(
+            ticket_score=0.0,
+            field_breakdown={},
+            shaped_step_reward=request_reward,
+            reward_kind="operational",
+            final_reward=request_reward,
+            investigation_penalty=self._state.investigation_penalty_applied,
+            extra_details={
+                "operational_action": "request_info",
+                "new_context_revealed": useful_request,
+                "customer_update_visible": useful_request,
+                "hidden_context_remaining_count": progress["remaining_count"],
+                "context_completeness": progress["completeness"],
+            },
+        )
+        self._state.history_entries.append(
+            self._build_history_entry(
+                current_ticket,
+                predicted=action.model_dump(exclude_none=True),
+                score=0.0,
+                breakdown={},
+                queue_position=idx + 1,
+                reward=request_reward,
+                reward_kind="operational",
+                tool_result=tool_result,
+                reward_components=reward_components,
+            )
+        )
+        self._state.last_reward_components = reward_components
+        return self._build_observation(task, done=False, reward=request_reward)
+    def _handle_defer_action(
+        self,
+        task: dict,
+        current_ticket: HelpdeskTicketRecord,
+        action: HelpdeskTicketAction,
+        idx: int,
+    ) -> HelpdeskTicketObservation:
+        submitted_fields = {
+            field
+            for field in ("issue_type", "priority", "assignment_group", "resolution_action")
+            if getattr(action, field) is not None
+        }
+        if submitted_fields:
+            raise ValueError(
+                "defer actions cannot include submit fields: "
+                f"{sorted(submitted_fields)}"
+            )
+        ticket_id = current_ticket.ticket_id
+        existing_count = self._defer_count(ticket_id)
+        defer_allowed = (
+            existing_count < MAX_DEFERS_PER_TICKET
+            and idx < len(self._queue) - 1
+            and self._state.current_task_id in {2, 3}
+        )
+        defer_count = existing_count + 1
+        reward = 0.0
+        sla_risk = current_ticket.priority in {"high", "critical"} or self._ticket_mentions_follow_up(
+            current_ticket
+        )
+        moved_ticket = current_ticket
+        if defer_allowed:
+            self._state.ticket_defer_counts[ticket_id] = defer_count
+            self._state.deferred_ticket_count += 1
+            if sla_risk:
+                self._state.sla_breach_count += 1
+                moved_ticket = self._escalate_ticket_after_delay(
+                    current_ticket,
+                    defer_count=defer_count,
+                )
+            elif (
+                self._remaining_tools_for_ticket(current_ticket)
+                or self._request_info_note_for_ticket(current_ticket) is not None
+                or self._ticket_is_capacity_sensitive(current_ticket)
+            ):
+                reward = REQUEST_INFO_CONTEXT_COMPLETION_BONUS
+            self._queue.pop(idx)
+            self._queue.append(moved_ticket)
+            self._tickets_by_id[moved_ticket.ticket_id] = moved_ticket
+            self._sync_queue_ticket_ids()
+            self._record_dynamic_queue_event(
+                "defer",
+                ticket_id=ticket_id,
+                defer_count=defer_count,
+                sla_risk=sla_risk,
+            )
+        else:
+            self._state.sla_breach_count += 1
+            self._record_dynamic_queue_event(
+                "defer_denied",
+                ticket_id=ticket_id,
+                defer_count=defer_count,
+            )
+        self._state.step_count += 1
+        self._state.last_tool_result = {
+            "action_type": "defer",
+            "ticket_id": ticket_id,
+            "defer_allowed": defer_allowed,
+            "defer_count": defer_count,
+            "sla_risk": sla_risk,
+        }
+        self._state.last_step_reward = reward
+        self._state.reward = reward
+        self._state.done = False
+        reward_components = self._build_reward_components(
+            ticket_score=0.0,
+            field_breakdown={},
+            shaped_step_reward=reward,
+            reward_kind="operational",
+            final_reward=reward,
+            extra_details={
+                "operational_action": "defer",
+                "defer_allowed": defer_allowed,
+                "defer_count": defer_count,
+                "sla_breach_count": self._state.sla_breach_count,
+            },
+        )
+        self._state.history_entries.append(
+            self._build_history_entry(
+                current_ticket,
+                predicted=action.model_dump(exclude_none=True),
+                score=0.0,
+                breakdown={},
+                queue_position=idx + 1,
+                reward=reward,
+                reward_kind="operational",
+                tool_result=self._state.last_tool_result,
+                reward_components=reward_components,
+            )
+        )
+        self._state.last_reward_components = reward_components
+        return self._build_observation(task, done=False, reward=reward)
+    def _handle_open_incident_action(
+        self,
+        task: dict,
+        current_ticket: HelpdeskTicketRecord,
+        action: HelpdeskTicketAction,
+        idx: int,
+    ) -> HelpdeskTicketObservation:
+        submitted_fields = {
+            field
+            for field in ("issue_type", "priority", "assignment_group", "resolution_action")
+            if getattr(action, field) is not None
+        }
+        if submitted_fields:
+            raise ValueError(
+                "open_incident actions cannot include submit fields: "
+                f"{sorted(submitted_fields)}"
+            )
+        useful_incident = (
+            self._state.current_task_id == 3
+            and self._requires_incident(current_ticket)
+            and not self._incident_open_for_ticket(current_ticket)
+        )
+        overflow = 0
+        incident_reward = 0.0
+        if useful_incident:
+            self._state.open_incident_ticket_ids.append(current_ticket.ticket_id)
+            self._state.incident_actions_used += 1
+            overflow = max(0, 1 - self._state.incident_slots_remaining)
+            self._state.incident_slots_remaining = max(
+                0,
+                self._state.incident_slots_remaining - 1,
+            )
+            overflow_penalty = round(overflow * INCIDENT_SLOT_OVERFLOW_PENALTY, 4)
+            if overflow_penalty > 0.0:
+                self._state.planning_penalty_total = round(
+                    self._state.planning_penalty_total + overflow_penalty,
+                    4,
+                )
+                self._state.planning_penalty_applied = overflow_penalty
+            incident_reward = clamp_open_unit_interval(
+                INCIDENT_OPEN_REWARD - overflow_penalty
+            )
+            self._record_dynamic_queue_event(
+                "open_incident",
+                ticket_id=current_ticket.ticket_id,
+                overflow=overflow,
+            )
+        self._state.step_count += 1
+        self._state.last_tool_result = {
+            "action_type": "open_incident",
+            "ticket_id": current_ticket.ticket_id,
+            "incident_open": useful_incident,
+            "incident_slots_remaining": self._state.incident_slots_remaining,
+            "overflow": overflow,
+        }
+        self._state.last_step_reward = incident_reward
+        self._state.reward = incident_reward
+        self._state.done = False
+        reward_components = self._build_reward_components(
+            ticket_score=0.0,
+            field_breakdown={},
+            shaped_step_reward=incident_reward,
+            reward_kind="operational",
+            final_reward=incident_reward,
+            extra_details={
+                "operational_action": "open_incident",
+                "incident_open": useful_incident,
+                "incident_slots_remaining": self._state.incident_slots_remaining,
+            },
+        )
+        self._state.history_entries.append(
+            self._build_history_entry(
+                current_ticket,
+                predicted=action.model_dump(exclude_none=True),
+                score=0.0,
+                breakdown={},
+                queue_position=idx + 1,
+                reward=incident_reward,
+                reward_kind="operational",
+                tool_result=self._state.last_tool_result,
+                reward_components=reward_components,
+            )
+        )
+        self._state.last_reward_components = reward_components
+        return self._build_observation(task, done=False, reward=incident_reward)
     def _build_ticket_view(self, ticket: HelpdeskTicketRecord) -> dict[str, Any]:
         progress = self._tool_progress_for_ticket(ticket)
         remaining_tools = progress["remaining_tools"]
         used_tools = set(self._used_tools_for_ticket(ticket.ticket_id))
+        operational_actions = progress["recommended_operational_actions"]
         ticket_view: dict[str, Any] = {
             "ticket_id": ticket.ticket_id,
             "title": self._visible_title(ticket),
                 "investigations_used_for_ticket": progress["revealed_count"],
                 "recommended_tools": list(remaining_tools),
             }
+        ticket_view["operational_context"] = {
+            "request_info_available": self._request_info_note_for_ticket(ticket) is not None,
+            "request_info_used": progress["request_info_used"],
+            "defer_count": self._defer_count(ticket.ticket_id),
+            "incident_recommended": self._requires_incident(ticket),
+            "incident_open": self._incident_open_for_ticket(ticket),
+            "recommended_actions": operational_actions,
+        }
         if ticket.ambiguity_note is not None and "lookup_internal_routing_note" not in remaining_tools:
             ticket_view["ambiguity_note"] = ticket.ambiguity_note
         if (
             and "lookup_internal_routing_note" not in remaining_tools
         ):
             ticket_view["planning_note"] = ticket.planning_note
+        if self._request_info_used(ticket.ticket_id):
+            ticket_view["customer_update_note"] = self._request_info_note_for_ticket(ticket)
         if ticket.related_ticket_id is not None and "lookup_related_ticket" not in remaining_tools:
             ticket_view["related_ticket_id"] = ticket.related_ticket_id
             related_ticket = self._tickets_by_id.get(ticket.related_ticket_id)
             or "lookup_queue_capacity_forecast" in used_tools
         ):
             ticket_view["routing_options"] = self._routing_options_for_ticket(ticket)
+        if ticket.generated_from_ticket_id is not None:
+            ticket_view["generated_from_ticket_id"] = ticket.generated_from_ticket_id
         return ticket_view
     def _build_feedback_summary(
             parts.append(f"Investigation step used {tool_name or 'a tool'}")
             if reward_components and reward_components.get("new_context_revealed"):
                 parts.append("new context was revealed")
+        elif reward_kind == "operational":
+            operational_action = (
+                reward_components.get("operational_action")
+                if reward_components
+                else predicted.get("action_type")
+            )
+            parts.append(f"Operational step used {operational_action or 'an action'}")
         elif penalty_reason is not None:
             parts.append(f"Penalty applied: {penalty_reason}")
         else:
             planning_penalty_total = reward_components.get("planning_penalty_total")
             if planning_penalty_total:
                 parts.append(f"planning_penalty_total={planning_penalty_total:.2f}")
+            incident_gap_penalty = reward_components.get("incident_gap_penalty")
+            if incident_gap_penalty:
+                parts.append(f"incident_gap_penalty={incident_gap_penalty:.2f}")
+            spawned_follow_up_ticket_id = reward_components.get("spawned_follow_up_ticket_id")
+            if spawned_follow_up_ticket_id:
+                parts.append(f"spawned_follow_up={spawned_follow_up_ticket_id}")
         return "; ".join(parts)
             "score": score,
             "breakdown": breakdown,
             "queue_position": queue_position,
+            "operational_context": {
+                "request_info_used": progress["request_info_used"],
+                "defer_count": self._defer_count(ticket.ticket_id),
+                "incident_open": self._incident_open_for_ticket(ticket),
+                "recommended_actions": progress["recommended_operational_actions"],
+            },
         }
         if self._state.current_task_id == 3:
             history_entry["capacity_state"] = self._capacity_state_snapshot()
             and "lookup_internal_routing_note" not in remaining_tools
         ):
             history_entry["planning_note"] = ticket.planning_note
+        if self._request_info_used(ticket.ticket_id):
+            history_entry["customer_update_note"] = self._request_info_note_for_ticket(ticket)
         if ticket.related_ticket_id is not None and "lookup_related_ticket" not in remaining_tools:
             history_entry["related_ticket_id"] = ticket.related_ticket_id
             related_ticket = self._tickets_by_id.get(ticket.related_ticket_id)
             history_entry["tool_result"] = tool_result
         if reward_components is not None:
             history_entry["reward_components"] = reward_components
+        if ticket.generated_from_ticket_id is not None:
+            history_entry["generated_from_ticket_id"] = ticket.generated_from_ticket_id
         if progress["required_tools"]:
             history_entry["context_progress"] = {
                 "hidden_context_remaining": bool(progress["remaining_count"]),
                 and (ticket_view.get("context_status") or {}).get("hidden_context_remaining")
             ),
             "action_mode": "investigate_or_submit",
+            "available_action_types": self._available_action_types_for_task(),
             "average_score_so_far": self._state.average_score_so_far,
             "progress_fraction": progress_fraction,
             "investigation_penalty_applied": self._state.investigation_penalty_applied,
             "planning_penalty_total": self._state.planning_penalty_total,
             "planning_penalty_applied": self._state.planning_penalty_applied,
+            "sla_breach_count": self._state.sla_breach_count,
+            "incident_gap_total": self._state.incident_gap_total,
+            "dynamic_queue_events": list(self._state.dynamic_queue_events[-5:]),
         }
         if self._state.current_task_id == 3:
             metadata["capacity_state"] = self._capacity_state_snapshot()
             task_name=task["name"],
             instructions=task["instructions"],
             allowed_fields=list(task["allowed_fields"]),
+            available_action_types=self._available_action_types_for_task(),
+            available_tools=self._available_tools_for_task(),
             investigation_budget_remaining=self._state.investigation_budget_remaining,
             last_tool_result=self._state.last_tool_result,
             current_ticket=ticket_view,

server/grader.py CHANGED Viewed

@@ -64,13 +64,23 @@ PRIORITY_SCORES = {
 TASK_WEIGHTS = {
-    1: {"issue_type": 1.0},
-    2: {"issue_type": 0.6, "priority": 0.4},
     3: {
-        "issue_type": 0.35,
         "priority": 0.20,
         "assignment_group": 0.25,
-        "resolution_action": 0.20,
     },
 }

 TASK_WEIGHTS = {
+    1: {
+        "issue_type": 0.40,
+        "priority": 0.20,
+        "assignment_group": 0.20,
+        "resolution_action": 0.20,
+    },
+    2: {
+        "issue_type": 0.32,
+        "priority": 0.20,
+        "assignment_group": 0.24,
+        "resolution_action": 0.24,
+    },
     3: {
+        "issue_type": 0.30,
         "priority": 0.20,
         "assignment_group": 0.25,
+        "resolution_action": 0.25,
     },
 }

server/tasks.py CHANGED Viewed

@@ -10,28 +10,39 @@ from vocabulary import TASK_IDS
 TASKS = {
     1: {
         "id": 1,
-        "name": "Issue Type Classification",
         "difficulty": "easy",
         "instructions": (
-            "Read the ticket and select the single best IT issue type. "
-            "You may investigate first, then submit a final routing answer."
         ),
-        "allowed_fields": ["issue_type"],
     },
     2: {
         "id": 2,
-        "name": "Issue Type And Priority",
         "difficulty": "medium",
         "instructions": (
-            "Read the ticket, select the best IT issue type, and estimate the "
-            "correct operational priority. If the observation includes ambiguity "
-            "or follow-up context, use it. You may investigate before you submit."
         ),
-        "allowed_fields": ["issue_type", "priority"],
     },
     3: {
         "id": 3,
-        "name": "Full Ticket Routing",
         "difficulty": "hard",
         "instructions": (
             "Perform full helpdesk routing by selecting the best issue type, "
@@ -40,9 +51,8 @@ TASKS = {
             "forecasts, and planning state when present. "
             "Some hard tickets intentionally hide decisive routing context until "
             "you investigate with the available tools, and some hard episodes also "
-            "require queue-level capacity planning across multiple tickets, so "
-            "premature or resource-greedy routing can underperform even when the "
-            "visible text looks plausible."
         ),
         "allowed_fields": [
             "issue_type",
@@ -61,6 +71,10 @@ PLANNING_ROUTE_UPDATES: dict[str, dict] = {
             "customer-facing charge review as a lower-fidelity fallback while the bug "
             "investigation continues separately."
         ),
         "alternate_issue_type": "billing_license",
         "alternate_assignment_group": "license_ops",
         "alternate_resolution_action": "assign",
@@ -82,6 +96,10 @@ PLANNING_ROUTE_UPDATES: dict[str, dict] = {
             "Seat expansion is the preferred route, but license operations can still "
             "handle the prorating clarification when procurement is the bottleneck."
         ),
         "alternate_issue_type": "billing_license",
         "alternate_assignment_group": "license_ops",
         "alternate_resolution_action": "fulfill",
@@ -92,6 +110,10 @@ PLANNING_ROUTE_UPDATES: dict[str, dict] = {
             "The request can be treated either as roadmap feedback or as a support "
             "escalation if the operational impact is emphasized."
         ),
         "alternate_issue_type": "application_support",
         "alternate_priority": "high",
         "alternate_resolution_action": "escalate",
@@ -138,6 +160,10 @@ PLANNING_ROUTE_UPDATES: dict[str, dict] = {
             "Security scheduling is ideal, but a compliance acknowledgement is still "
             "acceptable when the security team only needs to confirm the process."
         ),
         "alternate_issue_type": "security_compliance",
         "alternate_resolution_action": "acknowledge",
         "alternate_route_score_multiplier": 0.8,
@@ -192,6 +218,10 @@ CURATED_EXPANSION_RECORDS: list[dict] = [
             "Security still owns the privileged-access review, but service desk can "
             "collect chronology and prepare the packet if the security queue is jammed."
         ),
         "alternate_assignment_group": "service_desk",
         "alternate_resolution_action": "assign",
         "alternate_route_score_multiplier": 0.72,
@@ -253,6 +283,10 @@ CURATED_EXPANSION_RECORDS: list[dict] = [
             "Immediate operational execution is preferred. Procurement can still own the "
             "approval path if service-desk capacity is already depleted."
         ),
         "alternate_assignment_group": "procurement",
         "alternate_resolution_action": "assign",
         "alternate_route_score_multiplier": 0.8,
@@ -273,6 +307,11 @@ CURATED_EXPANSION_RECORDS: list[dict] = [
             "Security owns the final unblock decision. If security is saturated, the "
             "application team can still take the first-response diagnostics path."
         ),
         "alternate_issue_type": "application_support",
         "alternate_priority": "high",
         "alternate_assignment_group": "application_team",
@@ -398,6 +437,9 @@ CURATED_EXPANSION_RECORDS: list[dict] = [
             "Application engineering is preferred because they own the evidence. Procurement "
             "can still coordinate the renewal communication if the evidence queue is saturated."
         ),
         "alternate_issue_type": "service_request",
         "alternate_priority": "medium",
         "alternate_assignment_group": "procurement",

 TASKS = {
     1: {
         "id": 1,
+        "name": "Guided Full Routing",
         "difficulty": "easy",
         "instructions": (
+            "Perform full helpdesk routing by selecting issue type, priority, "
+            "assignment group, and resolution action. Easy-task episodes keep the "
+            "ticket text mostly visible and focus on grounded single-ticket routing."
         ),
+        "allowed_fields": [
+            "issue_type",
+            "priority",
+            "assignment_group",
+            "resolution_action",
+        ],
     },
     2: {
         "id": 2,
+        "name": "Contextual Full Routing",
         "difficulty": "medium",
         "instructions": (
+            "Perform full helpdesk routing with partial observability. Some "
+            "tickets hide related-case, requester-history, or clarification "
+            "details until you investigate or request more information."
         ),
+        "allowed_fields": [
+            "issue_type",
+            "priority",
+            "assignment_group",
+            "resolution_action",
+        ],
     },
     3: {
         "id": 3,
+        "name": "Adaptive Queue Routing",
         "difficulty": "hard",
         "instructions": (
             "Perform full helpdesk routing by selecting the best issue type, "
             "forecasts, and planning state when present. "
             "Some hard tickets intentionally hide decisive routing context until "
             "you investigate with the available tools, and some hard episodes also "
+            "require queue-level capacity planning, deferrals, incident management, "
+            "and recovery from downstream follow-up tickets."
         ),
         "allowed_fields": [
             "issue_type",
             "customer-facing charge review as a lower-fidelity fallback while the bug "
             "investigation continues separately."
         ),
+        "customer_update_note": (
+            "Finance confirmed the unexpected charge landed immediately after the "
+            "integration outage and wants one accountable owner today."
+        ),
         "alternate_issue_type": "billing_license",
         "alternate_assignment_group": "license_ops",
         "alternate_resolution_action": "assign",
             "Seat expansion is the preferred route, but license operations can still "
             "handle the prorating clarification when procurement is the bottleneck."
         ),
+        "customer_update_note": (
+            "The requester clarified that the blocker is both the seat increase and "
+            "the unexpected prorating language on the quote."
+        ),
         "alternate_issue_type": "billing_license",
         "alternate_assignment_group": "license_ops",
         "alternate_resolution_action": "fulfill",
             "The request can be treated either as roadmap feedback or as a support "
             "escalation if the operational impact is emphasized."
         ),
+        "customer_update_note": (
+            "The requester says the missing behavior is now blocking a customer "
+            "rollout, so this may need operational ownership rather than product triage."
+        ),
         "alternate_issue_type": "application_support",
         "alternate_priority": "high",
         "alternate_resolution_action": "escalate",
             "Security scheduling is ideal, but a compliance acknowledgement is still "
             "acceptable when the security team only needs to confirm the process."
         ),
+        "customer_update_note": (
+            "The requester clarified they mainly need confirmed ownership and a date "
+            "for the review, not the review itself right now."
+        ),
         "alternate_issue_type": "security_compliance",
         "alternate_resolution_action": "acknowledge",
         "alternate_route_score_multiplier": 0.8,
             "Security still owns the privileged-access review, but service desk can "
             "collect chronology and prepare the packet if the security queue is jammed."
         ),
+        "customer_update_note": (
+            "Executives want a single incident bridge owner before the board packet is sent."
+        ),
+        "incident_recommended": True,
         "alternate_assignment_group": "service_desk",
         "alternate_resolution_action": "assign",
         "alternate_route_score_multiplier": 0.72,
             "Immediate operational execution is preferred. Procurement can still own the "
             "approval path if service-desk capacity is already depleted."
         ),
+        "customer_update_note": (
+            "The customer says the launch rehearsal will fail without a same-day answer."
+        ),
+        "incident_recommended": True,
         "alternate_assignment_group": "procurement",
         "alternate_resolution_action": "assign",
         "alternate_route_score_multiplier": 0.8,
             "Security owns the final unblock decision. If security is saturated, the "
             "application team can still take the first-response diagnostics path."
         ),
+        "customer_update_note": (
+            "The identity-risk lead confirmed users remain locked out and wants incident "
+            "coordination while the exception is reviewed."
+        ),
+        "incident_recommended": True,
         "alternate_issue_type": "application_support",
         "alternate_priority": "high",
         "alternate_assignment_group": "application_team",
             "Application engineering is preferred because they own the evidence. Procurement "
             "can still coordinate the renewal communication if the evidence queue is saturated."
         ),
+        "customer_update_note": (
+            "Commercial leadership needs one named owner for the blocked renewal before end of day."
+        ),
         "alternate_issue_type": "service_request",
         "alternate_priority": "medium",
         "alternate_assignment_group": "procurement",

tests/test_api_integration.py CHANGED Viewed

@@ -529,8 +529,8 @@ class TestHeuristicInferenceRegression(unittest.TestCase):
         overall_avg = sum(rewards) / len(rewards)
         self.assertGreaterEqual(
             overall_avg,
-            0.45,
-            f"Overall average reward {overall_avg:.4f} is below the smoke-test floor of 0.45",
         )
         self.assertLessEqual(
             overall_avg,

         overall_avg = sum(rewards) / len(rewards)
         self.assertGreaterEqual(
             overall_avg,
+            0.25,
+            f"Overall average reward {overall_avg:.4f} is below the smoke-test floor of 0.25",
         )
         self.assertLessEqual(
             overall_avg,

tests/test_competitive_upgrade.py CHANGED Viewed

@@ -565,13 +565,16 @@ class TestInvestigationActions(unittest.TestCase):
     def test_submit_after_investigation_completes_episode(self) -> None:
         env, obs, ticket, related = self._make_linked_env()
-        env.step(
             HelpdeskTicketAction(
                 action_type="investigate",
                 tool_name="lookup_related_ticket",
                 tool_target_ticket_id=ticket.related_ticket_id,
             )
         )
         final_obs = env.step(
             HelpdeskTicketAction(
                 issue_type=ticket.issue_type,
@@ -752,6 +755,7 @@ class TestTerminalInvalidActionFinalReward(unittest.TestCase):
     def test_last_invalid_submit_returns_trajectory_reward_not_zero(self) -> None:
         from unittest.mock import patch
         dataset = load_dataset()
         first = dataset[0]
@@ -764,25 +768,40 @@ class TestTerminalInvalidActionFinalReward(unittest.TestCase):
                 "_tickets_by_id",
                 {first.ticket_id: first, second.ticket_id: second},
             ):
-                obs = env.reset(seed=0, task_id=1, queue_size=2)
-        tickets_by_id = {first.ticket_id: first, second.ticket_id: second}
-        current = tickets_by_id[obs.current_ticket["ticket_id"]]
-        obs = env.step(HelpdeskTicketAction(issue_type=current.issue_type))
-        self.assertFalse(obs.done)
-        current = tickets_by_id[obs.current_ticket["ticket_id"]]
-        final_obs = env.step(
-            HelpdeskTicketAction(
-                issue_type=current.issue_type,
-                priority="medium",
-            )
-        )
-        self.assertTrue(final_obs.done)
-        self.assertAlmostEqual(final_obs.reward, 0.5, places=9)
-        self.assertAlmostEqual(env.state.total_reward, 0.5, places=9)
-        self.assertAlmostEqual(env.state.reward or 0.0, 0.5, places=9)
 # ---------------------------------------------------------------------------

     def test_submit_after_investigation_completes_episode(self) -> None:
         env, obs, ticket, related = self._make_linked_env()
+        obs = env.step(
             HelpdeskTicketAction(
                 action_type="investigate",
                 tool_name="lookup_related_ticket",
                 tool_target_ticket_id=ticket.related_ticket_id,
             )
         )
+        operational_context = (obs.current_ticket or {}).get("operational_context", {})
+        if operational_context.get("incident_recommended"):
+            obs = env.step(HelpdeskTicketAction(action_type="open_incident"))
         final_obs = env.step(
             HelpdeskTicketAction(
                 issue_type=ticket.issue_type,
     def test_last_invalid_submit_returns_trajectory_reward_not_zero(self) -> None:
         from unittest.mock import patch
+        from server.tasks import get_task_definition as base_get_task_definition
         dataset = load_dataset()
         first = dataset[0]
                 "_tickets_by_id",
                 {first.ticket_id: first, second.ticket_id: second},
             ):
+                with patch(
+                    "server.environment.get_task_definition",
+                    side_effect=lambda task_id: (
+                        {
+                            **base_get_task_definition(task_id),
+                            "allowed_fields": ["issue_type"],
+                        }
+                        if task_id == 1
+                        else base_get_task_definition(task_id)
+                    ),
+                ):
+                    obs = env.reset(seed=0, task_id=1, queue_size=2)
+                    tickets_by_id = {first.ticket_id: first, second.ticket_id: second}
+                    current = tickets_by_id[obs.current_ticket["ticket_id"]]
+                    obs = env.step(HelpdeskTicketAction(issue_type=current.issue_type))
+                    self.assertFalse(obs.done)
+                    current = tickets_by_id[obs.current_ticket["ticket_id"]]
+                    final_obs = env.step(
+                        HelpdeskTicketAction(
+                            issue_type=current.issue_type,
+                            priority="medium",
+                        )
+                    )
+                    self.assertTrue(final_obs.done)
+                    expected_average = sum(env.state.per_ticket_scores) / len(
+                        env.state.per_ticket_scores
+                    )
+                    self.assertGreater(final_obs.reward, 0.0)
+                    self.assertAlmostEqual(final_obs.reward, expected_average, places=9)
+                    self.assertAlmostEqual(env.state.total_reward, expected_average, places=9)
+                    self.assertAlmostEqual(env.state.reward or 0.0, expected_average, places=9)
 # ---------------------------------------------------------------------------

tests/test_extra_fields_penalty.py CHANGED Viewed

@@ -5,6 +5,7 @@ Validates Requirement 7: Step Validates Action Fields Against Task Contract.
 """
 from __future__ import annotations
 import sys
 import os
 import unittest
@@ -41,24 +42,42 @@ def _make_env() -> HelpdeskTicketRoutingEnvironment:
     return HelpdeskTicketRoutingEnvironment()
 class TestExtraFieldsPenalty(unittest.TestCase):
     """Requirement 7: step() rejects actions with fields outside the task's allowed_fields."""
     def test_extra_fields_returns_closed_interval_penalty_reward(self) -> None:
         """Task 1 penalties should keep the returned reward inside the unit interval."""
         env = _make_env()
-        obs = env.reset(seed=42, task_id=1)
-        # Task 1 allowed_fields should NOT include assignment_group
-        self.assertNotIn("assignment_group", obs.allowed_fields)
-        # Submit an action with an extra field (assignment_group) not in task 1's allowed_fields
-        action = HelpdeskTicketAction(
-            issue_type=ISSUE_TYPES[0],
-            priority=PRIORITIES[0],
-            assignment_group=ASSIGNMENT_GROUPS[0],  # extra field
-        )
-        penalty_obs = env.step(action)
         self.assertIsInstance(penalty_obs, HelpdeskTicketObservation)
         self.assertGreaterEqual(penalty_obs.reward, 0.0)
@@ -67,27 +86,29 @@ class TestExtraFieldsPenalty(unittest.TestCase):
     def test_extra_fields_advances_ticket_index(self) -> None:
         """Penalty step must advance tickets_processed by 1."""
         env = _make_env()
-        obs = env.reset(seed=42, task_id=1)
-        self.assertEqual(obs.tickets_processed, 0)
-        action = HelpdeskTicketAction(
-            issue_type=ISSUE_TYPES[0],
-            assignment_group=ASSIGNMENT_GROUPS[0],  # extra field for task 1
-        )
-        penalty_obs = env.step(action)
         self.assertEqual(penalty_obs.tickets_processed, 1)
     def test_extra_fields_records_score_inside_unit_interval(self) -> None:
         """per_ticket_scores must stay in the unit interval after a penalty step."""
         env = _make_env()
-        env.reset(seed=42, task_id=1)
-        action = HelpdeskTicketAction(
-            issue_type=ISSUE_TYPES[0],
-            assignment_group=ASSIGNMENT_GROUPS[0],  # extra field
-        )
-        env.step(action)
         state = env.state
         self.assertEqual(len(state.per_ticket_scores), 1)
@@ -97,13 +118,14 @@ class TestExtraFieldsPenalty(unittest.TestCase):
     def test_extra_fields_history_entry_has_penalty_reason(self) -> None:
         """History entry for a penalty step must include penalty_reason."""
         env = _make_env()
-        env.reset(seed=42, task_id=1)
-        action = HelpdeskTicketAction(
-            issue_type=ISSUE_TYPES[0],
-            assignment_group=ASSIGNMENT_GROUPS[0],  # extra field
-        )
-        penalty_obs = env.step(action)
         self.assertEqual(len(penalty_obs.history), 1)
         entry = penalty_obs.history[0]
@@ -115,18 +137,19 @@ class TestExtraFieldsPenalty(unittest.TestCase):
     def test_no_extra_fields_grades_normally(self) -> None:
         """When action fields are within allowed_fields, grading proceeds normally (reward != forced 0.0)."""
         env = _make_env()
-        obs = env.reset(seed=42, task_id=1)
-        # Build action using only allowed fields
-        allowed = obs.allowed_fields
-        action_kwargs = {}
-        if "issue_type" in allowed:
-            action_kwargs["issue_type"] = ISSUE_TYPES[0]
-        if "priority" in allowed:
-            action_kwargs["priority"] = PRIORITIES[0]
-        action = HelpdeskTicketAction(**action_kwargs)
-        result_obs = env.step(action)
         # Should be a valid observation; reward may be any value in [0.0, 1.0]
         self.assertIsInstance(result_obs, HelpdeskTicketObservation)
@@ -138,16 +161,17 @@ class TestExtraFieldsPenalty(unittest.TestCase):
     def test_action_metadata_is_not_treated_as_extra_field(self) -> None:
         """OpenEnv Action metadata should not trigger the extra-fields penalty."""
         env = _make_env()
-        obs = env.reset(seed=42, task_id=1)
-        ticket_id = obs.current_ticket["ticket_id"]
-        current_ticket = env._tickets_by_id[ticket_id]  # noqa: SLF001 - test-only inspection
-        result_obs = env.step(
-            HelpdeskTicketAction(
-                issue_type=current_ticket.issue_type,
-                metadata={},
             )
-        )
         self.assertEqual(len(result_obs.history), 1)
         self.assertNotIn("penalty_reason", result_obs.history[0])
@@ -156,42 +180,44 @@ class TestExtraFieldsPenalty(unittest.TestCase):
     def test_extra_fields_no_exception_raised(self) -> None:
         """Requirement 7.4: extra fields must not raise an unhandled exception."""
         env = _make_env()
-        env.reset(seed=42, task_id=1)
-        action = HelpdeskTicketAction(
-            issue_type=ISSUE_TYPES[0],
-            priority=PRIORITIES[0],
-            assignment_group=ASSIGNMENT_GROUPS[0],
-            resolution_action=RESOLUTION_ACTIONS[0],  # multiple extra fields
-        )
-        try:
-            obs = env.step(action)
-        except Exception as exc:  # noqa: BLE001
-            self.fail(f"step() raised an unexpected exception: {exc}")
         self.assertIsInstance(obs, HelpdeskTicketObservation)
     def test_extra_fields_done_flag_set_correctly_on_last_ticket(self) -> None:
         """When the penalty step is on the last ticket, done stays True and reward stays episode-level."""
         env = _make_env()
-        obs = env.reset(seed=42, task_id=1)
-        queue_size = obs.queue_size
-        tickets_by_id = env._tickets_by_id  # noqa: SLF001 - test-only inspection
-        # Process all tickets except the last one normally
-        for _ in range(queue_size - 1):
             current_ticket_id = obs.current_ticket["ticket_id"]
             current_ticket = tickets_by_id[current_ticket_id]
-            obs = env.step(HelpdeskTicketAction(issue_type=current_ticket.issue_type))
-        # Now trigger penalty on the last ticket
-        current_ticket_id = obs.current_ticket["ticket_id"]
-        current_ticket = tickets_by_id[current_ticket_id]
-        action = HelpdeskTicketAction(
-            issue_type=current_ticket.issue_type,
-            assignment_group=ASSIGNMENT_GROUPS[0],  # extra field
-        )
-        final_obs = env.step(action)
         self.assertTrue(final_obs.done)
         self.assertGreater(final_obs.reward, 0.0)

 """
 from __future__ import annotations
+import contextlib
 import sys
 import os
 import unittest
     return HelpdeskTicketRoutingEnvironment()
+def _task_with_issue_type_only(task_id: int) -> dict:
+    task = dict(TASKS[task_id])
+    if task_id == 1:
+        task["allowed_fields"] = ["issue_type"]
+    return task
+@contextlib.contextmanager
+def _restrict_task_1_fields():
+    original_fields = list(TASKS[1]["allowed_fields"])
+    TASKS[1]["allowed_fields"] = ["issue_type"]
+    try:
+        yield
+    finally:
+        TASKS[1]["allowed_fields"] = original_fields
 class TestExtraFieldsPenalty(unittest.TestCase):
     """Requirement 7: step() rejects actions with fields outside the task's allowed_fields."""
     def test_extra_fields_returns_closed_interval_penalty_reward(self) -> None:
         """Task 1 penalties should keep the returned reward inside the unit interval."""
         env = _make_env()
+        with _restrict_task_1_fields():
+            obs = env.reset(seed=42, task_id=1)
+            # Task 1 allowed_fields should NOT include assignment_group
+            self.assertNotIn("assignment_group", obs.allowed_fields)
+            # Submit an action with an extra field (assignment_group) not in task 1's allowed_fields
+            action = HelpdeskTicketAction(
+                issue_type=ISSUE_TYPES[0],
+                priority=PRIORITIES[0],
+                assignment_group=ASSIGNMENT_GROUPS[0],  # extra field
+            )
+            penalty_obs = env.step(action)
         self.assertIsInstance(penalty_obs, HelpdeskTicketObservation)
         self.assertGreaterEqual(penalty_obs.reward, 0.0)
     def test_extra_fields_advances_ticket_index(self) -> None:
         """Penalty step must advance tickets_processed by 1."""
         env = _make_env()
+        with _restrict_task_1_fields():
+            obs = env.reset(seed=42, task_id=1)
+            self.assertEqual(obs.tickets_processed, 0)
+            action = HelpdeskTicketAction(
+                issue_type=ISSUE_TYPES[0],
+                assignment_group=ASSIGNMENT_GROUPS[0],  # extra field for task 1
+            )
+            penalty_obs = env.step(action)
         self.assertEqual(penalty_obs.tickets_processed, 1)
     def test_extra_fields_records_score_inside_unit_interval(self) -> None:
         """per_ticket_scores must stay in the unit interval after a penalty step."""
         env = _make_env()
+        with _restrict_task_1_fields():
+            env.reset(seed=42, task_id=1)
+            action = HelpdeskTicketAction(
+                issue_type=ISSUE_TYPES[0],
+                assignment_group=ASSIGNMENT_GROUPS[0],  # extra field
+            )
+            env.step(action)
         state = env.state
         self.assertEqual(len(state.per_ticket_scores), 1)
     def test_extra_fields_history_entry_has_penalty_reason(self) -> None:
         """History entry for a penalty step must include penalty_reason."""
         env = _make_env()
+        with _restrict_task_1_fields():
+            env.reset(seed=42, task_id=1)
+            action = HelpdeskTicketAction(
+                issue_type=ISSUE_TYPES[0],
+                assignment_group=ASSIGNMENT_GROUPS[0],  # extra field
+            )
+            penalty_obs = env.step(action)
         self.assertEqual(len(penalty_obs.history), 1)
         entry = penalty_obs.history[0]
     def test_no_extra_fields_grades_normally(self) -> None:
         """When action fields are within allowed_fields, grading proceeds normally (reward != forced 0.0)."""
         env = _make_env()
+        with _restrict_task_1_fields():
+            obs = env.reset(seed=42, task_id=1)
+            # Build action using only allowed fields
+            allowed = obs.allowed_fields
+            action_kwargs = {}
+            if "issue_type" in allowed:
+                action_kwargs["issue_type"] = ISSUE_TYPES[0]
+            if "priority" in allowed:
+                action_kwargs["priority"] = PRIORITIES[0]
+            action = HelpdeskTicketAction(**action_kwargs)
+            result_obs = env.step(action)
         # Should be a valid observation; reward may be any value in [0.0, 1.0]
         self.assertIsInstance(result_obs, HelpdeskTicketObservation)
     def test_action_metadata_is_not_treated_as_extra_field(self) -> None:
         """OpenEnv Action metadata should not trigger the extra-fields penalty."""
         env = _make_env()
+        with _restrict_task_1_fields():
+            obs = env.reset(seed=42, task_id=1)
+            ticket_id = obs.current_ticket["ticket_id"]
+            current_ticket = env._tickets_by_id[ticket_id]  # noqa: SLF001 - test-only inspection
+            result_obs = env.step(
+                HelpdeskTicketAction(
+                    issue_type=current_ticket.issue_type,
+                    metadata={},
+                )
             )
         self.assertEqual(len(result_obs.history), 1)
         self.assertNotIn("penalty_reason", result_obs.history[0])
     def test_extra_fields_no_exception_raised(self) -> None:
         """Requirement 7.4: extra fields must not raise an unhandled exception."""
         env = _make_env()
+        with _restrict_task_1_fields():
+            env.reset(seed=42, task_id=1)
+            action = HelpdeskTicketAction(
+                issue_type=ISSUE_TYPES[0],
+                priority=PRIORITIES[0],
+                assignment_group=ASSIGNMENT_GROUPS[0],
+                resolution_action=RESOLUTION_ACTIONS[0],  # multiple extra fields
+            )
+            try:
+                obs = env.step(action)
+            except Exception as exc:  # noqa: BLE001
+                self.fail(f"step() raised an unexpected exception: {exc}")
         self.assertIsInstance(obs, HelpdeskTicketObservation)
     def test_extra_fields_done_flag_set_correctly_on_last_ticket(self) -> None:
         """When the penalty step is on the last ticket, done stays True and reward stays episode-level."""
         env = _make_env()
+        with _restrict_task_1_fields():
+            obs = env.reset(seed=42, task_id=1)
+            queue_size = obs.queue_size
+            tickets_by_id = env._tickets_by_id  # noqa: SLF001 - test-only inspection
+            # Process all tickets except the last one normally
+            for _ in range(queue_size - 1):
+                current_ticket_id = obs.current_ticket["ticket_id"]
+                current_ticket = tickets_by_id[current_ticket_id]
+                obs = env.step(HelpdeskTicketAction(issue_type=current_ticket.issue_type))
+            # Now trigger penalty on the last ticket
             current_ticket_id = obs.current_ticket["ticket_id"]
             current_ticket = tickets_by_id[current_ticket_id]
+            action = HelpdeskTicketAction(
+                issue_type=current_ticket.issue_type,
+                assignment_group=ASSIGNMENT_GROUPS[0],  # extra field
+            )
+            final_obs = env.step(action)
         self.assertTrue(final_obs.done)
         self.assertGreater(final_obs.reward, 0.0)

tests/test_grader_unit.py CHANGED Viewed

@@ -16,6 +16,18 @@ from server.grader import (
 from vocabulary import ASSIGNMENT_GROUPS, ISSUE_TYPES, PRIORITIES, RESOLUTION_ACTIONS
 def _ticket(
     *,
     issue_type: str = "billing_license",
@@ -71,8 +83,24 @@ class GraderUnitTests(unittest.TestCase):
         score, breakdown = grade_action(action, ticket, task_id=1)
-        self.assertAlmostEqual(score, 0.4)
-        self.assertEqual(breakdown, {"issue_type": 0.4})
     def test_issue_type_scoring_matches_declared_similarity_table_exhaustively(self) -> None:
         for expected in ISSUE_TYPES:
@@ -88,9 +116,24 @@ class GraderUnitTests(unittest.TestCase):
                         if predicted == expected
                         else ISSUE_TYPE_SIMILARITY.get((predicted, expected), 0.0)
                     )
-                    expected_task_score = max(0.0, min(1.0, raw_expected_score))
-                    self.assertAlmostEqual(score, expected_task_score)
-                    self.assertEqual(breakdown, {"issue_type": raw_expected_score})
     def test_unrelated_issue_type_gets_zero_not_fuzzy_credit(self) -> None:
         ticket = _ticket(issue_type="onboarding")
@@ -99,7 +142,16 @@ class GraderUnitTests(unittest.TestCase):
         score, breakdown = grade_action(action, ticket, task_id=1)
         self.assertAlmostEqual(score, 0.0)
-        self.assertEqual(breakdown, {"issue_type": 0.0})
     def test_priority_scoring_uses_defined_proximity_table(self) -> None:
         ticket = _ticket(priority="critical")
@@ -109,7 +161,16 @@ class GraderUnitTests(unittest.TestCase):
         self.assertAlmostEqual(breakdown["issue_type"], 1.0)
         self.assertAlmostEqual(breakdown["priority"], 0.6)
-        self.assertAlmostEqual(score, 0.84)
     def test_priority_scoring_matches_declared_table_exhaustively(self) -> None:
         for expected in PRIORITIES:
@@ -130,11 +191,24 @@ class GraderUnitTests(unittest.TestCase):
                     )
                     self.assertEqual(
                         breakdown,
-                        {"issue_type": 1.0, "priority": priority_score},
                     )
-                    raw_score = 0.6 + 0.4 * priority_score
-                    expected_task_score = max(0.0, min(1.0, raw_score))
-                    self.assertAlmostEqual(score, expected_task_score)
     def test_task_2_weights_apply_as_documented(self) -> None:
         ticket = _ticket(priority="high")
@@ -142,8 +216,26 @@ class GraderUnitTests(unittest.TestCase):
         score, breakdown = grade_action(action, ticket, task_id=2)
-        self.assertEqual(breakdown, {"issue_type": 1.0, "priority": 0.5})
-        self.assertAlmostEqual(score, 0.8)
     def test_assignment_group_partial_credit_uses_declared_similarity_table(self) -> None:
         ticket = _ticket()
@@ -157,7 +249,16 @@ class GraderUnitTests(unittest.TestCase):
         score, breakdown = grade_action(action, ticket, task_id=3)
         self.assertEqual(breakdown["assignment_group"], 0.55)
-        self.assertAlmostEqual(score, 0.8875)
     def test_assignment_group_unrelated_miss_stays_zero(self) -> None:
         ticket = _ticket()
@@ -171,7 +272,16 @@ class GraderUnitTests(unittest.TestCase):
         score, breakdown = grade_action(action, ticket, task_id=3)
         self.assertEqual(breakdown["assignment_group"], 0.0)
-        self.assertAlmostEqual(score, 0.75)
     def test_task_3_weights_apply_as_documented(self) -> None:
         ticket = _ticket(priority="high")
@@ -186,14 +296,24 @@ class GraderUnitTests(unittest.TestCase):
         self.assertEqual(
             breakdown,
-            {
-                "issue_type": 1.0,
-                "priority": 0.5,
-                "assignment_group": 0.0,
-                "resolution_action": 1.0,
-            },
         )
-        self.assertAlmostEqual(score, 0.65)
     def test_alternate_route_can_win_when_primary_route_is_worse(self) -> None:
         ticket = HelpdeskTicketRecord(
@@ -243,7 +363,16 @@ class GraderUnitTests(unittest.TestCase):
         score, breakdown = grade_action(action, ticket, task_id=3)
         self.assertEqual(breakdown["resolution_action"], 0.35)
-        self.assertAlmostEqual(score, 0.87)
     def test_resolution_action_unrelated_miss_stays_zero(self) -> None:
         ticket = _ticket()
@@ -257,7 +386,16 @@ class GraderUnitTests(unittest.TestCase):
         score, breakdown = grade_action(action, ticket, task_id=3)
         self.assertEqual(breakdown["resolution_action"], 0.0)
-        self.assertAlmostEqual(score, 0.8)
     def test_assignment_group_scoring_matches_declared_similarity_table_exhaustively(self) -> None:
         for expected in ASSIGNMENT_GROUPS:
@@ -280,16 +418,24 @@ class GraderUnitTests(unittest.TestCase):
                     )
                     self.assertEqual(
                         breakdown,
-                        {
-                            "issue_type": 1.0,
-                            "priority": 1.0,
-                            "assignment_group": assignment_group_score,
-                            "resolution_action": 1.0,
-                        },
                     )
-                    raw_score = 0.35 + 0.20 + 0.25 * assignment_group_score + 0.20
-                    expected_task_score = max(0.0, min(1.0, raw_score))
-                    self.assertAlmostEqual(score, expected_task_score)
     def test_resolution_action_scoring_matches_declared_similarity_table_exhaustively(self) -> None:
         for expected in RESOLUTION_ACTIONS:
@@ -312,16 +458,24 @@ class GraderUnitTests(unittest.TestCase):
                     )
                     self.assertEqual(
                         breakdown,
-                        {
-                            "issue_type": 1.0,
-                            "priority": 1.0,
-                            "assignment_group": 1.0,
-                            "resolution_action": resolution_action_score,
-                        },
                     )
-                    raw_score = 0.35 + 0.20 + 0.25 + 0.20 * resolution_action_score
-                    expected_task_score = max(0.0, min(1.0, raw_score))
-                    self.assertAlmostEqual(score, expected_task_score)
     def test_partial_credit_tables_never_override_exact_match(self) -> None:
         for pair, value in ISSUE_TYPE_SIMILARITY.items():

 from vocabulary import ASSIGNMENT_GROUPS, ISSUE_TYPES, PRIORITIES, RESOLUTION_ACTIONS
+def _expected_breakdown(task_id: int, **field_scores: float) -> dict[str, float]:
+    return {field: field_scores[field] for field in TASK_WEIGHTS[task_id]}
+def _expected_task_score(task_id: int, **field_scores: float) -> float:
+    raw_score = sum(
+        field_scores[field] * TASK_WEIGHTS[task_id][field]
+        for field in TASK_WEIGHTS[task_id]
+    )
+    return max(0.0, min(1.0, raw_score))
 def _ticket(
     *,
     issue_type: str = "billing_license",
         score, breakdown = grade_action(action, ticket, task_id=1)
+        expected_breakdown = _expected_breakdown(
+            1,
+            issue_type=0.4,
+            priority=0.0,
+            assignment_group=0.0,
+            resolution_action=0.0,
+        )
+        self.assertEqual(breakdown, expected_breakdown)
+        self.assertAlmostEqual(
+            score,
+            _expected_task_score(
+                1,
+                issue_type=0.4,
+                priority=0.0,
+                assignment_group=0.0,
+                resolution_action=0.0,
+            ),
+        )
     def test_issue_type_scoring_matches_declared_similarity_table_exhaustively(self) -> None:
         for expected in ISSUE_TYPES:
                         if predicted == expected
                         else ISSUE_TYPE_SIMILARITY.get((predicted, expected), 0.0)
                     )
+                    expected_breakdown = _expected_breakdown(
+                        1,
+                        issue_type=raw_expected_score,
+                        priority=0.0,
+                        assignment_group=0.0,
+                        resolution_action=0.0,
+                    )
+                    self.assertAlmostEqual(
+                        score,
+                        _expected_task_score(
+                            1,
+                            issue_type=raw_expected_score,
+                            priority=0.0,
+                            assignment_group=0.0,
+                            resolution_action=0.0,
+                        ),
+                    )
+                    self.assertEqual(breakdown, expected_breakdown)
     def test_unrelated_issue_type_gets_zero_not_fuzzy_credit(self) -> None:
         ticket = _ticket(issue_type="onboarding")
         score, breakdown = grade_action(action, ticket, task_id=1)
         self.assertAlmostEqual(score, 0.0)
+        self.assertEqual(
+            breakdown,
+            _expected_breakdown(
+                1,
+                issue_type=0.0,
+                priority=0.0,
+                assignment_group=0.0,
+                resolution_action=0.0,
+            ),
+        )
     def test_priority_scoring_uses_defined_proximity_table(self) -> None:
         ticket = _ticket(priority="critical")
         self.assertAlmostEqual(breakdown["issue_type"], 1.0)
         self.assertAlmostEqual(breakdown["priority"], 0.6)
+        self.assertAlmostEqual(
+            score,
+            _expected_task_score(
+                2,
+                issue_type=1.0,
+                priority=0.6,
+                assignment_group=0.0,
+                resolution_action=0.0,
+            ),
+        )
     def test_priority_scoring_matches_declared_table_exhaustively(self) -> None:
         for expected in PRIORITIES:
                     )
                     self.assertEqual(
                         breakdown,
+                        _expected_breakdown(
+                            2,
+                            issue_type=1.0,
+                            priority=priority_score,
+                            assignment_group=0.0,
+                            resolution_action=0.0,
+                        ),
+                    )
+                    self.assertAlmostEqual(
+                        score,
+                        _expected_task_score(
+                            2,
+                            issue_type=1.0,
+                            priority=priority_score,
+                            assignment_group=0.0,
+                            resolution_action=0.0,
+                        ),
                     )
     def test_task_2_weights_apply_as_documented(self) -> None:
         ticket = _ticket(priority="high")
         score, breakdown = grade_action(action, ticket, task_id=2)
+        self.assertEqual(
+            breakdown,
+            _expected_breakdown(
+                2,
+                issue_type=1.0,
+                priority=0.5,
+                assignment_group=0.0,
+                resolution_action=0.0,
+            ),
+        )
+        self.assertAlmostEqual(
+            score,
+            _expected_task_score(
+                2,
+                issue_type=1.0,
+                priority=0.5,
+                assignment_group=0.0,
+                resolution_action=0.0,
+            ),
+        )
     def test_assignment_group_partial_credit_uses_declared_similarity_table(self) -> None:
         ticket = _ticket()
         score, breakdown = grade_action(action, ticket, task_id=3)
         self.assertEqual(breakdown["assignment_group"], 0.55)
+        self.assertAlmostEqual(
+            score,
+            _expected_task_score(
+                3,
+                issue_type=1.0,
+                priority=1.0,
+                assignment_group=0.55,
+                resolution_action=1.0,
+            ),
+        )
     def test_assignment_group_unrelated_miss_stays_zero(self) -> None:
         ticket = _ticket()
         score, breakdown = grade_action(action, ticket, task_id=3)
         self.assertEqual(breakdown["assignment_group"], 0.0)
+        self.assertAlmostEqual(
+            score,
+            _expected_task_score(
+                3,
+                issue_type=1.0,
+                priority=1.0,
+                assignment_group=0.0,
+                resolution_action=1.0,
+            ),
+        )
     def test_task_3_weights_apply_as_documented(self) -> None:
         ticket = _ticket(priority="high")
         self.assertEqual(
             breakdown,
+            _expected_breakdown(
+                3,
+                issue_type=1.0,
+                priority=0.5,
+                assignment_group=0.0,
+                resolution_action=1.0,
+            ),
+        )
+        self.assertAlmostEqual(
+            score,
+            _expected_task_score(
+                3,
+                issue_type=1.0,
+                priority=0.5,
+                assignment_group=0.0,
+                resolution_action=1.0,
+            ),
         )
     def test_alternate_route_can_win_when_primary_route_is_worse(self) -> None:
         ticket = HelpdeskTicketRecord(
         score, breakdown = grade_action(action, ticket, task_id=3)
         self.assertEqual(breakdown["resolution_action"], 0.35)
+        self.assertAlmostEqual(
+            score,
+            _expected_task_score(
+                3,
+                issue_type=1.0,
+                priority=1.0,
+                assignment_group=1.0,
+                resolution_action=0.35,
+            ),
+        )
     def test_resolution_action_unrelated_miss_stays_zero(self) -> None:
         ticket = _ticket()
         score, breakdown = grade_action(action, ticket, task_id=3)
         self.assertEqual(breakdown["resolution_action"], 0.0)
+        self.assertAlmostEqual(
+            score,
+            _expected_task_score(
+                3,
+                issue_type=1.0,
+                priority=1.0,
+                assignment_group=1.0,
+                resolution_action=0.0,
+            ),
+        )
     def test_assignment_group_scoring_matches_declared_similarity_table_exhaustively(self) -> None:
         for expected in ASSIGNMENT_GROUPS:
                     )
                     self.assertEqual(
                         breakdown,
+                        _expected_breakdown(
+                            3,
+                            issue_type=1.0,
+                            priority=1.0,
+                            assignment_group=assignment_group_score,
+                            resolution_action=1.0,
+                        ),
+                    )
+                    self.assertAlmostEqual(
+                        score,
+                        _expected_task_score(
+                            3,
+                            issue_type=1.0,
+                            priority=1.0,
+                            assignment_group=assignment_group_score,
+                            resolution_action=1.0,
+                        ),
                     )
     def test_resolution_action_scoring_matches_declared_similarity_table_exhaustively(self) -> None:
         for expected in RESOLUTION_ACTIONS:
                     )
                     self.assertEqual(
                         breakdown,
+                        _expected_breakdown(
+                            3,
+                            issue_type=1.0,
+                            priority=1.0,
+                            assignment_group=1.0,
+                            resolution_action=resolution_action_score,
+                        ),
+                    )
+                    self.assertAlmostEqual(
+                        score,
+                        _expected_task_score(
+                            3,
+                            issue_type=1.0,
+                            priority=1.0,
+                            assignment_group=1.0,
+                            resolution_action=resolution_action_score,
+                        ),
                     )
     def test_partial_credit_tables_never_override_exact_match(self) -> None:
         for pair, value in ISSUE_TYPE_SIMILARITY.items():

tests/test_policy_learning.py CHANGED Viewed

@@ -171,7 +171,7 @@ class PolicyLearningTests(unittest.TestCase):
         self.assertLess(no_summary["terminal_reward"], context_summary["terminal_reward"])
         self.assertLess(no_summary["normalized_return"], context_summary["normalized_return"])
-        self.assertEqual(context_summary["investigation_steps"], 1)
     def test_search_policies_selects_adaptive_policy(self) -> None:
         report = search_policies(

         self.assertLess(no_summary["terminal_reward"], context_summary["terminal_reward"])
         self.assertLess(no_summary["normalized_return"], context_summary["normalized_return"])
+        self.assertGreaterEqual(context_summary["investigation_steps"], 1)
     def test_search_policies_selects_adaptive_policy(self) -> None:
         report = search_policies(

tests/test_tasks_unit.py CHANGED Viewed

@@ -23,18 +23,17 @@ class TasksAndDatasetUnitTests(unittest.TestCase):
         self.assertEqual(tuple(TASKS.keys()), TASK_IDS)
     def test_task_allowed_fields_match_expected_ladder(self) -> None:
-        self.assertEqual(get_task_definition(1)["allowed_fields"], ["issue_type"])
-        self.assertEqual(
-            get_task_definition(2)["allowed_fields"], ["issue_type", "priority"]
-        )
         self.assertEqual(
             get_task_definition(3)["allowed_fields"],
-            [
-                "issue_type",
-                "priority",
-                "assignment_group",
-                "resolution_action",
-            ],
         )
     def test_task_difficulty_ladder_is_frozen(self) -> None:

         self.assertEqual(tuple(TASKS.keys()), TASK_IDS)
     def test_task_allowed_fields_match_expected_ladder(self) -> None:
+        expected_fields = [
+            "issue_type",
+            "priority",
+            "assignment_group",
+            "resolution_action",
+        ]
+        self.assertEqual(get_task_definition(1)["allowed_fields"], expected_fields)
+        self.assertEqual(get_task_definition(2)["allowed_fields"], expected_fields)
         self.assertEqual(
             get_task_definition(3)["allowed_fields"],
+            expected_fields,
         )
     def test_task_difficulty_ladder_is_frozen(self) -> None: