Spaces:

Roopalgn
/

AIHack-ITHelpDesk

Running

App Files Files Community

Roopalgn commited on Apr 7

Commit

8241eb5

1 Parent(s): d378e5d

Add queue-planning helpdesk routing mechanics

Browse files

Files changed (13) hide show

README.md +32 -13
inference.py +179 -11
models.py +54 -0
policy_learning.py +77 -5
server/environment.py +481 -8
server/grader.py +71 -15
server/reward.py +3 -4
server/tasks.py +381 -4
tests/test_api_integration.py +4 -4
tests/test_competitive_upgrade.py +36 -2
tests/test_extra_fields_penalty.py +7 -7
tests/test_grader_unit.py +42 -6
tests/test_tasks_unit.py +18 -3

README.md CHANGED Viewed

@@ -55,21 +55,22 @@ This domain is useful for OpenEnv because it is operationally realistic, easy to
 The project uses a queue-based episode model.
 - `reset()` samples a task and a queue of 3 to 5 tickets
-- `step()` grades one ticket submission at a time
 - `state()` exposes the internal episode snapshot
-- final reward is based on average ticket quality across the queue
 The environment classes and vocabulary are intentionally frozen to keep collaboration and judging simple.
 ## Lightweight Policy Improvement Loop
-The repo now includes a small local learning runner in `policy_learning.py`. It does not update model weights, but it does run repeated rollouts over many seeds, log full trajectories, and select the best policy configuration from a discrete candidate set using observed reward.
-That gives the project a real improvement loop for judge demos:
-- compare `no_investigation` against `investigate_when_context_hidden`
-- log per-step rewards, feedback summaries, and reward components to JSONL
-- search over small policy variants such as `legacy_single_probe`, `context_chain`, and `hybrid_context`
 - select the best policy on train seeds, then re-evaluate it on holdout seeds
 Example commands:
@@ -90,7 +91,7 @@ Artifacts are written to `analysis/policy_learning_runs/` by default:
 - `search_eval_episodes.jsonl`
 - `search_eval_trajectories.jsonl`
-The default submit policy inside this runner stays deterministic and local. It reuses the repo's heuristic routing logic, so the discrete policy search focuses on investigation behavior and reward-driven policy selection rather than on external LLM latency or API cost.
 ## Task Ladder
@@ -149,8 +150,11 @@ Visible ticket fields:
 - `requester`
 - `description`
 - optional `ambiguity_note`
 - optional `related_ticket_id`
 - optional `related_ticket_preview`
 Each observation also includes:
@@ -173,6 +177,8 @@ Each observation also includes:
 - `last_reward_components`
 - `rubric_reward` on terminal observations
 - `metadata.last_feedback_summary` for compact reward / penalty feedback
 - standard OpenEnv fields such as `done` and `reward`
 The internal `HelpdeskTicketState` tracks:
@@ -187,6 +193,10 @@ The internal `HelpdeskTicketState` tracks:
 - `total_reward`
 - `reward`
 - `done`
 ## Grading And Reward
@@ -202,14 +212,17 @@ Available tools:
 - `lookup_related_ticket`
 - `lookup_requester_history`
 - `lookup_internal_routing_note`
 Hard-task investigation behavior:
 - some ambiguous and non-default-routing tickets start with both redacted titles and redacted descriptions
 - linked-ticket previews and internal routing notes stay hidden until the matching tool is used
 - only useful investigation steps return a small positive shaping reward
 - blind or repeated probing does not pay by default
 - premature hard-task submission can incur a shaping penalty even when the visible text looks plausible
 - terminal `rubric_reward` remains the objective evaluation signal, while per-step `reward` is the denser training signal
 Per-field behavior:
@@ -218,6 +231,7 @@ Per-field behavior:
 - `priority`: exact match or proximity credit
 - `assignment_group`: exact match, with a small declared partial-credit map for nearby ownership mistakes
 - `resolution_action`: exact match, with a small declared partial-credit map for nearby next-step mistakes
 Task weights:
@@ -227,22 +241,23 @@ Task weights:
 | 2 | 60% | 40% | - | - |
 | 3 | 35% | 20% | 25% | 20% |
-Final episode reward:
 ```text
-average(per_ticket_scores)
 ```
-The result is clamped to `[0.0, 1.0]`.
 Step reward is lightly milestone-shaped: high per-ticket scores get a small bonus and very low scores get a small penalty before the final clamp.
-Final reward also includes a queue-economics penalty when the agent exceeds the free investigation budget. One investigation per queued ticket is free, but extra investigation steps reduce the final reward more noticeably than before.
 To make the environment more RL-friendly, each observation now also surfaces structured reward telemetry:
 - `last_reward_components` exposes ticket score, shaped step reward, milestone adjustment, trajectory reward when applicable, and any investigation penalty applied
 - `average_score_so_far` and `progress_fraction` expose trajectory progress without leaking future labels
 - `history` retains the same reward components plus a compact `feedback_summary` string for downstream agents
 ## Grounded Scoring
@@ -253,6 +268,7 @@ The grader is intentionally narrow and declared, not fully fuzzy.
 - `assignment_group` and `resolution_action` now expose only a small declared partial-credit map for nearby mistakes
 - `priority` only gets proximity credit from the declared table in `server/grader.py`
 - `issue_type` only gets partial credit for a small declared similarity map
 - wrong labels outside those explicit maps score `0.0`
 That scoring policy is now backed by checked-in unit tests in `tests/test_grader_unit.py` and `tests/test_tasks_unit.py`.
@@ -267,7 +283,7 @@ That grounding pass supported keeping the current similarity map small and expla
 ## Dataset Snapshot
-The labeled dataset in `data/dataset.json` currently contains 45 tickets spanning straightforward and ambiguous helpdesk scenarios.
 It includes:
@@ -280,6 +296,9 @@ It includes:
 - onboarding tickets
 - feature requests
 - follow-up cases linked through `related_ticket_id`
 ## Difficulty Coverage

 The project uses a queue-based episode model.
 - `reset()` samples a task and a queue of 3 to 5 tickets
+- `step()` lets the agent investigate or submit one ticket at a time
 - `state()` exposes the internal episode snapshot
+- hard-task episodes also track queue-level capacity, alternate acceptable routes, and planning penalties across tickets
+- final evaluation is based on the queue outcome, not on isolated per-ticket classification alone
 The environment classes and vocabulary are intentionally frozen to keep collaboration and judging simple.
 ## Lightweight Policy Improvement Loop
+The repo includes a local policy runner in `policy_learning.py`. It still does not update model weights, but it now does more than cosmetic search: it evaluates repeated seeded rollouts, learns cue-conditioned tool preferences for investigation, uses the same planning-aware deterministic submit logic as `inference.py`, and ranks policies by terminal rubric reward first, with lower planning penalty as the tie-breaker.
+That gives the project a meaningful improvement loop for judge demos:
+- compare `no_investigation`, `investigate_when_context_hidden`, and `adaptive_cue_bandit`
+- log per-step rewards, feedback summaries, planning penalties, and reward components to JSONL
+- learn when to use `lookup_queue_capacity_forecast` versus the other investigation tools
 - select the best policy on train seeds, then re-evaluate it on holdout seeds
 Example commands:
 - `search_eval_episodes.jsonl`
 - `search_eval_trajectories.jsonl`
+The default submit policy inside this runner stays deterministic and local. It reuses the repo's heuristic routing logic plus planning-aware routing overrides, so the search loop can study both investigation policy and queue-aware submission quality without depending on external LLM latency or API cost.
 ## Task Ladder
 - `requester`
 - `description`
 - optional `ambiguity_note`
+- optional `planning_note`
 - optional `related_ticket_id`
 - optional `related_ticket_preview`
+- optional `routing_options`
+- optional `capacity_state`
 Each observation also includes:
 - `last_reward_components`
 - `rubric_reward` on terminal observations
 - `metadata.last_feedback_summary` for compact reward / penalty feedback
+- `metadata.capacity_state` and `metadata.future_queue_demand` on hard-task episodes
+- `metadata.planning_penalty_total` and `metadata.planning_penalty_applied`
 - standard OpenEnv fields such as `done` and `reward`
 The internal `HelpdeskTicketState` tracks:
 - `total_reward`
 - `reward`
 - `done`
+- `team_capacity_remaining`
+- `high_priority_slots_remaining`
+- `escalation_slots_remaining`
+- `planning_penalty_total`
 ## Grading And Reward
 - `lookup_related_ticket`
 - `lookup_requester_history`
 - `lookup_internal_routing_note`
+- `lookup_queue_capacity_forecast`
 Hard-task investigation behavior:
 - some ambiguous and non-default-routing tickets start with both redacted titles and redacted descriptions
 - linked-ticket previews and internal routing notes stay hidden until the matching tool is used
+- capacity-sensitive tickets can expose queue pressure, future demand, and alternate routing options through `lookup_queue_capacity_forecast`
 - only useful investigation steps return a small positive shaping reward
 - blind or repeated probing does not pay by default
 - premature hard-task submission can incur a shaping penalty even when the visible text looks plausible
+- resource-greedy routing can add planning penalties later in the queue even when a single ticket looks correct in isolation
 - terminal `rubric_reward` remains the objective evaluation signal, while per-step `reward` is the denser training signal
 Per-field behavior:
 - `priority`: exact match or proximity credit
 - `assignment_group`: exact match, with a small declared partial-credit map for nearby ownership mistakes
 - `resolution_action`: exact match, with a small declared partial-credit map for nearby next-step mistakes
+- hard task only: some tickets also declare an alternate acceptable route with a reduced score multiplier, so the grader can reward capacity-aware fallback choices without collapsing into full fuzziness
 Task weights:
 | 2 | 60% | 40% | - | - |
 | 3 | 35% | 20% | 25% | 20% |
+Final episode rubric reward is queue-based:
 ```text
+clamp(average(per_ticket_scores) + trajectory bonuses - planning penalties - extra investigation penalties)
 ```
+Both `reward` and `rubric_reward` now use the closed interval `[0.0, 1.0]`.
 Step reward is lightly milestone-shaped: high per-ticket scores get a small bonus and very low scores get a small penalty before the final clamp.
+Final reward also includes a queue-economics penalty when the agent exceeds the free investigation budget. One investigation per queued ticket is free, but extra investigation steps reduce the final reward more noticeably than before. On hard-task queues, assignment-group capacity, high-priority slots, and escalation slots also create cross-ticket trade-offs.
 To make the environment more RL-friendly, each observation now also surfaces structured reward telemetry:
 - `last_reward_components` exposes ticket score, shaped step reward, milestone adjustment, trajectory reward when applicable, and any investigation penalty applied
 - `average_score_so_far` and `progress_fraction` expose trajectory progress without leaking future labels
+- hard-task telemetry includes planning penalties, capacity usage, and the post-action capacity snapshot
 - `history` retains the same reward components plus a compact `feedback_summary` string for downstream agents
 ## Grounded Scoring
 - `assignment_group` and `resolution_action` now expose only a small declared partial-credit map for nearby mistakes
 - `priority` only gets proximity credit from the declared table in `server/grader.py`
 - `issue_type` only gets partial credit for a small declared similarity map
+- hard-task alternate routes must be explicitly declared in the dataset and carry an explicit score multiplier
 - wrong labels outside those explicit maps score `0.0`
 That scoring policy is now backed by checked-in unit tests in `tests/test_grader_unit.py` and `tests/test_tasks_unit.py`.
 ## Dataset Snapshot
+The effective labeled dataset now contains 70 tickets spanning straightforward, ambiguous, and planning-sensitive helpdesk scenarios.
 It includes:
 - onboarding tickets
 - feature requests
 - follow-up cases linked through `related_ticket_id`
+- 16 tickets with explicit ambiguity notes
+- 7 linked follow-up cases
+- 22 tickets with declared alternate routes for queue-level planning
 ## Difficulty Coverage

inference.py CHANGED Viewed

@@ -195,6 +195,7 @@ def format_recent_history_entries(
 def build_llm_user_message(ticket: dict, allowed_fields: list[str], instructions: str) -> str:
     ambiguity_note = ticket.get("ambiguity_note")
     related_preview = ticket.get("related_ticket_preview") or {}
     last_tool_result = ticket.get("last_tool_result")
     context_status = ticket.get("context_status") or {}
@@ -204,9 +205,14 @@ def build_llm_user_message(ticket: dict, allowed_fields: list[str], instructions
     investigation_budget_remaining = ticket.get("investigation_budget_remaining")
     average_score_so_far = ticket.get("average_score_so_far")
     progress_fraction = ticket.get("progress_fraction")
     extra_context_lines: list[str] = []
     if ambiguity_note:
         extra_context_lines.append(f"Ambiguity note: {ambiguity_note}")
     if related_preview:
         extra_context_lines.extend(
             [
@@ -224,6 +230,18 @@ def build_llm_user_message(ticket: dict, allowed_fields: list[str], instructions
         extra_context_lines.append(
             "Context status: " + json.dumps(context_status, sort_keys=True)
         )
     if feedback_summary:
         extra_context_lines.append(f"Latest environment feedback: {feedback_summary}")
     if last_reward_components:
@@ -293,7 +311,7 @@ def _format_bool(value: bool) -> str:
 def clamp_reported_score(score: float) -> float:
-    return max(0.01, min(0.99, score))
 def _format_action_for_log(action: HelpdeskTicketAction) -> str:
@@ -553,14 +571,19 @@ TIME_SENSITIVE_PRIORITY_KEYWORDS = (
 def build_routing_text(ticket: dict) -> str:
     related_preview = ticket.get("related_ticket_preview") or {}
     last_tool_result = ticket.get("last_tool_result") or {}
     return " ".join(
         [
             ticket.get("title", ""),
             ticket.get("description", ""),
             ticket.get("ambiguity_note", ""),
             related_preview.get("title", ""),
             related_preview.get("description", ""),
             json.dumps(last_tool_result, sort_keys=True),
         ]
     ).lower()
@@ -630,6 +653,90 @@ def heuristic_action(
     return result
 def apply_domain_overrides(
     ticket: dict, candidate: dict[str, Any], allowed_fields: list[str]
 ) -> tuple[dict[str, Any], list[str]]:
@@ -697,9 +804,27 @@ def build_action(
     ticket: dict, allowed_fields: list[str], instructions: str
 ) -> tuple[HelpdeskTicketAction, str, str | None]:
     heuristic_dict = heuristic_action(ticket, allowed_fields)
     if llm_client is None:
-        return HelpdeskTicketAction(**heuristic_dict), "heuristic", None
     try:
         llm_dict = call_llm(ticket, allowed_fields, instructions)
@@ -731,9 +856,19 @@ def build_action(
             candidate,
             allowed_fields,
         )
         backfilled_fields = [field for field in allowed_fields if field not in accepted_fields]
-        if backfilled_fields or rejected_fields or override_reasons:
             reason_parts = []
             if backfilled_fields:
                 reason_parts.append(f"heuristic_backfill={backfilled_fields}")
@@ -741,6 +876,8 @@ def build_action(
                 reason_parts.append(f"invalid_llm_fields={rejected_fields}")
             if override_reasons:
                 reason_parts.append(f"domain_overrides={override_reasons}")
             return (
                 HelpdeskTicketAction(**candidate),
                 "llm_backfilled",
@@ -752,7 +889,23 @@ def build_action(
         return (
             HelpdeskTicketAction(**heuristic_dict),
             "heuristic_fallback",
-            str(exc),
         )
@@ -857,6 +1010,7 @@ def should_investigate(ticket: dict, history: list[dict[str, Any]]) -> tuple[boo
     if hidden_context_remaining:
         preferred_tools.extend(
             [
                 "lookup_related_ticket",
                 "lookup_internal_routing_note",
                 "lookup_requester_history",
@@ -892,6 +1046,14 @@ def merge_ticket_context(ticket: dict, observation: Any) -> dict:
     observation_metadata = getattr(observation, "metadata", {}) or {}
     if observation_metadata.get("last_feedback_summary"):
         merged_ticket["feedback_summary"] = observation_metadata["last_feedback_summary"]
     return merged_ticket
@@ -933,12 +1095,10 @@ def run() -> None:
                 if ticket is None:
                     break
-                investigate, tool_name = should_investigate(ticket, obs.history)
-                if (
-                    investigate
-                    and tool_name is not None
-                    and getattr(obs, "investigation_budget_remaining", 0) > 0
-                ):
                     tool_action = HelpdeskTicketAction(
                         action_type="investigate",
                         tool_name=tool_name,
@@ -947,10 +1107,13 @@ def run() -> None:
                     result = sync_client.step(tool_action)
                     obs = result.observation
                     step_num += 1
                     log_step(
                         step=step_num,
                         action=tool_action,
-                        reward=float(result.reward or 0.0),
                         done=bool(result.done),
                         error=None,
                     )
@@ -959,6 +1122,11 @@ def run() -> None:
                     ticket = obs.current_ticket
                     if ticket is None:
                         break
                 ticket_with_context = merge_ticket_context(ticket, obs)
                 action, action_source, fallback_reason = build_action(

 def build_llm_user_message(ticket: dict, allowed_fields: list[str], instructions: str) -> str:
     ambiguity_note = ticket.get("ambiguity_note")
+    planning_note = ticket.get("planning_note")
     related_preview = ticket.get("related_ticket_preview") or {}
     last_tool_result = ticket.get("last_tool_result")
     context_status = ticket.get("context_status") or {}
     investigation_budget_remaining = ticket.get("investigation_budget_remaining")
     average_score_so_far = ticket.get("average_score_so_far")
     progress_fraction = ticket.get("progress_fraction")
+    capacity_state = ticket.get("capacity_state")
+    future_queue_demand = ticket.get("future_queue_demand")
+    routing_options = ticket.get("routing_options") or []
     extra_context_lines: list[str] = []
     if ambiguity_note:
         extra_context_lines.append(f"Ambiguity note: {ambiguity_note}")
+    if planning_note:
+        extra_context_lines.append(f"Planning note: {planning_note}")
     if related_preview:
         extra_context_lines.extend(
             [
         extra_context_lines.append(
             "Context status: " + json.dumps(context_status, sort_keys=True)
         )
+    if capacity_state:
+        extra_context_lines.append(
+            "Queue capacity state: " + json.dumps(capacity_state, sort_keys=True)
+        )
+    if future_queue_demand:
+        extra_context_lines.append(
+            "Future queue demand: " + json.dumps(future_queue_demand, sort_keys=True)
+        )
+    if routing_options:
+        extra_context_lines.append(
+            "Routing options: " + json.dumps(routing_options, sort_keys=True)
+        )
     if feedback_summary:
         extra_context_lines.append(f"Latest environment feedback: {feedback_summary}")
     if last_reward_components:
 def clamp_reported_score(score: float) -> float:
+    return max(0.0, min(1.0, score))
 def _format_action_for_log(action: HelpdeskTicketAction) -> str:
 def build_routing_text(ticket: dict) -> str:
     related_preview = ticket.get("related_ticket_preview") or {}
     last_tool_result = ticket.get("last_tool_result") or {}
+    routing_options = ticket.get("routing_options") or []
     return " ".join(
         [
             ticket.get("title", ""),
             ticket.get("description", ""),
             ticket.get("ambiguity_note", ""),
+            ticket.get("planning_note", ""),
             related_preview.get("title", ""),
             related_preview.get("description", ""),
             json.dumps(last_tool_result, sort_keys=True),
+            json.dumps(routing_options, sort_keys=True),
+            json.dumps(ticket.get("capacity_state") or {}, sort_keys=True),
+            json.dumps(ticket.get("future_queue_demand") or {}, sort_keys=True),
         ]
     ).lower()
     return result
+def _get_routing_options(ticket: dict[str, Any]) -> list[dict[str, Any]]:
+    options = ticket.get("routing_options") or []
+    return [option for option in options if isinstance(option, dict)]
+def _get_routing_option_by_label(
+    ticket: dict[str, Any],
+    label: str | None,
+) -> dict[str, Any] | None:
+    if label is None:
+        return None
+    for option in _get_routing_options(ticket):
+        if option.get("label") == label:
+            return option
+    return None
+def _route_option_fields_match(
+    option: dict[str, Any],
+    candidate: dict[str, Any],
+    allowed_fields: list[str],
+) -> bool:
+    for field in ("issue_type", "priority", "assignment_group", "resolution_action"):
+        if field not in allowed_fields:
+            continue
+        option_value = option.get(field)
+        candidate_value = candidate.get(field)
+        if option_value is None or candidate_value is None:
+            continue
+        if str(option_value) != str(candidate_value):
+            return False
+    return True
+def _preferred_routing_label(ticket: dict[str, Any]) -> str | None:
+    last_tool_result = ticket.get("last_tool_result") or {}
+    tool_name = str(last_tool_result.get("tool_name", "") or "")
+    preferred_label = str(last_tool_result.get("preferred_route_label", "") or "")
+    if tool_name == "lookup_queue_capacity_forecast" and preferred_label in {
+        "primary",
+        "alternate",
+    }:
+        return preferred_label
+    return None
+def apply_capacity_planning_overrides(
+    ticket: dict[str, Any],
+    candidate: dict[str, Any],
+    allowed_fields: list[str],
+) -> tuple[dict[str, Any], list[str]]:
+    updated = dict(candidate)
+    reasons: list[str] = []
+    preferred_label = _preferred_routing_label(ticket)
+    preferred_option = _get_routing_option_by_label(ticket, preferred_label)
+    if preferred_option is None:
+        return updated, reasons
+    current_matching_label = None
+    for option in _get_routing_options(ticket):
+        if _route_option_fields_match(option, updated, allowed_fields):
+            current_matching_label = option.get("label")
+            break
+    if current_matching_label == preferred_label:
+        return updated, reasons
+    for field in ("issue_type", "priority", "assignment_group", "resolution_action"):
+        if field not in allowed_fields:
+            continue
+        option_value = preferred_option.get(field)
+        if option_value is None:
+            continue
+        updated[field] = option_value
+    last_tool_result = ticket.get("last_tool_result") or {}
+    reasons.append(
+        "planning_override="
+        f"{preferred_label}(primary_pressure={last_tool_result.get('primary_pressure')},"
+        f"alternate_pressure={last_tool_result.get('alternate_pressure')})"
+    )
+    return updated, reasons
 def apply_domain_overrides(
     ticket: dict, candidate: dict[str, Any], allowed_fields: list[str]
 ) -> tuple[dict[str, Any], list[str]]:
     ticket: dict, allowed_fields: list[str], instructions: str
 ) -> tuple[HelpdeskTicketAction, str, str | None]:
     heuristic_dict = heuristic_action(ticket, allowed_fields)
+    heuristic_dict, heuristic_override_reasons = apply_domain_overrides(
+        ticket,
+        heuristic_dict,
+        allowed_fields,
+    )
+    heuristic_dict, heuristic_planning_reasons = apply_capacity_planning_overrides(
+        ticket,
+        heuristic_dict,
+        allowed_fields,
+    )
     if llm_client is None:
+        fallback_reason = None
+        reason_parts = []
+        if heuristic_override_reasons:
+            reason_parts.append(f"domain_overrides={heuristic_override_reasons}")
+        if heuristic_planning_reasons:
+            reason_parts.append(f"planning_overrides={heuristic_planning_reasons}")
+        if reason_parts:
+            fallback_reason = "; ".join(reason_parts)
+        return HelpdeskTicketAction(**heuristic_dict), "heuristic", fallback_reason
     try:
         llm_dict = call_llm(ticket, allowed_fields, instructions)
             candidate,
             allowed_fields,
         )
+        candidate, planning_override_reasons = apply_capacity_planning_overrides(
+            ticket,
+            candidate,
+            allowed_fields,
+        )
         backfilled_fields = [field for field in allowed_fields if field not in accepted_fields]
+        if (
+            backfilled_fields
+            or rejected_fields
+            or override_reasons
+            or planning_override_reasons
+        ):
             reason_parts = []
             if backfilled_fields:
                 reason_parts.append(f"heuristic_backfill={backfilled_fields}")
                 reason_parts.append(f"invalid_llm_fields={rejected_fields}")
             if override_reasons:
                 reason_parts.append(f"domain_overrides={override_reasons}")
+            if planning_override_reasons:
+                reason_parts.append(f"planning_overrides={planning_override_reasons}")
             return (
                 HelpdeskTicketAction(**candidate),
                 "llm_backfilled",
         return (
             HelpdeskTicketAction(**heuristic_dict),
             "heuristic_fallback",
+            "; ".join(
+                part
+                for part in (
+                    str(exc),
+                    (
+                        f"domain_overrides={heuristic_override_reasons}"
+                        if heuristic_override_reasons
+                        else None
+                    ),
+                    (
+                        f"planning_overrides={heuristic_planning_reasons}"
+                        if heuristic_planning_reasons
+                        else None
+                    ),
+                )
+                if part
+            ),
         )
     if hidden_context_remaining:
         preferred_tools.extend(
             [
+                "lookup_queue_capacity_forecast",
                 "lookup_related_ticket",
                 "lookup_internal_routing_note",
                 "lookup_requester_history",
     observation_metadata = getattr(observation, "metadata", {}) or {}
     if observation_metadata.get("last_feedback_summary"):
         merged_ticket["feedback_summary"] = observation_metadata["last_feedback_summary"]
+    if observation_metadata.get("capacity_state") is not None:
+        merged_ticket["capacity_state"] = observation_metadata["capacity_state"]
+    if observation_metadata.get("future_queue_demand") is not None:
+        merged_ticket["future_queue_demand"] = observation_metadata["future_queue_demand"]
+    if observation_metadata.get("planning_penalty_total") is not None:
+        merged_ticket["planning_penalty_total"] = observation_metadata["planning_penalty_total"]
+    if observation_metadata.get("planning_penalty_applied") is not None:
+        merged_ticket["planning_penalty_applied"] = observation_metadata["planning_penalty_applied"]
     return merged_ticket
                 if ticket is None:
                     break
+                while getattr(obs, "investigation_budget_remaining", 0) > 0:
+                    investigate, tool_name = should_investigate(ticket, obs.history)
+                    if not investigate or tool_name is None:
+                        break
                     tool_action = HelpdeskTicketAction(
                         action_type="investigate",
                         tool_name=tool_name,
                     result = sync_client.step(tool_action)
                     obs = result.observation
                     step_num += 1
+                    reward = float(result.reward or 0.0)
+                    if result.reward is not None:
+                        task_step_rewards.append(reward)
                     log_step(
                         step=step_num,
                         action=tool_action,
+                        reward=reward,
                         done=bool(result.done),
                         error=None,
                     )
                     ticket = obs.current_ticket
                     if ticket is None:
                         break
+                if result.done:
+                    break
+                ticket = obs.current_ticket
+                if ticket is None:
+                    break
                 ticket_with_context = merge_ticket_context(ticket, obs)
                 action, action_source, fallback_reason = build_action(

models.py CHANGED Viewed

@@ -19,6 +19,7 @@ RESOLUTION_ACTION_SET = set(RESOLUTION_ACTIONS)
 ACTION_TYPE_SET = {"submit", "investigate"}
 TOOL_NAME_SET = {"lookup_related_ticket", "lookup_requester_history"}
 TOOL_NAME_SET.add("lookup_internal_routing_note")
 def _validate_choice(value: str, allowed: set[str], field_name: str) -> str:
@@ -47,6 +48,12 @@ class HelpdeskTicketRecord(BaseModel):
     resolution_action: str
     ambiguity_note: Optional[str] = None
     related_ticket_id: Optional[str] = None
     @field_validator("issue_type")
     @classmethod
@@ -68,6 +75,44 @@ class HelpdeskTicketRecord(BaseModel):
     def validate_resolution_action(cls, value: str) -> str:
         return _validate_choice(value, RESOLUTION_ACTION_SET, "resolution_action")
 class HelpdeskTicketAction(Action):
     action_type: str = "submit"
@@ -146,7 +191,16 @@ class HelpdeskTicketState(State):
     investigation_steps: int = 0
     investigation_budget_remaining: int = 0
     investigation_penalty_applied: float = 0.0
     last_tool_result: Optional[dict[str, Any]] = None
     last_reward_components: dict[str, Any] = Field(default_factory=dict)
     ticket_tool_usage: dict[str, list[str]] = Field(default_factory=dict)
     history_entries: list[dict] = Field(default_factory=list)

 ACTION_TYPE_SET = {"submit", "investigate"}
 TOOL_NAME_SET = {"lookup_related_ticket", "lookup_requester_history"}
 TOOL_NAME_SET.add("lookup_internal_routing_note")
+TOOL_NAME_SET.add("lookup_queue_capacity_forecast")
 def _validate_choice(value: str, allowed: set[str], field_name: str) -> str:
     resolution_action: str
     ambiguity_note: Optional[str] = None
     related_ticket_id: Optional[str] = None
+    planning_note: Optional[str] = None
+    alternate_issue_type: Optional[str] = None
+    alternate_priority: Optional[str] = None
+    alternate_assignment_group: Optional[str] = None
+    alternate_resolution_action: Optional[str] = None
+    alternate_route_score_multiplier: float = 0.0
     @field_validator("issue_type")
     @classmethod
     def validate_resolution_action(cls, value: str) -> str:
         return _validate_choice(value, RESOLUTION_ACTION_SET, "resolution_action")
+    @field_validator("alternate_issue_type")
+    @classmethod
+    def validate_alternate_issue_type(cls, value: Optional[str]) -> Optional[str]:
+        return _validate_optional_choice(value, ISSUE_TYPE_SET, "alternate_issue_type")
+    @field_validator("alternate_priority")
+    @classmethod
+    def validate_alternate_priority(cls, value: Optional[str]) -> Optional[str]:
+        return _validate_optional_choice(value, PRIORITY_SET, "alternate_priority")
+    @field_validator("alternate_assignment_group")
+    @classmethod
+    def validate_alternate_assignment_group(cls, value: Optional[str]) -> Optional[str]:
+        return _validate_optional_choice(
+            value,
+            ASSIGNMENT_GROUP_SET,
+            "alternate_assignment_group",
+        )
+    @field_validator("alternate_resolution_action")
+    @classmethod
+    def validate_alternate_resolution_action(
+        cls,
+        value: Optional[str],
+    ) -> Optional[str]:
+        return _validate_optional_choice(
+            value,
+            RESOLUTION_ACTION_SET,
+            "alternate_resolution_action",
+        )
+    @field_validator("alternate_route_score_multiplier")
+    @classmethod
+    def validate_alternate_route_score_multiplier(cls, value: float) -> float:
+        if not 0.0 <= value <= 1.0:
+            raise ValueError("alternate_route_score_multiplier must be in [0.0, 1.0]")
+        return value
 class HelpdeskTicketAction(Action):
     action_type: str = "submit"
     investigation_steps: int = 0
     investigation_budget_remaining: int = 0
     investigation_penalty_applied: float = 0.0
+    planning_penalty_applied: float = 0.0
     last_tool_result: Optional[dict[str, Any]] = None
     last_reward_components: dict[str, Any] = Field(default_factory=dict)
     ticket_tool_usage: dict[str, list[str]] = Field(default_factory=dict)
+    team_capacity_initial: dict[str, int] = Field(default_factory=dict)
+    team_capacity_remaining: dict[str, int] = Field(default_factory=dict)
+    high_priority_slots_initial: int = 0
+    high_priority_slots_remaining: int = 0
+    escalation_slots_initial: int = 0
+    escalation_slots_remaining: int = 0
+    planning_penalty_total: float = 0.0
+    capacity_pressure_tickets_resolved: int = 0
     history_entries: list[dict] = Field(default_factory=list)

policy_learning.py CHANGED Viewed

@@ -88,6 +88,7 @@ AVAILABLE_TOOLS = (
     "lookup_related_ticket",
     "lookup_requester_history",
     "lookup_internal_routing_note",
 )
@@ -229,6 +230,11 @@ def default_submit_builder(
     inference = importlib.import_module("inference")
     candidate = inference.heuristic_action(ticket, allowed_fields)
     candidate, _ = inference.apply_domain_overrides(ticket, candidate, allowed_fields)
     return HelpdeskTicketAction(**candidate)
@@ -237,7 +243,11 @@ def _routing_text(ticket: dict[str, Any]) -> str:
         str(ticket.get("title", "")),
         str(ticket.get("description", "")),
         str(ticket.get("ambiguity_note", "")),
         json.dumps(ticket.get("last_tool_result") or {}, sort_keys=True),
     ]
     related_preview = ticket.get("related_ticket_preview") or {}
     parts.extend(
@@ -251,6 +261,24 @@ def _routing_text(ticket: dict[str, Any]) -> str:
 def infer_ticket_cue(ticket: dict[str, Any]) -> str:
     text = _routing_text(ticket)
     if any(
         phrase in text
         for phrase in ("re:", "follow-up", "following up", "regression", "reference ticket", "third update")
@@ -297,14 +325,20 @@ def preferred_tool_order(
     hidden_context_remaining: bool,
 ) -> list[str]:
     text = _routing_text(ticket)
     last_tool_result = ticket.get("last_tool_result") or {}
     last_tool_name = str(last_tool_result.get("tool_name", "") or "")
     preferred_tools: list[str] = []
     if last_tool_name == "lookup_related_ticket":
         preferred_tools.append("lookup_requester_history")
     if last_tool_name == "lookup_requester_history":
         preferred_tools.append("lookup_internal_routing_note")
     if any(
         phrase in text
@@ -336,9 +370,15 @@ def preferred_tool_order(
     ):
         preferred_tools.append("lookup_requester_history")
     if hidden_context_remaining:
         preferred_tools.extend(
             [
                 "lookup_internal_routing_note",
                 "lookup_related_ticket",
                 "lookup_requester_history",
@@ -545,6 +585,8 @@ def rollout_episode(
         "terminal_reward": terminal_reward,
         "terminal_rubric_reward": terminal_rubric_reward,
         "average_ticket_score": env.state.average_score_so_far,
         "per_ticket_scores": list(env.state.per_ticket_scores),
     }
     if adaptive_bandit is not None and policy.strategy == "adaptive":
@@ -583,6 +625,15 @@ def summarize_policy_episodes(
             "avg_terminal_rubric_reward": _safe_mean(
                 [float(episode["terminal_rubric_reward"]) for episode in task_episodes]
             ),
             "avg_investigation_steps": _safe_mean(
                 [float(episode["investigation_steps"]) for episode in task_episodes]
             ),
@@ -604,6 +655,15 @@ def summarize_policy_episodes(
         "avg_terminal_rubric_reward": _safe_mean(
             [float(episode["terminal_rubric_reward"]) for episode in episode_summaries]
         ),
         "avg_investigation_steps": _safe_mean(
             [float(episode["investigation_steps"]) for episode in episode_summaries]
         ),
@@ -653,11 +713,12 @@ def evaluate_policy(
     return result
-def _selection_tuple(summary: dict[str, Any]) -> tuple[float, float, float, float]:
     return (
-        float(summary["avg_normalized_return"]),
-        float(summary["avg_terminal_reward"]),
         float(summary["avg_terminal_rubric_reward"]),
         -float(summary["avg_investigation_steps"]),
     )
@@ -713,7 +774,7 @@ def compare_policies(
         "mode": "compare",
         "task_ids": task_ids,
         "seeds": seeds,
-        "selection_metric": "avg_normalized_return",
         "baseline_policy": baseline_run["policy"],
         "best_policy": best_run["policy"],
         "improvement_vs_baseline": {
@@ -731,6 +792,11 @@ def compare_policies(
                 baseline_run["summary"],
                 "avg_terminal_rubric_reward",
             ),
         },
         "policy_summaries": [run["summary"] for run in policy_runs],
         "ranking": [
@@ -825,7 +891,7 @@ def search_policies(
         "task_ids": task_ids,
         "train_seeds": train_seeds,
         "eval_seeds": eval_seeds,
-        "selection_metric": "avg_normalized_return",
         "candidate_policies": [policy.name for policy in candidate_policies],
         "selected_policy": selected_policy.name,
         "baseline_policy": baseline_policy.name,
@@ -856,6 +922,11 @@ def search_policies(
                 eval_baseline["summary"],
                 "avg_terminal_rubric_reward",
             ),
         },
         "artifacts": {
             "summary": str(output_dir / "search_summary.json"),
@@ -975,6 +1046,7 @@ def _print_summary(label: str, summary: dict[str, Any]) -> None:
                     "avg_normalized_return": summary["avg_normalized_return"],
                     "avg_terminal_reward": summary["avg_terminal_reward"],
                     "avg_terminal_rubric_reward": summary["avg_terminal_rubric_reward"],
                     "avg_investigation_steps": summary["avg_investigation_steps"],
                 }
             },

     "lookup_related_ticket",
     "lookup_requester_history",
     "lookup_internal_routing_note",
+    "lookup_queue_capacity_forecast",
 )
     inference = importlib.import_module("inference")
     candidate = inference.heuristic_action(ticket, allowed_fields)
     candidate, _ = inference.apply_domain_overrides(ticket, candidate, allowed_fields)
+    candidate, _ = inference.apply_capacity_planning_overrides(
+        ticket,
+        candidate,
+        allowed_fields,
+    )
     return HelpdeskTicketAction(**candidate)
         str(ticket.get("title", "")),
         str(ticket.get("description", "")),
         str(ticket.get("ambiguity_note", "")),
+        str(ticket.get("planning_note", "")),
         json.dumps(ticket.get("last_tool_result") or {}, sort_keys=True),
+        json.dumps(ticket.get("routing_options") or [], sort_keys=True),
+        json.dumps(ticket.get("capacity_state") or {}, sort_keys=True),
+        json.dumps(ticket.get("future_queue_demand") or {}, sort_keys=True),
     ]
     related_preview = ticket.get("related_ticket_preview") or {}
     parts.extend(
 def infer_ticket_cue(ticket: dict[str, Any]) -> str:
     text = _routing_text(ticket)
+    context_status = ticket.get("context_status") or {}
+    if (
+        ticket.get("planning_note")
+        or ticket.get("routing_options")
+        or "lookup_queue_capacity_forecast"
+        in (context_status.get("recommended_tools") or [])
+        or any(
+            phrase in text
+            for phrase in (
+                "capacity",
+                "saturated",
+                "backlog",
+                "resource pressure",
+                "alternate route",
+            )
+        )
+    ):
+        return "capacity_planning"
     if any(
         phrase in text
         for phrase in ("re:", "follow-up", "following up", "regression", "reference ticket", "third update")
     hidden_context_remaining: bool,
 ) -> list[str]:
     text = _routing_text(ticket)
+    context_status = ticket.get("context_status") or {}
     last_tool_result = ticket.get("last_tool_result") or {}
     last_tool_name = str(last_tool_result.get("tool_name", "") or "")
+    recommended_tools = list(context_status.get("recommended_tools") or [])
     preferred_tools: list[str] = []
+    if "lookup_queue_capacity_forecast" in recommended_tools:
+        preferred_tools.append("lookup_queue_capacity_forecast")
     if last_tool_name == "lookup_related_ticket":
         preferred_tools.append("lookup_requester_history")
     if last_tool_name == "lookup_requester_history":
         preferred_tools.append("lookup_internal_routing_note")
+    if last_tool_name == "lookup_internal_routing_note":
+        preferred_tools.append("lookup_queue_capacity_forecast")
     if any(
         phrase in text
     ):
         preferred_tools.append("lookup_requester_history")
+    if infer_ticket_cue(ticket) == "capacity_planning":
+        preferred_tools.append("lookup_queue_capacity_forecast")
+    preferred_tools.extend(recommended_tools)
     if hidden_context_remaining:
         preferred_tools.extend(
             [
+                "lookup_queue_capacity_forecast",
                 "lookup_internal_routing_note",
                 "lookup_related_ticket",
                 "lookup_requester_history",
         "terminal_reward": terminal_reward,
         "terminal_rubric_reward": terminal_rubric_reward,
         "average_ticket_score": env.state.average_score_so_far,
+        "planning_penalty_total": env.state.planning_penalty_total,
+        "capacity_pressure_tickets_resolved": env.state.capacity_pressure_tickets_resolved,
         "per_ticket_scores": list(env.state.per_ticket_scores),
     }
     if adaptive_bandit is not None and policy.strategy == "adaptive":
             "avg_terminal_rubric_reward": _safe_mean(
                 [float(episode["terminal_rubric_reward"]) for episode in task_episodes]
             ),
+            "avg_planning_penalty_total": _safe_mean(
+                [float(episode["planning_penalty_total"]) for episode in task_episodes]
+            ),
+            "avg_capacity_pressure_tickets_resolved": _safe_mean(
+                [
+                    float(episode["capacity_pressure_tickets_resolved"])
+                    for episode in task_episodes
+                ]
+            ),
             "avg_investigation_steps": _safe_mean(
                 [float(episode["investigation_steps"]) for episode in task_episodes]
             ),
         "avg_terminal_rubric_reward": _safe_mean(
             [float(episode["terminal_rubric_reward"]) for episode in episode_summaries]
         ),
+        "avg_planning_penalty_total": _safe_mean(
+            [float(episode["planning_penalty_total"]) for episode in episode_summaries]
+        ),
+        "avg_capacity_pressure_tickets_resolved": _safe_mean(
+            [
+                float(episode["capacity_pressure_tickets_resolved"])
+                for episode in episode_summaries
+            ]
+        ),
         "avg_investigation_steps": _safe_mean(
             [float(episode["investigation_steps"]) for episode in episode_summaries]
         ),
     return result
+def _selection_tuple(summary: dict[str, Any]) -> tuple[float, float, float, float, float]:
     return (
         float(summary["avg_terminal_rubric_reward"]),
+        -float(summary["avg_planning_penalty_total"]),
+        float(summary["avg_episode_return"]),
+        float(summary["avg_normalized_return"]),
         -float(summary["avg_investigation_steps"]),
     )
         "mode": "compare",
         "task_ids": task_ids,
         "seeds": seeds,
+        "selection_metric": "avg_terminal_rubric_reward_then_lower_planning_penalty",
         "baseline_policy": baseline_run["policy"],
         "best_policy": best_run["policy"],
         "improvement_vs_baseline": {
                 baseline_run["summary"],
                 "avg_terminal_rubric_reward",
             ),
+            "avg_planning_penalty_total": _delta(
+                best_run["summary"],
+                baseline_run["summary"],
+                "avg_planning_penalty_total",
+            ),
         },
         "policy_summaries": [run["summary"] for run in policy_runs],
         "ranking": [
         "task_ids": task_ids,
         "train_seeds": train_seeds,
         "eval_seeds": eval_seeds,
+        "selection_metric": "avg_terminal_rubric_reward_then_lower_planning_penalty",
         "candidate_policies": [policy.name for policy in candidate_policies],
         "selected_policy": selected_policy.name,
         "baseline_policy": baseline_policy.name,
                 eval_baseline["summary"],
                 "avg_terminal_rubric_reward",
             ),
+            "avg_planning_penalty_total": _delta(
+                eval_selected["summary"],
+                eval_baseline["summary"],
+                "avg_planning_penalty_total",
+            ),
         },
         "artifacts": {
             "summary": str(output_dir / "search_summary.json"),
                     "avg_normalized_return": summary["avg_normalized_return"],
                     "avg_terminal_reward": summary["avg_terminal_reward"],
                     "avg_terminal_rubric_reward": summary["avg_terminal_rubric_reward"],
+                    "avg_planning_penalty_total": summary["avg_planning_penalty_total"],
                     "avg_investigation_steps": summary["avg_investigation_steps"],
                 }
             },

server/environment.py CHANGED Viewed

@@ -31,6 +31,7 @@ AVAILABLE_TOOLS = (
     "lookup_related_ticket",
     "lookup_requester_history",
     "lookup_internal_routing_note",
 )
 FREE_INVESTIGATIONS_PER_TICKET = 1
 EXTRA_INVESTIGATION_COST = 0.04
@@ -44,6 +45,10 @@ PRIORITY_UNDERSHOOT_PENALTY = 0.03
 SEVERE_PRIORITY_UNDERSHOOT_PENALTY = 0.07
 DANGEROUS_RESOLUTION_PENALTY = 0.05
 NONDEFAULT_ROUTING_FOLLOWTHROUGH_BONUS = 0.02
 TASK3_INVESTIGATION_TOOL_PLAN: dict[str, tuple[str, ...]] = {
     "ticket-021": ("lookup_related_ticket", "lookup_requester_history"),
@@ -161,6 +166,11 @@ class HelpdeskTicketRoutingEnvironment(
         else:
             queue_size = min(queue_size_value, len(self._dataset))
         self._queue = self._rng.sample(self._dataset, min(queue_size, len(self._dataset)))
         self._state = HelpdeskTicketState(
             episode_id=episode_id or str(uuid.uuid4()),
@@ -174,8 +184,17 @@ class HelpdeskTicketRoutingEnvironment(
             average_score_so_far=0.0,
             investigation_budget_remaining=queue_size * FREE_INVESTIGATIONS_PER_TICKET,
             investigation_penalty_applied=0.0,
             last_reward_components={},
             ticket_tool_usage={},
         )
         return self._build_observation(task)
@@ -298,6 +317,10 @@ class HelpdeskTicketRoutingEnvironment(
             action,
             task_id=task_id,
         )
         step_adjustments = compute_step_adjustments(
             score,
             previous_average=previous_average,
@@ -321,11 +344,17 @@ class HelpdeskTicketRoutingEnvironment(
                 self._state.per_ticket_scores,
                 len(self._queue),
                 self._state.step_count,
-                completion_bonus=self._trajectory_consistency_bonus(),
             )
             trajectory_reward = trajectory_components["final_reward"]
-            rubric_reward = self._apply_episode_economics(trajectory_reward)
-            final_reward = clamp_open_unit_interval(rubric_reward - context_penalty)
             self._state.total_reward = rubric_reward
             investigation_penalty = self._compute_episode_penalty()
         else:
@@ -333,7 +362,9 @@ class HelpdeskTicketRoutingEnvironment(
             self._state.average_score_so_far = self._current_average_score()
             self._state.step_count += 1
             self._state.current_ticket_index += 1
-            final_reward = clamp_open_unit_interval(step_reward - context_penalty)
         reward_components = self._build_reward_components(
             ticket_score=score,
@@ -348,12 +379,18 @@ class HelpdeskTicketRoutingEnvironment(
                 "context_gap_penalty": context_penalty,
                 "context_completion_bonus": process_bonus,
                 "risk_penalty": risk_penalty,
                 "delta_adjustment": step_adjustments["delta_adjustment"],
                 "required_investigation_count": len(self._required_tools_for_ticket(current_ticket)),
                 "hidden_context_remaining_count": missing_required_count,
                 "hidden_context_revealed_count": len(
                     self._used_tools_for_ticket(current_ticket.ticket_id)
                 ),
                 "rubric_reward": rubric_reward,
                 "trajectory_average_reward": (
                     trajectory_components["average_reward"]
@@ -372,6 +409,7 @@ class HelpdeskTicketRoutingEnvironment(
                 ),
             },
         )
         history_entry = self._build_history_entry(
             current_ticket,
@@ -390,6 +428,7 @@ class HelpdeskTicketRoutingEnvironment(
         self._state.reward = final_reward
         self._state.done = is_done
         self._state.investigation_penalty_applied = self._compute_episode_penalty()
         self._state.last_tool_result = None
         self._state.last_reward_components = reward_components
@@ -425,14 +464,373 @@ class HelpdeskTicketRoutingEnvironment(
             return 0.0
         return sum(self._state.per_ticket_scores) / len(self._state.per_ticket_scores)
     def _internal_routing_note_for_ticket(
         self,
         ticket: HelpdeskTicketRecord,
     ) -> str | None:
-        if ticket.ambiguity_note is not None:
-            return ticket.ambiguity_note
         if self._state.current_task_id != 3:
-            return None
         default_group = ISSUE_TYPE_TO_ASSIGNMENT_GROUP.get(
             ticket.issue_type,
@@ -442,7 +840,6 @@ class HelpdeskTicketRoutingEnvironment(
             ticket.issue_type,
             ticket.resolution_action,
         )
-        note_parts: list[str] = []
         if ticket.assignment_group != default_group:
             note_parts.append(
@@ -517,6 +914,11 @@ class HelpdeskTicketRoutingEnvironment(
             return self._ticket_repeated_requester_count(ticket) >= 2
         if tool_name == "lookup_internal_routing_note":
             return self._internal_routing_note_for_ticket(ticket) is not None
         return False
     def _required_tools_for_ticket(
@@ -546,6 +948,11 @@ class HelpdeskTicketRoutingEnvironment(
             and "lookup_requester_history" not in required_tools
         ):
             required_tools.append("lookup_requester_history")
         filtered_required_tools: list[str] = []
         for tool_name in required_tools:
             if tool_name in filtered_required_tools:
@@ -596,6 +1003,11 @@ class HelpdeskTicketRoutingEnvironment(
                 "The visible request is not enough to choose the final owner and next step. "
                 "Additional routing context is available via investigation."
             )
         if self._ticket_has_nondefault_routing(ticket):
             return (
                 "The visible request looks straightforward, but the decisive routing detail is hidden until investigation."
@@ -609,6 +1021,8 @@ class HelpdeskTicketRoutingEnvironment(
             return "Follow-up request with hidden routing context"
         if self._internal_routing_note_for_ticket(ticket) is not None:
             return "Routing clarification required"
         if self._ticket_mentions_follow_up(ticket):
             return "Priority support follow-up"
         return "Helpdesk routing decision"
@@ -805,6 +1219,24 @@ class HelpdeskTicketRoutingEnvironment(
             "routing_note": routing_note if found else "",
         }
     def _run_investigation_tool(
         self,
         current_ticket: HelpdeskTicketRecord,
@@ -817,6 +1249,8 @@ class HelpdeskTicketRoutingEnvironment(
             return self._lookup_requester_history(current_ticket)
         if tool_name == "lookup_internal_routing_note":
             return self._lookup_internal_routing_note(current_ticket)
         raise ValueError(f"Unsupported tool_name: {tool_name}")
     def _handle_investigation_action(
@@ -901,12 +1335,15 @@ class HelpdeskTicketRoutingEnvironment(
     def _build_ticket_view(self, ticket: HelpdeskTicketRecord) -> dict[str, Any]:
         progress = self._tool_progress_for_ticket(ticket)
         remaining_tools = progress["remaining_tools"]
         ticket_view: dict[str, Any] = {
             "ticket_id": ticket.ticket_id,
             "title": self._visible_title(ticket),
             "requester": ticket.requester,
             "description": self._visible_description(ticket),
         }
         if progress["required_tools"]:
             ticket_view["context_status"] = {
                 "investigation_required": True,
@@ -919,6 +1356,11 @@ class HelpdeskTicketRoutingEnvironment(
             }
         if ticket.ambiguity_note is not None and "lookup_internal_routing_note" not in remaining_tools:
             ticket_view["ambiguity_note"] = ticket.ambiguity_note
         if ticket.related_ticket_id is not None and "lookup_related_ticket" not in remaining_tools:
             ticket_view["related_ticket_id"] = ticket.related_ticket_id
             related_ticket = self._tickets_by_id.get(ticket.related_ticket_id)
@@ -929,6 +1371,11 @@ class HelpdeskTicketRoutingEnvironment(
                     "requester": related_ticket.requester,
                     "description": related_ticket.description,
                 }
         return ticket_view
     def _build_feedback_summary(
@@ -982,6 +1429,12 @@ class HelpdeskTicketRoutingEnvironment(
             risk_penalty = reward_components.get("risk_penalty")
             if risk_penalty:
                 parts.append(f"risk_penalty={risk_penalty:.2f}")
         return "; ".join(parts)
@@ -1011,6 +1464,8 @@ class HelpdeskTicketRoutingEnvironment(
             "breakdown": breakdown,
             "queue_position": queue_position,
         }
         if reward is not None:
             history_entry["reward"] = reward
         if rubric_reward is not None:
@@ -1019,6 +1474,11 @@ class HelpdeskTicketRoutingEnvironment(
             history_entry["reward_kind"] = reward_kind
         if ticket.ambiguity_note is not None and "lookup_internal_routing_note" not in remaining_tools:
             history_entry["ambiguity_note"] = ticket.ambiguity_note
         if ticket.related_ticket_id is not None and "lookup_related_ticket" not in remaining_tools:
             history_entry["related_ticket_id"] = ticket.related_ticket_id
             related_ticket = self._tickets_by_id.get(ticket.related_ticket_id)
@@ -1029,6 +1489,14 @@ class HelpdeskTicketRoutingEnvironment(
                     "requester": related_ticket.requester,
                     "description": related_ticket.description,
                 }
         if penalty_reason is not None:
             history_entry["penalty_reason"] = penalty_reason
         if tool_result is not None:
@@ -1098,7 +1566,12 @@ class HelpdeskTicketRoutingEnvironment(
             "average_score_so_far": self._state.average_score_so_far,
             "progress_fraction": progress_fraction,
             "investigation_penalty_applied": self._state.investigation_penalty_applied,
         }
         if last_history_entry is not None:
             metadata["last_score"] = last_history_entry.get("score")
             metadata["last_reward"] = last_history_entry.get("reward")

     "lookup_related_ticket",
     "lookup_requester_history",
     "lookup_internal_routing_note",
+    "lookup_queue_capacity_forecast",
 )
 FREE_INVESTIGATIONS_PER_TICKET = 1
 EXTRA_INVESTIGATION_COST = 0.04
 SEVERE_PRIORITY_UNDERSHOOT_PENALTY = 0.07
 DANGEROUS_RESOLUTION_PENALTY = 0.05
 NONDEFAULT_ROUTING_FOLLOWTHROUGH_BONUS = 0.02
+TEAM_CAPACITY_OVERFLOW_PENALTY = 0.08
+HIGH_PRIORITY_SLOT_OVERFLOW_PENALTY = 0.06
+ESCALATION_SLOT_OVERFLOW_PENALTY = 0.05
+PLANNING_SUCCESS_BONUS = 0.05
 TASK3_INVESTIGATION_TOOL_PLAN: dict[str, tuple[str, ...]] = {
     "ticket-021": ("lookup_related_ticket", "lookup_requester_history"),
         else:
             queue_size = min(queue_size_value, len(self._dataset))
         self._queue = self._rng.sample(self._dataset, min(queue_size, len(self._dataset)))
+        (
+            team_capacity_initial,
+            high_priority_slots_initial,
+            escalation_slots_initial,
+        ) = self._initial_capacity_state_for_queue(task_id)
         self._state = HelpdeskTicketState(
             episode_id=episode_id or str(uuid.uuid4()),
             average_score_so_far=0.0,
             investigation_budget_remaining=queue_size * FREE_INVESTIGATIONS_PER_TICKET,
             investigation_penalty_applied=0.0,
+            planning_penalty_applied=0.0,
             last_reward_components={},
             ticket_tool_usage={},
+            team_capacity_initial=team_capacity_initial,
+            team_capacity_remaining=dict(team_capacity_initial),
+            high_priority_slots_initial=high_priority_slots_initial,
+            high_priority_slots_remaining=high_priority_slots_initial,
+            escalation_slots_initial=escalation_slots_initial,
+            escalation_slots_remaining=escalation_slots_initial,
+            planning_penalty_total=0.0,
+            capacity_pressure_tickets_resolved=0,
         )
         return self._build_observation(task)
             action,
             task_id=task_id,
         )
+        capacity_penalty, capacity_details = self._apply_capacity_usage(
+            current_ticket,
+            action,
+        )
         step_adjustments = compute_step_adjustments(
             score,
             previous_average=previous_average,
                 self._state.per_ticket_scores,
                 len(self._queue),
                 self._state.step_count,
+                completion_bonus=(
+                    self._trajectory_consistency_bonus() + self._planning_success_bonus()
+                ),
             )
             trajectory_reward = trajectory_components["final_reward"]
+            rubric_reward = self._apply_episode_economics(
+                trajectory_reward - self._state.planning_penalty_total
+            )
+            final_reward = clamp_open_unit_interval(
+                rubric_reward - context_penalty - capacity_penalty
+            )
             self._state.total_reward = rubric_reward
             investigation_penalty = self._compute_episode_penalty()
         else:
             self._state.average_score_so_far = self._current_average_score()
             self._state.step_count += 1
             self._state.current_ticket_index += 1
+            final_reward = clamp_open_unit_interval(
+                step_reward - context_penalty - capacity_penalty
+            )
         reward_components = self._build_reward_components(
             ticket_score=score,
                 "context_gap_penalty": context_penalty,
                 "context_completion_bonus": process_bonus,
                 "risk_penalty": risk_penalty,
+                "capacity_penalty": capacity_penalty,
                 "delta_adjustment": step_adjustments["delta_adjustment"],
                 "required_investigation_count": len(self._required_tools_for_ticket(current_ticket)),
                 "hidden_context_remaining_count": missing_required_count,
                 "hidden_context_revealed_count": len(
                     self._used_tools_for_ticket(current_ticket.ticket_id)
                 ),
+                "planning_penalty_total": self._state.planning_penalty_total,
+                "planning_penalty_applied": self._state.planning_penalty_applied,
+                "planning_success_bonus": self._planning_success_bonus()
+                if is_done
+                else 0.0,
                 "rubric_reward": rubric_reward,
                 "trajectory_average_reward": (
                     trajectory_components["average_reward"]
                 ),
             },
         )
+        reward_components.update(capacity_details)
         history_entry = self._build_history_entry(
             current_ticket,
         self._state.reward = final_reward
         self._state.done = is_done
         self._state.investigation_penalty_applied = self._compute_episode_penalty()
+        self._state.planning_penalty_applied = capacity_penalty
         self._state.last_tool_result = None
         self._state.last_reward_components = reward_components
             return 0.0
         return sum(self._state.per_ticket_scores) / len(self._state.per_ticket_scores)
+    def _ticket_has_alternate_route(self, ticket: HelpdeskTicketRecord) -> bool:
+        return any(
+            value is not None
+            for value in (
+                ticket.alternate_issue_type,
+                ticket.alternate_priority,
+                ticket.alternate_assignment_group,
+                ticket.alternate_resolution_action,
+            )
+        ) and ticket.alternate_route_score_multiplier > 0.0
+    def _route_for_ticket(
+        self,
+        ticket: HelpdeskTicketRecord,
+        *,
+        use_alternate: bool = False,
+    ) -> dict[str, str]:
+        if use_alternate and self._ticket_has_alternate_route(ticket):
+            return {
+                "issue_type": ticket.alternate_issue_type or ticket.issue_type,
+                "priority": ticket.alternate_priority or ticket.priority,
+                "assignment_group": (
+                    ticket.alternate_assignment_group or ticket.assignment_group
+                ),
+                "resolution_action": (
+                    ticket.alternate_resolution_action or ticket.resolution_action
+                ),
+            }
+        return {
+            "issue_type": ticket.issue_type,
+            "priority": ticket.priority,
+            "assignment_group": ticket.assignment_group,
+            "resolution_action": ticket.resolution_action,
+        }
+    def _route_for_action(
+        self,
+        ticket: HelpdeskTicketRecord,
+        action: HelpdeskTicketAction,
+    ) -> dict[str, str]:
+        primary_route = self._route_for_ticket(ticket)
+        return {
+            "issue_type": action.issue_type or primary_route["issue_type"],
+            "priority": action.priority or primary_route["priority"],
+            "assignment_group": (
+                action.assignment_group or primary_route["assignment_group"]
+            ),
+            "resolution_action": (
+                action.resolution_action or primary_route["resolution_action"]
+            ),
+        }
+    def _route_capacity_cost(self, route: dict[str, str]) -> dict[str, Any]:
+        return {
+            "assignment_group": route["assignment_group"],
+            "team_slots": 1,
+            "high_priority_slots": 1
+            if route["priority"] in {"high", "critical"}
+            else 0,
+            "escalation_slots": 1
+            if route["resolution_action"] in {"assign", "escalate"}
+            else 0,
+        }
+    def _routing_options_for_ticket(self, ticket: HelpdeskTicketRecord) -> list[dict[str, Any]]:
+        options = [
+            {
+                "label": "primary",
+                "score_multiplier": 1.0,
+                **self._route_for_ticket(ticket),
+                "capacity_cost": self._route_capacity_cost(self._route_for_ticket(ticket)),
+            }
+        ]
+        if self._ticket_has_alternate_route(ticket):
+            alternate_route = self._route_for_ticket(ticket, use_alternate=True)
+            options.append(
+                {
+                    "label": "alternate",
+                    "score_multiplier": ticket.alternate_route_score_multiplier,
+                    **alternate_route,
+                    "capacity_cost": self._route_capacity_cost(alternate_route),
+                }
+            )
+        return options
+    def _initial_capacity_state_for_queue(
+        self,
+        task_id: int,
+    ) -> tuple[dict[str, int], int, int]:
+        if task_id != 3:
+            return {}, 0, 0
+        primary_group_demand: dict[str, int] = {}
+        alternate_relief_by_group: dict[str, int] = {}
+        all_groups: set[str] = set()
+        high_priority_demand = 0
+        high_priority_relief = 0
+        escalation_demand = 0
+        escalation_relief = 0
+        for ticket in self._queue:
+            primary_route = self._route_for_ticket(ticket)
+            all_groups.add(primary_route["assignment_group"])
+            primary_group_demand[primary_route["assignment_group"]] = (
+                primary_group_demand.get(primary_route["assignment_group"], 0) + 1
+            )
+            if primary_route["priority"] in {"high", "critical"}:
+                high_priority_demand += 1
+            if primary_route["resolution_action"] in {"assign", "escalate"}:
+                escalation_demand += 1
+            if self._ticket_has_alternate_route(ticket):
+                alternate_route = self._route_for_ticket(ticket, use_alternate=True)
+                all_groups.add(alternate_route["assignment_group"])
+                if alternate_route["assignment_group"] != primary_route["assignment_group"]:
+                    alternate_relief_by_group[primary_route["assignment_group"]] = (
+                        alternate_relief_by_group.get(
+                            primary_route["assignment_group"],
+                            0,
+                        )
+                        + 1
+                    )
+                if (
+                    primary_route["priority"] in {"high", "critical"}
+                    and alternate_route["priority"] not in {"high", "critical"}
+                ):
+                    high_priority_relief += 1
+                if (
+                    primary_route["resolution_action"] in {"assign", "escalate"}
+                    and alternate_route["resolution_action"] not in {"assign", "escalate"}
+                ):
+                    escalation_relief += 1
+        team_capacity_initial: dict[str, int] = {}
+        for group in sorted(all_groups):
+            demand = primary_group_demand.get(group, 0)
+            relief = alternate_relief_by_group.get(group, 0)
+            if demand <= 1:
+                team_capacity_initial[group] = 1 if group in all_groups else 0
+            elif relief > 0:
+                team_capacity_initial[group] = max(1, demand - 1)
+            else:
+                team_capacity_initial[group] = demand
+        if high_priority_demand <= 1:
+            high_priority_slots_initial = high_priority_demand
+        elif high_priority_relief > 0:
+            high_priority_slots_initial = max(1, high_priority_demand - 1)
+        else:
+            high_priority_slots_initial = high_priority_demand
+        if escalation_demand <= 1:
+            escalation_slots_initial = escalation_demand
+        elif escalation_relief > 0:
+            escalation_slots_initial = max(1, escalation_demand - 1)
+        else:
+            escalation_slots_initial = escalation_demand
+        return (
+            team_capacity_initial,
+            high_priority_slots_initial,
+            escalation_slots_initial,
+        )
+    def _future_queue_demand(self) -> dict[str, Any]:
+        future_tickets = self._queue[self._state.current_ticket_index + 1 :]
+        team_demand: dict[str, int] = {}
+        high_priority_needed = 0
+        escalation_needed = 0
+        capacity_sensitive_tickets = 0
+        for ticket in future_tickets:
+            route = self._route_for_ticket(ticket)
+            team_demand[route["assignment_group"]] = (
+                team_demand.get(route["assignment_group"], 0) + 1
+            )
+            if route["priority"] in {"high", "critical"}:
+                high_priority_needed += 1
+            if route["resolution_action"] in {"assign", "escalate"}:
+                escalation_needed += 1
+            if self._ticket_has_alternate_route(ticket):
+                capacity_sensitive_tickets += 1
+        return {
+            "remaining_ticket_count": len(future_tickets),
+            "team_demand": team_demand,
+            "high_priority_needed": high_priority_needed,
+            "escalation_needed": escalation_needed,
+            "capacity_sensitive_tickets": capacity_sensitive_tickets,
+        }
+    def _capacity_state_snapshot(self) -> dict[str, Any]:
+        return {
+            "team_capacity_remaining": dict(self._state.team_capacity_remaining),
+            "team_capacity_initial": dict(self._state.team_capacity_initial),
+            "high_priority_slots_remaining": self._state.high_priority_slots_remaining,
+            "high_priority_slots_initial": self._state.high_priority_slots_initial,
+            "escalation_slots_remaining": self._state.escalation_slots_remaining,
+            "escalation_slots_initial": self._state.escalation_slots_initial,
+        }
+    def _planning_route_recommendation(self, ticket: HelpdeskTicketRecord) -> dict[str, Any]:
+        primary_route = self._route_for_ticket(ticket)
+        alternate_route = (
+            self._route_for_ticket(ticket, use_alternate=True)
+            if self._ticket_has_alternate_route(ticket)
+            else None
+        )
+        future_demand = self._future_queue_demand()
+        capacity_state = self._capacity_state_snapshot()
+        def pressure_score(route: dict[str, str]) -> int:
+            cost = self._route_capacity_cost(route)
+            group_remaining = capacity_state["team_capacity_remaining"].get(
+                route["assignment_group"],
+                1,
+            )
+            group_pressure = max(
+                0,
+                future_demand["team_demand"].get(route["assignment_group"], 0)
+                + cost["team_slots"]
+                - group_remaining,
+            )
+            priority_pressure = max(
+                0,
+                future_demand["high_priority_needed"] + cost["high_priority_slots"]
+                - capacity_state["high_priority_slots_remaining"],
+            )
+            escalation_pressure = max(
+                0,
+                future_demand["escalation_needed"] + cost["escalation_slots"]
+                - capacity_state["escalation_slots_remaining"],
+            )
+            return group_pressure + priority_pressure + escalation_pressure
+        primary_pressure = pressure_score(primary_route)
+        alternate_pressure = (
+            pressure_score(alternate_route) if alternate_route is not None else primary_pressure
+        )
+        preferred_label = (
+            "alternate"
+            if alternate_route is not None and alternate_pressure < primary_pressure
+            else "primary"
+        )
+        return {
+            "preferred_label": preferred_label,
+            "primary_pressure": primary_pressure,
+            "alternate_pressure": alternate_pressure,
+            "capacity_state": capacity_state,
+            "future_demand": future_demand,
+        }
+    def _ticket_is_capacity_sensitive(self, ticket: HelpdeskTicketRecord) -> bool:
+        if self._state.current_task_id != 3 or not self._ticket_has_alternate_route(ticket):
+            return False
+        recommendation = self._planning_route_recommendation(ticket)
+        return recommendation["preferred_label"] == "alternate" or any(
+            value > 0
+            for value in (
+                recommendation["primary_pressure"],
+                recommendation["alternate_pressure"],
+            )
+        )
+    def _route_matches_alternate(
+        self,
+        ticket: HelpdeskTicketRecord,
+        route: dict[str, str],
+    ) -> bool:
+        if not self._ticket_has_alternate_route(ticket):
+            return False
+        return route == self._route_for_ticket(ticket, use_alternate=True)
+    def _apply_capacity_usage(
+        self,
+        ticket: HelpdeskTicketRecord,
+        action: HelpdeskTicketAction,
+    ) -> tuple[float, dict[str, Any]]:
+        if self._state.current_task_id != 3:
+            return 0.0, {}
+        route = self._route_for_action(ticket, action)
+        capacity_cost = self._route_capacity_cost(route)
+        group = str(capacity_cost["assignment_group"])
+        if group not in self._state.team_capacity_remaining:
+            self._state.team_capacity_remaining[group] = 1
+            self._state.team_capacity_initial.setdefault(group, 1)
+        group_remaining = self._state.team_capacity_remaining[group]
+        group_overflow = max(0, int(capacity_cost["team_slots"]) - group_remaining)
+        self._state.team_capacity_remaining[group] = max(
+            0,
+            group_remaining - int(capacity_cost["team_slots"]),
+        )
+        high_priority_cost = int(capacity_cost["high_priority_slots"])
+        high_priority_overflow = max(
+            0,
+            high_priority_cost - self._state.high_priority_slots_remaining,
+        )
+        self._state.high_priority_slots_remaining = max(
+            0,
+            self._state.high_priority_slots_remaining - high_priority_cost,
+        )
+        escalation_cost = int(capacity_cost["escalation_slots"])
+        escalation_overflow = max(
+            0,
+            escalation_cost - self._state.escalation_slots_remaining,
+        )
+        self._state.escalation_slots_remaining = max(
+            0,
+            self._state.escalation_slots_remaining - escalation_cost,
+        )
+        capacity_penalty = round(
+            group_overflow * TEAM_CAPACITY_OVERFLOW_PENALTY
+            + high_priority_overflow * HIGH_PRIORITY_SLOT_OVERFLOW_PENALTY
+            + escalation_overflow * ESCALATION_SLOT_OVERFLOW_PENALTY,
+            4,
+        )
+        self._state.planning_penalty_total = round(
+            self._state.planning_penalty_total + capacity_penalty,
+            4,
+        )
+        self._state.planning_penalty_applied = capacity_penalty
+        used_alternate_route = self._route_matches_alternate(ticket, route)
+        if used_alternate_route:
+            self._state.capacity_pressure_tickets_resolved += 1
+        return capacity_penalty, {
+            "capacity_cost": capacity_cost,
+            "group_overflow": group_overflow,
+            "high_priority_overflow": high_priority_overflow,
+            "escalation_overflow": escalation_overflow,
+            "used_alternate_route": used_alternate_route,
+            "capacity_state_after_action": self._capacity_state_snapshot(),
+        }
+    def _planning_success_bonus(self) -> float:
+        if self._state.current_task_id != 3 or self._state.planning_penalty_total > 0.0:
+            return 0.0
+        capacity_sensitive_count = sum(
+            1 for ticket in self._queue if self._ticket_has_alternate_route(ticket)
+        )
+        if capacity_sensitive_count == 0:
+            return 0.0
+        coverage = min(
+            1.0,
+            self._state.capacity_pressure_tickets_resolved / capacity_sensitive_count,
+        )
+        return round(PLANNING_SUCCESS_BONUS * coverage, 4)
     def _internal_routing_note_for_ticket(
         self,
         ticket: HelpdeskTicketRecord,
     ) -> str | None:
         if self._state.current_task_id != 3:
+            return ticket.ambiguity_note or ticket.planning_note
+        note_parts: list[str] = []
+        if ticket.ambiguity_note is not None:
+            note_parts.append(ticket.ambiguity_note)
+        if ticket.planning_note is not None:
+            note_parts.append(ticket.planning_note)
         default_group = ISSUE_TYPE_TO_ASSIGNMENT_GROUP.get(
             ticket.issue_type,
             ticket.issue_type,
             ticket.resolution_action,
         )
         if ticket.assignment_group != default_group:
             note_parts.append(
             return self._ticket_repeated_requester_count(ticket) >= 2
         if tool_name == "lookup_internal_routing_note":
             return self._internal_routing_note_for_ticket(ticket) is not None
+        if tool_name == "lookup_queue_capacity_forecast":
+            return self._state.current_task_id == 3 and (
+                self._ticket_has_alternate_route(ticket)
+                or self._future_queue_demand()["remaining_ticket_count"] > 0
+            )
         return False
     def _required_tools_for_ticket(
             and "lookup_requester_history" not in required_tools
         ):
             required_tools.append("lookup_requester_history")
+        if (
+            self._ticket_is_capacity_sensitive(ticket)
+            and "lookup_queue_capacity_forecast" not in required_tools
+        ):
+            required_tools.append("lookup_queue_capacity_forecast")
         filtered_required_tools: list[str] = []
         for tool_name in required_tools:
             if tool_name in filtered_required_tools:
                 "The visible request is not enough to choose the final owner and next step. "
                 "Additional routing context is available via investigation."
             )
+        if self._ticket_has_alternate_route(ticket):
+            return (
+                "The queue is under resource pressure and this ticket may support more than "
+                "one acceptable routing path. Additional planning context is available via investigation."
+            )
         if self._ticket_has_nondefault_routing(ticket):
             return (
                 "The visible request looks straightforward, but the decisive routing detail is hidden until investigation."
             return "Follow-up request with hidden routing context"
         if self._internal_routing_note_for_ticket(ticket) is not None:
             return "Routing clarification required"
+        if self._ticket_has_alternate_route(ticket):
+            return "Capacity-sensitive routing decision"
         if self._ticket_mentions_follow_up(ticket):
             return "Priority support follow-up"
         return "Helpdesk routing decision"
             "routing_note": routing_note if found else "",
         }
+    def _lookup_queue_capacity_forecast(
+        self,
+        current_ticket: HelpdeskTicketRecord,
+    ) -> dict[str, Any]:
+        recommendation = self._planning_route_recommendation(current_ticket)
+        routing_options = self._routing_options_for_ticket(current_ticket)
+        return {
+            "tool_name": "lookup_queue_capacity_forecast",
+            "found": True,
+            "ticket_id": current_ticket.ticket_id,
+            "preferred_route_label": recommendation["preferred_label"],
+            "primary_pressure": recommendation["primary_pressure"],
+            "alternate_pressure": recommendation["alternate_pressure"],
+            "capacity_state": recommendation["capacity_state"],
+            "future_queue_demand": recommendation["future_demand"],
+            "routing_options": routing_options,
+        }
     def _run_investigation_tool(
         self,
         current_ticket: HelpdeskTicketRecord,
             return self._lookup_requester_history(current_ticket)
         if tool_name == "lookup_internal_routing_note":
             return self._lookup_internal_routing_note(current_ticket)
+        if tool_name == "lookup_queue_capacity_forecast":
+            return self._lookup_queue_capacity_forecast(current_ticket)
         raise ValueError(f"Unsupported tool_name: {tool_name}")
     def _handle_investigation_action(
     def _build_ticket_view(self, ticket: HelpdeskTicketRecord) -> dict[str, Any]:
         progress = self._tool_progress_for_ticket(ticket)
         remaining_tools = progress["remaining_tools"]
+        used_tools = set(self._used_tools_for_ticket(ticket.ticket_id))
         ticket_view: dict[str, Any] = {
             "ticket_id": ticket.ticket_id,
             "title": self._visible_title(ticket),
             "requester": ticket.requester,
             "description": self._visible_description(ticket),
         }
+        if self._state.current_task_id == 3:
+            ticket_view["capacity_state"] = self._capacity_state_snapshot()
         if progress["required_tools"]:
             ticket_view["context_status"] = {
                 "investigation_required": True,
             }
         if ticket.ambiguity_note is not None and "lookup_internal_routing_note" not in remaining_tools:
             ticket_view["ambiguity_note"] = ticket.ambiguity_note
+        if (
+            ticket.planning_note is not None
+            and "lookup_internal_routing_note" not in remaining_tools
+        ):
+            ticket_view["planning_note"] = ticket.planning_note
         if ticket.related_ticket_id is not None and "lookup_related_ticket" not in remaining_tools:
             ticket_view["related_ticket_id"] = ticket.related_ticket_id
             related_ticket = self._tickets_by_id.get(ticket.related_ticket_id)
                     "requester": related_ticket.requester,
                     "description": related_ticket.description,
                 }
+        if self._ticket_has_alternate_route(ticket) and (
+            "lookup_internal_routing_note" in used_tools
+            or "lookup_queue_capacity_forecast" in used_tools
+        ):
+            ticket_view["routing_options"] = self._routing_options_for_ticket(ticket)
         return ticket_view
     def _build_feedback_summary(
             risk_penalty = reward_components.get("risk_penalty")
             if risk_penalty:
                 parts.append(f"risk_penalty={risk_penalty:.2f}")
+            capacity_penalty = reward_components.get("capacity_penalty")
+            if capacity_penalty:
+                parts.append(f"capacity_penalty={capacity_penalty:.2f}")
+            planning_penalty_total = reward_components.get("planning_penalty_total")
+            if planning_penalty_total:
+                parts.append(f"planning_penalty_total={planning_penalty_total:.2f}")
         return "; ".join(parts)
             "breakdown": breakdown,
             "queue_position": queue_position,
         }
+        if self._state.current_task_id == 3:
+            history_entry["capacity_state"] = self._capacity_state_snapshot()
         if reward is not None:
             history_entry["reward"] = reward
         if rubric_reward is not None:
             history_entry["reward_kind"] = reward_kind
         if ticket.ambiguity_note is not None and "lookup_internal_routing_note" not in remaining_tools:
             history_entry["ambiguity_note"] = ticket.ambiguity_note
+        if (
+            ticket.planning_note is not None
+            and "lookup_internal_routing_note" not in remaining_tools
+        ):
+            history_entry["planning_note"] = ticket.planning_note
         if ticket.related_ticket_id is not None and "lookup_related_ticket" not in remaining_tools:
             history_entry["related_ticket_id"] = ticket.related_ticket_id
             related_ticket = self._tickets_by_id.get(ticket.related_ticket_id)
                     "requester": related_ticket.requester,
                     "description": related_ticket.description,
                 }
+        if (
+            self._ticket_has_alternate_route(ticket)
+            and (
+                "lookup_internal_routing_note" not in remaining_tools
+                or "lookup_queue_capacity_forecast" in self._used_tools_for_ticket(ticket.ticket_id)
+            )
+        ):
+            history_entry["routing_options"] = self._routing_options_for_ticket(ticket)
         if penalty_reason is not None:
             history_entry["penalty_reason"] = penalty_reason
         if tool_result is not None:
             "average_score_so_far": self._state.average_score_so_far,
             "progress_fraction": progress_fraction,
             "investigation_penalty_applied": self._state.investigation_penalty_applied,
+            "planning_penalty_total": self._state.planning_penalty_total,
+            "planning_penalty_applied": self._state.planning_penalty_applied,
         }
+        if self._state.current_task_id == 3:
+            metadata["capacity_state"] = self._capacity_state_snapshot()
+            metadata["future_queue_demand"] = self._future_queue_demand()
         if last_history_entry is not None:
             metadata["last_score"] = last_history_entry.get("score")
             metadata["last_reward"] = last_history_entry.get("reward")

server/grader.py CHANGED Viewed

@@ -2,9 +2,6 @@ from __future__ import annotations
 from models import HelpdeskTicketAction, HelpdeskTicketRecord
-TASK_SCORE_EPSILON = 0.01
 ISSUE_TYPE_SIMILARITY = {
     ("billing_license", "service_request"): 0.4,
     ("service_request", "billing_license"): 0.4,
@@ -120,31 +117,90 @@ def _score_exact(predicted: str | None, expected: str) -> float:
     return 1.0 if _normalized(predicted) == _normalized(expected) and predicted else 0.0
-def grade_action(
     action: HelpdeskTicketAction,
-    ticket: HelpdeskTicketRecord,
     task_id: int,
 ) -> tuple[float, dict[str, float]]:
-    if task_id not in TASK_WEIGHTS:
-        raise ValueError(f"Unsupported task_id: {task_id}")
     field_scores = {
-        "issue_type": _score_exact_or_similar(action.issue_type, ticket.issue_type),
-        "priority": _score_priority(action.priority, ticket.priority),
         "assignment_group": _score_exact_or_table(
             action.assignment_group,
-            ticket.assignment_group,
             ASSIGNMENT_GROUP_SIMILARITY,
         ),
         "resolution_action": _score_exact_or_table(
             action.resolution_action,
-            ticket.resolution_action,
             RESOLUTION_ACTION_SIMILARITY,
         ),
     }
     weights = TASK_WEIGHTS[task_id]
     raw_score = sum(field_scores[field] * weight for field, weight in weights.items())
-    score = max(TASK_SCORE_EPSILON, min(1.0 - TASK_SCORE_EPSILON, raw_score))
-    breakdown = {field: field_scores[field] for field in weights}
     return score, breakdown

 from models import HelpdeskTicketAction, HelpdeskTicketRecord
 ISSUE_TYPE_SIMILARITY = {
     ("billing_license", "service_request"): 0.4,
     ("service_request", "billing_license"): 0.4,
     return 1.0 if _normalized(predicted) == _normalized(expected) and predicted else 0.0
+def _score_route(
     action: HelpdeskTicketAction,
+    *,
+    issue_type: str,
+    priority: str,
+    assignment_group: str,
+    resolution_action: str,
+    score_multiplier: float,
     task_id: int,
 ) -> tuple[float, dict[str, float]]:
     field_scores = {
+        "issue_type": _score_exact_or_similar(action.issue_type, issue_type),
+        "priority": _score_priority(action.priority, priority),
         "assignment_group": _score_exact_or_table(
             action.assignment_group,
+            assignment_group,
             ASSIGNMENT_GROUP_SIMILARITY,
         ),
         "resolution_action": _score_exact_or_table(
             action.resolution_action,
+            resolution_action,
             RESOLUTION_ACTION_SIMILARITY,
         ),
     }
+    if score_multiplier != 1.0:
+        field_scores = {
+            field: round(score * score_multiplier, 4)
+            for field, score in field_scores.items()
+        }
     weights = TASK_WEIGHTS[task_id]
     raw_score = sum(field_scores[field] * weight for field, weight in weights.items())
+    return raw_score, field_scores
+def _alternate_route_available(ticket: HelpdeskTicketRecord) -> bool:
+    return any(
+        value is not None
+        for value in (
+            ticket.alternate_issue_type,
+            ticket.alternate_priority,
+            ticket.alternate_assignment_group,
+            ticket.alternate_resolution_action,
+        )
+    ) and ticket.alternate_route_score_multiplier > 0.0
+def grade_action(
+    action: HelpdeskTicketAction,
+    ticket: HelpdeskTicketRecord,
+    task_id: int,
+) -> tuple[float, dict[str, float]]:
+    if task_id not in TASK_WEIGHTS:
+        raise ValueError(f"Unsupported task_id: {task_id}")
+    primary_score, primary_field_scores = _score_route(
+        action,
+        issue_type=ticket.issue_type,
+        priority=ticket.priority,
+        assignment_group=ticket.assignment_group,
+        resolution_action=ticket.resolution_action,
+        score_multiplier=1.0,
+        task_id=task_id,
+    )
+    chosen_score = primary_score
+    chosen_field_scores = primary_field_scores
+    if _alternate_route_available(ticket):
+        alternate_score, alternate_field_scores = _score_route(
+            action,
+            issue_type=ticket.alternate_issue_type or ticket.issue_type,
+            priority=ticket.alternate_priority or ticket.priority,
+            assignment_group=(
+                ticket.alternate_assignment_group or ticket.assignment_group
+            ),
+            resolution_action=(
+                ticket.alternate_resolution_action or ticket.resolution_action
+            ),
+            score_multiplier=ticket.alternate_route_score_multiplier,
+            task_id=task_id,
+        )
+        if alternate_score > chosen_score:
+            chosen_score = alternate_score
+            chosen_field_scores = alternate_field_scores
+    score = max(0.0, min(1.0, chosen_score))
+    breakdown = {field: chosen_field_scores[field] for field in TASK_WEIGHTS[task_id]}
     return score, breakdown

server/reward.py CHANGED Viewed

@@ -8,15 +8,14 @@ DELTA_REWARD_WEIGHT = 0.08
 DELTA_REWARD_CAP = 0.04
 PROCESS_BONUS_CAP = 0.08
 RISK_PENALTY_CAP = 0.12
-OPEN_INTERVAL_EPSILON = 0.01
 def _clamp_unit_interval(value: float) -> float:
     return max(0.0, min(1.0, value))
-def clamp_open_unit_interval(value: float, epsilon: float = OPEN_INTERVAL_EPSILON) -> float:
-    return max(epsilon, min(1.0 - epsilon, value))
 def compute_step_adjustments(
@@ -93,7 +92,7 @@ def compute_trajectory_adjustments(
     avg = sum(per_ticket_scores) / len(per_ticket_scores)
     bounded_completion_bonus = max(0.0, min(0.08, completion_bonus))
     bounded_consistency_bonus = max(0.0, min(0.05, consistency_bonus))
-    final_reward = clamp_open_unit_interval(
         avg + bounded_completion_bonus + bounded_consistency_bonus
     )
     return {

 DELTA_REWARD_CAP = 0.04
 PROCESS_BONUS_CAP = 0.08
 RISK_PENALTY_CAP = 0.12
 def _clamp_unit_interval(value: float) -> float:
     return max(0.0, min(1.0, value))
+def clamp_open_unit_interval(value: float, epsilon: float = 0.0) -> float:
+    return _clamp_unit_interval(value)
 def compute_step_adjustments(
     avg = sum(per_ticket_scores) / len(per_ticket_scores)
     bounded_completion_bonus = max(0.0, min(0.08, completion_bonus))
     bounded_consistency_bonus = max(0.0, min(0.05, consistency_bonus))
+    final_reward = _clamp_unit_interval(
         avg + bounded_completion_bonus + bounded_consistency_bonus
     )
     return {

server/tasks.py CHANGED Viewed

@@ -36,10 +36,13 @@ TASKS = {
         "instructions": (
             "Perform full helpdesk routing by selecting the best issue type, "
             "priority, assignment group, and resolution action for the ticket. "
-            "Use any ambiguity notes or related-ticket previews when present. "
             "Some hard tickets intentionally hide decisive routing context until "
-            "you investigate with the available tools, so premature submission can "
-            "underperform even when the visible text looks plausible."
         ),
         "allowed_fields": [
             "issue_type",
@@ -50,6 +53,379 @@ TASKS = {
     },
 }
 assert tuple(TASKS.keys()) == TASK_IDS
@@ -58,7 +434,8 @@ def load_dataset() -> list[HelpdeskTicketRecord]:
     # Accept UTF-8 files saved with a BOM, which is common on Windows editors.
     with dataset_path.open("r", encoding="utf-8-sig") as f:
         raw = json.load(f)
-    return [HelpdeskTicketRecord.model_validate(r) for r in raw]
 def get_task_definition(task_id: int) -> dict:

         "instructions": (
             "Perform full helpdesk routing by selecting the best issue type, "
             "priority, assignment group, and resolution action for the ticket. "
+            "Use any ambiguity notes, related-ticket previews, queue-capacity "
+            "forecasts, and planning state when present. "
             "Some hard tickets intentionally hide decisive routing context until "
+            "you investigate with the available tools, and some hard episodes also "
+            "require queue-level capacity planning across multiple tickets, so "
+            "premature or resource-greedy routing can underperform even when the "
+            "visible text looks plausible."
         ),
         "allowed_fields": [
             "issue_type",
     },
 }
+PLANNING_ROUTE_UPDATES: dict[str, dict] = {
+    "ticket-022": {
+        "planning_note": (
+            "If the application queue is saturated, billing operations can own the "
+            "customer-facing charge review as a lower-fidelity fallback while the bug "
+            "investigation continues separately."
+        ),
+        "alternate_issue_type": "billing_license",
+        "alternate_assignment_group": "license_ops",
+        "alternate_resolution_action": "assign",
+        "alternate_route_score_multiplier": 0.74,
+    },
+    "ticket-027": {
+        "planning_note": (
+            "If procurement capacity is available, treat this like a commercial review. "
+            "If not, a lightweight service-desk acknowledgement is still acceptable."
+        ),
+        "alternate_issue_type": "service_request",
+        "alternate_priority": "medium",
+        "alternate_assignment_group": "procurement",
+        "alternate_resolution_action": "assign",
+        "alternate_route_score_multiplier": 0.92,
+    },
+    "ticket-029": {
+        "planning_note": (
+            "Seat expansion is the preferred route, but license operations can still "
+            "handle the prorating clarification when procurement is the bottleneck."
+        ),
+        "alternate_issue_type": "billing_license",
+        "alternate_assignment_group": "license_ops",
+        "alternate_resolution_action": "fulfill",
+        "alternate_route_score_multiplier": 0.82,
+    },
+    "ticket-040": {
+        "planning_note": (
+            "The request can be treated either as roadmap feedback or as a support "
+            "escalation if the operational impact is emphasized."
+        ),
+        "alternate_issue_type": "application_support",
+        "alternate_priority": "high",
+        "alternate_resolution_action": "escalate",
+        "alternate_route_score_multiplier": 0.76,
+    },
+    "ticket-047": {
+        "planning_note": (
+            "The preferred route is an immediate service-desk extension, but the "
+            "commercial owner can take it if operational fulfillment capacity is exhausted."
+        ),
+        "alternate_assignment_group": "procurement",
+        "alternate_resolution_action": "assign",
+        "alternate_route_score_multiplier": 0.78,
+    },
+    "ticket-048": {
+        "planning_note": (
+            "This belongs with procurement when commercial reviewers are available, "
+            "but a generic service-desk acknowledgement is an acceptable fallback."
+        ),
+        "alternate_assignment_group": "service_desk",
+        "alternate_resolution_action": "acknowledge",
+        "alternate_route_score_multiplier": 0.9,
+    },
+    "ticket-050": {
+        "planning_note": (
+            "Central coordination is preferred. If service-desk capacity is depleted, "
+            "onboarding operations can still run a reduced fulfillment path."
+        ),
+        "alternate_priority": "medium",
+        "alternate_assignment_group": "onboarding_ops",
+        "alternate_resolution_action": "fulfill",
+        "alternate_route_score_multiplier": 0.84,
+    },
+    "ticket-051": {
+        "planning_note": (
+            "Commercial procurement owns the contract amendment, but this can also "
+            "be treated as a service request when the commercial queue needs triage."
+        ),
+        "alternate_issue_type": "service_request",
+        "alternate_route_score_multiplier": 0.83,
+    },
+    "ticket-053": {
+        "planning_note": (
+            "Security scheduling is ideal, but a compliance acknowledgement is still "
+            "acceptable when the security team only needs to confirm the process."
+        ),
+        "alternate_issue_type": "security_compliance",
+        "alternate_resolution_action": "acknowledge",
+        "alternate_route_score_multiplier": 0.8,
+    },
+    "ticket-054": {
+        "planning_note": (
+            "License operations can fulfill the archive request directly. If that queue "
+            "is saturated, service desk can acknowledge and queue the retrieval."
+        ),
+        "alternate_assignment_group": "service_desk",
+        "alternate_resolution_action": "acknowledge",
+        "alternate_route_score_multiplier": 0.9,
+    },
+}
+CURATED_EXPANSION_RECORDS: list[dict] = [
+    {
+        "ticket_id": "ticket-056",
+        "title": "Vendor DPA redlines need an owner before pricing sign-off",
+        "requester": "procurement@harborcompliance.io",
+        "description": (
+            "Commercial review is already moving, but the team needs to know who owns "
+            "the vendor DPA redlines before pricing can be approved."
+        ),
+        "issue_type": "general_inquiry",
+        "priority": "medium",
+        "assignment_group": "procurement",
+        "resolution_action": "assign",
+        "planning_note": (
+            "Procurement is preferred, but service desk can acknowledge and route the "
+            "questionnaire logistics if the commercial queue is saturated."
+        ),
+        "alternate_assignment_group": "service_desk",
+        "alternate_resolution_action": "acknowledge",
+        "alternate_route_score_multiplier": 0.9,
+    },
+    {
+        "ticket_id": "ticket-057",
+        "title": "Board audit packet needs a timeline for the privileged-account lockout",
+        "requester": "security-ops@atlasbank.io",
+        "description": (
+            "Following up on ticket-046. The board pack needs a timeline and ownership "
+            "summary for the privileged admin lockout before tomorrow morning."
+        ),
+        "issue_type": "identity_access",
+        "priority": "high",
+        "assignment_group": "security_team",
+        "resolution_action": "escalate",
+        "related_ticket_id": "ticket-046",
+        "planning_note": (
+            "Security still owns the privileged-access review, but service desk can "
+            "collect chronology and prepare the packet if the security queue is jammed."
+        ),
+        "alternate_assignment_group": "service_desk",
+        "alternate_resolution_action": "assign",
+        "alternate_route_score_multiplier": 0.72,
+    },
+    {
+        "ticket_id": "ticket-058",
+        "title": "Temporary contractor extension during an onboarding surge",
+        "requester": "hr@talentbridge.co",
+        "description": (
+            "A contractor start date slipped by two weeks and the account needs to stay "
+            "active while the onboarding backlog is already full."
+        ),
+        "issue_type": "onboarding",
+        "priority": "medium",
+        "assignment_group": "service_desk",
+        "resolution_action": "assign",
+        "planning_note": (
+            "Service desk is preferred for cross-team coordination. If coordination "
+            "capacity is exhausted, onboarding operations can fulfill the extension directly."
+        ),
+        "alternate_assignment_group": "onboarding_ops",
+        "alternate_resolution_action": "fulfill",
+        "alternate_route_score_multiplier": 0.85,
+    },
+    {
+        "ticket_id": "ticket-059",
+        "title": "Archived invoice packet plus quarter-close clarification",
+        "requester": "boardops@silverpine.com",
+        "description": (
+            "Finance needs archived invoice PDFs plus a quick note explaining whether any "
+            "quarter-close adjustments are still pending."
+        ),
+        "issue_type": "general_inquiry",
+        "priority": "medium",
+        "assignment_group": "license_ops",
+        "resolution_action": "fulfill",
+        "planning_note": (
+            "Invoice operations can fulfill directly. If that queue is constrained, "
+            "service desk can acknowledge and schedule the retrieval."
+        ),
+        "alternate_assignment_group": "service_desk",
+        "alternate_resolution_action": "acknowledge",
+        "alternate_route_score_multiplier": 0.88,
+    },
+    {
+        "ticket_id": "ticket-060",
+        "title": "Re: Temporary sandbox extension for the signed pilot",
+        "requester": "solutions@bluequarry.io",
+        "description": (
+            "Following up on ticket-047. The customer launch rehearsal is tomorrow, so the "
+            "sandbox extension needs either immediate execution or a commercial owner to unblock it."
+        ),
+        "issue_type": "service_request",
+        "priority": "high",
+        "assignment_group": "service_desk",
+        "resolution_action": "escalate",
+        "related_ticket_id": "ticket-047",
+        "planning_note": (
+            "Immediate operational execution is preferred. Procurement can still own the "
+            "approval path if service-desk capacity is already depleted."
+        ),
+        "alternate_assignment_group": "procurement",
+        "alternate_resolution_action": "assign",
+        "alternate_route_score_multiplier": 0.8,
+    },
+    {
+        "ticket_id": "ticket-061",
+        "title": "Risk-exception review is blocking an SSO restore",
+        "requester": "identity-risk@sterlingmed.io",
+        "description": (
+            "Users cannot log in through SSO until a temporary risk exception is approved. "
+            "The product team may need logs, but the unblock decision is tied to the review."
+        ),
+        "issue_type": "identity_access",
+        "priority": "critical",
+        "assignment_group": "security_team",
+        "resolution_action": "escalate",
+        "planning_note": (
+            "Security owns the final unblock decision. If security is saturated, the "
+            "application team can still take the first-response diagnostics path."
+        ),
+        "alternate_issue_type": "application_support",
+        "alternate_priority": "high",
+        "alternate_assignment_group": "application_team",
+        "alternate_resolution_action": "escalate",
+        "alternate_route_score_multiplier": 0.74,
+    },
+    {
+        "ticket_id": "ticket-062",
+        "title": "Need product remediation evidence for a customer security questionnaire",
+        "requester": "assurance@clientgrid.com",
+        "description": (
+            "A customer questionnaire asks for evidence that a previously remediated "
+            "application vulnerability is fully closed."
+        ),
+        "issue_type": "security_compliance",
+        "priority": "medium",
+        "assignment_group": "application_team",
+        "resolution_action": "fulfill",
+        "planning_note": (
+            "Application engineering is preferred because they hold the remediation artifacts. "
+            "Security can still acknowledge the questionnaire and buy time when app capacity is tight."
+        ),
+        "alternate_assignment_group": "security_team",
+        "alternate_resolution_action": "acknowledge",
+        "alternate_route_score_multiplier": 0.82,
+    },
+    {
+        "ticket_id": "ticket-063",
+        "title": "Subsidiary admin training with a seat-transfer request",
+        "requester": "enablement@globalcorp.com",
+        "description": (
+            "A newly acquired subsidiary needs admin training next week and also wants "
+            "to transfer existing seats into the parent contract."
+        ),
+        "issue_type": "service_request",
+        "priority": "medium",
+        "assignment_group": "procurement",
+        "resolution_action": "assign",
+        "planning_note": (
+            "Procurement owns the commercial transfer. If that queue is overloaded, "
+            "onboarding operations can still deliver the training portion first."
+        ),
+        "alternate_issue_type": "onboarding",
+        "alternate_assignment_group": "onboarding_ops",
+        "alternate_resolution_action": "fulfill",
+        "alternate_route_score_multiplier": 0.78,
+    },
+    {
+        "ticket_id": "ticket-064",
+        "title": "Legal-hold export of invoice history",
+        "requester": "legalops@northshoreenergy.com",
+        "description": (
+            "Legal needs invoice history exported for a hold notice. No pricing change is "
+            "required, but the request must be acknowledged today."
+        ),
+        "issue_type": "general_inquiry",
+        "priority": "high",
+        "assignment_group": "license_ops",
+        "resolution_action": "fulfill",
+        "planning_note": (
+            "License operations can deliver the export. If they are capacity-constrained, "
+            "service desk can acknowledge the request and queue the retrieval."
+        ),
+        "alternate_assignment_group": "service_desk",
+        "alternate_resolution_action": "acknowledge",
+        "alternate_route_score_multiplier": 0.87,
+    },
+    {
+        "ticket_id": "ticket-065",
+        "title": "Cross-functional launch checklist for an acquired support team",
+        "requester": "integration@mergerco.com",
+        "description": (
+            "Twelve support agents from an acquired business need onboarding, mailbox "
+            "setup, and a security attestation before Monday."
+        ),
+        "issue_type": "onboarding",
+        "priority": "high",
+        "assignment_group": "service_desk",
+        "resolution_action": "assign",
+        "planning_note": (
+            "Central coordination is preferred. If service-desk capacity is exhausted, "
+            "onboarding operations can still run a reduced fulfillment path."
+        ),
+        "alternate_priority": "medium",
+        "alternate_assignment_group": "onboarding_ops",
+        "alternate_resolution_action": "fulfill",
+        "alternate_route_score_multiplier": 0.81,
+    },
+    {
+        "ticket_id": "ticket-066",
+        "title": "Pilot customer asks who approves a credential-defense allowlist",
+        "requester": "pilotops@cruxsystems.io",
+        "description": (
+            "A pilot customer needs to know who approves an IP allowlist for a credential-"
+            "defense control before they continue their test."
+        ),
+        "issue_type": "general_inquiry",
+        "priority": "medium",
+        "assignment_group": "security_team",
+        "resolution_action": "assign",
+        "planning_note": (
+            "Security should own the answer when available. If that queue is overloaded, "
+            "service desk can acknowledge and route the ownership question."
+        ),
+        "alternate_assignment_group": "service_desk",
+        "alternate_resolution_action": "acknowledge",
+        "alternate_route_score_multiplier": 0.84,
+    },
+    {
+        "ticket_id": "ticket-067",
+        "title": "Re: Remediation evidence package is now blocking a renewal signature",
+        "requester": "assurance@clientgrid.com",
+        "description": (
+            "Following up on ticket-052. Renewal signature is blocked until the remediation "
+            "evidence package is delivered or a commercial owner confirms the delay."
+        ),
+        "issue_type": "security_compliance",
+        "priority": "high",
+        "assignment_group": "application_team",
+        "resolution_action": "escalate",
+        "related_ticket_id": "ticket-052",
+        "planning_note": (
+            "Application engineering is preferred because they own the evidence. Procurement "
+            "can still coordinate the renewal communication if the evidence queue is saturated."
+        ),
+        "alternate_issue_type": "service_request",
+        "alternate_priority": "medium",
+        "alternate_assignment_group": "procurement",
+        "alternate_resolution_action": "assign",
+        "alternate_route_score_multiplier": 0.76,
+    },
+]
+def _apply_dataset_enhancements(
+    dataset: list[HelpdeskTicketRecord],
+) -> list[HelpdeskTicketRecord]:
+    enhanced_dataset: list[HelpdeskTicketRecord] = []
+    for record in dataset:
+        update = PLANNING_ROUTE_UPDATES.get(record.ticket_id)
+        enhanced_dataset.append(
+            record.model_copy(update=update) if update is not None else record
+        )
+    seen_ids = {record.ticket_id for record in enhanced_dataset}
+    for raw_record in CURATED_EXPANSION_RECORDS:
+        ticket_id = str(raw_record["ticket_id"])
+        if ticket_id in seen_ids:
+            raise ValueError(f"Duplicate ticket_id in curated expansion: {ticket_id}")
+        enhanced_dataset.append(HelpdeskTicketRecord.model_validate(raw_record))
+        seen_ids.add(ticket_id)
+    return enhanced_dataset
 assert tuple(TASKS.keys()) == TASK_IDS
     # Accept UTF-8 files saved with a BOM, which is common on Windows editors.
     with dataset_path.open("r", encoding="utf-8-sig") as f:
         raw = json.load(f)
+    dataset = [HelpdeskTicketRecord.model_validate(r) for r in raw]
+    return _apply_dataset_enhancements(dataset)
 def get_task_definition(task_id: int) -> dict:

tests/test_api_integration.py CHANGED Viewed

@@ -517,8 +517,8 @@ class TestHeuristicInferenceRegression(unittest.TestCase):
                 self.assertIsInstance(reward, float)
     def test_overall_average_reward_in_expected_range(self):
-        """4.2.2 — Overall average reward across all 3 tasks is in [0.8, 1.0],
-        consistent with the recorded heuristic baseline of 0.9400.
         """
         rewards = []
         for task_id in self._TASKS:
@@ -529,8 +529,8 @@ class TestHeuristicInferenceRegression(unittest.TestCase):
         overall_avg = sum(rewards) / len(rewards)
         self.assertGreaterEqual(
             overall_avg,
-            0.75,
-            f"Overall average reward {overall_avg:.4f} is below the smoke-test floor of 0.75",
         )
         self.assertLessEqual(
             overall_avg,

                 self.assertIsInstance(reward, float)
     def test_overall_average_reward_in_expected_range(self):
+        """4.2.2 — Overall average reward across all 3 tasks stays in a healthy
+        smoke-test range for the plain no-investigation heuristic baseline.
         """
         rewards = []
         for task_id in self._TASKS:
         overall_avg = sum(rewards) / len(rewards)
         self.assertGreaterEqual(
             overall_avg,
+            0.45,
+            f"Overall average reward {overall_avg:.4f} is below the smoke-test floor of 0.45",
         )
         self.assertLessEqual(
             overall_avg,

tests/test_competitive_upgrade.py CHANGED Viewed

@@ -643,10 +643,39 @@ class TestInvestigationActions(unittest.TestCase):
                 tool_name="lookup_internal_routing_note",
             )
         )
-        self.assertEqual(obs.last_tool_result["routing_note"], ticket.ambiguity_note)
         self.assertEqual(obs.current_ticket["ambiguity_note"], ticket.ambiguity_note)
         self.assertGreater(obs.reward or 0.0, 0.0)
     def test_submit_without_required_investigation_gets_shaping_penalty(self) -> None:
         from unittest.mock import patch
@@ -710,7 +739,12 @@ class TestQueueEconomics(unittest.TestCase):
         final_obs = env.step(HelpdeskTicketAction(issue_type=ticket.issue_type))
         self.assertTrue(final_obs.done)
-        self.assertAlmostEqual(final_obs.reward, 0.95, places=9)
 class TestTerminalInvalidActionFinalReward(unittest.TestCase):

                 tool_name="lookup_internal_routing_note",
             )
         )
+        self.assertIn(ticket.ambiguity_note, obs.last_tool_result["routing_note"])
         self.assertEqual(obs.current_ticket["ambiguity_note"], ticket.ambiguity_note)
         self.assertGreater(obs.reward or 0.0, 0.0)
+    def test_queue_capacity_forecast_reveals_routing_options(self) -> None:
+        from unittest.mock import patch
+        dataset = load_dataset()
+        ticket = next(
+            (t for t in dataset if t.alternate_route_score_multiplier > 0.0),
+            None,
+        )
+        self.assertIsNotNone(ticket)
+        env = _make_env()
+        with patch.object(env, "_dataset", [ticket]):
+            with patch.object(env, "_tickets_by_id", {ticket.ticket_id: ticket}):
+                obs = env.reset(seed=0, task_id=3, queue_size=1)
+        self.assertNotIn("routing_options", obs.current_ticket)
+        obs = env.step(
+            HelpdeskTicketAction(
+                action_type="investigate",
+                tool_name="lookup_queue_capacity_forecast",
+            )
+        )
+        self.assertEqual(obs.last_tool_result["tool_name"], "lookup_queue_capacity_forecast")
+        self.assertTrue(obs.last_tool_result["found"])
+        self.assertIn("preferred_route_label", obs.last_tool_result)
+        self.assertIn("routing_options", obs.current_ticket)
+        self.assertGreaterEqual(len(obs.current_ticket["routing_options"]), 2)
     def test_submit_without_required_investigation_gets_shaping_penalty(self) -> None:
         from unittest.mock import patch
         final_obs = env.step(HelpdeskTicketAction(issue_type=ticket.issue_type))
         self.assertTrue(final_obs.done)
+        self.assertLess(final_obs.reward, 1.0)
+        self.assertAlmostEqual(
+            final_obs.last_reward_components.get("investigation_penalty_applied", 0.0),
+            0.04,
+            places=9,
+        )
 class TestTerminalInvalidActionFinalReward(unittest.TestCase):

tests/test_extra_fields_penalty.py CHANGED Viewed

@@ -44,8 +44,8 @@ def _make_env() -> HelpdeskTicketRoutingEnvironment:
 class TestExtraFieldsPenalty(unittest.TestCase):
     """Requirement 7: step() rejects actions with fields outside the task's allowed_fields."""
-    def test_extra_fields_returns_open_interval_penalty_reward(self) -> None:
-        """Task 1 penalties should keep the returned reward inside the open interval."""
         env = _make_env()
         obs = env.reset(seed=42, task_id=1)
@@ -61,7 +61,7 @@ class TestExtraFieldsPenalty(unittest.TestCase):
         penalty_obs = env.step(action)
         self.assertIsInstance(penalty_obs, HelpdeskTicketObservation)
-        self.assertGreater(penalty_obs.reward, 0.0)
         self.assertLess(penalty_obs.reward, 1.0)
     def test_extra_fields_advances_ticket_index(self) -> None:
@@ -78,8 +78,8 @@ class TestExtraFieldsPenalty(unittest.TestCase):
         self.assertEqual(penalty_obs.tickets_processed, 1)
-    def test_extra_fields_records_score_inside_open_interval(self) -> None:
-        """per_ticket_scores must stay in the open interval after a penalty step."""
         env = _make_env()
         env.reset(seed=42, task_id=1)
@@ -91,7 +91,7 @@ class TestExtraFieldsPenalty(unittest.TestCase):
         state = env.state
         self.assertEqual(len(state.per_ticket_scores), 1)
-        self.assertGreater(state.per_ticket_scores[0], 0.0)
         self.assertLess(state.per_ticket_scores[0], 1.0)
     def test_extra_fields_history_entry_has_penalty_reason(self) -> None:
@@ -109,7 +109,7 @@ class TestExtraFieldsPenalty(unittest.TestCase):
         entry = penalty_obs.history[0]
         self.assertIn("penalty_reason", entry)
         self.assertIn("assignment_group", entry["penalty_reason"])
-        self.assertGreater(entry["score"], 0.0)
         self.assertLess(entry["score"], 1.0)
     def test_no_extra_fields_grades_normally(self) -> None:

 class TestExtraFieldsPenalty(unittest.TestCase):
     """Requirement 7: step() rejects actions with fields outside the task's allowed_fields."""
+    def test_extra_fields_returns_closed_interval_penalty_reward(self) -> None:
+        """Task 1 penalties should keep the returned reward inside the unit interval."""
         env = _make_env()
         obs = env.reset(seed=42, task_id=1)
         penalty_obs = env.step(action)
         self.assertIsInstance(penalty_obs, HelpdeskTicketObservation)
+        self.assertGreaterEqual(penalty_obs.reward, 0.0)
         self.assertLess(penalty_obs.reward, 1.0)
     def test_extra_fields_advances_ticket_index(self) -> None:
         self.assertEqual(penalty_obs.tickets_processed, 1)
+    def test_extra_fields_records_score_inside_unit_interval(self) -> None:
+        """per_ticket_scores must stay in the unit interval after a penalty step."""
         env = _make_env()
         env.reset(seed=42, task_id=1)
         state = env.state
         self.assertEqual(len(state.per_ticket_scores), 1)
+        self.assertGreaterEqual(state.per_ticket_scores[0], 0.0)
         self.assertLess(state.per_ticket_scores[0], 1.0)
     def test_extra_fields_history_entry_has_penalty_reason(self) -> None:
         entry = penalty_obs.history[0]
         self.assertIn("penalty_reason", entry)
         self.assertIn("assignment_group", entry["penalty_reason"])
+        self.assertGreaterEqual(entry["score"], 0.0)
         self.assertLess(entry["score"], 1.0)
     def test_no_extra_fields_grades_normally(self) -> None:

tests/test_grader_unit.py CHANGED Viewed

@@ -47,7 +47,7 @@ class GraderUnitTests(unittest.TestCase):
         score, breakdown = grade_action(action, ticket, task_id=3)
-        self.assertAlmostEqual(score, 0.99)
         self.assertEqual(
             breakdown,
             {
@@ -88,7 +88,7 @@ class GraderUnitTests(unittest.TestCase):
                         if predicted == expected
                         else ISSUE_TYPE_SIMILARITY.get((predicted, expected), 0.0)
                     )
-                    expected_task_score = max(0.01, min(0.99, raw_expected_score))
                     self.assertAlmostEqual(score, expected_task_score)
                     self.assertEqual(breakdown, {"issue_type": raw_expected_score})
@@ -98,7 +98,7 @@ class GraderUnitTests(unittest.TestCase):
         score, breakdown = grade_action(action, ticket, task_id=1)
-        self.assertAlmostEqual(score, 0.01)
         self.assertEqual(breakdown, {"issue_type": 0.0})
     def test_priority_scoring_uses_defined_proximity_table(self) -> None:
@@ -133,7 +133,7 @@ class GraderUnitTests(unittest.TestCase):
                         {"issue_type": 1.0, "priority": priority_score},
                     )
                     raw_score = 0.6 + 0.4 * priority_score
-                    expected_task_score = max(0.01, min(0.99, raw_score))
                     self.assertAlmostEqual(score, expected_task_score)
     def test_task_2_weights_apply_as_documented(self) -> None:
@@ -195,6 +195,42 @@ class GraderUnitTests(unittest.TestCase):
         )
         self.assertAlmostEqual(score, 0.65)
     def test_resolution_action_partial_credit_uses_declared_similarity_table(self) -> None:
         ticket = _ticket()
         action = HelpdeskTicketAction(
@@ -252,7 +288,7 @@ class GraderUnitTests(unittest.TestCase):
                         },
                     )
                     raw_score = 0.35 + 0.20 + 0.25 * assignment_group_score + 0.20
-                    expected_task_score = max(0.01, min(0.99, raw_score))
                     self.assertAlmostEqual(score, expected_task_score)
     def test_resolution_action_scoring_matches_declared_similarity_table_exhaustively(self) -> None:
@@ -284,7 +320,7 @@ class GraderUnitTests(unittest.TestCase):
                         },
                     )
                     raw_score = 0.35 + 0.20 + 0.25 + 0.20 * resolution_action_score
-                    expected_task_score = max(0.01, min(0.99, raw_score))
                     self.assertAlmostEqual(score, expected_task_score)
     def test_partial_credit_tables_never_override_exact_match(self) -> None:

         score, breakdown = grade_action(action, ticket, task_id=3)
+        self.assertAlmostEqual(score, 1.0)
         self.assertEqual(
             breakdown,
             {
                         if predicted == expected
                         else ISSUE_TYPE_SIMILARITY.get((predicted, expected), 0.0)
                     )
+                    expected_task_score = max(0.0, min(1.0, raw_expected_score))
                     self.assertAlmostEqual(score, expected_task_score)
                     self.assertEqual(breakdown, {"issue_type": raw_expected_score})
         score, breakdown = grade_action(action, ticket, task_id=1)
+        self.assertAlmostEqual(score, 0.0)
         self.assertEqual(breakdown, {"issue_type": 0.0})
     def test_priority_scoring_uses_defined_proximity_table(self) -> None:
                         {"issue_type": 1.0, "priority": priority_score},
                     )
                     raw_score = 0.6 + 0.4 * priority_score
+                    expected_task_score = max(0.0, min(1.0, raw_score))
                     self.assertAlmostEqual(score, expected_task_score)
     def test_task_2_weights_apply_as_documented(self) -> None:
         )
         self.assertAlmostEqual(score, 0.65)
+    def test_alternate_route_can_win_when_primary_route_is_worse(self) -> None:
+        ticket = HelpdeskTicketRecord(
+            ticket_id="ticket-alt",
+            title="Planning ticket",
+            requester="planner@example.com",
+            description="Capacity-sensitive routing decision.",
+            issue_type="service_request",
+            priority="medium",
+            assignment_group="procurement",
+            resolution_action="assign",
+            alternate_issue_type="billing_license",
+            alternate_priority="high",
+            alternate_assignment_group="license_ops",
+            alternate_resolution_action="fulfill",
+            alternate_route_score_multiplier=0.85,
+        )
+        action = HelpdeskTicketAction(
+            issue_type="billing_license",
+            priority="high",
+            assignment_group="license_ops",
+            resolution_action="fulfill",
+        )
+        score, breakdown = grade_action(action, ticket, task_id=3)
+        self.assertAlmostEqual(score, 0.85)
+        self.assertEqual(
+            breakdown,
+            {
+                "issue_type": 0.85,
+                "priority": 0.85,
+                "assignment_group": 0.85,
+                "resolution_action": 0.85,
+            },
+        )
     def test_resolution_action_partial_credit_uses_declared_similarity_table(self) -> None:
         ticket = _ticket()
         action = HelpdeskTicketAction(
                         },
                     )
                     raw_score = 0.35 + 0.20 + 0.25 * assignment_group_score + 0.20
+                    expected_task_score = max(0.0, min(1.0, raw_score))
                     self.assertAlmostEqual(score, expected_task_score)
     def test_resolution_action_scoring_matches_declared_similarity_table_exhaustively(self) -> None:
                         },
                     )
                     raw_score = 0.35 + 0.20 + 0.25 + 0.20 * resolution_action_score
+                    expected_task_score = max(0.0, min(1.0, raw_score))
                     self.assertAlmostEqual(score, expected_task_score)
     def test_partial_credit_tables_never_override_exact_match(self) -> None:

tests/test_tasks_unit.py CHANGED Viewed

@@ -8,7 +8,7 @@ import openenv_test_stubs  # noqa: F401
 from models import HelpdeskTicketRecord
 from server import tasks as task_module
-from server.tasks import TASKS, get_task_definition, load_dataset
 from vocabulary import (
     ASSIGNMENT_GROUPS,
     ISSUE_TYPES,
@@ -51,7 +51,17 @@ class TasksAndDatasetUnitTests(unittest.TestCase):
         dataset = load_dataset()
         self.assertGreaterEqual(len(dataset), 45)
-        self.assertTrue(all(isinstance(record, HelpdeskTicketRecord) for record in dataset))
     def test_dataset_ticket_ids_are_unique(self) -> None:
         dataset = load_dataset()
@@ -100,9 +110,13 @@ class TasksAndDatasetUnitTests(unittest.TestCase):
         dataset = load_dataset()
         ambiguity_count = sum(1 for record in dataset if record.ambiguity_note)
         follow_up_count = sum(1 for record in dataset if record.related_ticket_id)
         self.assertGreaterEqual(ambiguity_count, 4)
         self.assertGreaterEqual(follow_up_count, 3)
     def test_load_dataset_accepts_utf8_bom(self) -> None:
         sample = (
@@ -129,7 +143,8 @@ class TasksAndDatasetUnitTests(unittest.TestCase):
         with mock.patch.object(task_module.Path, "open", fake_open):
             dataset = load_dataset()
-        self.assertEqual([record.ticket_id for record in dataset], ["ticket-bom"])
 if __name__ == "__main__":

 from models import HelpdeskTicketRecord
 from server import tasks as task_module
+from server.tasks import CURATED_EXPANSION_RECORDS, TASKS, get_task_definition, load_dataset
 from vocabulary import (
     ASSIGNMENT_GROUPS,
     ISSUE_TYPES,
         dataset = load_dataset()
         self.assertGreaterEqual(len(dataset), 45)
+        self.assertTrue(
+            all(
+                isinstance(record, HelpdeskTicketRecord)
+                or (
+                    record.__class__.__name__ == "HelpdeskTicketRecord"
+                    and hasattr(record, "model_dump")
+                    and hasattr(record, "ticket_id")
+                )
+                for record in dataset
+            )
+        )
     def test_dataset_ticket_ids_are_unique(self) -> None:
         dataset = load_dataset()
         dataset = load_dataset()
         ambiguity_count = sum(1 for record in dataset if record.ambiguity_note)
         follow_up_count = sum(1 for record in dataset if record.related_ticket_id)
+        alternate_route_count = sum(
+            1 for record in dataset if record.alternate_route_score_multiplier > 0.0
+        )
         self.assertGreaterEqual(ambiguity_count, 4)
         self.assertGreaterEqual(follow_up_count, 3)
+        self.assertGreaterEqual(alternate_route_count, 10)
     def test_load_dataset_accepts_utf8_bom(self) -> None:
         sample = (
         with mock.patch.object(task_module.Path, "open", fake_open):
             dataset = load_dataset()
+        self.assertIn("ticket-bom", [record.ticket_id for record in dataset])
+        self.assertEqual(len(dataset), 1 + len(CURATED_EXPANSION_RECORDS))
 if __name__ == "__main__":