Roopalgn commited on
Commit
043d9e1
·
1 Parent(s): 8241eb5

Upgrade helpdesk env with queue dynamics and operational actions

Browse files
README.md CHANGED
@@ -25,7 +25,7 @@ If a judge reads only one short explanation, it should be this:
25
 
26
  - this environment models a real enterprise workflow, not a toy classification task
27
  - each ticket requires typed routing decisions that are easy to score deterministically
28
- - the task ladder moves cleanly from single-field classification to full operational routing
29
  - the repo is small enough to rerun quickly and explicit enough to understand without hidden business logic
30
 
31
  ## What This Environment Simulates
@@ -34,9 +34,10 @@ The environment models a realistic helpdesk workflow:
34
 
35
  1. a new ticket enters the queue
36
  2. the agent reads the ticket title and description
37
- 3. the agent may investigate with lightweight tools, then submit structured routing fields
38
- 4. the grader assigns deterministic credit
39
- 5. the environment advances to the next ticket until the queue is complete
 
40
 
41
  For hard-task tickets, the environment can now withhold decisive routing context until the agent uses the right investigation tool. That keeps the task from collapsing into one-shot classification and makes tool choice part of the policy.
42
 
@@ -45,7 +46,7 @@ This domain is useful for OpenEnv because it is operationally realistic, easy to
45
  ## Why This Is A Good Hackathon Domain
46
 
47
  - it reflects real enterprise support operations
48
- - the action space is structured and judge-friendly, with a small investigate-versus-submit split
49
  - correctness can be scored deterministically
50
  - the hard task is meaningfully harder than the easy and medium tasks
51
  - the environment is small enough to rerun quickly
@@ -55,9 +56,9 @@ This domain is useful for OpenEnv because it is operationally realistic, easy to
55
  The project uses a queue-based episode model.
56
 
57
  - `reset()` samples a task and a queue of 3 to 5 tickets
58
- - `step()` lets the agent investigate or submit one ticket at a time
59
  - `state()` exposes the internal episode snapshot
60
- - hard-task episodes also track queue-level capacity, alternate acceptable routes, and planning penalties across tickets
61
  - final evaluation is based on the queue outcome, not on isolated per-ticket classification alone
62
 
63
  The environment classes and vocabulary are intentionally frozen to keep collaboration and judging simple.
@@ -91,15 +92,15 @@ Artifacts are written to `analysis/policy_learning_runs/` by default:
91
  - `search_eval_episodes.jsonl`
92
  - `search_eval_trajectories.jsonl`
93
 
94
- The default submit policy inside this runner stays deterministic and local. It reuses the repo's heuristic routing logic plus planning-aware routing overrides, so the search loop can study both investigation policy and queue-aware submission quality without depending on external LLM latency or API cost.
95
 
96
  ## Task Ladder
97
 
98
  | ID | Name | Difficulty | Required Fields | What The Agent Must Do |
99
  |----|------|------------|-----------------|-------------------------|
100
- | 1 | Issue Type Classification | Easy | `issue_type` | classify the ticket into the best issue category |
101
- | 2 | Issue Type And Priority | Medium | `issue_type`, `priority` | classify the issue and estimate urgency |
102
- | 3 | Full Ticket Routing | Hard | `issue_type`, `priority`, `assignment_group`, `resolution_action` | perform full routing and next-step selection |
103
 
104
  ## Locked Vocabulary
105
 
@@ -151,10 +152,13 @@ Visible ticket fields:
151
  - `description`
152
  - optional `ambiguity_note`
153
  - optional `planning_note`
 
154
  - optional `related_ticket_id`
155
  - optional `related_ticket_preview`
156
  - optional `routing_options`
157
  - optional `capacity_state`
 
 
158
 
159
  Each observation also includes:
160
 
@@ -196,16 +200,23 @@ The internal `HelpdeskTicketState` tracks:
196
  - `team_capacity_remaining`
197
  - `high_priority_slots_remaining`
198
  - `escalation_slots_remaining`
 
199
  - `planning_penalty_total`
 
 
 
200
 
201
  ## Grading And Reward
202
 
203
  Scoring is deterministic and normalized to `[0.0, 1.0]`.
204
 
205
- The action model now supports two paths:
206
 
207
  - `action_type="submit"` for the final routing answer
208
  - `action_type="investigate"` with a small built-in tool surface before submission
 
 
 
209
 
210
  Available tools:
211
 
@@ -223,6 +234,8 @@ Hard-task investigation behavior:
223
  - blind or repeated probing does not pay by default
224
  - premature hard-task submission can incur a shaping penalty even when the visible text looks plausible
225
  - resource-greedy routing can add planning penalties later in the queue even when a single ticket looks correct in isolation
 
 
226
  - terminal `rubric_reward` remains the objective evaluation signal, while per-step `reward` is the denser training signal
227
 
228
  Per-field behavior:
@@ -237,9 +250,9 @@ Task weights:
237
 
238
  | Task | Issue Type | Priority | Assignment Group | Resolution Action |
239
  |------|------------|----------|------------------|-------------------|
240
- | 1 | 100% | - | - | - |
241
- | 2 | 60% | 40% | - | - |
242
- | 3 | 35% | 20% | 25% | 20% |
243
 
244
  Final episode rubric reward is queue-based:
245
 
@@ -251,7 +264,7 @@ Both `reward` and `rubric_reward` now use the closed interval `[0.0, 1.0]`.
251
 
252
  Step reward is lightly milestone-shaped: high per-ticket scores get a small bonus and very low scores get a small penalty before the final clamp.
253
 
254
- Final reward also includes a queue-economics penalty when the agent exceeds the free investigation budget. One investigation per queued ticket is free, but extra investigation steps reduce the final reward more noticeably than before. On hard-task queues, assignment-group capacity, high-priority slots, and escalation slots also create cross-ticket trade-offs.
255
 
256
  To make the environment more RL-friendly, each observation now also surfaces structured reward telemetry:
257
 
@@ -302,7 +315,7 @@ It includes:
302
 
303
  ## Difficulty Coverage
304
 
305
- The difficulty ladder is visible both in the task fields and in the dataset itself.
306
 
307
  Easy-style examples:
308
 
@@ -322,6 +335,7 @@ Hard-style examples:
322
  - `ticket-029`: seat expansion combined with a prorating question
323
  - `ticket-038`: follow-up billing thread with escalated urgency
324
  - `ticket-045`: repeated account suspension thread with legal-escalation pressure
 
325
 
326
  ## Repository Layout
327
 
 
25
 
26
  - this environment models a real enterprise workflow, not a toy classification task
27
  - each ticket requires typed routing decisions that are easy to score deterministically
28
+ - the task ladder now keeps full routing on every task and scales observability, queue pressure, and operational controls instead
29
  - the repo is small enough to rerun quickly and explicit enough to understand without hidden business logic
30
 
31
  ## What This Environment Simulates
 
34
 
35
  1. a new ticket enters the queue
36
  2. the agent reads the ticket title and description
37
+ 3. the agent may investigate, request more information, open an incident, defer the ticket, or submit a routing decision
38
+ 4. the queue state mutates: capacity shrinks, incidents stay open, deferred tickets return later, and poor handling can spawn follow-up tickets
39
+ 5. the grader assigns deterministic credit
40
+ 6. the environment advances until the queue is complete
41
 
42
  For hard-task tickets, the environment can now withhold decisive routing context until the agent uses the right investigation tool. That keeps the task from collapsing into one-shot classification and makes tool choice part of the policy.
43
 
 
46
  ## Why This Is A Good Hackathon Domain
47
 
48
  - it reflects real enterprise support operations
49
+ - the action space is structured and judge-friendly, but now includes meaningful operational controls beyond investigate-versus-submit
50
  - correctness can be scored deterministically
51
  - the hard task is meaningfully harder than the easy and medium tasks
52
  - the environment is small enough to rerun quickly
 
56
  The project uses a queue-based episode model.
57
 
58
  - `reset()` samples a task and a queue of 3 to 5 tickets
59
+ - `step()` lets the agent investigate, request clarification, defer, open incidents, or submit one ticket at a time
60
  - `state()` exposes the internal episode snapshot
61
+ - hard-task episodes also track queue-level capacity, incident slots, alternate acceptable routes, planning penalties, SLA pressure, and dynamic follow-up tickets across the queue
62
  - final evaluation is based on the queue outcome, not on isolated per-ticket classification alone
63
 
64
  The environment classes and vocabulary are intentionally frozen to keep collaboration and judging simple.
 
92
  - `search_eval_episodes.jsonl`
93
  - `search_eval_trajectories.jsonl`
94
 
95
+ The default submit policy inside this runner stays deterministic and local. It reuses the repo's heuristic routing logic plus planning-aware routing overrides, and the policy loop can now also exercise operational actions such as `request_info`, `open_incident`, and `defer` without depending on external LLM latency or API cost.
96
 
97
  ## Task Ladder
98
 
99
  | ID | Name | Difficulty | Required Fields | What The Agent Must Do |
100
  |----|------|------------|-----------------|-------------------------|
101
+ | 1 | Guided Full Routing | Easy | `issue_type`, `priority`, `assignment_group`, `resolution_action` | route a mostly visible ticket correctly |
102
+ | 2 | Contextual Full Routing | Medium | `issue_type`, `priority`, `assignment_group`, `resolution_action` | route under partial observability with investigation and clarification |
103
+ | 3 | Adaptive Queue Routing | Hard | `issue_type`, `priority`, `assignment_group`, `resolution_action` | route while managing queue pressure, incidents, deferrals, and downstream follow-ups |
104
 
105
  ## Locked Vocabulary
106
 
 
152
  - `description`
153
  - optional `ambiguity_note`
154
  - optional `planning_note`
155
+ - optional `customer_update_note`
156
  - optional `related_ticket_id`
157
  - optional `related_ticket_preview`
158
  - optional `routing_options`
159
  - optional `capacity_state`
160
+ - optional `operational_context`
161
+ - optional `generated_from_ticket_id`
162
 
163
  Each observation also includes:
164
 
 
200
  - `team_capacity_remaining`
201
  - `high_priority_slots_remaining`
202
  - `escalation_slots_remaining`
203
+ - `incident_slots_remaining`
204
  - `planning_penalty_total`
205
+ - `incident_gap_total`
206
+ - `sla_breach_count`
207
+ - `dynamic_queue_events`
208
 
209
  ## Grading And Reward
210
 
211
  Scoring is deterministic and normalized to `[0.0, 1.0]`.
212
 
213
+ The action model now supports five paths:
214
 
215
  - `action_type="submit"` for the final routing answer
216
  - `action_type="investigate"` with a small built-in tool surface before submission
217
+ - `action_type="request_info"` to ask for customer / operator clarification on the current ticket
218
+ - `action_type="open_incident"` to reserve incident handling capacity before routing risky tickets
219
+ - `action_type="defer"` to push a ticket later in the queue and accept the downstream queue consequences
220
 
221
  Available tools:
222
 
 
234
  - blind or repeated probing does not pay by default
235
  - premature hard-task submission can incur a shaping penalty even when the visible text looks plausible
236
  - resource-greedy routing can add planning penalties later in the queue even when a single ticket looks correct in isolation
237
+ - incident-sensitive tickets can require an explicit `open_incident` step to avoid future follow-up debt
238
+ - bad or incomplete hard-task handling can append a deterministic follow-up ticket later in the same episode
239
  - terminal `rubric_reward` remains the objective evaluation signal, while per-step `reward` is the denser training signal
240
 
241
  Per-field behavior:
 
250
 
251
  | Task | Issue Type | Priority | Assignment Group | Resolution Action |
252
  |------|------------|----------|------------------|-------------------|
253
+ | 1 | 40% | 20% | 20% | 20% |
254
+ | 2 | 32% | 20% | 24% | 24% |
255
+ | 3 | 30% | 20% | 25% | 25% |
256
 
257
  Final episode rubric reward is queue-based:
258
 
 
264
 
265
  Step reward is lightly milestone-shaped: high per-ticket scores get a small bonus and very low scores get a small penalty before the final clamp.
266
 
267
+ Final reward also includes a queue-economics penalty when the agent exceeds the free investigation budget. One investigation-style step per queued ticket is free, but extra investigation or clarification steps reduce the final reward more noticeably than before. On hard-task queues, assignment-group capacity, high-priority slots, escalation slots, incident slots, and deferred-ticket SLA pressure all create cross-ticket trade-offs.
268
 
269
  To make the environment more RL-friendly, each observation now also surfaces structured reward telemetry:
270
 
 
315
 
316
  ## Difficulty Coverage
317
 
318
+ The difficulty ladder is now visible in observability and control, not just in the submitted field count.
319
 
320
  Easy-style examples:
321
 
 
335
  - `ticket-029`: seat expansion combined with a prorating question
336
  - `ticket-038`: follow-up billing thread with escalated urgency
337
  - `ticket-045`: repeated account suspension thread with legal-escalation pressure
338
+ - generated `*-followup` tickets: deterministic reopened cases that only appear when the earlier handling was incomplete or operationally risky
339
 
340
  ## Repository Layout
341
 
inference.py CHANGED
@@ -196,9 +196,11 @@ def format_recent_history_entries(
196
  def build_llm_user_message(ticket: dict, allowed_fields: list[str], instructions: str) -> str:
197
  ambiguity_note = ticket.get("ambiguity_note")
198
  planning_note = ticket.get("planning_note")
 
199
  related_preview = ticket.get("related_ticket_preview") or {}
200
  last_tool_result = ticket.get("last_tool_result")
201
  context_status = ticket.get("context_status") or {}
 
202
  recent_history = ticket.get("recent_history") or []
203
  feedback_summary = ticket.get("feedback_summary")
204
  last_reward_components = ticket.get("last_reward_components") or {}
@@ -213,6 +215,8 @@ def build_llm_user_message(ticket: dict, allowed_fields: list[str], instructions
213
  extra_context_lines.append(f"Ambiguity note: {ambiguity_note}")
214
  if planning_note:
215
  extra_context_lines.append(f"Planning note: {planning_note}")
 
 
216
  if related_preview:
217
  extra_context_lines.extend(
218
  [
@@ -230,6 +234,10 @@ def build_llm_user_message(ticket: dict, allowed_fields: list[str], instructions
230
  extra_context_lines.append(
231
  "Context status: " + json.dumps(context_status, sort_keys=True)
232
  )
 
 
 
 
233
  if capacity_state:
234
  extra_context_lines.append(
235
  "Queue capacity state: " + json.dumps(capacity_state, sort_keys=True)
@@ -572,16 +580,19 @@ def build_routing_text(ticket: dict) -> str:
572
  related_preview = ticket.get("related_ticket_preview") or {}
573
  last_tool_result = ticket.get("last_tool_result") or {}
574
  routing_options = ticket.get("routing_options") or []
 
575
  return " ".join(
576
  [
577
  ticket.get("title", ""),
578
  ticket.get("description", ""),
579
  ticket.get("ambiguity_note", ""),
580
  ticket.get("planning_note", ""),
 
581
  related_preview.get("title", ""),
582
  related_preview.get("description", ""),
583
  json.dumps(last_tool_result, sort_keys=True),
584
  json.dumps(routing_options, sort_keys=True),
 
585
  json.dumps(ticket.get("capacity_state") or {}, sort_keys=True),
586
  json.dumps(ticket.get("future_queue_demand") or {}, sort_keys=True),
587
  ]
@@ -909,9 +920,14 @@ def build_action(
909
  )
910
 
911
 
912
- def should_investigate(ticket: dict, history: list[dict[str, Any]]) -> tuple[bool, str | None]:
 
 
 
 
913
  if not ticket:
914
  return False, None
 
915
  context_status = ticket.get("context_status") or {}
916
  hidden_context_remaining = bool(context_status.get("hidden_context_remaining"))
917
  investigation_required = bool(context_status.get("investigation_required"))
@@ -945,6 +961,7 @@ def should_investigate(ticket: dict, history: list[dict[str, Any]]) -> tuple[boo
945
  tool_name
946
  for tool_name in context_status.get("recommended_tools", [])
947
  if tool_name not in used_tools
 
948
  ]
949
  if hidden_context_remaining and recommended_tools:
950
  return True, recommended_tools[0]
@@ -1018,6 +1035,8 @@ def should_investigate(ticket: dict, history: list[dict[str, Any]]) -> tuple[boo
1018
  )
1019
 
1020
  for tool_name in preferred_tools:
 
 
1021
  if tool_name not in used_tools:
1022
  return True, tool_name
1023
 
@@ -1026,6 +1045,39 @@ def should_investigate(ticket: dict, history: list[dict[str, Any]]) -> tuple[boo
1026
  return False, None
1027
 
1028
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1029
  def merge_ticket_context(ticket: dict, observation: Any) -> dict:
1030
  merged_ticket = dict(ticket)
1031
  if getattr(observation, "last_tool_result", None) is not None:
@@ -1033,6 +1085,7 @@ def merge_ticket_context(ticket: dict, observation: Any) -> dict:
1033
  merged_ticket["recent_history"] = list(getattr(observation, "history", []))
1034
  merged_ticket["queue_position"] = getattr(observation, "queue_position", None)
1035
  merged_ticket["tickets_remaining"] = getattr(observation, "tickets_remaining", None)
 
1036
  merged_ticket["investigation_budget_remaining"] = getattr(
1037
  observation,
1038
  "investigation_budget_remaining",
@@ -1040,6 +1093,10 @@ def merge_ticket_context(ticket: dict, observation: Any) -> dict:
1040
  )
1041
  merged_ticket["average_score_so_far"] = getattr(observation, "average_score_so_far", None)
1042
  merged_ticket["progress_fraction"] = getattr(observation, "progress_fraction", None)
 
 
 
 
1043
  merged_ticket["last_reward_components"] = dict(
1044
  getattr(observation, "last_reward_components", {}) or {}
1045
  )
@@ -1096,7 +1153,11 @@ def run() -> None:
1096
  break
1097
 
1098
  while getattr(obs, "investigation_budget_remaining", 0) > 0:
1099
- investigate, tool_name = should_investigate(ticket, obs.history)
 
 
 
 
1100
  if not investigate or tool_name is None:
1101
  break
1102
  tool_action = HelpdeskTicketAction(
@@ -1129,6 +1190,26 @@ def run() -> None:
1129
  break
1130
 
1131
  ticket_with_context = merge_ticket_context(ticket, obs)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1132
  action, action_source, fallback_reason = build_action(
1133
  ticket_with_context,
1134
  obs.allowed_fields,
 
196
  def build_llm_user_message(ticket: dict, allowed_fields: list[str], instructions: str) -> str:
197
  ambiguity_note = ticket.get("ambiguity_note")
198
  planning_note = ticket.get("planning_note")
199
+ customer_update_note = ticket.get("customer_update_note")
200
  related_preview = ticket.get("related_ticket_preview") or {}
201
  last_tool_result = ticket.get("last_tool_result")
202
  context_status = ticket.get("context_status") or {}
203
+ operational_context = ticket.get("operational_context") or {}
204
  recent_history = ticket.get("recent_history") or []
205
  feedback_summary = ticket.get("feedback_summary")
206
  last_reward_components = ticket.get("last_reward_components") or {}
 
215
  extra_context_lines.append(f"Ambiguity note: {ambiguity_note}")
216
  if planning_note:
217
  extra_context_lines.append(f"Planning note: {planning_note}")
218
+ if customer_update_note:
219
+ extra_context_lines.append(f"Customer update: {customer_update_note}")
220
  if related_preview:
221
  extra_context_lines.extend(
222
  [
 
234
  extra_context_lines.append(
235
  "Context status: " + json.dumps(context_status, sort_keys=True)
236
  )
237
+ if operational_context:
238
+ extra_context_lines.append(
239
+ "Operational context: " + json.dumps(operational_context, sort_keys=True)
240
+ )
241
  if capacity_state:
242
  extra_context_lines.append(
243
  "Queue capacity state: " + json.dumps(capacity_state, sort_keys=True)
 
580
  related_preview = ticket.get("related_ticket_preview") or {}
581
  last_tool_result = ticket.get("last_tool_result") or {}
582
  routing_options = ticket.get("routing_options") or []
583
+ operational_context = ticket.get("operational_context") or {}
584
  return " ".join(
585
  [
586
  ticket.get("title", ""),
587
  ticket.get("description", ""),
588
  ticket.get("ambiguity_note", ""),
589
  ticket.get("planning_note", ""),
590
+ ticket.get("customer_update_note", ""),
591
  related_preview.get("title", ""),
592
  related_preview.get("description", ""),
593
  json.dumps(last_tool_result, sort_keys=True),
594
  json.dumps(routing_options, sort_keys=True),
595
+ json.dumps(operational_context, sort_keys=True),
596
  json.dumps(ticket.get("capacity_state") or {}, sort_keys=True),
597
  json.dumps(ticket.get("future_queue_demand") or {}, sort_keys=True),
598
  ]
 
920
  )
921
 
922
 
923
+ def should_investigate(
924
+ ticket: dict,
925
+ history: list[dict[str, Any]],
926
+ available_tools: list[str] | None = None,
927
+ ) -> tuple[bool, str | None]:
928
  if not ticket:
929
  return False, None
930
+ available_tool_set = set(available_tools or [])
931
  context_status = ticket.get("context_status") or {}
932
  hidden_context_remaining = bool(context_status.get("hidden_context_remaining"))
933
  investigation_required = bool(context_status.get("investigation_required"))
 
961
  tool_name
962
  for tool_name in context_status.get("recommended_tools", [])
963
  if tool_name not in used_tools
964
+ and (not available_tool_set or tool_name in available_tool_set)
965
  ]
966
  if hidden_context_remaining and recommended_tools:
967
  return True, recommended_tools[0]
 
1035
  )
1036
 
1037
  for tool_name in preferred_tools:
1038
+ if available_tool_set and tool_name not in available_tool_set:
1039
+ continue
1040
  if tool_name not in used_tools:
1041
  return True, tool_name
1042
 
 
1045
  return False, None
1046
 
1047
 
1048
+ def choose_operational_action(
1049
+ ticket: dict,
1050
+ history: list[dict[str, Any]],
1051
+ available_action_types: list[str] | None = None,
1052
+ ) -> tuple[HelpdeskTicketAction | None, str | None]:
1053
+ if not ticket:
1054
+ return None, None
1055
+ operational_context = ticket.get("operational_context") or {}
1056
+ recommended_actions = list(operational_context.get("recommended_actions") or [])
1057
+ available_action_set = set(available_action_types or [])
1058
+ current_ticket_id = ticket.get("ticket_id")
1059
+ prior_ticket_history = [
1060
+ entry for entry in history if entry.get("ticket_id") == current_ticket_id
1061
+ ]
1062
+ used_action_types = {
1063
+ entry.get("predicted", {}).get("action_type")
1064
+ for entry in prior_ticket_history
1065
+ if entry.get("predicted")
1066
+ }
1067
+
1068
+ for action_name in ("open_incident", "request_info", "defer"):
1069
+ if action_name not in recommended_actions:
1070
+ continue
1071
+ if available_action_set and action_name not in available_action_set:
1072
+ continue
1073
+ if action_name in used_action_types:
1074
+ continue
1075
+ if action_name == "defer" and ticket.get("tickets_after_current", 0) <= 0:
1076
+ continue
1077
+ return HelpdeskTicketAction(action_type=action_name), action_name
1078
+ return None, None
1079
+
1080
+
1081
  def merge_ticket_context(ticket: dict, observation: Any) -> dict:
1082
  merged_ticket = dict(ticket)
1083
  if getattr(observation, "last_tool_result", None) is not None:
 
1085
  merged_ticket["recent_history"] = list(getattr(observation, "history", []))
1086
  merged_ticket["queue_position"] = getattr(observation, "queue_position", None)
1087
  merged_ticket["tickets_remaining"] = getattr(observation, "tickets_remaining", None)
1088
+ merged_ticket["tickets_after_current"] = getattr(observation, "tickets_after_current", None)
1089
  merged_ticket["investigation_budget_remaining"] = getattr(
1090
  observation,
1091
  "investigation_budget_remaining",
 
1093
  )
1094
  merged_ticket["average_score_so_far"] = getattr(observation, "average_score_so_far", None)
1095
  merged_ticket["progress_fraction"] = getattr(observation, "progress_fraction", None)
1096
+ merged_ticket["available_tools"] = list(getattr(observation, "available_tools", []) or [])
1097
+ merged_ticket["available_action_types"] = list(
1098
+ getattr(observation, "available_action_types", []) or []
1099
+ )
1100
  merged_ticket["last_reward_components"] = dict(
1101
  getattr(observation, "last_reward_components", {}) or {}
1102
  )
 
1153
  break
1154
 
1155
  while getattr(obs, "investigation_budget_remaining", 0) > 0:
1156
+ investigate, tool_name = should_investigate(
1157
+ ticket,
1158
+ obs.history,
1159
+ list(getattr(obs, "available_tools", []) or []),
1160
+ )
1161
  if not investigate or tool_name is None:
1162
  break
1163
  tool_action = HelpdeskTicketAction(
 
1190
  break
1191
 
1192
  ticket_with_context = merge_ticket_context(ticket, obs)
1193
+ operational_action, operational_source = choose_operational_action(
1194
+ ticket_with_context,
1195
+ obs.history,
1196
+ list(getattr(obs, "available_action_types", []) or []),
1197
+ )
1198
+ if operational_action is not None and operational_source is not None:
1199
+ result = sync_client.step(operational_action)
1200
+ obs = result.observation
1201
+ step_num += 1
1202
+ reward = float(result.reward or 0.0)
1203
+ if result.reward is not None:
1204
+ task_step_rewards.append(reward)
1205
+ log_step(
1206
+ step=step_num,
1207
+ action=operational_action,
1208
+ reward=reward,
1209
+ done=bool(result.done),
1210
+ error=operational_source,
1211
+ )
1212
+ continue
1213
  action, action_source, fallback_reason = build_action(
1214
  ticket_with_context,
1215
  obs.allowed_fields,
models.py CHANGED
@@ -16,7 +16,13 @@ ISSUE_TYPE_SET = set(ISSUE_TYPES)
16
  PRIORITY_SET = set(PRIORITIES)
17
  ASSIGNMENT_GROUP_SET = set(ASSIGNMENT_GROUPS)
18
  RESOLUTION_ACTION_SET = set(RESOLUTION_ACTIONS)
19
- ACTION_TYPE_SET = {"submit", "investigate"}
 
 
 
 
 
 
20
  TOOL_NAME_SET = {"lookup_related_ticket", "lookup_requester_history"}
21
  TOOL_NAME_SET.add("lookup_internal_routing_note")
22
  TOOL_NAME_SET.add("lookup_queue_capacity_forecast")
@@ -54,6 +60,9 @@ class HelpdeskTicketRecord(BaseModel):
54
  alternate_assignment_group: Optional[str] = None
55
  alternate_resolution_action: Optional[str] = None
56
  alternate_route_score_multiplier: float = 0.0
 
 
 
57
 
58
  @field_validator("issue_type")
59
  @classmethod
@@ -203,4 +212,16 @@ class HelpdeskTicketState(State):
203
  escalation_slots_remaining: int = 0
204
  planning_penalty_total: float = 0.0
205
  capacity_pressure_tickets_resolved: int = 0
 
 
 
 
 
 
 
 
 
 
 
 
206
  history_entries: list[dict] = Field(default_factory=list)
 
16
  PRIORITY_SET = set(PRIORITIES)
17
  ASSIGNMENT_GROUP_SET = set(ASSIGNMENT_GROUPS)
18
  RESOLUTION_ACTION_SET = set(RESOLUTION_ACTIONS)
19
+ ACTION_TYPE_SET = {
20
+ "submit",
21
+ "investigate",
22
+ "request_info",
23
+ "defer",
24
+ "open_incident",
25
+ }
26
  TOOL_NAME_SET = {"lookup_related_ticket", "lookup_requester_history"}
27
  TOOL_NAME_SET.add("lookup_internal_routing_note")
28
  TOOL_NAME_SET.add("lookup_queue_capacity_forecast")
 
60
  alternate_assignment_group: Optional[str] = None
61
  alternate_resolution_action: Optional[str] = None
62
  alternate_route_score_multiplier: float = 0.0
63
+ customer_update_note: Optional[str] = None
64
+ incident_recommended: bool = False
65
+ generated_from_ticket_id: Optional[str] = None
66
 
67
  @field_validator("issue_type")
68
  @classmethod
 
212
  escalation_slots_remaining: int = 0
213
  planning_penalty_total: float = 0.0
214
  capacity_pressure_tickets_resolved: int = 0
215
+ ticket_request_info_usage: dict[str, int] = Field(default_factory=dict)
216
+ ticket_defer_counts: dict[str, int] = Field(default_factory=dict)
217
+ open_incident_ticket_ids: list[str] = Field(default_factory=list)
218
+ incident_slots_initial: int = 0
219
+ incident_slots_remaining: int = 0
220
+ incident_actions_used: int = 0
221
+ incident_gap_total: float = 0.0
222
+ deferred_ticket_count: int = 0
223
+ sla_breach_count: int = 0
224
+ spawned_follow_up_ticket_ids: list[str] = Field(default_factory=list)
225
+ spawned_follow_up_source_ids: list[str] = Field(default_factory=list)
226
+ dynamic_queue_events: list[dict[str, Any]] = Field(default_factory=list)
227
  history_entries: list[dict] = Field(default_factory=list)
policy_learning.py CHANGED
@@ -244,8 +244,10 @@ def _routing_text(ticket: dict[str, Any]) -> str:
244
  str(ticket.get("description", "")),
245
  str(ticket.get("ambiguity_note", "")),
246
  str(ticket.get("planning_note", "")),
 
247
  json.dumps(ticket.get("last_tool_result") or {}, sort_keys=True),
248
  json.dumps(ticket.get("routing_options") or [], sort_keys=True),
 
249
  json.dumps(ticket.get("capacity_state") or {}, sort_keys=True),
250
  json.dumps(ticket.get("future_queue_demand") or {}, sort_keys=True),
251
  ]
@@ -265,6 +267,7 @@ def infer_ticket_cue(ticket: dict[str, Any]) -> str:
265
  if (
266
  ticket.get("planning_note")
267
  or ticket.get("routing_options")
 
268
  or "lookup_queue_capacity_forecast"
269
  in (context_status.get("recommended_tools") or [])
270
  or any(
@@ -316,6 +319,8 @@ def infer_ticket_cue(ticket: dict[str, Any]) -> str:
316
  for phrase in ("still", "again", "overdue", "legal", "priority")
317
  ):
318
  return "history_pressure"
 
 
319
  return "generic_hidden_context"
320
 
321
 
@@ -397,17 +402,54 @@ def select_cue_based_tool(
397
  *,
398
  hidden_context_remaining: bool,
399
  used_tools: set[str],
 
400
  ) -> str | None:
401
  preferred_tools = preferred_tool_order(
402
  ticket,
403
  hidden_context_remaining=hidden_context_remaining,
404
  )
 
405
  for tool_name in preferred_tools:
 
 
406
  if tool_name not in used_tools:
407
  return tool_name
408
  return None
409
 
410
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
411
  def choose_policy_action(
412
  policy: PolicyConfig,
413
  observation: HelpdeskTicketObservation,
@@ -425,6 +467,7 @@ def choose_policy_action(
425
  used_tools = set(used_tools_by_ticket.get(ticket_id, set()))
426
  context_status = ticket.get("context_status") or {}
427
  hidden_context_remaining = bool(context_status.get("hidden_context_remaining"))
 
428
 
429
  if ticket_investigations < policy.max_investigations_per_ticket:
430
  if policy.strategy == "adaptive" and adaptive_bandit is not None and hidden_context_remaining:
@@ -434,11 +477,13 @@ def choose_policy_action(
434
  ticket,
435
  hidden_context_remaining=hidden_context_remaining,
436
  )
437
- if tool_name not in used_tools
438
  ]
439
  if not candidate_tools:
440
  candidate_tools = [
441
- tool_name for tool_name in AVAILABLE_TOOLS if tool_name not in used_tools
 
 
442
  ]
443
  if candidate_tools:
444
  cue = infer_ticket_cue(ticket)
@@ -454,6 +499,7 @@ def choose_policy_action(
454
  ticket,
455
  hidden_context_remaining=hidden_context_remaining,
456
  used_tools=used_tools,
 
457
  )
458
  if tool_name is not None:
459
  return (
@@ -492,6 +538,14 @@ def choose_policy_action(
492
  infer_ticket_cue(ticket),
493
  )
494
 
 
 
 
 
 
 
 
 
495
  return submit_builder(ticket, list(observation.allowed_fields)), "submit", None
496
 
497
 
 
244
  str(ticket.get("description", "")),
245
  str(ticket.get("ambiguity_note", "")),
246
  str(ticket.get("planning_note", "")),
247
+ str(ticket.get("customer_update_note", "")),
248
  json.dumps(ticket.get("last_tool_result") or {}, sort_keys=True),
249
  json.dumps(ticket.get("routing_options") or [], sort_keys=True),
250
+ json.dumps(ticket.get("operational_context") or {}, sort_keys=True),
251
  json.dumps(ticket.get("capacity_state") or {}, sort_keys=True),
252
  json.dumps(ticket.get("future_queue_demand") or {}, sort_keys=True),
253
  ]
 
267
  if (
268
  ticket.get("planning_note")
269
  or ticket.get("routing_options")
270
+ or (ticket.get("operational_context") or {}).get("incident_recommended")
271
  or "lookup_queue_capacity_forecast"
272
  in (context_status.get("recommended_tools") or [])
273
  or any(
 
319
  for phrase in ("still", "again", "overdue", "legal", "priority")
320
  ):
321
  return "history_pressure"
322
+ if any(phrase in text for phrase in ("incident", "outage", "lockout", "company-wide")):
323
+ return "incident_pressure"
324
  return "generic_hidden_context"
325
 
326
 
 
402
  *,
403
  hidden_context_remaining: bool,
404
  used_tools: set[str],
405
+ available_tools: set[str] | None = None,
406
  ) -> str | None:
407
  preferred_tools = preferred_tool_order(
408
  ticket,
409
  hidden_context_remaining=hidden_context_remaining,
410
  )
411
+ available_tool_set = set(available_tools or [])
412
  for tool_name in preferred_tools:
413
+ if available_tool_set and tool_name not in available_tool_set:
414
+ continue
415
  if tool_name not in used_tools:
416
  return tool_name
417
  return None
418
 
419
 
420
+ def choose_operational_action(
421
+ ticket: dict[str, Any],
422
+ history: list[dict[str, Any]],
423
+ available_action_types: list[str] | None = None,
424
+ ) -> tuple[HelpdeskTicketAction | None, str | None]:
425
+ if not ticket:
426
+ return None, None
427
+ operational_context = ticket.get("operational_context") or {}
428
+ recommended_actions = list(operational_context.get("recommended_actions") or [])
429
+ available_action_set = set(available_action_types or [])
430
+ current_ticket_id = str(ticket.get("ticket_id", ""))
431
+ prior_ticket_history = [
432
+ entry for entry in history if entry.get("ticket_id") == current_ticket_id
433
+ ]
434
+ used_action_types = {
435
+ entry.get("predicted", {}).get("action_type")
436
+ for entry in prior_ticket_history
437
+ if entry.get("predicted")
438
+ }
439
+
440
+ for action_name in ("open_incident", "request_info", "defer"):
441
+ if action_name not in recommended_actions:
442
+ continue
443
+ if available_action_set and action_name not in available_action_set:
444
+ continue
445
+ if action_name in used_action_types:
446
+ continue
447
+ if action_name == "defer" and not ticket.get("tickets_after_current", 0):
448
+ continue
449
+ return HelpdeskTicketAction(action_type=action_name), action_name
450
+ return None, None
451
+
452
+
453
  def choose_policy_action(
454
  policy: PolicyConfig,
455
  observation: HelpdeskTicketObservation,
 
467
  used_tools = set(used_tools_by_ticket.get(ticket_id, set()))
468
  context_status = ticket.get("context_status") or {}
469
  hidden_context_remaining = bool(context_status.get("hidden_context_remaining"))
470
+ available_tools = set(getattr(observation, "available_tools", []) or [])
471
 
472
  if ticket_investigations < policy.max_investigations_per_ticket:
473
  if policy.strategy == "adaptive" and adaptive_bandit is not None and hidden_context_remaining:
 
477
  ticket,
478
  hidden_context_remaining=hidden_context_remaining,
479
  )
480
+ if tool_name not in used_tools and tool_name in available_tools
481
  ]
482
  if not candidate_tools:
483
  candidate_tools = [
484
+ tool_name
485
+ for tool_name in AVAILABLE_TOOLS
486
+ if tool_name not in used_tools and tool_name in available_tools
487
  ]
488
  if candidate_tools:
489
  cue = infer_ticket_cue(ticket)
 
499
  ticket,
500
  hidden_context_remaining=hidden_context_remaining,
501
  used_tools=used_tools,
502
+ available_tools=available_tools,
503
  )
504
  if tool_name is not None:
505
  return (
 
538
  infer_ticket_cue(ticket),
539
  )
540
 
541
+ operational_action, operational_source = choose_operational_action(
542
+ ticket,
543
+ list(getattr(observation, "history", []) or []),
544
+ list(getattr(observation, "available_action_types", []) or []),
545
+ )
546
+ if operational_action is not None and operational_source is not None:
547
+ return operational_action, operational_source, infer_ticket_cue(ticket)
548
+
549
  return submit_builder(ticket, list(observation.allowed_fields)), "submit", None
550
 
551
 
server/environment.py CHANGED
@@ -26,17 +26,37 @@ from vocabulary import (
26
 
27
 
28
  QUEUE_SIZE_RANGE = (3, 5)
29
- AVAILABLE_ACTION_TYPES = ("submit", "investigate")
30
- AVAILABLE_TOOLS = (
31
  "lookup_related_ticket",
32
  "lookup_requester_history",
33
  "lookup_internal_routing_note",
34
  "lookup_queue_capacity_forecast",
35
  )
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
36
  FREE_INVESTIGATIONS_PER_TICKET = 1
37
  EXTRA_INVESTIGATION_COST = 0.04
38
  MAX_EXTRA_INVESTIGATION_PENALTY = 0.25
39
  USEFUL_INVESTIGATION_REWARD = 0.03
 
 
 
40
  PREMATURE_SUBMIT_PENALTY = 0.22
41
  NONDEFAULT_HIDDEN_CONTEXT_PENALTY = 0.08
42
  CONTEXT_COMPLETION_BONUS = 0.06
@@ -49,6 +69,11 @@ TEAM_CAPACITY_OVERFLOW_PENALTY = 0.08
49
  HIGH_PRIORITY_SLOT_OVERFLOW_PENALTY = 0.06
50
  ESCALATION_SLOT_OVERFLOW_PENALTY = 0.05
51
  PLANNING_SUCCESS_BONUS = 0.05
 
 
 
 
 
52
 
53
  TASK3_INVESTIGATION_TOOL_PLAN: dict[str, tuple[str, ...]] = {
54
  "ticket-021": ("lookup_related_ticket", "lookup_requester_history"),
@@ -170,6 +195,7 @@ class HelpdeskTicketRoutingEnvironment(
170
  team_capacity_initial,
171
  high_priority_slots_initial,
172
  escalation_slots_initial,
 
173
  ) = self._initial_capacity_state_for_queue(task_id)
174
 
175
  self._state = HelpdeskTicketState(
@@ -193,8 +219,20 @@ class HelpdeskTicketRoutingEnvironment(
193
  high_priority_slots_remaining=high_priority_slots_initial,
194
  escalation_slots_initial=escalation_slots_initial,
195
  escalation_slots_remaining=escalation_slots_initial,
 
 
196
  planning_penalty_total=0.0,
197
  capacity_pressure_tickets_resolved=0,
 
 
 
 
 
 
 
 
 
 
198
  )
199
 
200
  return self._build_observation(task)
@@ -215,9 +253,19 @@ class HelpdeskTicketRoutingEnvironment(
215
  current_ticket = self._queue[idx]
216
  task_id = self._state.current_task_id
217
  task = get_task_definition(task_id)
 
 
 
 
218
 
219
  if action.action_type == "investigate":
220
  return self._handle_investigation_action(task, current_ticket, action, idx)
 
 
 
 
 
 
221
 
222
  submitted_fields = {
223
  f
@@ -317,6 +365,7 @@ class HelpdeskTicketRoutingEnvironment(
317
  action,
318
  task_id=task_id,
319
  )
 
320
  capacity_penalty, capacity_details = self._apply_capacity_usage(
321
  current_ticket,
322
  action,
@@ -353,7 +402,7 @@ class HelpdeskTicketRoutingEnvironment(
353
  trajectory_reward - self._state.planning_penalty_total
354
  )
355
  final_reward = clamp_open_unit_interval(
356
- rubric_reward - context_penalty - capacity_penalty
357
  )
358
  self._state.total_reward = rubric_reward
359
  investigation_penalty = self._compute_episode_penalty()
@@ -363,7 +412,31 @@ class HelpdeskTicketRoutingEnvironment(
363
  self._state.step_count += 1
364
  self._state.current_ticket_index += 1
365
  final_reward = clamp_open_unit_interval(
366
- step_reward - context_penalty - capacity_penalty
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
367
  )
368
 
369
  reward_components = self._build_reward_components(
@@ -379,6 +452,7 @@ class HelpdeskTicketRoutingEnvironment(
379
  "context_gap_penalty": context_penalty,
380
  "context_completion_bonus": process_bonus,
381
  "risk_penalty": risk_penalty,
 
382
  "capacity_penalty": capacity_penalty,
383
  "delta_adjustment": step_adjustments["delta_adjustment"],
384
  "required_investigation_count": len(self._required_tools_for_ticket(current_ticket)),
@@ -391,6 +465,7 @@ class HelpdeskTicketRoutingEnvironment(
391
  "planning_success_bonus": self._planning_success_bonus()
392
  if is_done
393
  else 0.0,
 
394
  "rubric_reward": rubric_reward,
395
  "trajectory_average_reward": (
396
  trajectory_components["average_reward"]
@@ -457,6 +532,10 @@ class HelpdeskTicketRoutingEnvironment(
457
 
458
  def _apply_episode_economics(self, base_reward: float) -> float:
459
  penalty = self._compute_episode_penalty()
 
 
 
 
460
  return clamp_open_unit_interval(base_reward - penalty)
461
 
462
  def _current_average_score(self) -> float:
@@ -464,6 +543,17 @@ class HelpdeskTicketRoutingEnvironment(
464
  return 0.0
465
  return sum(self._state.per_ticket_scores) / len(self._state.per_ticket_scores)
466
 
 
 
 
 
 
 
 
 
 
 
 
467
  def _ticket_has_alternate_route(self, ticket: HelpdeskTicketRecord) -> bool:
468
  return any(
469
  value is not None
@@ -552,9 +642,9 @@ class HelpdeskTicketRoutingEnvironment(
552
  def _initial_capacity_state_for_queue(
553
  self,
554
  task_id: int,
555
- ) -> tuple[dict[str, int], int, int]:
556
  if task_id != 3:
557
- return {}, 0, 0
558
 
559
  primary_group_demand: dict[str, int] = {}
560
  alternate_relief_by_group: dict[str, int] = {}
@@ -563,6 +653,7 @@ class HelpdeskTicketRoutingEnvironment(
563
  high_priority_relief = 0
564
  escalation_demand = 0
565
  escalation_relief = 0
 
566
 
567
  for ticket in self._queue:
568
  primary_route = self._route_for_ticket(ticket)
@@ -574,6 +665,8 @@ class HelpdeskTicketRoutingEnvironment(
574
  high_priority_demand += 1
575
  if primary_route["resolution_action"] in {"assign", "escalate"}:
576
  escalation_demand += 1
 
 
577
 
578
  if self._ticket_has_alternate_route(ticket):
579
  alternate_route = self._route_for_ticket(ticket, use_alternate=True)
@@ -622,10 +715,16 @@ class HelpdeskTicketRoutingEnvironment(
622
  else:
623
  escalation_slots_initial = escalation_demand
624
 
 
 
 
 
 
625
  return (
626
  team_capacity_initial,
627
  high_priority_slots_initial,
628
  escalation_slots_initial,
 
629
  )
630
 
631
  def _future_queue_demand(self) -> dict[str, Any]:
@@ -634,6 +733,7 @@ class HelpdeskTicketRoutingEnvironment(
634
  high_priority_needed = 0
635
  escalation_needed = 0
636
  capacity_sensitive_tickets = 0
 
637
 
638
  for ticket in future_tickets:
639
  route = self._route_for_ticket(ticket)
@@ -646,6 +746,8 @@ class HelpdeskTicketRoutingEnvironment(
646
  escalation_needed += 1
647
  if self._ticket_has_alternate_route(ticket):
648
  capacity_sensitive_tickets += 1
 
 
649
 
650
  return {
651
  "remaining_ticket_count": len(future_tickets),
@@ -653,6 +755,7 @@ class HelpdeskTicketRoutingEnvironment(
653
  "high_priority_needed": high_priority_needed,
654
  "escalation_needed": escalation_needed,
655
  "capacity_sensitive_tickets": capacity_sensitive_tickets,
 
656
  }
657
 
658
  def _capacity_state_snapshot(self) -> dict[str, Any]:
@@ -663,6 +766,8 @@ class HelpdeskTicketRoutingEnvironment(
663
  "high_priority_slots_initial": self._state.high_priority_slots_initial,
664
  "escalation_slots_remaining": self._state.escalation_slots_remaining,
665
  "escalation_slots_initial": self._state.escalation_slots_initial,
 
 
666
  }
667
 
668
  def _planning_route_recommendation(self, ticket: HelpdeskTicketRecord) -> dict[str, Any]:
@@ -897,8 +1002,191 @@ class HelpdeskTicketRoutingEnvironment(
897
  )
898
  )
899
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
900
  def _ticket_repeated_requester_count(self, ticket: HelpdeskTicketRecord) -> int:
901
- return sum(1 for candidate in self._dataset if candidate.requester == ticket.requester)
 
 
 
 
902
 
903
  def _tool_has_available_context(
904
  self,
@@ -927,7 +1215,7 @@ class HelpdeskTicketRoutingEnvironment(
927
  task_id: int | None = None,
928
  ) -> list[str]:
929
  resolved_task_id = self._state.current_task_id if task_id is None else task_id
930
- if resolved_task_id != 3:
931
  return []
932
  required_tools: list[str] = list(TASK3_INVESTIGATION_TOOL_PLAN.get(ticket.ticket_id, ()))
933
  if ticket.related_ticket_id is not None and "lookup_related_ticket" not in required_tools:
@@ -949,18 +1237,51 @@ class HelpdeskTicketRoutingEnvironment(
949
  ):
950
  required_tools.append("lookup_requester_history")
951
  if (
952
- self._ticket_is_capacity_sensitive(ticket)
 
953
  and "lookup_queue_capacity_forecast" not in required_tools
954
  ):
955
  required_tools.append("lookup_queue_capacity_forecast")
956
  filtered_required_tools: list[str] = []
 
957
  for tool_name in required_tools:
958
  if tool_name in filtered_required_tools:
959
  continue
 
 
960
  if self._tool_has_available_context(ticket, tool_name):
961
  filtered_required_tools.append(tool_name)
962
  return filtered_required_tools
963
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
964
  def _used_tools_for_ticket(self, ticket_id: str) -> list[str]:
965
  return list(self._state.ticket_tool_usage.get(ticket_id, []))
966
 
@@ -983,6 +1304,8 @@ class HelpdeskTicketRoutingEnvironment(
983
  revealed_tools = self._used_tools_for_ticket(ticket.ticket_id)
984
  remaining_tools = self._remaining_tools_for_ticket(ticket)
985
  total_required = max(1, len(required_tools))
 
 
986
  return {
987
  "required_tools": required_tools,
988
  "revealed_tools": revealed_tools,
@@ -990,6 +1313,8 @@ class HelpdeskTicketRoutingEnvironment(
990
  "revealed_count": len(revealed_tools),
991
  "remaining_count": len(remaining_tools),
992
  "completeness": round(len(revealed_tools) / total_required, 2),
 
 
993
  }
994
 
995
  def _default_redacted_description(self, ticket: HelpdeskTicketRecord) -> str:
@@ -1028,7 +1353,7 @@ class HelpdeskTicketRoutingEnvironment(
1028
  return "Helpdesk routing decision"
1029
 
1030
  def _visible_title(self, ticket: HelpdeskTicketRecord) -> str:
1031
- if self._state.current_task_id == 3 and self._remaining_tools_for_ticket(ticket):
1032
  return HARD_TASK_TITLE_REDACTIONS.get(
1033
  ticket.ticket_id,
1034
  self._default_redacted_title(ticket),
@@ -1036,7 +1361,7 @@ class HelpdeskTicketRoutingEnvironment(
1036
  return ticket.title
1037
 
1038
  def _visible_description(self, ticket: HelpdeskTicketRecord) -> str:
1039
- if self._state.current_task_id == 3 and self._remaining_tools_for_ticket(ticket):
1040
  return HARD_TASK_DESCRIPTION_REDACTIONS.get(
1041
  ticket.ticket_id,
1042
  self._default_redacted_description(ticket),
@@ -1122,6 +1447,21 @@ class HelpdeskTicketRoutingEnvironment(
1122
 
1123
  return round(priority_penalty + resolution_penalty, 4)
1124
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1125
  def _build_reward_components(
1126
  self,
1127
  *,
@@ -1198,7 +1538,7 @@ class HelpdeskTicketRoutingEnvironment(
1198
  "assignment_group": ticket.assignment_group,
1199
  "resolution_action": ticket.resolution_action,
1200
  }
1201
- for ticket in self._dataset
1202
  if ticket.requester == current_ticket.requester
1203
  and ticket.ticket_id != current_ticket.ticket_id
1204
  ]
@@ -1235,6 +1575,7 @@ class HelpdeskTicketRoutingEnvironment(
1235
  "capacity_state": recommendation["capacity_state"],
1236
  "future_queue_demand": recommendation["future_demand"],
1237
  "routing_options": routing_options,
 
1238
  }
1239
 
1240
  def _run_investigation_tool(
@@ -1262,6 +1603,8 @@ class HelpdeskTicketRoutingEnvironment(
1262
  ) -> HelpdeskTicketObservation:
1263
  if action.tool_name is None:
1264
  raise ValueError("Investigate actions require tool_name")
 
 
1265
  submitted_fields = {
1266
  field
1267
  for field in ("issue_type", "priority", "assignment_group", "resolution_action")
@@ -1332,10 +1675,279 @@ class HelpdeskTicketRoutingEnvironment(
1332
  self._state.last_reward_components = reward_components
1333
  return self._build_observation(task, done=False, reward=investigation_reward)
1334
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1335
  def _build_ticket_view(self, ticket: HelpdeskTicketRecord) -> dict[str, Any]:
1336
  progress = self._tool_progress_for_ticket(ticket)
1337
  remaining_tools = progress["remaining_tools"]
1338
  used_tools = set(self._used_tools_for_ticket(ticket.ticket_id))
 
1339
  ticket_view: dict[str, Any] = {
1340
  "ticket_id": ticket.ticket_id,
1341
  "title": self._visible_title(ticket),
@@ -1354,6 +1966,14 @@ class HelpdeskTicketRoutingEnvironment(
1354
  "investigations_used_for_ticket": progress["revealed_count"],
1355
  "recommended_tools": list(remaining_tools),
1356
  }
 
 
 
 
 
 
 
 
1357
  if ticket.ambiguity_note is not None and "lookup_internal_routing_note" not in remaining_tools:
1358
  ticket_view["ambiguity_note"] = ticket.ambiguity_note
1359
  if (
@@ -1361,6 +1981,8 @@ class HelpdeskTicketRoutingEnvironment(
1361
  and "lookup_internal_routing_note" not in remaining_tools
1362
  ):
1363
  ticket_view["planning_note"] = ticket.planning_note
 
 
1364
  if ticket.related_ticket_id is not None and "lookup_related_ticket" not in remaining_tools:
1365
  ticket_view["related_ticket_id"] = ticket.related_ticket_id
1366
  related_ticket = self._tickets_by_id.get(ticket.related_ticket_id)
@@ -1376,6 +1998,8 @@ class HelpdeskTicketRoutingEnvironment(
1376
  or "lookup_queue_capacity_forecast" in used_tools
1377
  ):
1378
  ticket_view["routing_options"] = self._routing_options_for_ticket(ticket)
 
 
1379
  return ticket_view
1380
 
1381
  def _build_feedback_summary(
@@ -1398,6 +2022,13 @@ class HelpdeskTicketRoutingEnvironment(
1398
  parts.append(f"Investigation step used {tool_name or 'a tool'}")
1399
  if reward_components and reward_components.get("new_context_revealed"):
1400
  parts.append("new context was revealed")
 
 
 
 
 
 
 
1401
  elif penalty_reason is not None:
1402
  parts.append(f"Penalty applied: {penalty_reason}")
1403
  else:
@@ -1435,6 +2066,12 @@ class HelpdeskTicketRoutingEnvironment(
1435
  planning_penalty_total = reward_components.get("planning_penalty_total")
1436
  if planning_penalty_total:
1437
  parts.append(f"planning_penalty_total={planning_penalty_total:.2f}")
 
 
 
 
 
 
1438
 
1439
  return "; ".join(parts)
1440
 
@@ -1463,6 +2100,12 @@ class HelpdeskTicketRoutingEnvironment(
1463
  "score": score,
1464
  "breakdown": breakdown,
1465
  "queue_position": queue_position,
 
 
 
 
 
 
1466
  }
1467
  if self._state.current_task_id == 3:
1468
  history_entry["capacity_state"] = self._capacity_state_snapshot()
@@ -1479,6 +2122,8 @@ class HelpdeskTicketRoutingEnvironment(
1479
  and "lookup_internal_routing_note" not in remaining_tools
1480
  ):
1481
  history_entry["planning_note"] = ticket.planning_note
 
 
1482
  if ticket.related_ticket_id is not None and "lookup_related_ticket" not in remaining_tools:
1483
  history_entry["related_ticket_id"] = ticket.related_ticket_id
1484
  related_ticket = self._tickets_by_id.get(ticket.related_ticket_id)
@@ -1503,6 +2148,8 @@ class HelpdeskTicketRoutingEnvironment(
1503
  history_entry["tool_result"] = tool_result
1504
  if reward_components is not None:
1505
  history_entry["reward_components"] = reward_components
 
 
1506
  if progress["required_tools"]:
1507
  history_entry["context_progress"] = {
1508
  "hidden_context_remaining": bool(progress["remaining_count"]),
@@ -1562,12 +2209,15 @@ class HelpdeskTicketRoutingEnvironment(
1562
  and (ticket_view.get("context_status") or {}).get("hidden_context_remaining")
1563
  ),
1564
  "action_mode": "investigate_or_submit",
1565
- "available_action_types": list(AVAILABLE_ACTION_TYPES),
1566
  "average_score_so_far": self._state.average_score_so_far,
1567
  "progress_fraction": progress_fraction,
1568
  "investigation_penalty_applied": self._state.investigation_penalty_applied,
1569
  "planning_penalty_total": self._state.planning_penalty_total,
1570
  "planning_penalty_applied": self._state.planning_penalty_applied,
 
 
 
1571
  }
1572
  if self._state.current_task_id == 3:
1573
  metadata["capacity_state"] = self._capacity_state_snapshot()
@@ -1591,8 +2241,8 @@ class HelpdeskTicketRoutingEnvironment(
1591
  task_name=task["name"],
1592
  instructions=task["instructions"],
1593
  allowed_fields=list(task["allowed_fields"]),
1594
- available_action_types=list(AVAILABLE_ACTION_TYPES),
1595
- available_tools=list(AVAILABLE_TOOLS),
1596
  investigation_budget_remaining=self._state.investigation_budget_remaining,
1597
  last_tool_result=self._state.last_tool_result,
1598
  current_ticket=ticket_view,
 
26
 
27
 
28
  QUEUE_SIZE_RANGE = (3, 5)
29
+ BASE_AVAILABLE_TOOLS = (
 
30
  "lookup_related_ticket",
31
  "lookup_requester_history",
32
  "lookup_internal_routing_note",
33
  "lookup_queue_capacity_forecast",
34
  )
35
+ TASK_AVAILABLE_ACTION_TYPES: dict[int, tuple[str, ...]] = {
36
+ 1: ("submit", "investigate"),
37
+ 2: ("submit", "investigate", "request_info"),
38
+ 3: ("submit", "investigate", "request_info", "defer", "open_incident"),
39
+ }
40
+ TASK_AVAILABLE_TOOLS: dict[int, tuple[str, ...]] = {
41
+ 1: (
42
+ "lookup_related_ticket",
43
+ "lookup_requester_history",
44
+ "lookup_internal_routing_note",
45
+ ),
46
+ 2: (
47
+ "lookup_related_ticket",
48
+ "lookup_requester_history",
49
+ "lookup_internal_routing_note",
50
+ ),
51
+ 3: BASE_AVAILABLE_TOOLS,
52
+ }
53
  FREE_INVESTIGATIONS_PER_TICKET = 1
54
  EXTRA_INVESTIGATION_COST = 0.04
55
  MAX_EXTRA_INVESTIGATION_PENALTY = 0.25
56
  USEFUL_INVESTIGATION_REWARD = 0.03
57
+ USEFUL_REQUEST_INFO_REWARD = 0.025
58
+ INCIDENT_OPEN_REWARD = 0.03
59
+ REQUEST_INFO_CONTEXT_COMPLETION_BONUS = 0.02
60
  PREMATURE_SUBMIT_PENALTY = 0.22
61
  NONDEFAULT_HIDDEN_CONTEXT_PENALTY = 0.08
62
  CONTEXT_COMPLETION_BONUS = 0.06
 
69
  HIGH_PRIORITY_SLOT_OVERFLOW_PENALTY = 0.06
70
  ESCALATION_SLOT_OVERFLOW_PENALTY = 0.05
71
  PLANNING_SUCCESS_BONUS = 0.05
72
+ INCIDENT_SLOT_OVERFLOW_PENALTY = 0.05
73
+ INCIDENT_GAP_PENALTY = 0.07
74
+ SLA_BREACH_PENALTY = 0.04
75
+ FOLLOW_UP_SPAWN_THRESHOLD = 0.72
76
+ MAX_DEFERS_PER_TICKET = 1
77
 
78
  TASK3_INVESTIGATION_TOOL_PLAN: dict[str, tuple[str, ...]] = {
79
  "ticket-021": ("lookup_related_ticket", "lookup_requester_history"),
 
195
  team_capacity_initial,
196
  high_priority_slots_initial,
197
  escalation_slots_initial,
198
+ incident_slots_initial,
199
  ) = self._initial_capacity_state_for_queue(task_id)
200
 
201
  self._state = HelpdeskTicketState(
 
219
  high_priority_slots_remaining=high_priority_slots_initial,
220
  escalation_slots_initial=escalation_slots_initial,
221
  escalation_slots_remaining=escalation_slots_initial,
222
+ incident_slots_initial=incident_slots_initial,
223
+ incident_slots_remaining=incident_slots_initial,
224
  planning_penalty_total=0.0,
225
  capacity_pressure_tickets_resolved=0,
226
+ ticket_request_info_usage={},
227
+ ticket_defer_counts={},
228
+ open_incident_ticket_ids=[],
229
+ incident_actions_used=0,
230
+ incident_gap_total=0.0,
231
+ deferred_ticket_count=0,
232
+ sla_breach_count=0,
233
+ spawned_follow_up_ticket_ids=[],
234
+ spawned_follow_up_source_ids=[],
235
+ dynamic_queue_events=[],
236
  )
237
 
238
  return self._build_observation(task)
 
253
  current_ticket = self._queue[idx]
254
  task_id = self._state.current_task_id
255
  task = get_task_definition(task_id)
256
+ if action.action_type not in self._available_action_types_for_task(task_id):
257
+ raise ValueError(
258
+ f"Unsupported action_type {action.action_type!r} for task {task_id}"
259
+ )
260
 
261
  if action.action_type == "investigate":
262
  return self._handle_investigation_action(task, current_ticket, action, idx)
263
+ if action.action_type == "request_info":
264
+ return self._handle_request_info_action(task, current_ticket, action, idx)
265
+ if action.action_type == "defer":
266
+ return self._handle_defer_action(task, current_ticket, action, idx)
267
+ if action.action_type == "open_incident":
268
+ return self._handle_open_incident_action(task, current_ticket, action, idx)
269
 
270
  submitted_fields = {
271
  f
 
365
  action,
366
  task_id=task_id,
367
  )
368
+ incident_gap_penalty = self._incident_gap_penalty(current_ticket, action)
369
  capacity_penalty, capacity_details = self._apply_capacity_usage(
370
  current_ticket,
371
  action,
 
402
  trajectory_reward - self._state.planning_penalty_total
403
  )
404
  final_reward = clamp_open_unit_interval(
405
+ rubric_reward - context_penalty - capacity_penalty - incident_gap_penalty
406
  )
407
  self._state.total_reward = rubric_reward
408
  investigation_penalty = self._compute_episode_penalty()
 
412
  self._state.step_count += 1
413
  self._state.current_ticket_index += 1
414
  final_reward = clamp_open_unit_interval(
415
+ step_reward - context_penalty - capacity_penalty - incident_gap_penalty
416
+ )
417
+
418
+ spawned_follow_up_ticket_id = None
419
+ if self._should_spawn_follow_up(
420
+ current_ticket,
421
+ score=score,
422
+ context_penalty=context_penalty,
423
+ incident_gap_penalty=incident_gap_penalty,
424
+ ):
425
+ spawned_follow_up = self._spawn_follow_up_ticket(current_ticket)
426
+ spawned_follow_up_ticket_id = spawned_follow_up.ticket_id
427
+ if is_done:
428
+ is_done = False
429
+ trajectory_reward = None
430
+ trajectory_components = None
431
+ rubric_reward = None
432
+ final_reward = clamp_open_unit_interval(
433
+ step_reward - context_penalty - capacity_penalty - incident_gap_penalty
434
+ )
435
+ self._state.total_reward = 0.0
436
+ if incident_gap_penalty > 0.0:
437
+ self._state.incident_gap_total = round(
438
+ self._state.incident_gap_total + incident_gap_penalty,
439
+ 4,
440
  )
441
 
442
  reward_components = self._build_reward_components(
 
452
  "context_gap_penalty": context_penalty,
453
  "context_completion_bonus": process_bonus,
454
  "risk_penalty": risk_penalty,
455
+ "incident_gap_penalty": incident_gap_penalty,
456
  "capacity_penalty": capacity_penalty,
457
  "delta_adjustment": step_adjustments["delta_adjustment"],
458
  "required_investigation_count": len(self._required_tools_for_ticket(current_ticket)),
 
465
  "planning_success_bonus": self._planning_success_bonus()
466
  if is_done
467
  else 0.0,
468
+ "spawned_follow_up_ticket_id": spawned_follow_up_ticket_id,
469
  "rubric_reward": rubric_reward,
470
  "trajectory_average_reward": (
471
  trajectory_components["average_reward"]
 
532
 
533
  def _apply_episode_economics(self, base_reward: float) -> float:
534
  penalty = self._compute_episode_penalty()
535
+ penalty += min(
536
+ 0.25,
537
+ self._state.sla_breach_count * SLA_BREACH_PENALTY + self._state.incident_gap_total,
538
+ )
539
  return clamp_open_unit_interval(base_reward - penalty)
540
 
541
  def _current_average_score(self) -> float:
 
543
  return 0.0
544
  return sum(self._state.per_ticket_scores) / len(self._state.per_ticket_scores)
545
 
546
+ def _available_action_types_for_task(self, task_id: int | None = None) -> list[str]:
547
+ resolved_task_id = self._state.current_task_id if task_id is None else task_id
548
+ return list(TASK_AVAILABLE_ACTION_TYPES.get(int(resolved_task_id or 1), ("submit",)))
549
+
550
+ def _available_tools_for_task(self, task_id: int | None = None) -> list[str]:
551
+ resolved_task_id = self._state.current_task_id if task_id is None else task_id
552
+ return list(TASK_AVAILABLE_TOOLS.get(int(resolved_task_id or 1), ()))
553
+
554
+ def _sync_queue_ticket_ids(self) -> None:
555
+ self._state.queue_ticket_ids = [ticket.ticket_id for ticket in self._queue]
556
+
557
  def _ticket_has_alternate_route(self, ticket: HelpdeskTicketRecord) -> bool:
558
  return any(
559
  value is not None
 
642
  def _initial_capacity_state_for_queue(
643
  self,
644
  task_id: int,
645
+ ) -> tuple[dict[str, int], int, int, int]:
646
  if task_id != 3:
647
+ return {}, 0, 0, 0
648
 
649
  primary_group_demand: dict[str, int] = {}
650
  alternate_relief_by_group: dict[str, int] = {}
 
653
  high_priority_relief = 0
654
  escalation_demand = 0
655
  escalation_relief = 0
656
+ incident_demand = 0
657
 
658
  for ticket in self._queue:
659
  primary_route = self._route_for_ticket(ticket)
 
665
  high_priority_demand += 1
666
  if primary_route["resolution_action"] in {"assign", "escalate"}:
667
  escalation_demand += 1
668
+ if self._requires_incident(ticket):
669
+ incident_demand += 1
670
 
671
  if self._ticket_has_alternate_route(ticket):
672
  alternate_route = self._route_for_ticket(ticket, use_alternate=True)
 
715
  else:
716
  escalation_slots_initial = escalation_demand
717
 
718
+ if incident_demand <= 1:
719
+ incident_slots_initial = incident_demand
720
+ else:
721
+ incident_slots_initial = max(1, incident_demand - 1)
722
+
723
  return (
724
  team_capacity_initial,
725
  high_priority_slots_initial,
726
  escalation_slots_initial,
727
+ incident_slots_initial,
728
  )
729
 
730
  def _future_queue_demand(self) -> dict[str, Any]:
 
733
  high_priority_needed = 0
734
  escalation_needed = 0
735
  capacity_sensitive_tickets = 0
736
+ incident_needed = 0
737
 
738
  for ticket in future_tickets:
739
  route = self._route_for_ticket(ticket)
 
746
  escalation_needed += 1
747
  if self._ticket_has_alternate_route(ticket):
748
  capacity_sensitive_tickets += 1
749
+ if self._requires_incident(ticket):
750
+ incident_needed += 1
751
 
752
  return {
753
  "remaining_ticket_count": len(future_tickets),
 
755
  "high_priority_needed": high_priority_needed,
756
  "escalation_needed": escalation_needed,
757
  "capacity_sensitive_tickets": capacity_sensitive_tickets,
758
+ "incident_needed": incident_needed,
759
  }
760
 
761
  def _capacity_state_snapshot(self) -> dict[str, Any]:
 
766
  "high_priority_slots_initial": self._state.high_priority_slots_initial,
767
  "escalation_slots_remaining": self._state.escalation_slots_remaining,
768
  "escalation_slots_initial": self._state.escalation_slots_initial,
769
+ "incident_slots_remaining": self._state.incident_slots_remaining,
770
+ "incident_slots_initial": self._state.incident_slots_initial,
771
  }
772
 
773
  def _planning_route_recommendation(self, ticket: HelpdeskTicketRecord) -> dict[str, Any]:
 
1002
  )
1003
  )
1004
 
1005
+ def _ticket_text(self, ticket: HelpdeskTicketRecord) -> str:
1006
+ return f"{ticket.title} {ticket.description}".lower()
1007
+
1008
+ def _requires_incident(self, ticket: HelpdeskTicketRecord) -> bool:
1009
+ if ticket.incident_recommended:
1010
+ return True
1011
+ text = self._ticket_text(ticket)
1012
+ return (
1013
+ ticket.priority in {"high", "critical"}
1014
+ and ticket.issue_type
1015
+ in {"application_support", "identity_access", "security_compliance"}
1016
+ and any(
1017
+ phrase in text
1018
+ for phrase in (
1019
+ "outage",
1020
+ "cannot log in",
1021
+ "login",
1022
+ "regression",
1023
+ "unstable",
1024
+ "blocked",
1025
+ "lockout",
1026
+ "company-wide",
1027
+ "production",
1028
+ "unresolved",
1029
+ )
1030
+ )
1031
+ )
1032
+
1033
+ def _incident_open_for_ticket(self, ticket: HelpdeskTicketRecord) -> bool:
1034
+ related_ids = {ticket.ticket_id}
1035
+ if ticket.related_ticket_id:
1036
+ related_ids.add(ticket.related_ticket_id)
1037
+ if ticket.generated_from_ticket_id:
1038
+ related_ids.add(ticket.generated_from_ticket_id)
1039
+ return any(ticket_id in self._state.open_incident_ticket_ids for ticket_id in related_ids)
1040
+
1041
+ def _request_info_note_for_ticket(self, ticket: HelpdeskTicketRecord) -> str | None:
1042
+ note_parts: list[str] = []
1043
+ if ticket.customer_update_note:
1044
+ note_parts.append(ticket.customer_update_note)
1045
+ if ticket.related_ticket_id is not None:
1046
+ note_parts.append(
1047
+ "The requester confirmed this is connected to the earlier case and wants a single accountable owner."
1048
+ )
1049
+ if self._ticket_has_nondefault_routing(ticket):
1050
+ note_parts.append(
1051
+ "The requester clarified that the blocker owner matters more than the superficial request label."
1052
+ )
1053
+ if self._ticket_has_alternate_route(ticket):
1054
+ note_parts.append(
1055
+ "Operations said an acknowledged fallback path is acceptable if the preferred queue is saturated."
1056
+ )
1057
+ if self._requires_incident(ticket):
1058
+ note_parts.append(
1059
+ "Stakeholders asked for incident-style coordination because the issue is still operationally active."
1060
+ )
1061
+ if not note_parts:
1062
+ return None
1063
+ return " ".join(note_parts)
1064
+
1065
+ def _request_info_used(self, ticket_id: str) -> bool:
1066
+ return self._state.ticket_request_info_usage.get(ticket_id, 0) > 0
1067
+
1068
+ def _defer_count(self, ticket_id: str) -> int:
1069
+ return self._state.ticket_defer_counts.get(ticket_id, 0)
1070
+
1071
+ def _record_dynamic_queue_event(self, event_type: str, **details: Any) -> None:
1072
+ self._state.dynamic_queue_events.append({"event_type": event_type, **details})
1073
+
1074
+ def _escalate_priority_level(self, priority: str) -> str:
1075
+ if priority == "low":
1076
+ return "medium"
1077
+ if priority == "medium":
1078
+ return "high"
1079
+ return "critical"
1080
+
1081
+ def _escalate_ticket_after_delay(
1082
+ self,
1083
+ ticket: HelpdeskTicketRecord,
1084
+ *,
1085
+ defer_count: int,
1086
+ ) -> HelpdeskTicketRecord:
1087
+ escalated_priority = self._escalate_priority_level(ticket.priority)
1088
+ description_suffix = (
1089
+ " The ticket was deferred earlier in the queue and now needs firmer ownership."
1090
+ )
1091
+ customer_update = (
1092
+ ticket.customer_update_note
1093
+ or "The requester followed up after the delay and wants a committed owner."
1094
+ )
1095
+ return ticket.model_copy(
1096
+ update={
1097
+ "priority": escalated_priority,
1098
+ "title": (
1099
+ ticket.title
1100
+ if ticket.title.lower().startswith("re:")
1101
+ else f"Re: {ticket.title}"
1102
+ ),
1103
+ "description": f"{ticket.description}{description_suffix}",
1104
+ "customer_update_note": customer_update,
1105
+ }
1106
+ )
1107
+
1108
+ def _should_spawn_follow_up(
1109
+ self,
1110
+ ticket: HelpdeskTicketRecord,
1111
+ *,
1112
+ score: float,
1113
+ context_penalty: float,
1114
+ incident_gap_penalty: float,
1115
+ ) -> bool:
1116
+ if self._state.current_task_id != 3:
1117
+ return False
1118
+ if ticket.generated_from_ticket_id is not None:
1119
+ return False
1120
+ if ticket.ticket_id in self._state.spawned_follow_up_source_ids:
1121
+ return False
1122
+ if not (
1123
+ self._requires_incident(ticket)
1124
+ or self._ticket_mentions_follow_up(ticket)
1125
+ or ticket.related_ticket_id is not None
1126
+ or ticket.priority in {"high", "critical"}
1127
+ ):
1128
+ return False
1129
+ return (
1130
+ score < FOLLOW_UP_SPAWN_THRESHOLD
1131
+ or (context_penalty >= 0.15 and score < 0.9)
1132
+ or incident_gap_penalty > 0.0
1133
+ )
1134
+
1135
+ def _spawn_follow_up_ticket(self, ticket: HelpdeskTicketRecord) -> HelpdeskTicketRecord:
1136
+ follow_up_ticket = HelpdeskTicketRecord(
1137
+ ticket_id=f"{ticket.ticket_id}-followup",
1138
+ title=(
1139
+ ticket.title
1140
+ if ticket.title.lower().startswith("re:")
1141
+ else f"Re: {ticket.title}"
1142
+ ),
1143
+ requester=ticket.requester,
1144
+ description=(
1145
+ "The earlier handling did not fully resolve the issue. The requester is "
1146
+ f"following up on {ticket.ticket_id} and needs a single accountable owner now."
1147
+ ),
1148
+ issue_type=ticket.issue_type,
1149
+ priority=(
1150
+ "critical"
1151
+ if ticket.priority in {"high", "critical"}
1152
+ else self._escalate_priority_level(ticket.priority)
1153
+ ),
1154
+ assignment_group=ticket.assignment_group,
1155
+ resolution_action=(
1156
+ "escalate"
1157
+ if ticket.priority in {"high", "critical"} or self._requires_incident(ticket)
1158
+ else ticket.resolution_action
1159
+ ),
1160
+ ambiguity_note=(
1161
+ ticket.ambiguity_note
1162
+ or "Prior routing did not settle ownership; route to the team that can actually unblock the issue."
1163
+ ),
1164
+ related_ticket_id=ticket.ticket_id,
1165
+ planning_note=ticket.planning_note,
1166
+ customer_update_note=(
1167
+ "The requester said the last response did not resolve the blocker and wants an accountable next owner."
1168
+ ),
1169
+ incident_recommended=self._requires_incident(ticket),
1170
+ generated_from_ticket_id=ticket.ticket_id,
1171
+ )
1172
+ self._queue.append(follow_up_ticket)
1173
+ self._tickets_by_id[follow_up_ticket.ticket_id] = follow_up_ticket
1174
+ self._sync_queue_ticket_ids()
1175
+ self._state.spawned_follow_up_ticket_ids.append(follow_up_ticket.ticket_id)
1176
+ self._state.spawned_follow_up_source_ids.append(ticket.ticket_id)
1177
+ self._record_dynamic_queue_event(
1178
+ "spawn_follow_up",
1179
+ source_ticket_id=ticket.ticket_id,
1180
+ follow_up_ticket_id=follow_up_ticket.ticket_id,
1181
+ )
1182
+ return follow_up_ticket
1183
+
1184
  def _ticket_repeated_requester_count(self, ticket: HelpdeskTicketRecord) -> int:
1185
+ return sum(
1186
+ 1
1187
+ for candidate in self._tickets_by_id.values()
1188
+ if candidate.requester == ticket.requester
1189
+ )
1190
 
1191
  def _tool_has_available_context(
1192
  self,
 
1215
  task_id: int | None = None,
1216
  ) -> list[str]:
1217
  resolved_task_id = self._state.current_task_id if task_id is None else task_id
1218
+ if resolved_task_id is None or resolved_task_id < 2:
1219
  return []
1220
  required_tools: list[str] = list(TASK3_INVESTIGATION_TOOL_PLAN.get(ticket.ticket_id, ()))
1221
  if ticket.related_ticket_id is not None and "lookup_related_ticket" not in required_tools:
 
1237
  ):
1238
  required_tools.append("lookup_requester_history")
1239
  if (
1240
+ resolved_task_id == 3
1241
+ and self._ticket_is_capacity_sensitive(ticket)
1242
  and "lookup_queue_capacity_forecast" not in required_tools
1243
  ):
1244
  required_tools.append("lookup_queue_capacity_forecast")
1245
  filtered_required_tools: list[str] = []
1246
+ allowed_tool_set = set(self._available_tools_for_task(resolved_task_id))
1247
  for tool_name in required_tools:
1248
  if tool_name in filtered_required_tools:
1249
  continue
1250
+ if tool_name not in allowed_tool_set:
1251
+ continue
1252
  if self._tool_has_available_context(ticket, tool_name):
1253
  filtered_required_tools.append(tool_name)
1254
  return filtered_required_tools
1255
 
1256
+ def _recommended_operational_actions(self, ticket: HelpdeskTicketRecord) -> list[str]:
1257
+ recommended_actions: list[str] = []
1258
+ available_action_types = set(self._available_action_types_for_task())
1259
+ if (
1260
+ "request_info" in available_action_types
1261
+ and self._request_info_note_for_ticket(ticket) is not None
1262
+ and not self._request_info_used(ticket.ticket_id)
1263
+ ):
1264
+ recommended_actions.append("request_info")
1265
+ if (
1266
+ "open_incident" in available_action_types
1267
+ and self._requires_incident(ticket)
1268
+ and not self._incident_open_for_ticket(ticket)
1269
+ ):
1270
+ recommended_actions.append("open_incident")
1271
+ if (
1272
+ "defer" in available_action_types
1273
+ and self._defer_count(ticket.ticket_id) < MAX_DEFERS_PER_TICKET
1274
+ and self._state.current_ticket_index < len(self._queue) - 1
1275
+ and ticket.priority not in {"high", "critical"}
1276
+ and (
1277
+ bool(self._remaining_tools_for_ticket(ticket))
1278
+ or self._ticket_is_capacity_sensitive(ticket)
1279
+ or self._request_info_note_for_ticket(ticket) is not None
1280
+ )
1281
+ ):
1282
+ recommended_actions.append("defer")
1283
+ return recommended_actions
1284
+
1285
  def _used_tools_for_ticket(self, ticket_id: str) -> list[str]:
1286
  return list(self._state.ticket_tool_usage.get(ticket_id, []))
1287
 
 
1304
  revealed_tools = self._used_tools_for_ticket(ticket.ticket_id)
1305
  remaining_tools = self._remaining_tools_for_ticket(ticket)
1306
  total_required = max(1, len(required_tools))
1307
+ request_info_used = self._request_info_used(ticket.ticket_id)
1308
+ operational_actions = self._recommended_operational_actions(ticket)
1309
  return {
1310
  "required_tools": required_tools,
1311
  "revealed_tools": revealed_tools,
 
1313
  "revealed_count": len(revealed_tools),
1314
  "remaining_count": len(remaining_tools),
1315
  "completeness": round(len(revealed_tools) / total_required, 2),
1316
+ "request_info_used": request_info_used,
1317
+ "recommended_operational_actions": operational_actions,
1318
  }
1319
 
1320
  def _default_redacted_description(self, ticket: HelpdeskTicketRecord) -> str:
 
1353
  return "Helpdesk routing decision"
1354
 
1355
  def _visible_title(self, ticket: HelpdeskTicketRecord) -> str:
1356
+ if self._state.current_task_id in {2, 3} and self._remaining_tools_for_ticket(ticket):
1357
  return HARD_TASK_TITLE_REDACTIONS.get(
1358
  ticket.ticket_id,
1359
  self._default_redacted_title(ticket),
 
1361
  return ticket.title
1362
 
1363
  def _visible_description(self, ticket: HelpdeskTicketRecord) -> str:
1364
+ if self._state.current_task_id in {2, 3} and self._remaining_tools_for_ticket(ticket):
1365
  return HARD_TASK_DESCRIPTION_REDACTIONS.get(
1366
  ticket.ticket_id,
1367
  self._default_redacted_description(ticket),
 
1447
 
1448
  return round(priority_penalty + resolution_penalty, 4)
1449
 
1450
+ def _incident_gap_penalty(
1451
+ self,
1452
+ ticket: HelpdeskTicketRecord,
1453
+ action: HelpdeskTicketAction,
1454
+ ) -> float:
1455
+ if self._state.current_task_id != 3:
1456
+ return 0.0
1457
+ if not self._requires_incident(ticket):
1458
+ return 0.0
1459
+ if self._incident_open_for_ticket(ticket):
1460
+ return 0.0
1461
+ if action.resolution_action in {"escalate", "assign"}:
1462
+ return round(INCIDENT_GAP_PENALTY / 2, 4)
1463
+ return INCIDENT_GAP_PENALTY
1464
+
1465
  def _build_reward_components(
1466
  self,
1467
  *,
 
1538
  "assignment_group": ticket.assignment_group,
1539
  "resolution_action": ticket.resolution_action,
1540
  }
1541
+ for ticket in self._tickets_by_id.values()
1542
  if ticket.requester == current_ticket.requester
1543
  and ticket.ticket_id != current_ticket.ticket_id
1544
  ]
 
1575
  "capacity_state": recommendation["capacity_state"],
1576
  "future_queue_demand": recommendation["future_demand"],
1577
  "routing_options": routing_options,
1578
+ "incident_recommended": self._requires_incident(current_ticket),
1579
  }
1580
 
1581
  def _run_investigation_tool(
 
1603
  ) -> HelpdeskTicketObservation:
1604
  if action.tool_name is None:
1605
  raise ValueError("Investigate actions require tool_name")
1606
+ if action.tool_name not in self._available_tools_for_task():
1607
+ raise ValueError(f"Unsupported tool_name for current task: {action.tool_name}")
1608
  submitted_fields = {
1609
  field
1610
  for field in ("issue_type", "priority", "assignment_group", "resolution_action")
 
1675
  self._state.last_reward_components = reward_components
1676
  return self._build_observation(task, done=False, reward=investigation_reward)
1677
 
1678
+ def _handle_request_info_action(
1679
+ self,
1680
+ task: dict,
1681
+ current_ticket: HelpdeskTicketRecord,
1682
+ action: HelpdeskTicketAction,
1683
+ idx: int,
1684
+ ) -> HelpdeskTicketObservation:
1685
+ submitted_fields = {
1686
+ field
1687
+ for field in ("issue_type", "priority", "assignment_group", "resolution_action")
1688
+ if getattr(action, field) is not None
1689
+ }
1690
+ if submitted_fields:
1691
+ raise ValueError(
1692
+ "request_info actions cannot include submit fields: "
1693
+ f"{sorted(submitted_fields)}"
1694
+ )
1695
+
1696
+ ticket_id = current_ticket.ticket_id
1697
+ note = self._request_info_note_for_ticket(current_ticket)
1698
+ already_used = self._request_info_used(ticket_id)
1699
+ useful_request = note is not None and not already_used
1700
+ self._state.ticket_request_info_usage[ticket_id] = (
1701
+ self._state.ticket_request_info_usage.get(ticket_id, 0) + 1
1702
+ )
1703
+ self._state.step_count += 1
1704
+ self._state.investigation_steps += 1
1705
+ self._state.investigation_budget_remaining = max(
1706
+ 0,
1707
+ self._state.investigation_budget_remaining - 1,
1708
+ )
1709
+ request_reward = USEFUL_REQUEST_INFO_REWARD if useful_request else 0.0
1710
+ tool_result = {
1711
+ "action_type": "request_info",
1712
+ "found": useful_request,
1713
+ "ticket_id": ticket_id,
1714
+ "customer_update_note": note if useful_request else "",
1715
+ }
1716
+ self._state.last_tool_result = tool_result
1717
+ self._state.last_step_reward = request_reward
1718
+ self._state.reward = request_reward
1719
+ self._state.done = False
1720
+ self._state.investigation_penalty_applied = self._compute_episode_penalty()
1721
+ progress = self._tool_progress_for_ticket(current_ticket)
1722
+ reward_components = self._build_reward_components(
1723
+ ticket_score=0.0,
1724
+ field_breakdown={},
1725
+ shaped_step_reward=request_reward,
1726
+ reward_kind="operational",
1727
+ final_reward=request_reward,
1728
+ investigation_penalty=self._state.investigation_penalty_applied,
1729
+ extra_details={
1730
+ "operational_action": "request_info",
1731
+ "new_context_revealed": useful_request,
1732
+ "customer_update_visible": useful_request,
1733
+ "hidden_context_remaining_count": progress["remaining_count"],
1734
+ "context_completeness": progress["completeness"],
1735
+ },
1736
+ )
1737
+ self._state.history_entries.append(
1738
+ self._build_history_entry(
1739
+ current_ticket,
1740
+ predicted=action.model_dump(exclude_none=True),
1741
+ score=0.0,
1742
+ breakdown={},
1743
+ queue_position=idx + 1,
1744
+ reward=request_reward,
1745
+ reward_kind="operational",
1746
+ tool_result=tool_result,
1747
+ reward_components=reward_components,
1748
+ )
1749
+ )
1750
+ self._state.last_reward_components = reward_components
1751
+ return self._build_observation(task, done=False, reward=request_reward)
1752
+
1753
+ def _handle_defer_action(
1754
+ self,
1755
+ task: dict,
1756
+ current_ticket: HelpdeskTicketRecord,
1757
+ action: HelpdeskTicketAction,
1758
+ idx: int,
1759
+ ) -> HelpdeskTicketObservation:
1760
+ submitted_fields = {
1761
+ field
1762
+ for field in ("issue_type", "priority", "assignment_group", "resolution_action")
1763
+ if getattr(action, field) is not None
1764
+ }
1765
+ if submitted_fields:
1766
+ raise ValueError(
1767
+ "defer actions cannot include submit fields: "
1768
+ f"{sorted(submitted_fields)}"
1769
+ )
1770
+
1771
+ ticket_id = current_ticket.ticket_id
1772
+ existing_count = self._defer_count(ticket_id)
1773
+ defer_allowed = (
1774
+ existing_count < MAX_DEFERS_PER_TICKET
1775
+ and idx < len(self._queue) - 1
1776
+ and self._state.current_task_id in {2, 3}
1777
+ )
1778
+ defer_count = existing_count + 1
1779
+ reward = 0.0
1780
+ sla_risk = current_ticket.priority in {"high", "critical"} or self._ticket_mentions_follow_up(
1781
+ current_ticket
1782
+ )
1783
+ moved_ticket = current_ticket
1784
+
1785
+ if defer_allowed:
1786
+ self._state.ticket_defer_counts[ticket_id] = defer_count
1787
+ self._state.deferred_ticket_count += 1
1788
+ if sla_risk:
1789
+ self._state.sla_breach_count += 1
1790
+ moved_ticket = self._escalate_ticket_after_delay(
1791
+ current_ticket,
1792
+ defer_count=defer_count,
1793
+ )
1794
+ elif (
1795
+ self._remaining_tools_for_ticket(current_ticket)
1796
+ or self._request_info_note_for_ticket(current_ticket) is not None
1797
+ or self._ticket_is_capacity_sensitive(current_ticket)
1798
+ ):
1799
+ reward = REQUEST_INFO_CONTEXT_COMPLETION_BONUS
1800
+ self._queue.pop(idx)
1801
+ self._queue.append(moved_ticket)
1802
+ self._tickets_by_id[moved_ticket.ticket_id] = moved_ticket
1803
+ self._sync_queue_ticket_ids()
1804
+ self._record_dynamic_queue_event(
1805
+ "defer",
1806
+ ticket_id=ticket_id,
1807
+ defer_count=defer_count,
1808
+ sla_risk=sla_risk,
1809
+ )
1810
+ else:
1811
+ self._state.sla_breach_count += 1
1812
+ self._record_dynamic_queue_event(
1813
+ "defer_denied",
1814
+ ticket_id=ticket_id,
1815
+ defer_count=defer_count,
1816
+ )
1817
+
1818
+ self._state.step_count += 1
1819
+ self._state.last_tool_result = {
1820
+ "action_type": "defer",
1821
+ "ticket_id": ticket_id,
1822
+ "defer_allowed": defer_allowed,
1823
+ "defer_count": defer_count,
1824
+ "sla_risk": sla_risk,
1825
+ }
1826
+ self._state.last_step_reward = reward
1827
+ self._state.reward = reward
1828
+ self._state.done = False
1829
+ reward_components = self._build_reward_components(
1830
+ ticket_score=0.0,
1831
+ field_breakdown={},
1832
+ shaped_step_reward=reward,
1833
+ reward_kind="operational",
1834
+ final_reward=reward,
1835
+ extra_details={
1836
+ "operational_action": "defer",
1837
+ "defer_allowed": defer_allowed,
1838
+ "defer_count": defer_count,
1839
+ "sla_breach_count": self._state.sla_breach_count,
1840
+ },
1841
+ )
1842
+ self._state.history_entries.append(
1843
+ self._build_history_entry(
1844
+ current_ticket,
1845
+ predicted=action.model_dump(exclude_none=True),
1846
+ score=0.0,
1847
+ breakdown={},
1848
+ queue_position=idx + 1,
1849
+ reward=reward,
1850
+ reward_kind="operational",
1851
+ tool_result=self._state.last_tool_result,
1852
+ reward_components=reward_components,
1853
+ )
1854
+ )
1855
+ self._state.last_reward_components = reward_components
1856
+ return self._build_observation(task, done=False, reward=reward)
1857
+
1858
+ def _handle_open_incident_action(
1859
+ self,
1860
+ task: dict,
1861
+ current_ticket: HelpdeskTicketRecord,
1862
+ action: HelpdeskTicketAction,
1863
+ idx: int,
1864
+ ) -> HelpdeskTicketObservation:
1865
+ submitted_fields = {
1866
+ field
1867
+ for field in ("issue_type", "priority", "assignment_group", "resolution_action")
1868
+ if getattr(action, field) is not None
1869
+ }
1870
+ if submitted_fields:
1871
+ raise ValueError(
1872
+ "open_incident actions cannot include submit fields: "
1873
+ f"{sorted(submitted_fields)}"
1874
+ )
1875
+
1876
+ useful_incident = (
1877
+ self._state.current_task_id == 3
1878
+ and self._requires_incident(current_ticket)
1879
+ and not self._incident_open_for_ticket(current_ticket)
1880
+ )
1881
+ overflow = 0
1882
+ incident_reward = 0.0
1883
+ if useful_incident:
1884
+ self._state.open_incident_ticket_ids.append(current_ticket.ticket_id)
1885
+ self._state.incident_actions_used += 1
1886
+ overflow = max(0, 1 - self._state.incident_slots_remaining)
1887
+ self._state.incident_slots_remaining = max(
1888
+ 0,
1889
+ self._state.incident_slots_remaining - 1,
1890
+ )
1891
+ overflow_penalty = round(overflow * INCIDENT_SLOT_OVERFLOW_PENALTY, 4)
1892
+ if overflow_penalty > 0.0:
1893
+ self._state.planning_penalty_total = round(
1894
+ self._state.planning_penalty_total + overflow_penalty,
1895
+ 4,
1896
+ )
1897
+ self._state.planning_penalty_applied = overflow_penalty
1898
+ incident_reward = clamp_open_unit_interval(
1899
+ INCIDENT_OPEN_REWARD - overflow_penalty
1900
+ )
1901
+ self._record_dynamic_queue_event(
1902
+ "open_incident",
1903
+ ticket_id=current_ticket.ticket_id,
1904
+ overflow=overflow,
1905
+ )
1906
+
1907
+ self._state.step_count += 1
1908
+ self._state.last_tool_result = {
1909
+ "action_type": "open_incident",
1910
+ "ticket_id": current_ticket.ticket_id,
1911
+ "incident_open": useful_incident,
1912
+ "incident_slots_remaining": self._state.incident_slots_remaining,
1913
+ "overflow": overflow,
1914
+ }
1915
+ self._state.last_step_reward = incident_reward
1916
+ self._state.reward = incident_reward
1917
+ self._state.done = False
1918
+ reward_components = self._build_reward_components(
1919
+ ticket_score=0.0,
1920
+ field_breakdown={},
1921
+ shaped_step_reward=incident_reward,
1922
+ reward_kind="operational",
1923
+ final_reward=incident_reward,
1924
+ extra_details={
1925
+ "operational_action": "open_incident",
1926
+ "incident_open": useful_incident,
1927
+ "incident_slots_remaining": self._state.incident_slots_remaining,
1928
+ },
1929
+ )
1930
+ self._state.history_entries.append(
1931
+ self._build_history_entry(
1932
+ current_ticket,
1933
+ predicted=action.model_dump(exclude_none=True),
1934
+ score=0.0,
1935
+ breakdown={},
1936
+ queue_position=idx + 1,
1937
+ reward=incident_reward,
1938
+ reward_kind="operational",
1939
+ tool_result=self._state.last_tool_result,
1940
+ reward_components=reward_components,
1941
+ )
1942
+ )
1943
+ self._state.last_reward_components = reward_components
1944
+ return self._build_observation(task, done=False, reward=incident_reward)
1945
+
1946
  def _build_ticket_view(self, ticket: HelpdeskTicketRecord) -> dict[str, Any]:
1947
  progress = self._tool_progress_for_ticket(ticket)
1948
  remaining_tools = progress["remaining_tools"]
1949
  used_tools = set(self._used_tools_for_ticket(ticket.ticket_id))
1950
+ operational_actions = progress["recommended_operational_actions"]
1951
  ticket_view: dict[str, Any] = {
1952
  "ticket_id": ticket.ticket_id,
1953
  "title": self._visible_title(ticket),
 
1966
  "investigations_used_for_ticket": progress["revealed_count"],
1967
  "recommended_tools": list(remaining_tools),
1968
  }
1969
+ ticket_view["operational_context"] = {
1970
+ "request_info_available": self._request_info_note_for_ticket(ticket) is not None,
1971
+ "request_info_used": progress["request_info_used"],
1972
+ "defer_count": self._defer_count(ticket.ticket_id),
1973
+ "incident_recommended": self._requires_incident(ticket),
1974
+ "incident_open": self._incident_open_for_ticket(ticket),
1975
+ "recommended_actions": operational_actions,
1976
+ }
1977
  if ticket.ambiguity_note is not None and "lookup_internal_routing_note" not in remaining_tools:
1978
  ticket_view["ambiguity_note"] = ticket.ambiguity_note
1979
  if (
 
1981
  and "lookup_internal_routing_note" not in remaining_tools
1982
  ):
1983
  ticket_view["planning_note"] = ticket.planning_note
1984
+ if self._request_info_used(ticket.ticket_id):
1985
+ ticket_view["customer_update_note"] = self._request_info_note_for_ticket(ticket)
1986
  if ticket.related_ticket_id is not None and "lookup_related_ticket" not in remaining_tools:
1987
  ticket_view["related_ticket_id"] = ticket.related_ticket_id
1988
  related_ticket = self._tickets_by_id.get(ticket.related_ticket_id)
 
1998
  or "lookup_queue_capacity_forecast" in used_tools
1999
  ):
2000
  ticket_view["routing_options"] = self._routing_options_for_ticket(ticket)
2001
+ if ticket.generated_from_ticket_id is not None:
2002
+ ticket_view["generated_from_ticket_id"] = ticket.generated_from_ticket_id
2003
  return ticket_view
2004
 
2005
  def _build_feedback_summary(
 
2022
  parts.append(f"Investigation step used {tool_name or 'a tool'}")
2023
  if reward_components and reward_components.get("new_context_revealed"):
2024
  parts.append("new context was revealed")
2025
+ elif reward_kind == "operational":
2026
+ operational_action = (
2027
+ reward_components.get("operational_action")
2028
+ if reward_components
2029
+ else predicted.get("action_type")
2030
+ )
2031
+ parts.append(f"Operational step used {operational_action or 'an action'}")
2032
  elif penalty_reason is not None:
2033
  parts.append(f"Penalty applied: {penalty_reason}")
2034
  else:
 
2066
  planning_penalty_total = reward_components.get("planning_penalty_total")
2067
  if planning_penalty_total:
2068
  parts.append(f"planning_penalty_total={planning_penalty_total:.2f}")
2069
+ incident_gap_penalty = reward_components.get("incident_gap_penalty")
2070
+ if incident_gap_penalty:
2071
+ parts.append(f"incident_gap_penalty={incident_gap_penalty:.2f}")
2072
+ spawned_follow_up_ticket_id = reward_components.get("spawned_follow_up_ticket_id")
2073
+ if spawned_follow_up_ticket_id:
2074
+ parts.append(f"spawned_follow_up={spawned_follow_up_ticket_id}")
2075
 
2076
  return "; ".join(parts)
2077
 
 
2100
  "score": score,
2101
  "breakdown": breakdown,
2102
  "queue_position": queue_position,
2103
+ "operational_context": {
2104
+ "request_info_used": progress["request_info_used"],
2105
+ "defer_count": self._defer_count(ticket.ticket_id),
2106
+ "incident_open": self._incident_open_for_ticket(ticket),
2107
+ "recommended_actions": progress["recommended_operational_actions"],
2108
+ },
2109
  }
2110
  if self._state.current_task_id == 3:
2111
  history_entry["capacity_state"] = self._capacity_state_snapshot()
 
2122
  and "lookup_internal_routing_note" not in remaining_tools
2123
  ):
2124
  history_entry["planning_note"] = ticket.planning_note
2125
+ if self._request_info_used(ticket.ticket_id):
2126
+ history_entry["customer_update_note"] = self._request_info_note_for_ticket(ticket)
2127
  if ticket.related_ticket_id is not None and "lookup_related_ticket" not in remaining_tools:
2128
  history_entry["related_ticket_id"] = ticket.related_ticket_id
2129
  related_ticket = self._tickets_by_id.get(ticket.related_ticket_id)
 
2148
  history_entry["tool_result"] = tool_result
2149
  if reward_components is not None:
2150
  history_entry["reward_components"] = reward_components
2151
+ if ticket.generated_from_ticket_id is not None:
2152
+ history_entry["generated_from_ticket_id"] = ticket.generated_from_ticket_id
2153
  if progress["required_tools"]:
2154
  history_entry["context_progress"] = {
2155
  "hidden_context_remaining": bool(progress["remaining_count"]),
 
2209
  and (ticket_view.get("context_status") or {}).get("hidden_context_remaining")
2210
  ),
2211
  "action_mode": "investigate_or_submit",
2212
+ "available_action_types": self._available_action_types_for_task(),
2213
  "average_score_so_far": self._state.average_score_so_far,
2214
  "progress_fraction": progress_fraction,
2215
  "investigation_penalty_applied": self._state.investigation_penalty_applied,
2216
  "planning_penalty_total": self._state.planning_penalty_total,
2217
  "planning_penalty_applied": self._state.planning_penalty_applied,
2218
+ "sla_breach_count": self._state.sla_breach_count,
2219
+ "incident_gap_total": self._state.incident_gap_total,
2220
+ "dynamic_queue_events": list(self._state.dynamic_queue_events[-5:]),
2221
  }
2222
  if self._state.current_task_id == 3:
2223
  metadata["capacity_state"] = self._capacity_state_snapshot()
 
2241
  task_name=task["name"],
2242
  instructions=task["instructions"],
2243
  allowed_fields=list(task["allowed_fields"]),
2244
+ available_action_types=self._available_action_types_for_task(),
2245
+ available_tools=self._available_tools_for_task(),
2246
  investigation_budget_remaining=self._state.investigation_budget_remaining,
2247
  last_tool_result=self._state.last_tool_result,
2248
  current_ticket=ticket_view,
server/grader.py CHANGED
@@ -64,13 +64,23 @@ PRIORITY_SCORES = {
64
 
65
 
66
  TASK_WEIGHTS = {
67
- 1: {"issue_type": 1.0},
68
- 2: {"issue_type": 0.6, "priority": 0.4},
 
 
 
 
 
 
 
 
 
 
69
  3: {
70
- "issue_type": 0.35,
71
  "priority": 0.20,
72
  "assignment_group": 0.25,
73
- "resolution_action": 0.20,
74
  },
75
  }
76
 
 
64
 
65
 
66
  TASK_WEIGHTS = {
67
+ 1: {
68
+ "issue_type": 0.40,
69
+ "priority": 0.20,
70
+ "assignment_group": 0.20,
71
+ "resolution_action": 0.20,
72
+ },
73
+ 2: {
74
+ "issue_type": 0.32,
75
+ "priority": 0.20,
76
+ "assignment_group": 0.24,
77
+ "resolution_action": 0.24,
78
+ },
79
  3: {
80
+ "issue_type": 0.30,
81
  "priority": 0.20,
82
  "assignment_group": 0.25,
83
+ "resolution_action": 0.25,
84
  },
85
  }
86
 
server/tasks.py CHANGED
@@ -10,28 +10,39 @@ from vocabulary import TASK_IDS
10
  TASKS = {
11
  1: {
12
  "id": 1,
13
- "name": "Issue Type Classification",
14
  "difficulty": "easy",
15
  "instructions": (
16
- "Read the ticket and select the single best IT issue type. "
17
- "You may investigate first, then submit a final routing answer."
 
18
  ),
19
- "allowed_fields": ["issue_type"],
 
 
 
 
 
20
  },
21
  2: {
22
  "id": 2,
23
- "name": "Issue Type And Priority",
24
  "difficulty": "medium",
25
  "instructions": (
26
- "Read the ticket, select the best IT issue type, and estimate the "
27
- "correct operational priority. If the observation includes ambiguity "
28
- "or follow-up context, use it. You may investigate before you submit."
29
  ),
30
- "allowed_fields": ["issue_type", "priority"],
 
 
 
 
 
31
  },
32
  3: {
33
  "id": 3,
34
- "name": "Full Ticket Routing",
35
  "difficulty": "hard",
36
  "instructions": (
37
  "Perform full helpdesk routing by selecting the best issue type, "
@@ -40,9 +51,8 @@ TASKS = {
40
  "forecasts, and planning state when present. "
41
  "Some hard tickets intentionally hide decisive routing context until "
42
  "you investigate with the available tools, and some hard episodes also "
43
- "require queue-level capacity planning across multiple tickets, so "
44
- "premature or resource-greedy routing can underperform even when the "
45
- "visible text looks plausible."
46
  ),
47
  "allowed_fields": [
48
  "issue_type",
@@ -61,6 +71,10 @@ PLANNING_ROUTE_UPDATES: dict[str, dict] = {
61
  "customer-facing charge review as a lower-fidelity fallback while the bug "
62
  "investigation continues separately."
63
  ),
 
 
 
 
64
  "alternate_issue_type": "billing_license",
65
  "alternate_assignment_group": "license_ops",
66
  "alternate_resolution_action": "assign",
@@ -82,6 +96,10 @@ PLANNING_ROUTE_UPDATES: dict[str, dict] = {
82
  "Seat expansion is the preferred route, but license operations can still "
83
  "handle the prorating clarification when procurement is the bottleneck."
84
  ),
 
 
 
 
85
  "alternate_issue_type": "billing_license",
86
  "alternate_assignment_group": "license_ops",
87
  "alternate_resolution_action": "fulfill",
@@ -92,6 +110,10 @@ PLANNING_ROUTE_UPDATES: dict[str, dict] = {
92
  "The request can be treated either as roadmap feedback or as a support "
93
  "escalation if the operational impact is emphasized."
94
  ),
 
 
 
 
95
  "alternate_issue_type": "application_support",
96
  "alternate_priority": "high",
97
  "alternate_resolution_action": "escalate",
@@ -138,6 +160,10 @@ PLANNING_ROUTE_UPDATES: dict[str, dict] = {
138
  "Security scheduling is ideal, but a compliance acknowledgement is still "
139
  "acceptable when the security team only needs to confirm the process."
140
  ),
 
 
 
 
141
  "alternate_issue_type": "security_compliance",
142
  "alternate_resolution_action": "acknowledge",
143
  "alternate_route_score_multiplier": 0.8,
@@ -192,6 +218,10 @@ CURATED_EXPANSION_RECORDS: list[dict] = [
192
  "Security still owns the privileged-access review, but service desk can "
193
  "collect chronology and prepare the packet if the security queue is jammed."
194
  ),
 
 
 
 
195
  "alternate_assignment_group": "service_desk",
196
  "alternate_resolution_action": "assign",
197
  "alternate_route_score_multiplier": 0.72,
@@ -253,6 +283,10 @@ CURATED_EXPANSION_RECORDS: list[dict] = [
253
  "Immediate operational execution is preferred. Procurement can still own the "
254
  "approval path if service-desk capacity is already depleted."
255
  ),
 
 
 
 
256
  "alternate_assignment_group": "procurement",
257
  "alternate_resolution_action": "assign",
258
  "alternate_route_score_multiplier": 0.8,
@@ -273,6 +307,11 @@ CURATED_EXPANSION_RECORDS: list[dict] = [
273
  "Security owns the final unblock decision. If security is saturated, the "
274
  "application team can still take the first-response diagnostics path."
275
  ),
 
 
 
 
 
276
  "alternate_issue_type": "application_support",
277
  "alternate_priority": "high",
278
  "alternate_assignment_group": "application_team",
@@ -398,6 +437,9 @@ CURATED_EXPANSION_RECORDS: list[dict] = [
398
  "Application engineering is preferred because they own the evidence. Procurement "
399
  "can still coordinate the renewal communication if the evidence queue is saturated."
400
  ),
 
 
 
401
  "alternate_issue_type": "service_request",
402
  "alternate_priority": "medium",
403
  "alternate_assignment_group": "procurement",
 
10
  TASKS = {
11
  1: {
12
  "id": 1,
13
+ "name": "Guided Full Routing",
14
  "difficulty": "easy",
15
  "instructions": (
16
+ "Perform full helpdesk routing by selecting issue type, priority, "
17
+ "assignment group, and resolution action. Easy-task episodes keep the "
18
+ "ticket text mostly visible and focus on grounded single-ticket routing."
19
  ),
20
+ "allowed_fields": [
21
+ "issue_type",
22
+ "priority",
23
+ "assignment_group",
24
+ "resolution_action",
25
+ ],
26
  },
27
  2: {
28
  "id": 2,
29
+ "name": "Contextual Full Routing",
30
  "difficulty": "medium",
31
  "instructions": (
32
+ "Perform full helpdesk routing with partial observability. Some "
33
+ "tickets hide related-case, requester-history, or clarification "
34
+ "details until you investigate or request more information."
35
  ),
36
+ "allowed_fields": [
37
+ "issue_type",
38
+ "priority",
39
+ "assignment_group",
40
+ "resolution_action",
41
+ ],
42
  },
43
  3: {
44
  "id": 3,
45
+ "name": "Adaptive Queue Routing",
46
  "difficulty": "hard",
47
  "instructions": (
48
  "Perform full helpdesk routing by selecting the best issue type, "
 
51
  "forecasts, and planning state when present. "
52
  "Some hard tickets intentionally hide decisive routing context until "
53
  "you investigate with the available tools, and some hard episodes also "
54
+ "require queue-level capacity planning, deferrals, incident management, "
55
+ "and recovery from downstream follow-up tickets."
 
56
  ),
57
  "allowed_fields": [
58
  "issue_type",
 
71
  "customer-facing charge review as a lower-fidelity fallback while the bug "
72
  "investigation continues separately."
73
  ),
74
+ "customer_update_note": (
75
+ "Finance confirmed the unexpected charge landed immediately after the "
76
+ "integration outage and wants one accountable owner today."
77
+ ),
78
  "alternate_issue_type": "billing_license",
79
  "alternate_assignment_group": "license_ops",
80
  "alternate_resolution_action": "assign",
 
96
  "Seat expansion is the preferred route, but license operations can still "
97
  "handle the prorating clarification when procurement is the bottleneck."
98
  ),
99
+ "customer_update_note": (
100
+ "The requester clarified that the blocker is both the seat increase and "
101
+ "the unexpected prorating language on the quote."
102
+ ),
103
  "alternate_issue_type": "billing_license",
104
  "alternate_assignment_group": "license_ops",
105
  "alternate_resolution_action": "fulfill",
 
110
  "The request can be treated either as roadmap feedback or as a support "
111
  "escalation if the operational impact is emphasized."
112
  ),
113
+ "customer_update_note": (
114
+ "The requester says the missing behavior is now blocking a customer "
115
+ "rollout, so this may need operational ownership rather than product triage."
116
+ ),
117
  "alternate_issue_type": "application_support",
118
  "alternate_priority": "high",
119
  "alternate_resolution_action": "escalate",
 
160
  "Security scheduling is ideal, but a compliance acknowledgement is still "
161
  "acceptable when the security team only needs to confirm the process."
162
  ),
163
+ "customer_update_note": (
164
+ "The requester clarified they mainly need confirmed ownership and a date "
165
+ "for the review, not the review itself right now."
166
+ ),
167
  "alternate_issue_type": "security_compliance",
168
  "alternate_resolution_action": "acknowledge",
169
  "alternate_route_score_multiplier": 0.8,
 
218
  "Security still owns the privileged-access review, but service desk can "
219
  "collect chronology and prepare the packet if the security queue is jammed."
220
  ),
221
+ "customer_update_note": (
222
+ "Executives want a single incident bridge owner before the board packet is sent."
223
+ ),
224
+ "incident_recommended": True,
225
  "alternate_assignment_group": "service_desk",
226
  "alternate_resolution_action": "assign",
227
  "alternate_route_score_multiplier": 0.72,
 
283
  "Immediate operational execution is preferred. Procurement can still own the "
284
  "approval path if service-desk capacity is already depleted."
285
  ),
286
+ "customer_update_note": (
287
+ "The customer says the launch rehearsal will fail without a same-day answer."
288
+ ),
289
+ "incident_recommended": True,
290
  "alternate_assignment_group": "procurement",
291
  "alternate_resolution_action": "assign",
292
  "alternate_route_score_multiplier": 0.8,
 
307
  "Security owns the final unblock decision. If security is saturated, the "
308
  "application team can still take the first-response diagnostics path."
309
  ),
310
+ "customer_update_note": (
311
+ "The identity-risk lead confirmed users remain locked out and wants incident "
312
+ "coordination while the exception is reviewed."
313
+ ),
314
+ "incident_recommended": True,
315
  "alternate_issue_type": "application_support",
316
  "alternate_priority": "high",
317
  "alternate_assignment_group": "application_team",
 
437
  "Application engineering is preferred because they own the evidence. Procurement "
438
  "can still coordinate the renewal communication if the evidence queue is saturated."
439
  ),
440
+ "customer_update_note": (
441
+ "Commercial leadership needs one named owner for the blocked renewal before end of day."
442
+ ),
443
  "alternate_issue_type": "service_request",
444
  "alternate_priority": "medium",
445
  "alternate_assignment_group": "procurement",
tests/test_api_integration.py CHANGED
@@ -529,8 +529,8 @@ class TestHeuristicInferenceRegression(unittest.TestCase):
529
  overall_avg = sum(rewards) / len(rewards)
530
  self.assertGreaterEqual(
531
  overall_avg,
532
- 0.45,
533
- f"Overall average reward {overall_avg:.4f} is below the smoke-test floor of 0.45",
534
  )
535
  self.assertLessEqual(
536
  overall_avg,
 
529
  overall_avg = sum(rewards) / len(rewards)
530
  self.assertGreaterEqual(
531
  overall_avg,
532
+ 0.25,
533
+ f"Overall average reward {overall_avg:.4f} is below the smoke-test floor of 0.25",
534
  )
535
  self.assertLessEqual(
536
  overall_avg,
tests/test_competitive_upgrade.py CHANGED
@@ -565,13 +565,16 @@ class TestInvestigationActions(unittest.TestCase):
565
 
566
  def test_submit_after_investigation_completes_episode(self) -> None:
567
  env, obs, ticket, related = self._make_linked_env()
568
- env.step(
569
  HelpdeskTicketAction(
570
  action_type="investigate",
571
  tool_name="lookup_related_ticket",
572
  tool_target_ticket_id=ticket.related_ticket_id,
573
  )
574
  )
 
 
 
575
  final_obs = env.step(
576
  HelpdeskTicketAction(
577
  issue_type=ticket.issue_type,
@@ -752,6 +755,7 @@ class TestTerminalInvalidActionFinalReward(unittest.TestCase):
752
 
753
  def test_last_invalid_submit_returns_trajectory_reward_not_zero(self) -> None:
754
  from unittest.mock import patch
 
755
 
756
  dataset = load_dataset()
757
  first = dataset[0]
@@ -764,25 +768,40 @@ class TestTerminalInvalidActionFinalReward(unittest.TestCase):
764
  "_tickets_by_id",
765
  {first.ticket_id: first, second.ticket_id: second},
766
  ):
767
- obs = env.reset(seed=0, task_id=1, queue_size=2)
768
-
769
- tickets_by_id = {first.ticket_id: first, second.ticket_id: second}
770
- current = tickets_by_id[obs.current_ticket["ticket_id"]]
771
- obs = env.step(HelpdeskTicketAction(issue_type=current.issue_type))
772
- self.assertFalse(obs.done)
773
-
774
- current = tickets_by_id[obs.current_ticket["ticket_id"]]
775
- final_obs = env.step(
776
- HelpdeskTicketAction(
777
- issue_type=current.issue_type,
778
- priority="medium",
779
- )
780
- )
781
-
782
- self.assertTrue(final_obs.done)
783
- self.assertAlmostEqual(final_obs.reward, 0.5, places=9)
784
- self.assertAlmostEqual(env.state.total_reward, 0.5, places=9)
785
- self.assertAlmostEqual(env.state.reward or 0.0, 0.5, places=9)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
786
 
787
 
788
  # ---------------------------------------------------------------------------
 
565
 
566
  def test_submit_after_investigation_completes_episode(self) -> None:
567
  env, obs, ticket, related = self._make_linked_env()
568
+ obs = env.step(
569
  HelpdeskTicketAction(
570
  action_type="investigate",
571
  tool_name="lookup_related_ticket",
572
  tool_target_ticket_id=ticket.related_ticket_id,
573
  )
574
  )
575
+ operational_context = (obs.current_ticket or {}).get("operational_context", {})
576
+ if operational_context.get("incident_recommended"):
577
+ obs = env.step(HelpdeskTicketAction(action_type="open_incident"))
578
  final_obs = env.step(
579
  HelpdeskTicketAction(
580
  issue_type=ticket.issue_type,
 
755
 
756
  def test_last_invalid_submit_returns_trajectory_reward_not_zero(self) -> None:
757
  from unittest.mock import patch
758
+ from server.tasks import get_task_definition as base_get_task_definition
759
 
760
  dataset = load_dataset()
761
  first = dataset[0]
 
768
  "_tickets_by_id",
769
  {first.ticket_id: first, second.ticket_id: second},
770
  ):
771
+ with patch(
772
+ "server.environment.get_task_definition",
773
+ side_effect=lambda task_id: (
774
+ {
775
+ **base_get_task_definition(task_id),
776
+ "allowed_fields": ["issue_type"],
777
+ }
778
+ if task_id == 1
779
+ else base_get_task_definition(task_id)
780
+ ),
781
+ ):
782
+ obs = env.reset(seed=0, task_id=1, queue_size=2)
783
+
784
+ tickets_by_id = {first.ticket_id: first, second.ticket_id: second}
785
+ current = tickets_by_id[obs.current_ticket["ticket_id"]]
786
+ obs = env.step(HelpdeskTicketAction(issue_type=current.issue_type))
787
+ self.assertFalse(obs.done)
788
+
789
+ current = tickets_by_id[obs.current_ticket["ticket_id"]]
790
+ final_obs = env.step(
791
+ HelpdeskTicketAction(
792
+ issue_type=current.issue_type,
793
+ priority="medium",
794
+ )
795
+ )
796
+
797
+ self.assertTrue(final_obs.done)
798
+ expected_average = sum(env.state.per_ticket_scores) / len(
799
+ env.state.per_ticket_scores
800
+ )
801
+ self.assertGreater(final_obs.reward, 0.0)
802
+ self.assertAlmostEqual(final_obs.reward, expected_average, places=9)
803
+ self.assertAlmostEqual(env.state.total_reward, expected_average, places=9)
804
+ self.assertAlmostEqual(env.state.reward or 0.0, expected_average, places=9)
805
 
806
 
807
  # ---------------------------------------------------------------------------
tests/test_extra_fields_penalty.py CHANGED
@@ -5,6 +5,7 @@ Validates Requirement 7: Step Validates Action Fields Against Task Contract.
5
  """
6
  from __future__ import annotations
7
 
 
8
  import sys
9
  import os
10
  import unittest
@@ -41,24 +42,42 @@ def _make_env() -> HelpdeskTicketRoutingEnvironment:
41
  return HelpdeskTicketRoutingEnvironment()
42
 
43
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
44
  class TestExtraFieldsPenalty(unittest.TestCase):
45
  """Requirement 7: step() rejects actions with fields outside the task's allowed_fields."""
46
 
47
  def test_extra_fields_returns_closed_interval_penalty_reward(self) -> None:
48
  """Task 1 penalties should keep the returned reward inside the unit interval."""
49
  env = _make_env()
50
- obs = env.reset(seed=42, task_id=1)
 
51
 
52
- # Task 1 allowed_fields should NOT include assignment_group
53
- self.assertNotIn("assignment_group", obs.allowed_fields)
54
 
55
- # Submit an action with an extra field (assignment_group) not in task 1's allowed_fields
56
- action = HelpdeskTicketAction(
57
- issue_type=ISSUE_TYPES[0],
58
- priority=PRIORITIES[0],
59
- assignment_group=ASSIGNMENT_GROUPS[0], # extra field
60
- )
61
- penalty_obs = env.step(action)
62
 
63
  self.assertIsInstance(penalty_obs, HelpdeskTicketObservation)
64
  self.assertGreaterEqual(penalty_obs.reward, 0.0)
@@ -67,27 +86,29 @@ class TestExtraFieldsPenalty(unittest.TestCase):
67
  def test_extra_fields_advances_ticket_index(self) -> None:
68
  """Penalty step must advance tickets_processed by 1."""
69
  env = _make_env()
70
- obs = env.reset(seed=42, task_id=1)
71
- self.assertEqual(obs.tickets_processed, 0)
 
72
 
73
- action = HelpdeskTicketAction(
74
- issue_type=ISSUE_TYPES[0],
75
- assignment_group=ASSIGNMENT_GROUPS[0], # extra field for task 1
76
- )
77
- penalty_obs = env.step(action)
78
 
79
  self.assertEqual(penalty_obs.tickets_processed, 1)
80
 
81
  def test_extra_fields_records_score_inside_unit_interval(self) -> None:
82
  """per_ticket_scores must stay in the unit interval after a penalty step."""
83
  env = _make_env()
84
- env.reset(seed=42, task_id=1)
 
85
 
86
- action = HelpdeskTicketAction(
87
- issue_type=ISSUE_TYPES[0],
88
- assignment_group=ASSIGNMENT_GROUPS[0], # extra field
89
- )
90
- env.step(action)
91
 
92
  state = env.state
93
  self.assertEqual(len(state.per_ticket_scores), 1)
@@ -97,13 +118,14 @@ class TestExtraFieldsPenalty(unittest.TestCase):
97
  def test_extra_fields_history_entry_has_penalty_reason(self) -> None:
98
  """History entry for a penalty step must include penalty_reason."""
99
  env = _make_env()
100
- env.reset(seed=42, task_id=1)
 
101
 
102
- action = HelpdeskTicketAction(
103
- issue_type=ISSUE_TYPES[0],
104
- assignment_group=ASSIGNMENT_GROUPS[0], # extra field
105
- )
106
- penalty_obs = env.step(action)
107
 
108
  self.assertEqual(len(penalty_obs.history), 1)
109
  entry = penalty_obs.history[0]
@@ -115,18 +137,19 @@ class TestExtraFieldsPenalty(unittest.TestCase):
115
  def test_no_extra_fields_grades_normally(self) -> None:
116
  """When action fields are within allowed_fields, grading proceeds normally (reward != forced 0.0)."""
117
  env = _make_env()
118
- obs = env.reset(seed=42, task_id=1)
 
119
 
120
- # Build action using only allowed fields
121
- allowed = obs.allowed_fields
122
- action_kwargs = {}
123
- if "issue_type" in allowed:
124
- action_kwargs["issue_type"] = ISSUE_TYPES[0]
125
- if "priority" in allowed:
126
- action_kwargs["priority"] = PRIORITIES[0]
127
 
128
- action = HelpdeskTicketAction(**action_kwargs)
129
- result_obs = env.step(action)
130
 
131
  # Should be a valid observation; reward may be any value in [0.0, 1.0]
132
  self.assertIsInstance(result_obs, HelpdeskTicketObservation)
@@ -138,16 +161,17 @@ class TestExtraFieldsPenalty(unittest.TestCase):
138
  def test_action_metadata_is_not_treated_as_extra_field(self) -> None:
139
  """OpenEnv Action metadata should not trigger the extra-fields penalty."""
140
  env = _make_env()
141
- obs = env.reset(seed=42, task_id=1)
142
- ticket_id = obs.current_ticket["ticket_id"]
143
- current_ticket = env._tickets_by_id[ticket_id] # noqa: SLF001 - test-only inspection
144
-
145
- result_obs = env.step(
146
- HelpdeskTicketAction(
147
- issue_type=current_ticket.issue_type,
148
- metadata={},
 
 
149
  )
150
- )
151
 
152
  self.assertEqual(len(result_obs.history), 1)
153
  self.assertNotIn("penalty_reason", result_obs.history[0])
@@ -156,42 +180,44 @@ class TestExtraFieldsPenalty(unittest.TestCase):
156
  def test_extra_fields_no_exception_raised(self) -> None:
157
  """Requirement 7.4: extra fields must not raise an unhandled exception."""
158
  env = _make_env()
159
- env.reset(seed=42, task_id=1)
160
-
161
- action = HelpdeskTicketAction(
162
- issue_type=ISSUE_TYPES[0],
163
- priority=PRIORITIES[0],
164
- assignment_group=ASSIGNMENT_GROUPS[0],
165
- resolution_action=RESOLUTION_ACTIONS[0], # multiple extra fields
166
- )
167
- try:
168
- obs = env.step(action)
169
- except Exception as exc: # noqa: BLE001
170
- self.fail(f"step() raised an unexpected exception: {exc}")
 
171
 
172
  self.assertIsInstance(obs, HelpdeskTicketObservation)
173
 
174
  def test_extra_fields_done_flag_set_correctly_on_last_ticket(self) -> None:
175
  """When the penalty step is on the last ticket, done stays True and reward stays episode-level."""
176
  env = _make_env()
177
- obs = env.reset(seed=42, task_id=1)
178
- queue_size = obs.queue_size
179
- tickets_by_id = env._tickets_by_id # noqa: SLF001 - test-only inspection
180
-
181
- # Process all tickets except the last one normally
182
- for _ in range(queue_size - 1):
 
 
 
 
 
 
183
  current_ticket_id = obs.current_ticket["ticket_id"]
184
  current_ticket = tickets_by_id[current_ticket_id]
185
- obs = env.step(HelpdeskTicketAction(issue_type=current_ticket.issue_type))
186
-
187
- # Now trigger penalty on the last ticket
188
- current_ticket_id = obs.current_ticket["ticket_id"]
189
- current_ticket = tickets_by_id[current_ticket_id]
190
- action = HelpdeskTicketAction(
191
- issue_type=current_ticket.issue_type,
192
- assignment_group=ASSIGNMENT_GROUPS[0], # extra field
193
- )
194
- final_obs = env.step(action)
195
 
196
  self.assertTrue(final_obs.done)
197
  self.assertGreater(final_obs.reward, 0.0)
 
5
  """
6
  from __future__ import annotations
7
 
8
+ import contextlib
9
  import sys
10
  import os
11
  import unittest
 
42
  return HelpdeskTicketRoutingEnvironment()
43
 
44
 
45
+ def _task_with_issue_type_only(task_id: int) -> dict:
46
+ task = dict(TASKS[task_id])
47
+ if task_id == 1:
48
+ task["allowed_fields"] = ["issue_type"]
49
+ return task
50
+
51
+
52
+ @contextlib.contextmanager
53
+ def _restrict_task_1_fields():
54
+ original_fields = list(TASKS[1]["allowed_fields"])
55
+ TASKS[1]["allowed_fields"] = ["issue_type"]
56
+ try:
57
+ yield
58
+ finally:
59
+ TASKS[1]["allowed_fields"] = original_fields
60
+
61
+
62
  class TestExtraFieldsPenalty(unittest.TestCase):
63
  """Requirement 7: step() rejects actions with fields outside the task's allowed_fields."""
64
 
65
  def test_extra_fields_returns_closed_interval_penalty_reward(self) -> None:
66
  """Task 1 penalties should keep the returned reward inside the unit interval."""
67
  env = _make_env()
68
+ with _restrict_task_1_fields():
69
+ obs = env.reset(seed=42, task_id=1)
70
 
71
+ # Task 1 allowed_fields should NOT include assignment_group
72
+ self.assertNotIn("assignment_group", obs.allowed_fields)
73
 
74
+ # Submit an action with an extra field (assignment_group) not in task 1's allowed_fields
75
+ action = HelpdeskTicketAction(
76
+ issue_type=ISSUE_TYPES[0],
77
+ priority=PRIORITIES[0],
78
+ assignment_group=ASSIGNMENT_GROUPS[0], # extra field
79
+ )
80
+ penalty_obs = env.step(action)
81
 
82
  self.assertIsInstance(penalty_obs, HelpdeskTicketObservation)
83
  self.assertGreaterEqual(penalty_obs.reward, 0.0)
 
86
  def test_extra_fields_advances_ticket_index(self) -> None:
87
  """Penalty step must advance tickets_processed by 1."""
88
  env = _make_env()
89
+ with _restrict_task_1_fields():
90
+ obs = env.reset(seed=42, task_id=1)
91
+ self.assertEqual(obs.tickets_processed, 0)
92
 
93
+ action = HelpdeskTicketAction(
94
+ issue_type=ISSUE_TYPES[0],
95
+ assignment_group=ASSIGNMENT_GROUPS[0], # extra field for task 1
96
+ )
97
+ penalty_obs = env.step(action)
98
 
99
  self.assertEqual(penalty_obs.tickets_processed, 1)
100
 
101
  def test_extra_fields_records_score_inside_unit_interval(self) -> None:
102
  """per_ticket_scores must stay in the unit interval after a penalty step."""
103
  env = _make_env()
104
+ with _restrict_task_1_fields():
105
+ env.reset(seed=42, task_id=1)
106
 
107
+ action = HelpdeskTicketAction(
108
+ issue_type=ISSUE_TYPES[0],
109
+ assignment_group=ASSIGNMENT_GROUPS[0], # extra field
110
+ )
111
+ env.step(action)
112
 
113
  state = env.state
114
  self.assertEqual(len(state.per_ticket_scores), 1)
 
118
  def test_extra_fields_history_entry_has_penalty_reason(self) -> None:
119
  """History entry for a penalty step must include penalty_reason."""
120
  env = _make_env()
121
+ with _restrict_task_1_fields():
122
+ env.reset(seed=42, task_id=1)
123
 
124
+ action = HelpdeskTicketAction(
125
+ issue_type=ISSUE_TYPES[0],
126
+ assignment_group=ASSIGNMENT_GROUPS[0], # extra field
127
+ )
128
+ penalty_obs = env.step(action)
129
 
130
  self.assertEqual(len(penalty_obs.history), 1)
131
  entry = penalty_obs.history[0]
 
137
  def test_no_extra_fields_grades_normally(self) -> None:
138
  """When action fields are within allowed_fields, grading proceeds normally (reward != forced 0.0)."""
139
  env = _make_env()
140
+ with _restrict_task_1_fields():
141
+ obs = env.reset(seed=42, task_id=1)
142
 
143
+ # Build action using only allowed fields
144
+ allowed = obs.allowed_fields
145
+ action_kwargs = {}
146
+ if "issue_type" in allowed:
147
+ action_kwargs["issue_type"] = ISSUE_TYPES[0]
148
+ if "priority" in allowed:
149
+ action_kwargs["priority"] = PRIORITIES[0]
150
 
151
+ action = HelpdeskTicketAction(**action_kwargs)
152
+ result_obs = env.step(action)
153
 
154
  # Should be a valid observation; reward may be any value in [0.0, 1.0]
155
  self.assertIsInstance(result_obs, HelpdeskTicketObservation)
 
161
  def test_action_metadata_is_not_treated_as_extra_field(self) -> None:
162
  """OpenEnv Action metadata should not trigger the extra-fields penalty."""
163
  env = _make_env()
164
+ with _restrict_task_1_fields():
165
+ obs = env.reset(seed=42, task_id=1)
166
+ ticket_id = obs.current_ticket["ticket_id"]
167
+ current_ticket = env._tickets_by_id[ticket_id] # noqa: SLF001 - test-only inspection
168
+
169
+ result_obs = env.step(
170
+ HelpdeskTicketAction(
171
+ issue_type=current_ticket.issue_type,
172
+ metadata={},
173
+ )
174
  )
 
175
 
176
  self.assertEqual(len(result_obs.history), 1)
177
  self.assertNotIn("penalty_reason", result_obs.history[0])
 
180
  def test_extra_fields_no_exception_raised(self) -> None:
181
  """Requirement 7.4: extra fields must not raise an unhandled exception."""
182
  env = _make_env()
183
+ with _restrict_task_1_fields():
184
+ env.reset(seed=42, task_id=1)
185
+
186
+ action = HelpdeskTicketAction(
187
+ issue_type=ISSUE_TYPES[0],
188
+ priority=PRIORITIES[0],
189
+ assignment_group=ASSIGNMENT_GROUPS[0],
190
+ resolution_action=RESOLUTION_ACTIONS[0], # multiple extra fields
191
+ )
192
+ try:
193
+ obs = env.step(action)
194
+ except Exception as exc: # noqa: BLE001
195
+ self.fail(f"step() raised an unexpected exception: {exc}")
196
 
197
  self.assertIsInstance(obs, HelpdeskTicketObservation)
198
 
199
  def test_extra_fields_done_flag_set_correctly_on_last_ticket(self) -> None:
200
  """When the penalty step is on the last ticket, done stays True and reward stays episode-level."""
201
  env = _make_env()
202
+ with _restrict_task_1_fields():
203
+ obs = env.reset(seed=42, task_id=1)
204
+ queue_size = obs.queue_size
205
+ tickets_by_id = env._tickets_by_id # noqa: SLF001 - test-only inspection
206
+
207
+ # Process all tickets except the last one normally
208
+ for _ in range(queue_size - 1):
209
+ current_ticket_id = obs.current_ticket["ticket_id"]
210
+ current_ticket = tickets_by_id[current_ticket_id]
211
+ obs = env.step(HelpdeskTicketAction(issue_type=current_ticket.issue_type))
212
+
213
+ # Now trigger penalty on the last ticket
214
  current_ticket_id = obs.current_ticket["ticket_id"]
215
  current_ticket = tickets_by_id[current_ticket_id]
216
+ action = HelpdeskTicketAction(
217
+ issue_type=current_ticket.issue_type,
218
+ assignment_group=ASSIGNMENT_GROUPS[0], # extra field
219
+ )
220
+ final_obs = env.step(action)
 
 
 
 
 
221
 
222
  self.assertTrue(final_obs.done)
223
  self.assertGreater(final_obs.reward, 0.0)
tests/test_grader_unit.py CHANGED
@@ -16,6 +16,18 @@ from server.grader import (
16
  from vocabulary import ASSIGNMENT_GROUPS, ISSUE_TYPES, PRIORITIES, RESOLUTION_ACTIONS
17
 
18
 
 
 
 
 
 
 
 
 
 
 
 
 
19
  def _ticket(
20
  *,
21
  issue_type: str = "billing_license",
@@ -71,8 +83,24 @@ class GraderUnitTests(unittest.TestCase):
71
 
72
  score, breakdown = grade_action(action, ticket, task_id=1)
73
 
74
- self.assertAlmostEqual(score, 0.4)
75
- self.assertEqual(breakdown, {"issue_type": 0.4})
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
76
 
77
  def test_issue_type_scoring_matches_declared_similarity_table_exhaustively(self) -> None:
78
  for expected in ISSUE_TYPES:
@@ -88,9 +116,24 @@ class GraderUnitTests(unittest.TestCase):
88
  if predicted == expected
89
  else ISSUE_TYPE_SIMILARITY.get((predicted, expected), 0.0)
90
  )
91
- expected_task_score = max(0.0, min(1.0, raw_expected_score))
92
- self.assertAlmostEqual(score, expected_task_score)
93
- self.assertEqual(breakdown, {"issue_type": raw_expected_score})
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
94
 
95
  def test_unrelated_issue_type_gets_zero_not_fuzzy_credit(self) -> None:
96
  ticket = _ticket(issue_type="onboarding")
@@ -99,7 +142,16 @@ class GraderUnitTests(unittest.TestCase):
99
  score, breakdown = grade_action(action, ticket, task_id=1)
100
 
101
  self.assertAlmostEqual(score, 0.0)
102
- self.assertEqual(breakdown, {"issue_type": 0.0})
 
 
 
 
 
 
 
 
 
103
 
104
  def test_priority_scoring_uses_defined_proximity_table(self) -> None:
105
  ticket = _ticket(priority="critical")
@@ -109,7 +161,16 @@ class GraderUnitTests(unittest.TestCase):
109
 
110
  self.assertAlmostEqual(breakdown["issue_type"], 1.0)
111
  self.assertAlmostEqual(breakdown["priority"], 0.6)
112
- self.assertAlmostEqual(score, 0.84)
 
 
 
 
 
 
 
 
 
113
 
114
  def test_priority_scoring_matches_declared_table_exhaustively(self) -> None:
115
  for expected in PRIORITIES:
@@ -130,11 +191,24 @@ class GraderUnitTests(unittest.TestCase):
130
  )
131
  self.assertEqual(
132
  breakdown,
133
- {"issue_type": 1.0, "priority": priority_score},
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
134
  )
135
- raw_score = 0.6 + 0.4 * priority_score
136
- expected_task_score = max(0.0, min(1.0, raw_score))
137
- self.assertAlmostEqual(score, expected_task_score)
138
 
139
  def test_task_2_weights_apply_as_documented(self) -> None:
140
  ticket = _ticket(priority="high")
@@ -142,8 +216,26 @@ class GraderUnitTests(unittest.TestCase):
142
 
143
  score, breakdown = grade_action(action, ticket, task_id=2)
144
 
145
- self.assertEqual(breakdown, {"issue_type": 1.0, "priority": 0.5})
146
- self.assertAlmostEqual(score, 0.8)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
147
 
148
  def test_assignment_group_partial_credit_uses_declared_similarity_table(self) -> None:
149
  ticket = _ticket()
@@ -157,7 +249,16 @@ class GraderUnitTests(unittest.TestCase):
157
  score, breakdown = grade_action(action, ticket, task_id=3)
158
 
159
  self.assertEqual(breakdown["assignment_group"], 0.55)
160
- self.assertAlmostEqual(score, 0.8875)
 
 
 
 
 
 
 
 
 
161
 
162
  def test_assignment_group_unrelated_miss_stays_zero(self) -> None:
163
  ticket = _ticket()
@@ -171,7 +272,16 @@ class GraderUnitTests(unittest.TestCase):
171
  score, breakdown = grade_action(action, ticket, task_id=3)
172
 
173
  self.assertEqual(breakdown["assignment_group"], 0.0)
174
- self.assertAlmostEqual(score, 0.75)
 
 
 
 
 
 
 
 
 
175
 
176
  def test_task_3_weights_apply_as_documented(self) -> None:
177
  ticket = _ticket(priority="high")
@@ -186,14 +296,24 @@ class GraderUnitTests(unittest.TestCase):
186
 
187
  self.assertEqual(
188
  breakdown,
189
- {
190
- "issue_type": 1.0,
191
- "priority": 0.5,
192
- "assignment_group": 0.0,
193
- "resolution_action": 1.0,
194
- },
 
 
 
 
 
 
 
 
 
 
 
195
  )
196
- self.assertAlmostEqual(score, 0.65)
197
 
198
  def test_alternate_route_can_win_when_primary_route_is_worse(self) -> None:
199
  ticket = HelpdeskTicketRecord(
@@ -243,7 +363,16 @@ class GraderUnitTests(unittest.TestCase):
243
  score, breakdown = grade_action(action, ticket, task_id=3)
244
 
245
  self.assertEqual(breakdown["resolution_action"], 0.35)
246
- self.assertAlmostEqual(score, 0.87)
 
 
 
 
 
 
 
 
 
247
 
248
  def test_resolution_action_unrelated_miss_stays_zero(self) -> None:
249
  ticket = _ticket()
@@ -257,7 +386,16 @@ class GraderUnitTests(unittest.TestCase):
257
  score, breakdown = grade_action(action, ticket, task_id=3)
258
 
259
  self.assertEqual(breakdown["resolution_action"], 0.0)
260
- self.assertAlmostEqual(score, 0.8)
 
 
 
 
 
 
 
 
 
261
 
262
  def test_assignment_group_scoring_matches_declared_similarity_table_exhaustively(self) -> None:
263
  for expected in ASSIGNMENT_GROUPS:
@@ -280,16 +418,24 @@ class GraderUnitTests(unittest.TestCase):
280
  )
281
  self.assertEqual(
282
  breakdown,
283
- {
284
- "issue_type": 1.0,
285
- "priority": 1.0,
286
- "assignment_group": assignment_group_score,
287
- "resolution_action": 1.0,
288
- },
 
 
 
 
 
 
 
 
 
 
 
289
  )
290
- raw_score = 0.35 + 0.20 + 0.25 * assignment_group_score + 0.20
291
- expected_task_score = max(0.0, min(1.0, raw_score))
292
- self.assertAlmostEqual(score, expected_task_score)
293
 
294
  def test_resolution_action_scoring_matches_declared_similarity_table_exhaustively(self) -> None:
295
  for expected in RESOLUTION_ACTIONS:
@@ -312,16 +458,24 @@ class GraderUnitTests(unittest.TestCase):
312
  )
313
  self.assertEqual(
314
  breakdown,
315
- {
316
- "issue_type": 1.0,
317
- "priority": 1.0,
318
- "assignment_group": 1.0,
319
- "resolution_action": resolution_action_score,
320
- },
 
 
 
 
 
 
 
 
 
 
 
321
  )
322
- raw_score = 0.35 + 0.20 + 0.25 + 0.20 * resolution_action_score
323
- expected_task_score = max(0.0, min(1.0, raw_score))
324
- self.assertAlmostEqual(score, expected_task_score)
325
 
326
  def test_partial_credit_tables_never_override_exact_match(self) -> None:
327
  for pair, value in ISSUE_TYPE_SIMILARITY.items():
 
16
  from vocabulary import ASSIGNMENT_GROUPS, ISSUE_TYPES, PRIORITIES, RESOLUTION_ACTIONS
17
 
18
 
19
+ def _expected_breakdown(task_id: int, **field_scores: float) -> dict[str, float]:
20
+ return {field: field_scores[field] for field in TASK_WEIGHTS[task_id]}
21
+
22
+
23
+ def _expected_task_score(task_id: int, **field_scores: float) -> float:
24
+ raw_score = sum(
25
+ field_scores[field] * TASK_WEIGHTS[task_id][field]
26
+ for field in TASK_WEIGHTS[task_id]
27
+ )
28
+ return max(0.0, min(1.0, raw_score))
29
+
30
+
31
  def _ticket(
32
  *,
33
  issue_type: str = "billing_license",
 
83
 
84
  score, breakdown = grade_action(action, ticket, task_id=1)
85
 
86
+ expected_breakdown = _expected_breakdown(
87
+ 1,
88
+ issue_type=0.4,
89
+ priority=0.0,
90
+ assignment_group=0.0,
91
+ resolution_action=0.0,
92
+ )
93
+ self.assertEqual(breakdown, expected_breakdown)
94
+ self.assertAlmostEqual(
95
+ score,
96
+ _expected_task_score(
97
+ 1,
98
+ issue_type=0.4,
99
+ priority=0.0,
100
+ assignment_group=0.0,
101
+ resolution_action=0.0,
102
+ ),
103
+ )
104
 
105
  def test_issue_type_scoring_matches_declared_similarity_table_exhaustively(self) -> None:
106
  for expected in ISSUE_TYPES:
 
116
  if predicted == expected
117
  else ISSUE_TYPE_SIMILARITY.get((predicted, expected), 0.0)
118
  )
119
+ expected_breakdown = _expected_breakdown(
120
+ 1,
121
+ issue_type=raw_expected_score,
122
+ priority=0.0,
123
+ assignment_group=0.0,
124
+ resolution_action=0.0,
125
+ )
126
+ self.assertAlmostEqual(
127
+ score,
128
+ _expected_task_score(
129
+ 1,
130
+ issue_type=raw_expected_score,
131
+ priority=0.0,
132
+ assignment_group=0.0,
133
+ resolution_action=0.0,
134
+ ),
135
+ )
136
+ self.assertEqual(breakdown, expected_breakdown)
137
 
138
  def test_unrelated_issue_type_gets_zero_not_fuzzy_credit(self) -> None:
139
  ticket = _ticket(issue_type="onboarding")
 
142
  score, breakdown = grade_action(action, ticket, task_id=1)
143
 
144
  self.assertAlmostEqual(score, 0.0)
145
+ self.assertEqual(
146
+ breakdown,
147
+ _expected_breakdown(
148
+ 1,
149
+ issue_type=0.0,
150
+ priority=0.0,
151
+ assignment_group=0.0,
152
+ resolution_action=0.0,
153
+ ),
154
+ )
155
 
156
  def test_priority_scoring_uses_defined_proximity_table(self) -> None:
157
  ticket = _ticket(priority="critical")
 
161
 
162
  self.assertAlmostEqual(breakdown["issue_type"], 1.0)
163
  self.assertAlmostEqual(breakdown["priority"], 0.6)
164
+ self.assertAlmostEqual(
165
+ score,
166
+ _expected_task_score(
167
+ 2,
168
+ issue_type=1.0,
169
+ priority=0.6,
170
+ assignment_group=0.0,
171
+ resolution_action=0.0,
172
+ ),
173
+ )
174
 
175
  def test_priority_scoring_matches_declared_table_exhaustively(self) -> None:
176
  for expected in PRIORITIES:
 
191
  )
192
  self.assertEqual(
193
  breakdown,
194
+ _expected_breakdown(
195
+ 2,
196
+ issue_type=1.0,
197
+ priority=priority_score,
198
+ assignment_group=0.0,
199
+ resolution_action=0.0,
200
+ ),
201
+ )
202
+ self.assertAlmostEqual(
203
+ score,
204
+ _expected_task_score(
205
+ 2,
206
+ issue_type=1.0,
207
+ priority=priority_score,
208
+ assignment_group=0.0,
209
+ resolution_action=0.0,
210
+ ),
211
  )
 
 
 
212
 
213
  def test_task_2_weights_apply_as_documented(self) -> None:
214
  ticket = _ticket(priority="high")
 
216
 
217
  score, breakdown = grade_action(action, ticket, task_id=2)
218
 
219
+ self.assertEqual(
220
+ breakdown,
221
+ _expected_breakdown(
222
+ 2,
223
+ issue_type=1.0,
224
+ priority=0.5,
225
+ assignment_group=0.0,
226
+ resolution_action=0.0,
227
+ ),
228
+ )
229
+ self.assertAlmostEqual(
230
+ score,
231
+ _expected_task_score(
232
+ 2,
233
+ issue_type=1.0,
234
+ priority=0.5,
235
+ assignment_group=0.0,
236
+ resolution_action=0.0,
237
+ ),
238
+ )
239
 
240
  def test_assignment_group_partial_credit_uses_declared_similarity_table(self) -> None:
241
  ticket = _ticket()
 
249
  score, breakdown = grade_action(action, ticket, task_id=3)
250
 
251
  self.assertEqual(breakdown["assignment_group"], 0.55)
252
+ self.assertAlmostEqual(
253
+ score,
254
+ _expected_task_score(
255
+ 3,
256
+ issue_type=1.0,
257
+ priority=1.0,
258
+ assignment_group=0.55,
259
+ resolution_action=1.0,
260
+ ),
261
+ )
262
 
263
  def test_assignment_group_unrelated_miss_stays_zero(self) -> None:
264
  ticket = _ticket()
 
272
  score, breakdown = grade_action(action, ticket, task_id=3)
273
 
274
  self.assertEqual(breakdown["assignment_group"], 0.0)
275
+ self.assertAlmostEqual(
276
+ score,
277
+ _expected_task_score(
278
+ 3,
279
+ issue_type=1.0,
280
+ priority=1.0,
281
+ assignment_group=0.0,
282
+ resolution_action=1.0,
283
+ ),
284
+ )
285
 
286
  def test_task_3_weights_apply_as_documented(self) -> None:
287
  ticket = _ticket(priority="high")
 
296
 
297
  self.assertEqual(
298
  breakdown,
299
+ _expected_breakdown(
300
+ 3,
301
+ issue_type=1.0,
302
+ priority=0.5,
303
+ assignment_group=0.0,
304
+ resolution_action=1.0,
305
+ ),
306
+ )
307
+ self.assertAlmostEqual(
308
+ score,
309
+ _expected_task_score(
310
+ 3,
311
+ issue_type=1.0,
312
+ priority=0.5,
313
+ assignment_group=0.0,
314
+ resolution_action=1.0,
315
+ ),
316
  )
 
317
 
318
  def test_alternate_route_can_win_when_primary_route_is_worse(self) -> None:
319
  ticket = HelpdeskTicketRecord(
 
363
  score, breakdown = grade_action(action, ticket, task_id=3)
364
 
365
  self.assertEqual(breakdown["resolution_action"], 0.35)
366
+ self.assertAlmostEqual(
367
+ score,
368
+ _expected_task_score(
369
+ 3,
370
+ issue_type=1.0,
371
+ priority=1.0,
372
+ assignment_group=1.0,
373
+ resolution_action=0.35,
374
+ ),
375
+ )
376
 
377
  def test_resolution_action_unrelated_miss_stays_zero(self) -> None:
378
  ticket = _ticket()
 
386
  score, breakdown = grade_action(action, ticket, task_id=3)
387
 
388
  self.assertEqual(breakdown["resolution_action"], 0.0)
389
+ self.assertAlmostEqual(
390
+ score,
391
+ _expected_task_score(
392
+ 3,
393
+ issue_type=1.0,
394
+ priority=1.0,
395
+ assignment_group=1.0,
396
+ resolution_action=0.0,
397
+ ),
398
+ )
399
 
400
  def test_assignment_group_scoring_matches_declared_similarity_table_exhaustively(self) -> None:
401
  for expected in ASSIGNMENT_GROUPS:
 
418
  )
419
  self.assertEqual(
420
  breakdown,
421
+ _expected_breakdown(
422
+ 3,
423
+ issue_type=1.0,
424
+ priority=1.0,
425
+ assignment_group=assignment_group_score,
426
+ resolution_action=1.0,
427
+ ),
428
+ )
429
+ self.assertAlmostEqual(
430
+ score,
431
+ _expected_task_score(
432
+ 3,
433
+ issue_type=1.0,
434
+ priority=1.0,
435
+ assignment_group=assignment_group_score,
436
+ resolution_action=1.0,
437
+ ),
438
  )
 
 
 
439
 
440
  def test_resolution_action_scoring_matches_declared_similarity_table_exhaustively(self) -> None:
441
  for expected in RESOLUTION_ACTIONS:
 
458
  )
459
  self.assertEqual(
460
  breakdown,
461
+ _expected_breakdown(
462
+ 3,
463
+ issue_type=1.0,
464
+ priority=1.0,
465
+ assignment_group=1.0,
466
+ resolution_action=resolution_action_score,
467
+ ),
468
+ )
469
+ self.assertAlmostEqual(
470
+ score,
471
+ _expected_task_score(
472
+ 3,
473
+ issue_type=1.0,
474
+ priority=1.0,
475
+ assignment_group=1.0,
476
+ resolution_action=resolution_action_score,
477
+ ),
478
  )
 
 
 
479
 
480
  def test_partial_credit_tables_never_override_exact_match(self) -> None:
481
  for pair, value in ISSUE_TYPE_SIMILARITY.items():
tests/test_policy_learning.py CHANGED
@@ -171,7 +171,7 @@ class PolicyLearningTests(unittest.TestCase):
171
 
172
  self.assertLess(no_summary["terminal_reward"], context_summary["terminal_reward"])
173
  self.assertLess(no_summary["normalized_return"], context_summary["normalized_return"])
174
- self.assertEqual(context_summary["investigation_steps"], 1)
175
 
176
  def test_search_policies_selects_adaptive_policy(self) -> None:
177
  report = search_policies(
 
171
 
172
  self.assertLess(no_summary["terminal_reward"], context_summary["terminal_reward"])
173
  self.assertLess(no_summary["normalized_return"], context_summary["normalized_return"])
174
+ self.assertGreaterEqual(context_summary["investigation_steps"], 1)
175
 
176
  def test_search_policies_selects_adaptive_policy(self) -> None:
177
  report = search_policies(
tests/test_tasks_unit.py CHANGED
@@ -23,18 +23,17 @@ class TasksAndDatasetUnitTests(unittest.TestCase):
23
  self.assertEqual(tuple(TASKS.keys()), TASK_IDS)
24
 
25
  def test_task_allowed_fields_match_expected_ladder(self) -> None:
26
- self.assertEqual(get_task_definition(1)["allowed_fields"], ["issue_type"])
27
- self.assertEqual(
28
- get_task_definition(2)["allowed_fields"], ["issue_type", "priority"]
29
- )
 
 
 
 
30
  self.assertEqual(
31
  get_task_definition(3)["allowed_fields"],
32
- [
33
- "issue_type",
34
- "priority",
35
- "assignment_group",
36
- "resolution_action",
37
- ],
38
  )
39
 
40
  def test_task_difficulty_ladder_is_frozen(self) -> None:
 
23
  self.assertEqual(tuple(TASKS.keys()), TASK_IDS)
24
 
25
  def test_task_allowed_fields_match_expected_ladder(self) -> None:
26
+ expected_fields = [
27
+ "issue_type",
28
+ "priority",
29
+ "assignment_group",
30
+ "resolution_action",
31
+ ]
32
+ self.assertEqual(get_task_definition(1)["allowed_fields"], expected_fields)
33
+ self.assertEqual(get_task_definition(2)["allowed_fields"], expected_fields)
34
  self.assertEqual(
35
  get_task_definition(3)["allowed_fields"],
36
+ expected_fields,
 
 
 
 
 
37
  )
38
 
39
  def test_task_difficulty_ladder_is_frozen(self) -> None: