Roopalgn commited on
Commit
8241eb5
·
1 Parent(s): d378e5d

Add queue-planning helpdesk routing mechanics

Browse files
README.md CHANGED
@@ -55,21 +55,22 @@ This domain is useful for OpenEnv because it is operationally realistic, easy to
55
  The project uses a queue-based episode model.
56
 
57
  - `reset()` samples a task and a queue of 3 to 5 tickets
58
- - `step()` grades one ticket submission at a time
59
  - `state()` exposes the internal episode snapshot
60
- - final reward is based on average ticket quality across the queue
 
61
 
62
  The environment classes and vocabulary are intentionally frozen to keep collaboration and judging simple.
63
 
64
  ## Lightweight Policy Improvement Loop
65
 
66
- The repo now includes a small local learning runner in `policy_learning.py`. It does not update model weights, but it does run repeated rollouts over many seeds, log full trajectories, and select the best policy configuration from a discrete candidate set using observed reward.
67
 
68
- That gives the project a real improvement loop for judge demos:
69
 
70
- - compare `no_investigation` against `investigate_when_context_hidden`
71
- - log per-step rewards, feedback summaries, and reward components to JSONL
72
- - search over small policy variants such as `legacy_single_probe`, `context_chain`, and `hybrid_context`
73
  - select the best policy on train seeds, then re-evaluate it on holdout seeds
74
 
75
  Example commands:
@@ -90,7 +91,7 @@ Artifacts are written to `analysis/policy_learning_runs/` by default:
90
  - `search_eval_episodes.jsonl`
91
  - `search_eval_trajectories.jsonl`
92
 
93
- The default submit policy inside this runner stays deterministic and local. It reuses the repo's heuristic routing logic, so the discrete policy search focuses on investigation behavior and reward-driven policy selection rather than on external LLM latency or API cost.
94
 
95
  ## Task Ladder
96
 
@@ -149,8 +150,11 @@ Visible ticket fields:
149
  - `requester`
150
  - `description`
151
  - optional `ambiguity_note`
 
152
  - optional `related_ticket_id`
153
  - optional `related_ticket_preview`
 
 
154
 
155
  Each observation also includes:
156
 
@@ -173,6 +177,8 @@ Each observation also includes:
173
  - `last_reward_components`
174
  - `rubric_reward` on terminal observations
175
  - `metadata.last_feedback_summary` for compact reward / penalty feedback
 
 
176
  - standard OpenEnv fields such as `done` and `reward`
177
 
178
  The internal `HelpdeskTicketState` tracks:
@@ -187,6 +193,10 @@ The internal `HelpdeskTicketState` tracks:
187
  - `total_reward`
188
  - `reward`
189
  - `done`
 
 
 
 
190
 
191
  ## Grading And Reward
192
 
@@ -202,14 +212,17 @@ Available tools:
202
  - `lookup_related_ticket`
203
  - `lookup_requester_history`
204
  - `lookup_internal_routing_note`
 
205
 
206
  Hard-task investigation behavior:
207
 
208
  - some ambiguous and non-default-routing tickets start with both redacted titles and redacted descriptions
209
  - linked-ticket previews and internal routing notes stay hidden until the matching tool is used
 
210
  - only useful investigation steps return a small positive shaping reward
211
  - blind or repeated probing does not pay by default
212
  - premature hard-task submission can incur a shaping penalty even when the visible text looks plausible
 
213
  - terminal `rubric_reward` remains the objective evaluation signal, while per-step `reward` is the denser training signal
214
 
215
  Per-field behavior:
@@ -218,6 +231,7 @@ Per-field behavior:
218
  - `priority`: exact match or proximity credit
219
  - `assignment_group`: exact match, with a small declared partial-credit map for nearby ownership mistakes
220
  - `resolution_action`: exact match, with a small declared partial-credit map for nearby next-step mistakes
 
221
 
222
  Task weights:
223
 
@@ -227,22 +241,23 @@ Task weights:
227
  | 2 | 60% | 40% | - | - |
228
  | 3 | 35% | 20% | 25% | 20% |
229
 
230
- Final episode reward:
231
 
232
  ```text
233
- average(per_ticket_scores)
234
  ```
235
 
236
- The result is clamped to `[0.0, 1.0]`.
237
 
238
  Step reward is lightly milestone-shaped: high per-ticket scores get a small bonus and very low scores get a small penalty before the final clamp.
239
 
240
- Final reward also includes a queue-economics penalty when the agent exceeds the free investigation budget. One investigation per queued ticket is free, but extra investigation steps reduce the final reward more noticeably than before.
241
 
242
  To make the environment more RL-friendly, each observation now also surfaces structured reward telemetry:
243
 
244
  - `last_reward_components` exposes ticket score, shaped step reward, milestone adjustment, trajectory reward when applicable, and any investigation penalty applied
245
  - `average_score_so_far` and `progress_fraction` expose trajectory progress without leaking future labels
 
246
  - `history` retains the same reward components plus a compact `feedback_summary` string for downstream agents
247
 
248
  ## Grounded Scoring
@@ -253,6 +268,7 @@ The grader is intentionally narrow and declared, not fully fuzzy.
253
  - `assignment_group` and `resolution_action` now expose only a small declared partial-credit map for nearby mistakes
254
  - `priority` only gets proximity credit from the declared table in `server/grader.py`
255
  - `issue_type` only gets partial credit for a small declared similarity map
 
256
  - wrong labels outside those explicit maps score `0.0`
257
 
258
  That scoring policy is now backed by checked-in unit tests in `tests/test_grader_unit.py` and `tests/test_tasks_unit.py`.
@@ -267,7 +283,7 @@ That grounding pass supported keeping the current similarity map small and expla
267
 
268
  ## Dataset Snapshot
269
 
270
- The labeled dataset in `data/dataset.json` currently contains 45 tickets spanning straightforward and ambiguous helpdesk scenarios.
271
 
272
  It includes:
273
 
@@ -280,6 +296,9 @@ It includes:
280
  - onboarding tickets
281
  - feature requests
282
  - follow-up cases linked through `related_ticket_id`
 
 
 
283
 
284
  ## Difficulty Coverage
285
 
 
55
  The project uses a queue-based episode model.
56
 
57
  - `reset()` samples a task and a queue of 3 to 5 tickets
58
+ - `step()` lets the agent investigate or submit one ticket at a time
59
  - `state()` exposes the internal episode snapshot
60
+ - hard-task episodes also track queue-level capacity, alternate acceptable routes, and planning penalties across tickets
61
+ - final evaluation is based on the queue outcome, not on isolated per-ticket classification alone
62
 
63
  The environment classes and vocabulary are intentionally frozen to keep collaboration and judging simple.
64
 
65
  ## Lightweight Policy Improvement Loop
66
 
67
+ The repo includes a local policy runner in `policy_learning.py`. It still does not update model weights, but it now does more than cosmetic search: it evaluates repeated seeded rollouts, learns cue-conditioned tool preferences for investigation, uses the same planning-aware deterministic submit logic as `inference.py`, and ranks policies by terminal rubric reward first, with lower planning penalty as the tie-breaker.
68
 
69
+ That gives the project a meaningful improvement loop for judge demos:
70
 
71
+ - compare `no_investigation`, `investigate_when_context_hidden`, and `adaptive_cue_bandit`
72
+ - log per-step rewards, feedback summaries, planning penalties, and reward components to JSONL
73
+ - learn when to use `lookup_queue_capacity_forecast` versus the other investigation tools
74
  - select the best policy on train seeds, then re-evaluate it on holdout seeds
75
 
76
  Example commands:
 
91
  - `search_eval_episodes.jsonl`
92
  - `search_eval_trajectories.jsonl`
93
 
94
+ The default submit policy inside this runner stays deterministic and local. It reuses the repo's heuristic routing logic plus planning-aware routing overrides, so the search loop can study both investigation policy and queue-aware submission quality without depending on external LLM latency or API cost.
95
 
96
  ## Task Ladder
97
 
 
150
  - `requester`
151
  - `description`
152
  - optional `ambiguity_note`
153
+ - optional `planning_note`
154
  - optional `related_ticket_id`
155
  - optional `related_ticket_preview`
156
+ - optional `routing_options`
157
+ - optional `capacity_state`
158
 
159
  Each observation also includes:
160
 
 
177
  - `last_reward_components`
178
  - `rubric_reward` on terminal observations
179
  - `metadata.last_feedback_summary` for compact reward / penalty feedback
180
+ - `metadata.capacity_state` and `metadata.future_queue_demand` on hard-task episodes
181
+ - `metadata.planning_penalty_total` and `metadata.planning_penalty_applied`
182
  - standard OpenEnv fields such as `done` and `reward`
183
 
184
  The internal `HelpdeskTicketState` tracks:
 
193
  - `total_reward`
194
  - `reward`
195
  - `done`
196
+ - `team_capacity_remaining`
197
+ - `high_priority_slots_remaining`
198
+ - `escalation_slots_remaining`
199
+ - `planning_penalty_total`
200
 
201
  ## Grading And Reward
202
 
 
212
  - `lookup_related_ticket`
213
  - `lookup_requester_history`
214
  - `lookup_internal_routing_note`
215
+ - `lookup_queue_capacity_forecast`
216
 
217
  Hard-task investigation behavior:
218
 
219
  - some ambiguous and non-default-routing tickets start with both redacted titles and redacted descriptions
220
  - linked-ticket previews and internal routing notes stay hidden until the matching tool is used
221
+ - capacity-sensitive tickets can expose queue pressure, future demand, and alternate routing options through `lookup_queue_capacity_forecast`
222
  - only useful investigation steps return a small positive shaping reward
223
  - blind or repeated probing does not pay by default
224
  - premature hard-task submission can incur a shaping penalty even when the visible text looks plausible
225
+ - resource-greedy routing can add planning penalties later in the queue even when a single ticket looks correct in isolation
226
  - terminal `rubric_reward` remains the objective evaluation signal, while per-step `reward` is the denser training signal
227
 
228
  Per-field behavior:
 
231
  - `priority`: exact match or proximity credit
232
  - `assignment_group`: exact match, with a small declared partial-credit map for nearby ownership mistakes
233
  - `resolution_action`: exact match, with a small declared partial-credit map for nearby next-step mistakes
234
+ - hard task only: some tickets also declare an alternate acceptable route with a reduced score multiplier, so the grader can reward capacity-aware fallback choices without collapsing into full fuzziness
235
 
236
  Task weights:
237
 
 
241
  | 2 | 60% | 40% | - | - |
242
  | 3 | 35% | 20% | 25% | 20% |
243
 
244
+ Final episode rubric reward is queue-based:
245
 
246
  ```text
247
+ clamp(average(per_ticket_scores) + trajectory bonuses - planning penalties - extra investigation penalties)
248
  ```
249
 
250
+ Both `reward` and `rubric_reward` now use the closed interval `[0.0, 1.0]`.
251
 
252
  Step reward is lightly milestone-shaped: high per-ticket scores get a small bonus and very low scores get a small penalty before the final clamp.
253
 
254
+ Final reward also includes a queue-economics penalty when the agent exceeds the free investigation budget. One investigation per queued ticket is free, but extra investigation steps reduce the final reward more noticeably than before. On hard-task queues, assignment-group capacity, high-priority slots, and escalation slots also create cross-ticket trade-offs.
255
 
256
  To make the environment more RL-friendly, each observation now also surfaces structured reward telemetry:
257
 
258
  - `last_reward_components` exposes ticket score, shaped step reward, milestone adjustment, trajectory reward when applicable, and any investigation penalty applied
259
  - `average_score_so_far` and `progress_fraction` expose trajectory progress without leaking future labels
260
+ - hard-task telemetry includes planning penalties, capacity usage, and the post-action capacity snapshot
261
  - `history` retains the same reward components plus a compact `feedback_summary` string for downstream agents
262
 
263
  ## Grounded Scoring
 
268
  - `assignment_group` and `resolution_action` now expose only a small declared partial-credit map for nearby mistakes
269
  - `priority` only gets proximity credit from the declared table in `server/grader.py`
270
  - `issue_type` only gets partial credit for a small declared similarity map
271
+ - hard-task alternate routes must be explicitly declared in the dataset and carry an explicit score multiplier
272
  - wrong labels outside those explicit maps score `0.0`
273
 
274
  That scoring policy is now backed by checked-in unit tests in `tests/test_grader_unit.py` and `tests/test_tasks_unit.py`.
 
283
 
284
  ## Dataset Snapshot
285
 
286
+ The effective labeled dataset now contains 70 tickets spanning straightforward, ambiguous, and planning-sensitive helpdesk scenarios.
287
 
288
  It includes:
289
 
 
296
  - onboarding tickets
297
  - feature requests
298
  - follow-up cases linked through `related_ticket_id`
299
+ - 16 tickets with explicit ambiguity notes
300
+ - 7 linked follow-up cases
301
+ - 22 tickets with declared alternate routes for queue-level planning
302
 
303
  ## Difficulty Coverage
304
 
inference.py CHANGED
@@ -195,6 +195,7 @@ def format_recent_history_entries(
195
 
196
  def build_llm_user_message(ticket: dict, allowed_fields: list[str], instructions: str) -> str:
197
  ambiguity_note = ticket.get("ambiguity_note")
 
198
  related_preview = ticket.get("related_ticket_preview") or {}
199
  last_tool_result = ticket.get("last_tool_result")
200
  context_status = ticket.get("context_status") or {}
@@ -204,9 +205,14 @@ def build_llm_user_message(ticket: dict, allowed_fields: list[str], instructions
204
  investigation_budget_remaining = ticket.get("investigation_budget_remaining")
205
  average_score_so_far = ticket.get("average_score_so_far")
206
  progress_fraction = ticket.get("progress_fraction")
 
 
 
207
  extra_context_lines: list[str] = []
208
  if ambiguity_note:
209
  extra_context_lines.append(f"Ambiguity note: {ambiguity_note}")
 
 
210
  if related_preview:
211
  extra_context_lines.extend(
212
  [
@@ -224,6 +230,18 @@ def build_llm_user_message(ticket: dict, allowed_fields: list[str], instructions
224
  extra_context_lines.append(
225
  "Context status: " + json.dumps(context_status, sort_keys=True)
226
  )
 
 
 
 
 
 
 
 
 
 
 
 
227
  if feedback_summary:
228
  extra_context_lines.append(f"Latest environment feedback: {feedback_summary}")
229
  if last_reward_components:
@@ -293,7 +311,7 @@ def _format_bool(value: bool) -> str:
293
 
294
 
295
  def clamp_reported_score(score: float) -> float:
296
- return max(0.01, min(0.99, score))
297
 
298
 
299
  def _format_action_for_log(action: HelpdeskTicketAction) -> str:
@@ -553,14 +571,19 @@ TIME_SENSITIVE_PRIORITY_KEYWORDS = (
553
  def build_routing_text(ticket: dict) -> str:
554
  related_preview = ticket.get("related_ticket_preview") or {}
555
  last_tool_result = ticket.get("last_tool_result") or {}
 
556
  return " ".join(
557
  [
558
  ticket.get("title", ""),
559
  ticket.get("description", ""),
560
  ticket.get("ambiguity_note", ""),
 
561
  related_preview.get("title", ""),
562
  related_preview.get("description", ""),
563
  json.dumps(last_tool_result, sort_keys=True),
 
 
 
564
  ]
565
  ).lower()
566
 
@@ -630,6 +653,90 @@ def heuristic_action(
630
  return result
631
 
632
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
633
  def apply_domain_overrides(
634
  ticket: dict, candidate: dict[str, Any], allowed_fields: list[str]
635
  ) -> tuple[dict[str, Any], list[str]]:
@@ -697,9 +804,27 @@ def build_action(
697
  ticket: dict, allowed_fields: list[str], instructions: str
698
  ) -> tuple[HelpdeskTicketAction, str, str | None]:
699
  heuristic_dict = heuristic_action(ticket, allowed_fields)
 
 
 
 
 
 
 
 
 
 
700
 
701
  if llm_client is None:
702
- return HelpdeskTicketAction(**heuristic_dict), "heuristic", None
 
 
 
 
 
 
 
 
703
 
704
  try:
705
  llm_dict = call_llm(ticket, allowed_fields, instructions)
@@ -731,9 +856,19 @@ def build_action(
731
  candidate,
732
  allowed_fields,
733
  )
 
 
 
 
 
734
 
735
  backfilled_fields = [field for field in allowed_fields if field not in accepted_fields]
736
- if backfilled_fields or rejected_fields or override_reasons:
 
 
 
 
 
737
  reason_parts = []
738
  if backfilled_fields:
739
  reason_parts.append(f"heuristic_backfill={backfilled_fields}")
@@ -741,6 +876,8 @@ def build_action(
741
  reason_parts.append(f"invalid_llm_fields={rejected_fields}")
742
  if override_reasons:
743
  reason_parts.append(f"domain_overrides={override_reasons}")
 
 
744
  return (
745
  HelpdeskTicketAction(**candidate),
746
  "llm_backfilled",
@@ -752,7 +889,23 @@ def build_action(
752
  return (
753
  HelpdeskTicketAction(**heuristic_dict),
754
  "heuristic_fallback",
755
- str(exc),
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
756
  )
757
 
758
 
@@ -857,6 +1010,7 @@ def should_investigate(ticket: dict, history: list[dict[str, Any]]) -> tuple[boo
857
  if hidden_context_remaining:
858
  preferred_tools.extend(
859
  [
 
860
  "lookup_related_ticket",
861
  "lookup_internal_routing_note",
862
  "lookup_requester_history",
@@ -892,6 +1046,14 @@ def merge_ticket_context(ticket: dict, observation: Any) -> dict:
892
  observation_metadata = getattr(observation, "metadata", {}) or {}
893
  if observation_metadata.get("last_feedback_summary"):
894
  merged_ticket["feedback_summary"] = observation_metadata["last_feedback_summary"]
 
 
 
 
 
 
 
 
895
  return merged_ticket
896
 
897
 
@@ -933,12 +1095,10 @@ def run() -> None:
933
  if ticket is None:
934
  break
935
 
936
- investigate, tool_name = should_investigate(ticket, obs.history)
937
- if (
938
- investigate
939
- and tool_name is not None
940
- and getattr(obs, "investigation_budget_remaining", 0) > 0
941
- ):
942
  tool_action = HelpdeskTicketAction(
943
  action_type="investigate",
944
  tool_name=tool_name,
@@ -947,10 +1107,13 @@ def run() -> None:
947
  result = sync_client.step(tool_action)
948
  obs = result.observation
949
  step_num += 1
 
 
 
950
  log_step(
951
  step=step_num,
952
  action=tool_action,
953
- reward=float(result.reward or 0.0),
954
  done=bool(result.done),
955
  error=None,
956
  )
@@ -959,6 +1122,11 @@ def run() -> None:
959
  ticket = obs.current_ticket
960
  if ticket is None:
961
  break
 
 
 
 
 
962
 
963
  ticket_with_context = merge_ticket_context(ticket, obs)
964
  action, action_source, fallback_reason = build_action(
 
195
 
196
  def build_llm_user_message(ticket: dict, allowed_fields: list[str], instructions: str) -> str:
197
  ambiguity_note = ticket.get("ambiguity_note")
198
+ planning_note = ticket.get("planning_note")
199
  related_preview = ticket.get("related_ticket_preview") or {}
200
  last_tool_result = ticket.get("last_tool_result")
201
  context_status = ticket.get("context_status") or {}
 
205
  investigation_budget_remaining = ticket.get("investigation_budget_remaining")
206
  average_score_so_far = ticket.get("average_score_so_far")
207
  progress_fraction = ticket.get("progress_fraction")
208
+ capacity_state = ticket.get("capacity_state")
209
+ future_queue_demand = ticket.get("future_queue_demand")
210
+ routing_options = ticket.get("routing_options") or []
211
  extra_context_lines: list[str] = []
212
  if ambiguity_note:
213
  extra_context_lines.append(f"Ambiguity note: {ambiguity_note}")
214
+ if planning_note:
215
+ extra_context_lines.append(f"Planning note: {planning_note}")
216
  if related_preview:
217
  extra_context_lines.extend(
218
  [
 
230
  extra_context_lines.append(
231
  "Context status: " + json.dumps(context_status, sort_keys=True)
232
  )
233
+ if capacity_state:
234
+ extra_context_lines.append(
235
+ "Queue capacity state: " + json.dumps(capacity_state, sort_keys=True)
236
+ )
237
+ if future_queue_demand:
238
+ extra_context_lines.append(
239
+ "Future queue demand: " + json.dumps(future_queue_demand, sort_keys=True)
240
+ )
241
+ if routing_options:
242
+ extra_context_lines.append(
243
+ "Routing options: " + json.dumps(routing_options, sort_keys=True)
244
+ )
245
  if feedback_summary:
246
  extra_context_lines.append(f"Latest environment feedback: {feedback_summary}")
247
  if last_reward_components:
 
311
 
312
 
313
  def clamp_reported_score(score: float) -> float:
314
+ return max(0.0, min(1.0, score))
315
 
316
 
317
  def _format_action_for_log(action: HelpdeskTicketAction) -> str:
 
571
  def build_routing_text(ticket: dict) -> str:
572
  related_preview = ticket.get("related_ticket_preview") or {}
573
  last_tool_result = ticket.get("last_tool_result") or {}
574
+ routing_options = ticket.get("routing_options") or []
575
  return " ".join(
576
  [
577
  ticket.get("title", ""),
578
  ticket.get("description", ""),
579
  ticket.get("ambiguity_note", ""),
580
+ ticket.get("planning_note", ""),
581
  related_preview.get("title", ""),
582
  related_preview.get("description", ""),
583
  json.dumps(last_tool_result, sort_keys=True),
584
+ json.dumps(routing_options, sort_keys=True),
585
+ json.dumps(ticket.get("capacity_state") or {}, sort_keys=True),
586
+ json.dumps(ticket.get("future_queue_demand") or {}, sort_keys=True),
587
  ]
588
  ).lower()
589
 
 
653
  return result
654
 
655
 
656
+ def _get_routing_options(ticket: dict[str, Any]) -> list[dict[str, Any]]:
657
+ options = ticket.get("routing_options") or []
658
+ return [option for option in options if isinstance(option, dict)]
659
+
660
+
661
+ def _get_routing_option_by_label(
662
+ ticket: dict[str, Any],
663
+ label: str | None,
664
+ ) -> dict[str, Any] | None:
665
+ if label is None:
666
+ return None
667
+ for option in _get_routing_options(ticket):
668
+ if option.get("label") == label:
669
+ return option
670
+ return None
671
+
672
+
673
+ def _route_option_fields_match(
674
+ option: dict[str, Any],
675
+ candidate: dict[str, Any],
676
+ allowed_fields: list[str],
677
+ ) -> bool:
678
+ for field in ("issue_type", "priority", "assignment_group", "resolution_action"):
679
+ if field not in allowed_fields:
680
+ continue
681
+ option_value = option.get(field)
682
+ candidate_value = candidate.get(field)
683
+ if option_value is None or candidate_value is None:
684
+ continue
685
+ if str(option_value) != str(candidate_value):
686
+ return False
687
+ return True
688
+
689
+
690
+ def _preferred_routing_label(ticket: dict[str, Any]) -> str | None:
691
+ last_tool_result = ticket.get("last_tool_result") or {}
692
+ tool_name = str(last_tool_result.get("tool_name", "") or "")
693
+ preferred_label = str(last_tool_result.get("preferred_route_label", "") or "")
694
+ if tool_name == "lookup_queue_capacity_forecast" and preferred_label in {
695
+ "primary",
696
+ "alternate",
697
+ }:
698
+ return preferred_label
699
+ return None
700
+
701
+
702
+ def apply_capacity_planning_overrides(
703
+ ticket: dict[str, Any],
704
+ candidate: dict[str, Any],
705
+ allowed_fields: list[str],
706
+ ) -> tuple[dict[str, Any], list[str]]:
707
+ updated = dict(candidate)
708
+ reasons: list[str] = []
709
+ preferred_label = _preferred_routing_label(ticket)
710
+ preferred_option = _get_routing_option_by_label(ticket, preferred_label)
711
+ if preferred_option is None:
712
+ return updated, reasons
713
+
714
+ current_matching_label = None
715
+ for option in _get_routing_options(ticket):
716
+ if _route_option_fields_match(option, updated, allowed_fields):
717
+ current_matching_label = option.get("label")
718
+ break
719
+
720
+ if current_matching_label == preferred_label:
721
+ return updated, reasons
722
+
723
+ for field in ("issue_type", "priority", "assignment_group", "resolution_action"):
724
+ if field not in allowed_fields:
725
+ continue
726
+ option_value = preferred_option.get(field)
727
+ if option_value is None:
728
+ continue
729
+ updated[field] = option_value
730
+
731
+ last_tool_result = ticket.get("last_tool_result") or {}
732
+ reasons.append(
733
+ "planning_override="
734
+ f"{preferred_label}(primary_pressure={last_tool_result.get('primary_pressure')},"
735
+ f"alternate_pressure={last_tool_result.get('alternate_pressure')})"
736
+ )
737
+ return updated, reasons
738
+
739
+
740
  def apply_domain_overrides(
741
  ticket: dict, candidate: dict[str, Any], allowed_fields: list[str]
742
  ) -> tuple[dict[str, Any], list[str]]:
 
804
  ticket: dict, allowed_fields: list[str], instructions: str
805
  ) -> tuple[HelpdeskTicketAction, str, str | None]:
806
  heuristic_dict = heuristic_action(ticket, allowed_fields)
807
+ heuristic_dict, heuristic_override_reasons = apply_domain_overrides(
808
+ ticket,
809
+ heuristic_dict,
810
+ allowed_fields,
811
+ )
812
+ heuristic_dict, heuristic_planning_reasons = apply_capacity_planning_overrides(
813
+ ticket,
814
+ heuristic_dict,
815
+ allowed_fields,
816
+ )
817
 
818
  if llm_client is None:
819
+ fallback_reason = None
820
+ reason_parts = []
821
+ if heuristic_override_reasons:
822
+ reason_parts.append(f"domain_overrides={heuristic_override_reasons}")
823
+ if heuristic_planning_reasons:
824
+ reason_parts.append(f"planning_overrides={heuristic_planning_reasons}")
825
+ if reason_parts:
826
+ fallback_reason = "; ".join(reason_parts)
827
+ return HelpdeskTicketAction(**heuristic_dict), "heuristic", fallback_reason
828
 
829
  try:
830
  llm_dict = call_llm(ticket, allowed_fields, instructions)
 
856
  candidate,
857
  allowed_fields,
858
  )
859
+ candidate, planning_override_reasons = apply_capacity_planning_overrides(
860
+ ticket,
861
+ candidate,
862
+ allowed_fields,
863
+ )
864
 
865
  backfilled_fields = [field for field in allowed_fields if field not in accepted_fields]
866
+ if (
867
+ backfilled_fields
868
+ or rejected_fields
869
+ or override_reasons
870
+ or planning_override_reasons
871
+ ):
872
  reason_parts = []
873
  if backfilled_fields:
874
  reason_parts.append(f"heuristic_backfill={backfilled_fields}")
 
876
  reason_parts.append(f"invalid_llm_fields={rejected_fields}")
877
  if override_reasons:
878
  reason_parts.append(f"domain_overrides={override_reasons}")
879
+ if planning_override_reasons:
880
+ reason_parts.append(f"planning_overrides={planning_override_reasons}")
881
  return (
882
  HelpdeskTicketAction(**candidate),
883
  "llm_backfilled",
 
889
  return (
890
  HelpdeskTicketAction(**heuristic_dict),
891
  "heuristic_fallback",
892
+ "; ".join(
893
+ part
894
+ for part in (
895
+ str(exc),
896
+ (
897
+ f"domain_overrides={heuristic_override_reasons}"
898
+ if heuristic_override_reasons
899
+ else None
900
+ ),
901
+ (
902
+ f"planning_overrides={heuristic_planning_reasons}"
903
+ if heuristic_planning_reasons
904
+ else None
905
+ ),
906
+ )
907
+ if part
908
+ ),
909
  )
910
 
911
 
 
1010
  if hidden_context_remaining:
1011
  preferred_tools.extend(
1012
  [
1013
+ "lookup_queue_capacity_forecast",
1014
  "lookup_related_ticket",
1015
  "lookup_internal_routing_note",
1016
  "lookup_requester_history",
 
1046
  observation_metadata = getattr(observation, "metadata", {}) or {}
1047
  if observation_metadata.get("last_feedback_summary"):
1048
  merged_ticket["feedback_summary"] = observation_metadata["last_feedback_summary"]
1049
+ if observation_metadata.get("capacity_state") is not None:
1050
+ merged_ticket["capacity_state"] = observation_metadata["capacity_state"]
1051
+ if observation_metadata.get("future_queue_demand") is not None:
1052
+ merged_ticket["future_queue_demand"] = observation_metadata["future_queue_demand"]
1053
+ if observation_metadata.get("planning_penalty_total") is not None:
1054
+ merged_ticket["planning_penalty_total"] = observation_metadata["planning_penalty_total"]
1055
+ if observation_metadata.get("planning_penalty_applied") is not None:
1056
+ merged_ticket["planning_penalty_applied"] = observation_metadata["planning_penalty_applied"]
1057
  return merged_ticket
1058
 
1059
 
 
1095
  if ticket is None:
1096
  break
1097
 
1098
+ while getattr(obs, "investigation_budget_remaining", 0) > 0:
1099
+ investigate, tool_name = should_investigate(ticket, obs.history)
1100
+ if not investigate or tool_name is None:
1101
+ break
 
 
1102
  tool_action = HelpdeskTicketAction(
1103
  action_type="investigate",
1104
  tool_name=tool_name,
 
1107
  result = sync_client.step(tool_action)
1108
  obs = result.observation
1109
  step_num += 1
1110
+ reward = float(result.reward or 0.0)
1111
+ if result.reward is not None:
1112
+ task_step_rewards.append(reward)
1113
  log_step(
1114
  step=step_num,
1115
  action=tool_action,
1116
+ reward=reward,
1117
  done=bool(result.done),
1118
  error=None,
1119
  )
 
1122
  ticket = obs.current_ticket
1123
  if ticket is None:
1124
  break
1125
+ if result.done:
1126
+ break
1127
+ ticket = obs.current_ticket
1128
+ if ticket is None:
1129
+ break
1130
 
1131
  ticket_with_context = merge_ticket_context(ticket, obs)
1132
  action, action_source, fallback_reason = build_action(
models.py CHANGED
@@ -19,6 +19,7 @@ RESOLUTION_ACTION_SET = set(RESOLUTION_ACTIONS)
19
  ACTION_TYPE_SET = {"submit", "investigate"}
20
  TOOL_NAME_SET = {"lookup_related_ticket", "lookup_requester_history"}
21
  TOOL_NAME_SET.add("lookup_internal_routing_note")
 
22
 
23
 
24
  def _validate_choice(value: str, allowed: set[str], field_name: str) -> str:
@@ -47,6 +48,12 @@ class HelpdeskTicketRecord(BaseModel):
47
  resolution_action: str
48
  ambiguity_note: Optional[str] = None
49
  related_ticket_id: Optional[str] = None
 
 
 
 
 
 
50
 
51
  @field_validator("issue_type")
52
  @classmethod
@@ -68,6 +75,44 @@ class HelpdeskTicketRecord(BaseModel):
68
  def validate_resolution_action(cls, value: str) -> str:
69
  return _validate_choice(value, RESOLUTION_ACTION_SET, "resolution_action")
70
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
71
 
72
  class HelpdeskTicketAction(Action):
73
  action_type: str = "submit"
@@ -146,7 +191,16 @@ class HelpdeskTicketState(State):
146
  investigation_steps: int = 0
147
  investigation_budget_remaining: int = 0
148
  investigation_penalty_applied: float = 0.0
 
149
  last_tool_result: Optional[dict[str, Any]] = None
150
  last_reward_components: dict[str, Any] = Field(default_factory=dict)
151
  ticket_tool_usage: dict[str, list[str]] = Field(default_factory=dict)
 
 
 
 
 
 
 
 
152
  history_entries: list[dict] = Field(default_factory=list)
 
19
  ACTION_TYPE_SET = {"submit", "investigate"}
20
  TOOL_NAME_SET = {"lookup_related_ticket", "lookup_requester_history"}
21
  TOOL_NAME_SET.add("lookup_internal_routing_note")
22
+ TOOL_NAME_SET.add("lookup_queue_capacity_forecast")
23
 
24
 
25
  def _validate_choice(value: str, allowed: set[str], field_name: str) -> str:
 
48
  resolution_action: str
49
  ambiguity_note: Optional[str] = None
50
  related_ticket_id: Optional[str] = None
51
+ planning_note: Optional[str] = None
52
+ alternate_issue_type: Optional[str] = None
53
+ alternate_priority: Optional[str] = None
54
+ alternate_assignment_group: Optional[str] = None
55
+ alternate_resolution_action: Optional[str] = None
56
+ alternate_route_score_multiplier: float = 0.0
57
 
58
  @field_validator("issue_type")
59
  @classmethod
 
75
  def validate_resolution_action(cls, value: str) -> str:
76
  return _validate_choice(value, RESOLUTION_ACTION_SET, "resolution_action")
77
 
78
+ @field_validator("alternate_issue_type")
79
+ @classmethod
80
+ def validate_alternate_issue_type(cls, value: Optional[str]) -> Optional[str]:
81
+ return _validate_optional_choice(value, ISSUE_TYPE_SET, "alternate_issue_type")
82
+
83
+ @field_validator("alternate_priority")
84
+ @classmethod
85
+ def validate_alternate_priority(cls, value: Optional[str]) -> Optional[str]:
86
+ return _validate_optional_choice(value, PRIORITY_SET, "alternate_priority")
87
+
88
+ @field_validator("alternate_assignment_group")
89
+ @classmethod
90
+ def validate_alternate_assignment_group(cls, value: Optional[str]) -> Optional[str]:
91
+ return _validate_optional_choice(
92
+ value,
93
+ ASSIGNMENT_GROUP_SET,
94
+ "alternate_assignment_group",
95
+ )
96
+
97
+ @field_validator("alternate_resolution_action")
98
+ @classmethod
99
+ def validate_alternate_resolution_action(
100
+ cls,
101
+ value: Optional[str],
102
+ ) -> Optional[str]:
103
+ return _validate_optional_choice(
104
+ value,
105
+ RESOLUTION_ACTION_SET,
106
+ "alternate_resolution_action",
107
+ )
108
+
109
+ @field_validator("alternate_route_score_multiplier")
110
+ @classmethod
111
+ def validate_alternate_route_score_multiplier(cls, value: float) -> float:
112
+ if not 0.0 <= value <= 1.0:
113
+ raise ValueError("alternate_route_score_multiplier must be in [0.0, 1.0]")
114
+ return value
115
+
116
 
117
  class HelpdeskTicketAction(Action):
118
  action_type: str = "submit"
 
191
  investigation_steps: int = 0
192
  investigation_budget_remaining: int = 0
193
  investigation_penalty_applied: float = 0.0
194
+ planning_penalty_applied: float = 0.0
195
  last_tool_result: Optional[dict[str, Any]] = None
196
  last_reward_components: dict[str, Any] = Field(default_factory=dict)
197
  ticket_tool_usage: dict[str, list[str]] = Field(default_factory=dict)
198
+ team_capacity_initial: dict[str, int] = Field(default_factory=dict)
199
+ team_capacity_remaining: dict[str, int] = Field(default_factory=dict)
200
+ high_priority_slots_initial: int = 0
201
+ high_priority_slots_remaining: int = 0
202
+ escalation_slots_initial: int = 0
203
+ escalation_slots_remaining: int = 0
204
+ planning_penalty_total: float = 0.0
205
+ capacity_pressure_tickets_resolved: int = 0
206
  history_entries: list[dict] = Field(default_factory=list)
policy_learning.py CHANGED
@@ -88,6 +88,7 @@ AVAILABLE_TOOLS = (
88
  "lookup_related_ticket",
89
  "lookup_requester_history",
90
  "lookup_internal_routing_note",
 
91
  )
92
 
93
 
@@ -229,6 +230,11 @@ def default_submit_builder(
229
  inference = importlib.import_module("inference")
230
  candidate = inference.heuristic_action(ticket, allowed_fields)
231
  candidate, _ = inference.apply_domain_overrides(ticket, candidate, allowed_fields)
 
 
 
 
 
232
  return HelpdeskTicketAction(**candidate)
233
 
234
 
@@ -237,7 +243,11 @@ def _routing_text(ticket: dict[str, Any]) -> str:
237
  str(ticket.get("title", "")),
238
  str(ticket.get("description", "")),
239
  str(ticket.get("ambiguity_note", "")),
 
240
  json.dumps(ticket.get("last_tool_result") or {}, sort_keys=True),
 
 
 
241
  ]
242
  related_preview = ticket.get("related_ticket_preview") or {}
243
  parts.extend(
@@ -251,6 +261,24 @@ def _routing_text(ticket: dict[str, Any]) -> str:
251
 
252
  def infer_ticket_cue(ticket: dict[str, Any]) -> str:
253
  text = _routing_text(ticket)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
254
  if any(
255
  phrase in text
256
  for phrase in ("re:", "follow-up", "following up", "regression", "reference ticket", "third update")
@@ -297,14 +325,20 @@ def preferred_tool_order(
297
  hidden_context_remaining: bool,
298
  ) -> list[str]:
299
  text = _routing_text(ticket)
 
300
  last_tool_result = ticket.get("last_tool_result") or {}
301
  last_tool_name = str(last_tool_result.get("tool_name", "") or "")
 
302
 
303
  preferred_tools: list[str] = []
 
 
304
  if last_tool_name == "lookup_related_ticket":
305
  preferred_tools.append("lookup_requester_history")
306
  if last_tool_name == "lookup_requester_history":
307
  preferred_tools.append("lookup_internal_routing_note")
 
 
308
 
309
  if any(
310
  phrase in text
@@ -336,9 +370,15 @@ def preferred_tool_order(
336
  ):
337
  preferred_tools.append("lookup_requester_history")
338
 
 
 
 
 
 
339
  if hidden_context_remaining:
340
  preferred_tools.extend(
341
  [
 
342
  "lookup_internal_routing_note",
343
  "lookup_related_ticket",
344
  "lookup_requester_history",
@@ -545,6 +585,8 @@ def rollout_episode(
545
  "terminal_reward": terminal_reward,
546
  "terminal_rubric_reward": terminal_rubric_reward,
547
  "average_ticket_score": env.state.average_score_so_far,
 
 
548
  "per_ticket_scores": list(env.state.per_ticket_scores),
549
  }
550
  if adaptive_bandit is not None and policy.strategy == "adaptive":
@@ -583,6 +625,15 @@ def summarize_policy_episodes(
583
  "avg_terminal_rubric_reward": _safe_mean(
584
  [float(episode["terminal_rubric_reward"]) for episode in task_episodes]
585
  ),
 
 
 
 
 
 
 
 
 
586
  "avg_investigation_steps": _safe_mean(
587
  [float(episode["investigation_steps"]) for episode in task_episodes]
588
  ),
@@ -604,6 +655,15 @@ def summarize_policy_episodes(
604
  "avg_terminal_rubric_reward": _safe_mean(
605
  [float(episode["terminal_rubric_reward"]) for episode in episode_summaries]
606
  ),
 
 
 
 
 
 
 
 
 
607
  "avg_investigation_steps": _safe_mean(
608
  [float(episode["investigation_steps"]) for episode in episode_summaries]
609
  ),
@@ -653,11 +713,12 @@ def evaluate_policy(
653
  return result
654
 
655
 
656
- def _selection_tuple(summary: dict[str, Any]) -> tuple[float, float, float, float]:
657
  return (
658
- float(summary["avg_normalized_return"]),
659
- float(summary["avg_terminal_reward"]),
660
  float(summary["avg_terminal_rubric_reward"]),
 
 
 
661
  -float(summary["avg_investigation_steps"]),
662
  )
663
 
@@ -713,7 +774,7 @@ def compare_policies(
713
  "mode": "compare",
714
  "task_ids": task_ids,
715
  "seeds": seeds,
716
- "selection_metric": "avg_normalized_return",
717
  "baseline_policy": baseline_run["policy"],
718
  "best_policy": best_run["policy"],
719
  "improvement_vs_baseline": {
@@ -731,6 +792,11 @@ def compare_policies(
731
  baseline_run["summary"],
732
  "avg_terminal_rubric_reward",
733
  ),
 
 
 
 
 
734
  },
735
  "policy_summaries": [run["summary"] for run in policy_runs],
736
  "ranking": [
@@ -825,7 +891,7 @@ def search_policies(
825
  "task_ids": task_ids,
826
  "train_seeds": train_seeds,
827
  "eval_seeds": eval_seeds,
828
- "selection_metric": "avg_normalized_return",
829
  "candidate_policies": [policy.name for policy in candidate_policies],
830
  "selected_policy": selected_policy.name,
831
  "baseline_policy": baseline_policy.name,
@@ -856,6 +922,11 @@ def search_policies(
856
  eval_baseline["summary"],
857
  "avg_terminal_rubric_reward",
858
  ),
 
 
 
 
 
859
  },
860
  "artifacts": {
861
  "summary": str(output_dir / "search_summary.json"),
@@ -975,6 +1046,7 @@ def _print_summary(label: str, summary: dict[str, Any]) -> None:
975
  "avg_normalized_return": summary["avg_normalized_return"],
976
  "avg_terminal_reward": summary["avg_terminal_reward"],
977
  "avg_terminal_rubric_reward": summary["avg_terminal_rubric_reward"],
 
978
  "avg_investigation_steps": summary["avg_investigation_steps"],
979
  }
980
  },
 
88
  "lookup_related_ticket",
89
  "lookup_requester_history",
90
  "lookup_internal_routing_note",
91
+ "lookup_queue_capacity_forecast",
92
  )
93
 
94
 
 
230
  inference = importlib.import_module("inference")
231
  candidate = inference.heuristic_action(ticket, allowed_fields)
232
  candidate, _ = inference.apply_domain_overrides(ticket, candidate, allowed_fields)
233
+ candidate, _ = inference.apply_capacity_planning_overrides(
234
+ ticket,
235
+ candidate,
236
+ allowed_fields,
237
+ )
238
  return HelpdeskTicketAction(**candidate)
239
 
240
 
 
243
  str(ticket.get("title", "")),
244
  str(ticket.get("description", "")),
245
  str(ticket.get("ambiguity_note", "")),
246
+ str(ticket.get("planning_note", "")),
247
  json.dumps(ticket.get("last_tool_result") or {}, sort_keys=True),
248
+ json.dumps(ticket.get("routing_options") or [], sort_keys=True),
249
+ json.dumps(ticket.get("capacity_state") or {}, sort_keys=True),
250
+ json.dumps(ticket.get("future_queue_demand") or {}, sort_keys=True),
251
  ]
252
  related_preview = ticket.get("related_ticket_preview") or {}
253
  parts.extend(
 
261
 
262
  def infer_ticket_cue(ticket: dict[str, Any]) -> str:
263
  text = _routing_text(ticket)
264
+ context_status = ticket.get("context_status") or {}
265
+ if (
266
+ ticket.get("planning_note")
267
+ or ticket.get("routing_options")
268
+ or "lookup_queue_capacity_forecast"
269
+ in (context_status.get("recommended_tools") or [])
270
+ or any(
271
+ phrase in text
272
+ for phrase in (
273
+ "capacity",
274
+ "saturated",
275
+ "backlog",
276
+ "resource pressure",
277
+ "alternate route",
278
+ )
279
+ )
280
+ ):
281
+ return "capacity_planning"
282
  if any(
283
  phrase in text
284
  for phrase in ("re:", "follow-up", "following up", "regression", "reference ticket", "third update")
 
325
  hidden_context_remaining: bool,
326
  ) -> list[str]:
327
  text = _routing_text(ticket)
328
+ context_status = ticket.get("context_status") or {}
329
  last_tool_result = ticket.get("last_tool_result") or {}
330
  last_tool_name = str(last_tool_result.get("tool_name", "") or "")
331
+ recommended_tools = list(context_status.get("recommended_tools") or [])
332
 
333
  preferred_tools: list[str] = []
334
+ if "lookup_queue_capacity_forecast" in recommended_tools:
335
+ preferred_tools.append("lookup_queue_capacity_forecast")
336
  if last_tool_name == "lookup_related_ticket":
337
  preferred_tools.append("lookup_requester_history")
338
  if last_tool_name == "lookup_requester_history":
339
  preferred_tools.append("lookup_internal_routing_note")
340
+ if last_tool_name == "lookup_internal_routing_note":
341
+ preferred_tools.append("lookup_queue_capacity_forecast")
342
 
343
  if any(
344
  phrase in text
 
370
  ):
371
  preferred_tools.append("lookup_requester_history")
372
 
373
+ if infer_ticket_cue(ticket) == "capacity_planning":
374
+ preferred_tools.append("lookup_queue_capacity_forecast")
375
+
376
+ preferred_tools.extend(recommended_tools)
377
+
378
  if hidden_context_remaining:
379
  preferred_tools.extend(
380
  [
381
+ "lookup_queue_capacity_forecast",
382
  "lookup_internal_routing_note",
383
  "lookup_related_ticket",
384
  "lookup_requester_history",
 
585
  "terminal_reward": terminal_reward,
586
  "terminal_rubric_reward": terminal_rubric_reward,
587
  "average_ticket_score": env.state.average_score_so_far,
588
+ "planning_penalty_total": env.state.planning_penalty_total,
589
+ "capacity_pressure_tickets_resolved": env.state.capacity_pressure_tickets_resolved,
590
  "per_ticket_scores": list(env.state.per_ticket_scores),
591
  }
592
  if adaptive_bandit is not None and policy.strategy == "adaptive":
 
625
  "avg_terminal_rubric_reward": _safe_mean(
626
  [float(episode["terminal_rubric_reward"]) for episode in task_episodes]
627
  ),
628
+ "avg_planning_penalty_total": _safe_mean(
629
+ [float(episode["planning_penalty_total"]) for episode in task_episodes]
630
+ ),
631
+ "avg_capacity_pressure_tickets_resolved": _safe_mean(
632
+ [
633
+ float(episode["capacity_pressure_tickets_resolved"])
634
+ for episode in task_episodes
635
+ ]
636
+ ),
637
  "avg_investigation_steps": _safe_mean(
638
  [float(episode["investigation_steps"]) for episode in task_episodes]
639
  ),
 
655
  "avg_terminal_rubric_reward": _safe_mean(
656
  [float(episode["terminal_rubric_reward"]) for episode in episode_summaries]
657
  ),
658
+ "avg_planning_penalty_total": _safe_mean(
659
+ [float(episode["planning_penalty_total"]) for episode in episode_summaries]
660
+ ),
661
+ "avg_capacity_pressure_tickets_resolved": _safe_mean(
662
+ [
663
+ float(episode["capacity_pressure_tickets_resolved"])
664
+ for episode in episode_summaries
665
+ ]
666
+ ),
667
  "avg_investigation_steps": _safe_mean(
668
  [float(episode["investigation_steps"]) for episode in episode_summaries]
669
  ),
 
713
  return result
714
 
715
 
716
+ def _selection_tuple(summary: dict[str, Any]) -> tuple[float, float, float, float, float]:
717
  return (
 
 
718
  float(summary["avg_terminal_rubric_reward"]),
719
+ -float(summary["avg_planning_penalty_total"]),
720
+ float(summary["avg_episode_return"]),
721
+ float(summary["avg_normalized_return"]),
722
  -float(summary["avg_investigation_steps"]),
723
  )
724
 
 
774
  "mode": "compare",
775
  "task_ids": task_ids,
776
  "seeds": seeds,
777
+ "selection_metric": "avg_terminal_rubric_reward_then_lower_planning_penalty",
778
  "baseline_policy": baseline_run["policy"],
779
  "best_policy": best_run["policy"],
780
  "improvement_vs_baseline": {
 
792
  baseline_run["summary"],
793
  "avg_terminal_rubric_reward",
794
  ),
795
+ "avg_planning_penalty_total": _delta(
796
+ best_run["summary"],
797
+ baseline_run["summary"],
798
+ "avg_planning_penalty_total",
799
+ ),
800
  },
801
  "policy_summaries": [run["summary"] for run in policy_runs],
802
  "ranking": [
 
891
  "task_ids": task_ids,
892
  "train_seeds": train_seeds,
893
  "eval_seeds": eval_seeds,
894
+ "selection_metric": "avg_terminal_rubric_reward_then_lower_planning_penalty",
895
  "candidate_policies": [policy.name for policy in candidate_policies],
896
  "selected_policy": selected_policy.name,
897
  "baseline_policy": baseline_policy.name,
 
922
  eval_baseline["summary"],
923
  "avg_terminal_rubric_reward",
924
  ),
925
+ "avg_planning_penalty_total": _delta(
926
+ eval_selected["summary"],
927
+ eval_baseline["summary"],
928
+ "avg_planning_penalty_total",
929
+ ),
930
  },
931
  "artifacts": {
932
  "summary": str(output_dir / "search_summary.json"),
 
1046
  "avg_normalized_return": summary["avg_normalized_return"],
1047
  "avg_terminal_reward": summary["avg_terminal_reward"],
1048
  "avg_terminal_rubric_reward": summary["avg_terminal_rubric_reward"],
1049
+ "avg_planning_penalty_total": summary["avg_planning_penalty_total"],
1050
  "avg_investigation_steps": summary["avg_investigation_steps"],
1051
  }
1052
  },
server/environment.py CHANGED
@@ -31,6 +31,7 @@ AVAILABLE_TOOLS = (
31
  "lookup_related_ticket",
32
  "lookup_requester_history",
33
  "lookup_internal_routing_note",
 
34
  )
35
  FREE_INVESTIGATIONS_PER_TICKET = 1
36
  EXTRA_INVESTIGATION_COST = 0.04
@@ -44,6 +45,10 @@ PRIORITY_UNDERSHOOT_PENALTY = 0.03
44
  SEVERE_PRIORITY_UNDERSHOOT_PENALTY = 0.07
45
  DANGEROUS_RESOLUTION_PENALTY = 0.05
46
  NONDEFAULT_ROUTING_FOLLOWTHROUGH_BONUS = 0.02
 
 
 
 
47
 
48
  TASK3_INVESTIGATION_TOOL_PLAN: dict[str, tuple[str, ...]] = {
49
  "ticket-021": ("lookup_related_ticket", "lookup_requester_history"),
@@ -161,6 +166,11 @@ class HelpdeskTicketRoutingEnvironment(
161
  else:
162
  queue_size = min(queue_size_value, len(self._dataset))
163
  self._queue = self._rng.sample(self._dataset, min(queue_size, len(self._dataset)))
 
 
 
 
 
164
 
165
  self._state = HelpdeskTicketState(
166
  episode_id=episode_id or str(uuid.uuid4()),
@@ -174,8 +184,17 @@ class HelpdeskTicketRoutingEnvironment(
174
  average_score_so_far=0.0,
175
  investigation_budget_remaining=queue_size * FREE_INVESTIGATIONS_PER_TICKET,
176
  investigation_penalty_applied=0.0,
 
177
  last_reward_components={},
178
  ticket_tool_usage={},
 
 
 
 
 
 
 
 
179
  )
180
 
181
  return self._build_observation(task)
@@ -298,6 +317,10 @@ class HelpdeskTicketRoutingEnvironment(
298
  action,
299
  task_id=task_id,
300
  )
 
 
 
 
301
  step_adjustments = compute_step_adjustments(
302
  score,
303
  previous_average=previous_average,
@@ -321,11 +344,17 @@ class HelpdeskTicketRoutingEnvironment(
321
  self._state.per_ticket_scores,
322
  len(self._queue),
323
  self._state.step_count,
324
- completion_bonus=self._trajectory_consistency_bonus(),
 
 
325
  )
326
  trajectory_reward = trajectory_components["final_reward"]
327
- rubric_reward = self._apply_episode_economics(trajectory_reward)
328
- final_reward = clamp_open_unit_interval(rubric_reward - context_penalty)
 
 
 
 
329
  self._state.total_reward = rubric_reward
330
  investigation_penalty = self._compute_episode_penalty()
331
  else:
@@ -333,7 +362,9 @@ class HelpdeskTicketRoutingEnvironment(
333
  self._state.average_score_so_far = self._current_average_score()
334
  self._state.step_count += 1
335
  self._state.current_ticket_index += 1
336
- final_reward = clamp_open_unit_interval(step_reward - context_penalty)
 
 
337
 
338
  reward_components = self._build_reward_components(
339
  ticket_score=score,
@@ -348,12 +379,18 @@ class HelpdeskTicketRoutingEnvironment(
348
  "context_gap_penalty": context_penalty,
349
  "context_completion_bonus": process_bonus,
350
  "risk_penalty": risk_penalty,
 
351
  "delta_adjustment": step_adjustments["delta_adjustment"],
352
  "required_investigation_count": len(self._required_tools_for_ticket(current_ticket)),
353
  "hidden_context_remaining_count": missing_required_count,
354
  "hidden_context_revealed_count": len(
355
  self._used_tools_for_ticket(current_ticket.ticket_id)
356
  ),
 
 
 
 
 
357
  "rubric_reward": rubric_reward,
358
  "trajectory_average_reward": (
359
  trajectory_components["average_reward"]
@@ -372,6 +409,7 @@ class HelpdeskTicketRoutingEnvironment(
372
  ),
373
  },
374
  )
 
375
 
376
  history_entry = self._build_history_entry(
377
  current_ticket,
@@ -390,6 +428,7 @@ class HelpdeskTicketRoutingEnvironment(
390
  self._state.reward = final_reward
391
  self._state.done = is_done
392
  self._state.investigation_penalty_applied = self._compute_episode_penalty()
 
393
  self._state.last_tool_result = None
394
  self._state.last_reward_components = reward_components
395
 
@@ -425,14 +464,373 @@ class HelpdeskTicketRoutingEnvironment(
425
  return 0.0
426
  return sum(self._state.per_ticket_scores) / len(self._state.per_ticket_scores)
427
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
428
  def _internal_routing_note_for_ticket(
429
  self,
430
  ticket: HelpdeskTicketRecord,
431
  ) -> str | None:
432
- if ticket.ambiguity_note is not None:
433
- return ticket.ambiguity_note
434
  if self._state.current_task_id != 3:
435
- return None
 
 
 
 
 
 
436
 
437
  default_group = ISSUE_TYPE_TO_ASSIGNMENT_GROUP.get(
438
  ticket.issue_type,
@@ -442,7 +840,6 @@ class HelpdeskTicketRoutingEnvironment(
442
  ticket.issue_type,
443
  ticket.resolution_action,
444
  )
445
- note_parts: list[str] = []
446
 
447
  if ticket.assignment_group != default_group:
448
  note_parts.append(
@@ -517,6 +914,11 @@ class HelpdeskTicketRoutingEnvironment(
517
  return self._ticket_repeated_requester_count(ticket) >= 2
518
  if tool_name == "lookup_internal_routing_note":
519
  return self._internal_routing_note_for_ticket(ticket) is not None
 
 
 
 
 
520
  return False
521
 
522
  def _required_tools_for_ticket(
@@ -546,6 +948,11 @@ class HelpdeskTicketRoutingEnvironment(
546
  and "lookup_requester_history" not in required_tools
547
  ):
548
  required_tools.append("lookup_requester_history")
 
 
 
 
 
549
  filtered_required_tools: list[str] = []
550
  for tool_name in required_tools:
551
  if tool_name in filtered_required_tools:
@@ -596,6 +1003,11 @@ class HelpdeskTicketRoutingEnvironment(
596
  "The visible request is not enough to choose the final owner and next step. "
597
  "Additional routing context is available via investigation."
598
  )
 
 
 
 
 
599
  if self._ticket_has_nondefault_routing(ticket):
600
  return (
601
  "The visible request looks straightforward, but the decisive routing detail is hidden until investigation."
@@ -609,6 +1021,8 @@ class HelpdeskTicketRoutingEnvironment(
609
  return "Follow-up request with hidden routing context"
610
  if self._internal_routing_note_for_ticket(ticket) is not None:
611
  return "Routing clarification required"
 
 
612
  if self._ticket_mentions_follow_up(ticket):
613
  return "Priority support follow-up"
614
  return "Helpdesk routing decision"
@@ -805,6 +1219,24 @@ class HelpdeskTicketRoutingEnvironment(
805
  "routing_note": routing_note if found else "",
806
  }
807
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
808
  def _run_investigation_tool(
809
  self,
810
  current_ticket: HelpdeskTicketRecord,
@@ -817,6 +1249,8 @@ class HelpdeskTicketRoutingEnvironment(
817
  return self._lookup_requester_history(current_ticket)
818
  if tool_name == "lookup_internal_routing_note":
819
  return self._lookup_internal_routing_note(current_ticket)
 
 
820
  raise ValueError(f"Unsupported tool_name: {tool_name}")
821
 
822
  def _handle_investigation_action(
@@ -901,12 +1335,15 @@ class HelpdeskTicketRoutingEnvironment(
901
  def _build_ticket_view(self, ticket: HelpdeskTicketRecord) -> dict[str, Any]:
902
  progress = self._tool_progress_for_ticket(ticket)
903
  remaining_tools = progress["remaining_tools"]
 
904
  ticket_view: dict[str, Any] = {
905
  "ticket_id": ticket.ticket_id,
906
  "title": self._visible_title(ticket),
907
  "requester": ticket.requester,
908
  "description": self._visible_description(ticket),
909
  }
 
 
910
  if progress["required_tools"]:
911
  ticket_view["context_status"] = {
912
  "investigation_required": True,
@@ -919,6 +1356,11 @@ class HelpdeskTicketRoutingEnvironment(
919
  }
920
  if ticket.ambiguity_note is not None and "lookup_internal_routing_note" not in remaining_tools:
921
  ticket_view["ambiguity_note"] = ticket.ambiguity_note
 
 
 
 
 
922
  if ticket.related_ticket_id is not None and "lookup_related_ticket" not in remaining_tools:
923
  ticket_view["related_ticket_id"] = ticket.related_ticket_id
924
  related_ticket = self._tickets_by_id.get(ticket.related_ticket_id)
@@ -929,6 +1371,11 @@ class HelpdeskTicketRoutingEnvironment(
929
  "requester": related_ticket.requester,
930
  "description": related_ticket.description,
931
  }
 
 
 
 
 
932
  return ticket_view
933
 
934
  def _build_feedback_summary(
@@ -982,6 +1429,12 @@ class HelpdeskTicketRoutingEnvironment(
982
  risk_penalty = reward_components.get("risk_penalty")
983
  if risk_penalty:
984
  parts.append(f"risk_penalty={risk_penalty:.2f}")
 
 
 
 
 
 
985
 
986
  return "; ".join(parts)
987
 
@@ -1011,6 +1464,8 @@ class HelpdeskTicketRoutingEnvironment(
1011
  "breakdown": breakdown,
1012
  "queue_position": queue_position,
1013
  }
 
 
1014
  if reward is not None:
1015
  history_entry["reward"] = reward
1016
  if rubric_reward is not None:
@@ -1019,6 +1474,11 @@ class HelpdeskTicketRoutingEnvironment(
1019
  history_entry["reward_kind"] = reward_kind
1020
  if ticket.ambiguity_note is not None and "lookup_internal_routing_note" not in remaining_tools:
1021
  history_entry["ambiguity_note"] = ticket.ambiguity_note
 
 
 
 
 
1022
  if ticket.related_ticket_id is not None and "lookup_related_ticket" not in remaining_tools:
1023
  history_entry["related_ticket_id"] = ticket.related_ticket_id
1024
  related_ticket = self._tickets_by_id.get(ticket.related_ticket_id)
@@ -1029,6 +1489,14 @@ class HelpdeskTicketRoutingEnvironment(
1029
  "requester": related_ticket.requester,
1030
  "description": related_ticket.description,
1031
  }
 
 
 
 
 
 
 
 
1032
  if penalty_reason is not None:
1033
  history_entry["penalty_reason"] = penalty_reason
1034
  if tool_result is not None:
@@ -1098,7 +1566,12 @@ class HelpdeskTicketRoutingEnvironment(
1098
  "average_score_so_far": self._state.average_score_so_far,
1099
  "progress_fraction": progress_fraction,
1100
  "investigation_penalty_applied": self._state.investigation_penalty_applied,
 
 
1101
  }
 
 
 
1102
  if last_history_entry is not None:
1103
  metadata["last_score"] = last_history_entry.get("score")
1104
  metadata["last_reward"] = last_history_entry.get("reward")
 
31
  "lookup_related_ticket",
32
  "lookup_requester_history",
33
  "lookup_internal_routing_note",
34
+ "lookup_queue_capacity_forecast",
35
  )
36
  FREE_INVESTIGATIONS_PER_TICKET = 1
37
  EXTRA_INVESTIGATION_COST = 0.04
 
45
  SEVERE_PRIORITY_UNDERSHOOT_PENALTY = 0.07
46
  DANGEROUS_RESOLUTION_PENALTY = 0.05
47
  NONDEFAULT_ROUTING_FOLLOWTHROUGH_BONUS = 0.02
48
+ TEAM_CAPACITY_OVERFLOW_PENALTY = 0.08
49
+ HIGH_PRIORITY_SLOT_OVERFLOW_PENALTY = 0.06
50
+ ESCALATION_SLOT_OVERFLOW_PENALTY = 0.05
51
+ PLANNING_SUCCESS_BONUS = 0.05
52
 
53
  TASK3_INVESTIGATION_TOOL_PLAN: dict[str, tuple[str, ...]] = {
54
  "ticket-021": ("lookup_related_ticket", "lookup_requester_history"),
 
166
  else:
167
  queue_size = min(queue_size_value, len(self._dataset))
168
  self._queue = self._rng.sample(self._dataset, min(queue_size, len(self._dataset)))
169
+ (
170
+ team_capacity_initial,
171
+ high_priority_slots_initial,
172
+ escalation_slots_initial,
173
+ ) = self._initial_capacity_state_for_queue(task_id)
174
 
175
  self._state = HelpdeskTicketState(
176
  episode_id=episode_id or str(uuid.uuid4()),
 
184
  average_score_so_far=0.0,
185
  investigation_budget_remaining=queue_size * FREE_INVESTIGATIONS_PER_TICKET,
186
  investigation_penalty_applied=0.0,
187
+ planning_penalty_applied=0.0,
188
  last_reward_components={},
189
  ticket_tool_usage={},
190
+ team_capacity_initial=team_capacity_initial,
191
+ team_capacity_remaining=dict(team_capacity_initial),
192
+ high_priority_slots_initial=high_priority_slots_initial,
193
+ high_priority_slots_remaining=high_priority_slots_initial,
194
+ escalation_slots_initial=escalation_slots_initial,
195
+ escalation_slots_remaining=escalation_slots_initial,
196
+ planning_penalty_total=0.0,
197
+ capacity_pressure_tickets_resolved=0,
198
  )
199
 
200
  return self._build_observation(task)
 
317
  action,
318
  task_id=task_id,
319
  )
320
+ capacity_penalty, capacity_details = self._apply_capacity_usage(
321
+ current_ticket,
322
+ action,
323
+ )
324
  step_adjustments = compute_step_adjustments(
325
  score,
326
  previous_average=previous_average,
 
344
  self._state.per_ticket_scores,
345
  len(self._queue),
346
  self._state.step_count,
347
+ completion_bonus=(
348
+ self._trajectory_consistency_bonus() + self._planning_success_bonus()
349
+ ),
350
  )
351
  trajectory_reward = trajectory_components["final_reward"]
352
+ rubric_reward = self._apply_episode_economics(
353
+ trajectory_reward - self._state.planning_penalty_total
354
+ )
355
+ final_reward = clamp_open_unit_interval(
356
+ rubric_reward - context_penalty - capacity_penalty
357
+ )
358
  self._state.total_reward = rubric_reward
359
  investigation_penalty = self._compute_episode_penalty()
360
  else:
 
362
  self._state.average_score_so_far = self._current_average_score()
363
  self._state.step_count += 1
364
  self._state.current_ticket_index += 1
365
+ final_reward = clamp_open_unit_interval(
366
+ step_reward - context_penalty - capacity_penalty
367
+ )
368
 
369
  reward_components = self._build_reward_components(
370
  ticket_score=score,
 
379
  "context_gap_penalty": context_penalty,
380
  "context_completion_bonus": process_bonus,
381
  "risk_penalty": risk_penalty,
382
+ "capacity_penalty": capacity_penalty,
383
  "delta_adjustment": step_adjustments["delta_adjustment"],
384
  "required_investigation_count": len(self._required_tools_for_ticket(current_ticket)),
385
  "hidden_context_remaining_count": missing_required_count,
386
  "hidden_context_revealed_count": len(
387
  self._used_tools_for_ticket(current_ticket.ticket_id)
388
  ),
389
+ "planning_penalty_total": self._state.planning_penalty_total,
390
+ "planning_penalty_applied": self._state.planning_penalty_applied,
391
+ "planning_success_bonus": self._planning_success_bonus()
392
+ if is_done
393
+ else 0.0,
394
  "rubric_reward": rubric_reward,
395
  "trajectory_average_reward": (
396
  trajectory_components["average_reward"]
 
409
  ),
410
  },
411
  )
412
+ reward_components.update(capacity_details)
413
 
414
  history_entry = self._build_history_entry(
415
  current_ticket,
 
428
  self._state.reward = final_reward
429
  self._state.done = is_done
430
  self._state.investigation_penalty_applied = self._compute_episode_penalty()
431
+ self._state.planning_penalty_applied = capacity_penalty
432
  self._state.last_tool_result = None
433
  self._state.last_reward_components = reward_components
434
 
 
464
  return 0.0
465
  return sum(self._state.per_ticket_scores) / len(self._state.per_ticket_scores)
466
 
467
+ def _ticket_has_alternate_route(self, ticket: HelpdeskTicketRecord) -> bool:
468
+ return any(
469
+ value is not None
470
+ for value in (
471
+ ticket.alternate_issue_type,
472
+ ticket.alternate_priority,
473
+ ticket.alternate_assignment_group,
474
+ ticket.alternate_resolution_action,
475
+ )
476
+ ) and ticket.alternate_route_score_multiplier > 0.0
477
+
478
+ def _route_for_ticket(
479
+ self,
480
+ ticket: HelpdeskTicketRecord,
481
+ *,
482
+ use_alternate: bool = False,
483
+ ) -> dict[str, str]:
484
+ if use_alternate and self._ticket_has_alternate_route(ticket):
485
+ return {
486
+ "issue_type": ticket.alternate_issue_type or ticket.issue_type,
487
+ "priority": ticket.alternate_priority or ticket.priority,
488
+ "assignment_group": (
489
+ ticket.alternate_assignment_group or ticket.assignment_group
490
+ ),
491
+ "resolution_action": (
492
+ ticket.alternate_resolution_action or ticket.resolution_action
493
+ ),
494
+ }
495
+ return {
496
+ "issue_type": ticket.issue_type,
497
+ "priority": ticket.priority,
498
+ "assignment_group": ticket.assignment_group,
499
+ "resolution_action": ticket.resolution_action,
500
+ }
501
+
502
+ def _route_for_action(
503
+ self,
504
+ ticket: HelpdeskTicketRecord,
505
+ action: HelpdeskTicketAction,
506
+ ) -> dict[str, str]:
507
+ primary_route = self._route_for_ticket(ticket)
508
+ return {
509
+ "issue_type": action.issue_type or primary_route["issue_type"],
510
+ "priority": action.priority or primary_route["priority"],
511
+ "assignment_group": (
512
+ action.assignment_group or primary_route["assignment_group"]
513
+ ),
514
+ "resolution_action": (
515
+ action.resolution_action or primary_route["resolution_action"]
516
+ ),
517
+ }
518
+
519
+ def _route_capacity_cost(self, route: dict[str, str]) -> dict[str, Any]:
520
+ return {
521
+ "assignment_group": route["assignment_group"],
522
+ "team_slots": 1,
523
+ "high_priority_slots": 1
524
+ if route["priority"] in {"high", "critical"}
525
+ else 0,
526
+ "escalation_slots": 1
527
+ if route["resolution_action"] in {"assign", "escalate"}
528
+ else 0,
529
+ }
530
+
531
+ def _routing_options_for_ticket(self, ticket: HelpdeskTicketRecord) -> list[dict[str, Any]]:
532
+ options = [
533
+ {
534
+ "label": "primary",
535
+ "score_multiplier": 1.0,
536
+ **self._route_for_ticket(ticket),
537
+ "capacity_cost": self._route_capacity_cost(self._route_for_ticket(ticket)),
538
+ }
539
+ ]
540
+ if self._ticket_has_alternate_route(ticket):
541
+ alternate_route = self._route_for_ticket(ticket, use_alternate=True)
542
+ options.append(
543
+ {
544
+ "label": "alternate",
545
+ "score_multiplier": ticket.alternate_route_score_multiplier,
546
+ **alternate_route,
547
+ "capacity_cost": self._route_capacity_cost(alternate_route),
548
+ }
549
+ )
550
+ return options
551
+
552
+ def _initial_capacity_state_for_queue(
553
+ self,
554
+ task_id: int,
555
+ ) -> tuple[dict[str, int], int, int]:
556
+ if task_id != 3:
557
+ return {}, 0, 0
558
+
559
+ primary_group_demand: dict[str, int] = {}
560
+ alternate_relief_by_group: dict[str, int] = {}
561
+ all_groups: set[str] = set()
562
+ high_priority_demand = 0
563
+ high_priority_relief = 0
564
+ escalation_demand = 0
565
+ escalation_relief = 0
566
+
567
+ for ticket in self._queue:
568
+ primary_route = self._route_for_ticket(ticket)
569
+ all_groups.add(primary_route["assignment_group"])
570
+ primary_group_demand[primary_route["assignment_group"]] = (
571
+ primary_group_demand.get(primary_route["assignment_group"], 0) + 1
572
+ )
573
+ if primary_route["priority"] in {"high", "critical"}:
574
+ high_priority_demand += 1
575
+ if primary_route["resolution_action"] in {"assign", "escalate"}:
576
+ escalation_demand += 1
577
+
578
+ if self._ticket_has_alternate_route(ticket):
579
+ alternate_route = self._route_for_ticket(ticket, use_alternate=True)
580
+ all_groups.add(alternate_route["assignment_group"])
581
+ if alternate_route["assignment_group"] != primary_route["assignment_group"]:
582
+ alternate_relief_by_group[primary_route["assignment_group"]] = (
583
+ alternate_relief_by_group.get(
584
+ primary_route["assignment_group"],
585
+ 0,
586
+ )
587
+ + 1
588
+ )
589
+ if (
590
+ primary_route["priority"] in {"high", "critical"}
591
+ and alternate_route["priority"] not in {"high", "critical"}
592
+ ):
593
+ high_priority_relief += 1
594
+ if (
595
+ primary_route["resolution_action"] in {"assign", "escalate"}
596
+ and alternate_route["resolution_action"] not in {"assign", "escalate"}
597
+ ):
598
+ escalation_relief += 1
599
+
600
+ team_capacity_initial: dict[str, int] = {}
601
+ for group in sorted(all_groups):
602
+ demand = primary_group_demand.get(group, 0)
603
+ relief = alternate_relief_by_group.get(group, 0)
604
+ if demand <= 1:
605
+ team_capacity_initial[group] = 1 if group in all_groups else 0
606
+ elif relief > 0:
607
+ team_capacity_initial[group] = max(1, demand - 1)
608
+ else:
609
+ team_capacity_initial[group] = demand
610
+
611
+ if high_priority_demand <= 1:
612
+ high_priority_slots_initial = high_priority_demand
613
+ elif high_priority_relief > 0:
614
+ high_priority_slots_initial = max(1, high_priority_demand - 1)
615
+ else:
616
+ high_priority_slots_initial = high_priority_demand
617
+
618
+ if escalation_demand <= 1:
619
+ escalation_slots_initial = escalation_demand
620
+ elif escalation_relief > 0:
621
+ escalation_slots_initial = max(1, escalation_demand - 1)
622
+ else:
623
+ escalation_slots_initial = escalation_demand
624
+
625
+ return (
626
+ team_capacity_initial,
627
+ high_priority_slots_initial,
628
+ escalation_slots_initial,
629
+ )
630
+
631
+ def _future_queue_demand(self) -> dict[str, Any]:
632
+ future_tickets = self._queue[self._state.current_ticket_index + 1 :]
633
+ team_demand: dict[str, int] = {}
634
+ high_priority_needed = 0
635
+ escalation_needed = 0
636
+ capacity_sensitive_tickets = 0
637
+
638
+ for ticket in future_tickets:
639
+ route = self._route_for_ticket(ticket)
640
+ team_demand[route["assignment_group"]] = (
641
+ team_demand.get(route["assignment_group"], 0) + 1
642
+ )
643
+ if route["priority"] in {"high", "critical"}:
644
+ high_priority_needed += 1
645
+ if route["resolution_action"] in {"assign", "escalate"}:
646
+ escalation_needed += 1
647
+ if self._ticket_has_alternate_route(ticket):
648
+ capacity_sensitive_tickets += 1
649
+
650
+ return {
651
+ "remaining_ticket_count": len(future_tickets),
652
+ "team_demand": team_demand,
653
+ "high_priority_needed": high_priority_needed,
654
+ "escalation_needed": escalation_needed,
655
+ "capacity_sensitive_tickets": capacity_sensitive_tickets,
656
+ }
657
+
658
+ def _capacity_state_snapshot(self) -> dict[str, Any]:
659
+ return {
660
+ "team_capacity_remaining": dict(self._state.team_capacity_remaining),
661
+ "team_capacity_initial": dict(self._state.team_capacity_initial),
662
+ "high_priority_slots_remaining": self._state.high_priority_slots_remaining,
663
+ "high_priority_slots_initial": self._state.high_priority_slots_initial,
664
+ "escalation_slots_remaining": self._state.escalation_slots_remaining,
665
+ "escalation_slots_initial": self._state.escalation_slots_initial,
666
+ }
667
+
668
+ def _planning_route_recommendation(self, ticket: HelpdeskTicketRecord) -> dict[str, Any]:
669
+ primary_route = self._route_for_ticket(ticket)
670
+ alternate_route = (
671
+ self._route_for_ticket(ticket, use_alternate=True)
672
+ if self._ticket_has_alternate_route(ticket)
673
+ else None
674
+ )
675
+ future_demand = self._future_queue_demand()
676
+ capacity_state = self._capacity_state_snapshot()
677
+
678
+ def pressure_score(route: dict[str, str]) -> int:
679
+ cost = self._route_capacity_cost(route)
680
+ group_remaining = capacity_state["team_capacity_remaining"].get(
681
+ route["assignment_group"],
682
+ 1,
683
+ )
684
+ group_pressure = max(
685
+ 0,
686
+ future_demand["team_demand"].get(route["assignment_group"], 0)
687
+ + cost["team_slots"]
688
+ - group_remaining,
689
+ )
690
+ priority_pressure = max(
691
+ 0,
692
+ future_demand["high_priority_needed"] + cost["high_priority_slots"]
693
+ - capacity_state["high_priority_slots_remaining"],
694
+ )
695
+ escalation_pressure = max(
696
+ 0,
697
+ future_demand["escalation_needed"] + cost["escalation_slots"]
698
+ - capacity_state["escalation_slots_remaining"],
699
+ )
700
+ return group_pressure + priority_pressure + escalation_pressure
701
+
702
+ primary_pressure = pressure_score(primary_route)
703
+ alternate_pressure = (
704
+ pressure_score(alternate_route) if alternate_route is not None else primary_pressure
705
+ )
706
+ preferred_label = (
707
+ "alternate"
708
+ if alternate_route is not None and alternate_pressure < primary_pressure
709
+ else "primary"
710
+ )
711
+ return {
712
+ "preferred_label": preferred_label,
713
+ "primary_pressure": primary_pressure,
714
+ "alternate_pressure": alternate_pressure,
715
+ "capacity_state": capacity_state,
716
+ "future_demand": future_demand,
717
+ }
718
+
719
+ def _ticket_is_capacity_sensitive(self, ticket: HelpdeskTicketRecord) -> bool:
720
+ if self._state.current_task_id != 3 or not self._ticket_has_alternate_route(ticket):
721
+ return False
722
+ recommendation = self._planning_route_recommendation(ticket)
723
+ return recommendation["preferred_label"] == "alternate" or any(
724
+ value > 0
725
+ for value in (
726
+ recommendation["primary_pressure"],
727
+ recommendation["alternate_pressure"],
728
+ )
729
+ )
730
+
731
+ def _route_matches_alternate(
732
+ self,
733
+ ticket: HelpdeskTicketRecord,
734
+ route: dict[str, str],
735
+ ) -> bool:
736
+ if not self._ticket_has_alternate_route(ticket):
737
+ return False
738
+ return route == self._route_for_ticket(ticket, use_alternate=True)
739
+
740
+ def _apply_capacity_usage(
741
+ self,
742
+ ticket: HelpdeskTicketRecord,
743
+ action: HelpdeskTicketAction,
744
+ ) -> tuple[float, dict[str, Any]]:
745
+ if self._state.current_task_id != 3:
746
+ return 0.0, {}
747
+
748
+ route = self._route_for_action(ticket, action)
749
+ capacity_cost = self._route_capacity_cost(route)
750
+ group = str(capacity_cost["assignment_group"])
751
+
752
+ if group not in self._state.team_capacity_remaining:
753
+ self._state.team_capacity_remaining[group] = 1
754
+ self._state.team_capacity_initial.setdefault(group, 1)
755
+
756
+ group_remaining = self._state.team_capacity_remaining[group]
757
+ group_overflow = max(0, int(capacity_cost["team_slots"]) - group_remaining)
758
+ self._state.team_capacity_remaining[group] = max(
759
+ 0,
760
+ group_remaining - int(capacity_cost["team_slots"]),
761
+ )
762
+
763
+ high_priority_cost = int(capacity_cost["high_priority_slots"])
764
+ high_priority_overflow = max(
765
+ 0,
766
+ high_priority_cost - self._state.high_priority_slots_remaining,
767
+ )
768
+ self._state.high_priority_slots_remaining = max(
769
+ 0,
770
+ self._state.high_priority_slots_remaining - high_priority_cost,
771
+ )
772
+
773
+ escalation_cost = int(capacity_cost["escalation_slots"])
774
+ escalation_overflow = max(
775
+ 0,
776
+ escalation_cost - self._state.escalation_slots_remaining,
777
+ )
778
+ self._state.escalation_slots_remaining = max(
779
+ 0,
780
+ self._state.escalation_slots_remaining - escalation_cost,
781
+ )
782
+
783
+ capacity_penalty = round(
784
+ group_overflow * TEAM_CAPACITY_OVERFLOW_PENALTY
785
+ + high_priority_overflow * HIGH_PRIORITY_SLOT_OVERFLOW_PENALTY
786
+ + escalation_overflow * ESCALATION_SLOT_OVERFLOW_PENALTY,
787
+ 4,
788
+ )
789
+ self._state.planning_penalty_total = round(
790
+ self._state.planning_penalty_total + capacity_penalty,
791
+ 4,
792
+ )
793
+ self._state.planning_penalty_applied = capacity_penalty
794
+
795
+ used_alternate_route = self._route_matches_alternate(ticket, route)
796
+ if used_alternate_route:
797
+ self._state.capacity_pressure_tickets_resolved += 1
798
+
799
+ return capacity_penalty, {
800
+ "capacity_cost": capacity_cost,
801
+ "group_overflow": group_overflow,
802
+ "high_priority_overflow": high_priority_overflow,
803
+ "escalation_overflow": escalation_overflow,
804
+ "used_alternate_route": used_alternate_route,
805
+ "capacity_state_after_action": self._capacity_state_snapshot(),
806
+ }
807
+
808
+ def _planning_success_bonus(self) -> float:
809
+ if self._state.current_task_id != 3 or self._state.planning_penalty_total > 0.0:
810
+ return 0.0
811
+ capacity_sensitive_count = sum(
812
+ 1 for ticket in self._queue if self._ticket_has_alternate_route(ticket)
813
+ )
814
+ if capacity_sensitive_count == 0:
815
+ return 0.0
816
+ coverage = min(
817
+ 1.0,
818
+ self._state.capacity_pressure_tickets_resolved / capacity_sensitive_count,
819
+ )
820
+ return round(PLANNING_SUCCESS_BONUS * coverage, 4)
821
+
822
  def _internal_routing_note_for_ticket(
823
  self,
824
  ticket: HelpdeskTicketRecord,
825
  ) -> str | None:
 
 
826
  if self._state.current_task_id != 3:
827
+ return ticket.ambiguity_note or ticket.planning_note
828
+
829
+ note_parts: list[str] = []
830
+ if ticket.ambiguity_note is not None:
831
+ note_parts.append(ticket.ambiguity_note)
832
+ if ticket.planning_note is not None:
833
+ note_parts.append(ticket.planning_note)
834
 
835
  default_group = ISSUE_TYPE_TO_ASSIGNMENT_GROUP.get(
836
  ticket.issue_type,
 
840
  ticket.issue_type,
841
  ticket.resolution_action,
842
  )
 
843
 
844
  if ticket.assignment_group != default_group:
845
  note_parts.append(
 
914
  return self._ticket_repeated_requester_count(ticket) >= 2
915
  if tool_name == "lookup_internal_routing_note":
916
  return self._internal_routing_note_for_ticket(ticket) is not None
917
+ if tool_name == "lookup_queue_capacity_forecast":
918
+ return self._state.current_task_id == 3 and (
919
+ self._ticket_has_alternate_route(ticket)
920
+ or self._future_queue_demand()["remaining_ticket_count"] > 0
921
+ )
922
  return False
923
 
924
  def _required_tools_for_ticket(
 
948
  and "lookup_requester_history" not in required_tools
949
  ):
950
  required_tools.append("lookup_requester_history")
951
+ if (
952
+ self._ticket_is_capacity_sensitive(ticket)
953
+ and "lookup_queue_capacity_forecast" not in required_tools
954
+ ):
955
+ required_tools.append("lookup_queue_capacity_forecast")
956
  filtered_required_tools: list[str] = []
957
  for tool_name in required_tools:
958
  if tool_name in filtered_required_tools:
 
1003
  "The visible request is not enough to choose the final owner and next step. "
1004
  "Additional routing context is available via investigation."
1005
  )
1006
+ if self._ticket_has_alternate_route(ticket):
1007
+ return (
1008
+ "The queue is under resource pressure and this ticket may support more than "
1009
+ "one acceptable routing path. Additional planning context is available via investigation."
1010
+ )
1011
  if self._ticket_has_nondefault_routing(ticket):
1012
  return (
1013
  "The visible request looks straightforward, but the decisive routing detail is hidden until investigation."
 
1021
  return "Follow-up request with hidden routing context"
1022
  if self._internal_routing_note_for_ticket(ticket) is not None:
1023
  return "Routing clarification required"
1024
+ if self._ticket_has_alternate_route(ticket):
1025
+ return "Capacity-sensitive routing decision"
1026
  if self._ticket_mentions_follow_up(ticket):
1027
  return "Priority support follow-up"
1028
  return "Helpdesk routing decision"
 
1219
  "routing_note": routing_note if found else "",
1220
  }
1221
 
1222
+ def _lookup_queue_capacity_forecast(
1223
+ self,
1224
+ current_ticket: HelpdeskTicketRecord,
1225
+ ) -> dict[str, Any]:
1226
+ recommendation = self._planning_route_recommendation(current_ticket)
1227
+ routing_options = self._routing_options_for_ticket(current_ticket)
1228
+ return {
1229
+ "tool_name": "lookup_queue_capacity_forecast",
1230
+ "found": True,
1231
+ "ticket_id": current_ticket.ticket_id,
1232
+ "preferred_route_label": recommendation["preferred_label"],
1233
+ "primary_pressure": recommendation["primary_pressure"],
1234
+ "alternate_pressure": recommendation["alternate_pressure"],
1235
+ "capacity_state": recommendation["capacity_state"],
1236
+ "future_queue_demand": recommendation["future_demand"],
1237
+ "routing_options": routing_options,
1238
+ }
1239
+
1240
  def _run_investigation_tool(
1241
  self,
1242
  current_ticket: HelpdeskTicketRecord,
 
1249
  return self._lookup_requester_history(current_ticket)
1250
  if tool_name == "lookup_internal_routing_note":
1251
  return self._lookup_internal_routing_note(current_ticket)
1252
+ if tool_name == "lookup_queue_capacity_forecast":
1253
+ return self._lookup_queue_capacity_forecast(current_ticket)
1254
  raise ValueError(f"Unsupported tool_name: {tool_name}")
1255
 
1256
  def _handle_investigation_action(
 
1335
  def _build_ticket_view(self, ticket: HelpdeskTicketRecord) -> dict[str, Any]:
1336
  progress = self._tool_progress_for_ticket(ticket)
1337
  remaining_tools = progress["remaining_tools"]
1338
+ used_tools = set(self._used_tools_for_ticket(ticket.ticket_id))
1339
  ticket_view: dict[str, Any] = {
1340
  "ticket_id": ticket.ticket_id,
1341
  "title": self._visible_title(ticket),
1342
  "requester": ticket.requester,
1343
  "description": self._visible_description(ticket),
1344
  }
1345
+ if self._state.current_task_id == 3:
1346
+ ticket_view["capacity_state"] = self._capacity_state_snapshot()
1347
  if progress["required_tools"]:
1348
  ticket_view["context_status"] = {
1349
  "investigation_required": True,
 
1356
  }
1357
  if ticket.ambiguity_note is not None and "lookup_internal_routing_note" not in remaining_tools:
1358
  ticket_view["ambiguity_note"] = ticket.ambiguity_note
1359
+ if (
1360
+ ticket.planning_note is not None
1361
+ and "lookup_internal_routing_note" not in remaining_tools
1362
+ ):
1363
+ ticket_view["planning_note"] = ticket.planning_note
1364
  if ticket.related_ticket_id is not None and "lookup_related_ticket" not in remaining_tools:
1365
  ticket_view["related_ticket_id"] = ticket.related_ticket_id
1366
  related_ticket = self._tickets_by_id.get(ticket.related_ticket_id)
 
1371
  "requester": related_ticket.requester,
1372
  "description": related_ticket.description,
1373
  }
1374
+ if self._ticket_has_alternate_route(ticket) and (
1375
+ "lookup_internal_routing_note" in used_tools
1376
+ or "lookup_queue_capacity_forecast" in used_tools
1377
+ ):
1378
+ ticket_view["routing_options"] = self._routing_options_for_ticket(ticket)
1379
  return ticket_view
1380
 
1381
  def _build_feedback_summary(
 
1429
  risk_penalty = reward_components.get("risk_penalty")
1430
  if risk_penalty:
1431
  parts.append(f"risk_penalty={risk_penalty:.2f}")
1432
+ capacity_penalty = reward_components.get("capacity_penalty")
1433
+ if capacity_penalty:
1434
+ parts.append(f"capacity_penalty={capacity_penalty:.2f}")
1435
+ planning_penalty_total = reward_components.get("planning_penalty_total")
1436
+ if planning_penalty_total:
1437
+ parts.append(f"planning_penalty_total={planning_penalty_total:.2f}")
1438
 
1439
  return "; ".join(parts)
1440
 
 
1464
  "breakdown": breakdown,
1465
  "queue_position": queue_position,
1466
  }
1467
+ if self._state.current_task_id == 3:
1468
+ history_entry["capacity_state"] = self._capacity_state_snapshot()
1469
  if reward is not None:
1470
  history_entry["reward"] = reward
1471
  if rubric_reward is not None:
 
1474
  history_entry["reward_kind"] = reward_kind
1475
  if ticket.ambiguity_note is not None and "lookup_internal_routing_note" not in remaining_tools:
1476
  history_entry["ambiguity_note"] = ticket.ambiguity_note
1477
+ if (
1478
+ ticket.planning_note is not None
1479
+ and "lookup_internal_routing_note" not in remaining_tools
1480
+ ):
1481
+ history_entry["planning_note"] = ticket.planning_note
1482
  if ticket.related_ticket_id is not None and "lookup_related_ticket" not in remaining_tools:
1483
  history_entry["related_ticket_id"] = ticket.related_ticket_id
1484
  related_ticket = self._tickets_by_id.get(ticket.related_ticket_id)
 
1489
  "requester": related_ticket.requester,
1490
  "description": related_ticket.description,
1491
  }
1492
+ if (
1493
+ self._ticket_has_alternate_route(ticket)
1494
+ and (
1495
+ "lookup_internal_routing_note" not in remaining_tools
1496
+ or "lookup_queue_capacity_forecast" in self._used_tools_for_ticket(ticket.ticket_id)
1497
+ )
1498
+ ):
1499
+ history_entry["routing_options"] = self._routing_options_for_ticket(ticket)
1500
  if penalty_reason is not None:
1501
  history_entry["penalty_reason"] = penalty_reason
1502
  if tool_result is not None:
 
1566
  "average_score_so_far": self._state.average_score_so_far,
1567
  "progress_fraction": progress_fraction,
1568
  "investigation_penalty_applied": self._state.investigation_penalty_applied,
1569
+ "planning_penalty_total": self._state.planning_penalty_total,
1570
+ "planning_penalty_applied": self._state.planning_penalty_applied,
1571
  }
1572
+ if self._state.current_task_id == 3:
1573
+ metadata["capacity_state"] = self._capacity_state_snapshot()
1574
+ metadata["future_queue_demand"] = self._future_queue_demand()
1575
  if last_history_entry is not None:
1576
  metadata["last_score"] = last_history_entry.get("score")
1577
  metadata["last_reward"] = last_history_entry.get("reward")
server/grader.py CHANGED
@@ -2,9 +2,6 @@ from __future__ import annotations
2
 
3
  from models import HelpdeskTicketAction, HelpdeskTicketRecord
4
 
5
- TASK_SCORE_EPSILON = 0.01
6
-
7
-
8
  ISSUE_TYPE_SIMILARITY = {
9
  ("billing_license", "service_request"): 0.4,
10
  ("service_request", "billing_license"): 0.4,
@@ -120,31 +117,90 @@ def _score_exact(predicted: str | None, expected: str) -> float:
120
  return 1.0 if _normalized(predicted) == _normalized(expected) and predicted else 0.0
121
 
122
 
123
- def grade_action(
124
  action: HelpdeskTicketAction,
125
- ticket: HelpdeskTicketRecord,
 
 
 
 
 
126
  task_id: int,
127
  ) -> tuple[float, dict[str, float]]:
128
- if task_id not in TASK_WEIGHTS:
129
- raise ValueError(f"Unsupported task_id: {task_id}")
130
-
131
  field_scores = {
132
- "issue_type": _score_exact_or_similar(action.issue_type, ticket.issue_type),
133
- "priority": _score_priority(action.priority, ticket.priority),
134
  "assignment_group": _score_exact_or_table(
135
  action.assignment_group,
136
- ticket.assignment_group,
137
  ASSIGNMENT_GROUP_SIMILARITY,
138
  ),
139
  "resolution_action": _score_exact_or_table(
140
  action.resolution_action,
141
- ticket.resolution_action,
142
  RESOLUTION_ACTION_SIMILARITY,
143
  ),
144
  }
145
-
 
 
 
 
146
  weights = TASK_WEIGHTS[task_id]
147
  raw_score = sum(field_scores[field] * weight for field, weight in weights.items())
148
- score = max(TASK_SCORE_EPSILON, min(1.0 - TASK_SCORE_EPSILON, raw_score))
149
- breakdown = {field: field_scores[field] for field in weights}
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
150
  return score, breakdown
 
2
 
3
  from models import HelpdeskTicketAction, HelpdeskTicketRecord
4
 
 
 
 
5
  ISSUE_TYPE_SIMILARITY = {
6
  ("billing_license", "service_request"): 0.4,
7
  ("service_request", "billing_license"): 0.4,
 
117
  return 1.0 if _normalized(predicted) == _normalized(expected) and predicted else 0.0
118
 
119
 
120
+ def _score_route(
121
  action: HelpdeskTicketAction,
122
+ *,
123
+ issue_type: str,
124
+ priority: str,
125
+ assignment_group: str,
126
+ resolution_action: str,
127
+ score_multiplier: float,
128
  task_id: int,
129
  ) -> tuple[float, dict[str, float]]:
 
 
 
130
  field_scores = {
131
+ "issue_type": _score_exact_or_similar(action.issue_type, issue_type),
132
+ "priority": _score_priority(action.priority, priority),
133
  "assignment_group": _score_exact_or_table(
134
  action.assignment_group,
135
+ assignment_group,
136
  ASSIGNMENT_GROUP_SIMILARITY,
137
  ),
138
  "resolution_action": _score_exact_or_table(
139
  action.resolution_action,
140
+ resolution_action,
141
  RESOLUTION_ACTION_SIMILARITY,
142
  ),
143
  }
144
+ if score_multiplier != 1.0:
145
+ field_scores = {
146
+ field: round(score * score_multiplier, 4)
147
+ for field, score in field_scores.items()
148
+ }
149
  weights = TASK_WEIGHTS[task_id]
150
  raw_score = sum(field_scores[field] * weight for field, weight in weights.items())
151
+ return raw_score, field_scores
152
+
153
+
154
+ def _alternate_route_available(ticket: HelpdeskTicketRecord) -> bool:
155
+ return any(
156
+ value is not None
157
+ for value in (
158
+ ticket.alternate_issue_type,
159
+ ticket.alternate_priority,
160
+ ticket.alternate_assignment_group,
161
+ ticket.alternate_resolution_action,
162
+ )
163
+ ) and ticket.alternate_route_score_multiplier > 0.0
164
+
165
+
166
+ def grade_action(
167
+ action: HelpdeskTicketAction,
168
+ ticket: HelpdeskTicketRecord,
169
+ task_id: int,
170
+ ) -> tuple[float, dict[str, float]]:
171
+ if task_id not in TASK_WEIGHTS:
172
+ raise ValueError(f"Unsupported task_id: {task_id}")
173
+
174
+ primary_score, primary_field_scores = _score_route(
175
+ action,
176
+ issue_type=ticket.issue_type,
177
+ priority=ticket.priority,
178
+ assignment_group=ticket.assignment_group,
179
+ resolution_action=ticket.resolution_action,
180
+ score_multiplier=1.0,
181
+ task_id=task_id,
182
+ )
183
+ chosen_score = primary_score
184
+ chosen_field_scores = primary_field_scores
185
+
186
+ if _alternate_route_available(ticket):
187
+ alternate_score, alternate_field_scores = _score_route(
188
+ action,
189
+ issue_type=ticket.alternate_issue_type or ticket.issue_type,
190
+ priority=ticket.alternate_priority or ticket.priority,
191
+ assignment_group=(
192
+ ticket.alternate_assignment_group or ticket.assignment_group
193
+ ),
194
+ resolution_action=(
195
+ ticket.alternate_resolution_action or ticket.resolution_action
196
+ ),
197
+ score_multiplier=ticket.alternate_route_score_multiplier,
198
+ task_id=task_id,
199
+ )
200
+ if alternate_score > chosen_score:
201
+ chosen_score = alternate_score
202
+ chosen_field_scores = alternate_field_scores
203
+
204
+ score = max(0.0, min(1.0, chosen_score))
205
+ breakdown = {field: chosen_field_scores[field] for field in TASK_WEIGHTS[task_id]}
206
  return score, breakdown
server/reward.py CHANGED
@@ -8,15 +8,14 @@ DELTA_REWARD_WEIGHT = 0.08
8
  DELTA_REWARD_CAP = 0.04
9
  PROCESS_BONUS_CAP = 0.08
10
  RISK_PENALTY_CAP = 0.12
11
- OPEN_INTERVAL_EPSILON = 0.01
12
 
13
 
14
  def _clamp_unit_interval(value: float) -> float:
15
  return max(0.0, min(1.0, value))
16
 
17
 
18
- def clamp_open_unit_interval(value: float, epsilon: float = OPEN_INTERVAL_EPSILON) -> float:
19
- return max(epsilon, min(1.0 - epsilon, value))
20
 
21
 
22
  def compute_step_adjustments(
@@ -93,7 +92,7 @@ def compute_trajectory_adjustments(
93
  avg = sum(per_ticket_scores) / len(per_ticket_scores)
94
  bounded_completion_bonus = max(0.0, min(0.08, completion_bonus))
95
  bounded_consistency_bonus = max(0.0, min(0.05, consistency_bonus))
96
- final_reward = clamp_open_unit_interval(
97
  avg + bounded_completion_bonus + bounded_consistency_bonus
98
  )
99
  return {
 
8
  DELTA_REWARD_CAP = 0.04
9
  PROCESS_BONUS_CAP = 0.08
10
  RISK_PENALTY_CAP = 0.12
 
11
 
12
 
13
  def _clamp_unit_interval(value: float) -> float:
14
  return max(0.0, min(1.0, value))
15
 
16
 
17
+ def clamp_open_unit_interval(value: float, epsilon: float = 0.0) -> float:
18
+ return _clamp_unit_interval(value)
19
 
20
 
21
  def compute_step_adjustments(
 
92
  avg = sum(per_ticket_scores) / len(per_ticket_scores)
93
  bounded_completion_bonus = max(0.0, min(0.08, completion_bonus))
94
  bounded_consistency_bonus = max(0.0, min(0.05, consistency_bonus))
95
+ final_reward = _clamp_unit_interval(
96
  avg + bounded_completion_bonus + bounded_consistency_bonus
97
  )
98
  return {
server/tasks.py CHANGED
@@ -36,10 +36,13 @@ TASKS = {
36
  "instructions": (
37
  "Perform full helpdesk routing by selecting the best issue type, "
38
  "priority, assignment group, and resolution action for the ticket. "
39
- "Use any ambiguity notes or related-ticket previews when present. "
 
40
  "Some hard tickets intentionally hide decisive routing context until "
41
- "you investigate with the available tools, so premature submission can "
42
- "underperform even when the visible text looks plausible."
 
 
43
  ),
44
  "allowed_fields": [
45
  "issue_type",
@@ -50,6 +53,379 @@ TASKS = {
50
  },
51
  }
52
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
53
  assert tuple(TASKS.keys()) == TASK_IDS
54
 
55
 
@@ -58,7 +434,8 @@ def load_dataset() -> list[HelpdeskTicketRecord]:
58
  # Accept UTF-8 files saved with a BOM, which is common on Windows editors.
59
  with dataset_path.open("r", encoding="utf-8-sig") as f:
60
  raw = json.load(f)
61
- return [HelpdeskTicketRecord.model_validate(r) for r in raw]
 
62
 
63
 
64
  def get_task_definition(task_id: int) -> dict:
 
36
  "instructions": (
37
  "Perform full helpdesk routing by selecting the best issue type, "
38
  "priority, assignment group, and resolution action for the ticket. "
39
+ "Use any ambiguity notes, related-ticket previews, queue-capacity "
40
+ "forecasts, and planning state when present. "
41
  "Some hard tickets intentionally hide decisive routing context until "
42
+ "you investigate with the available tools, and some hard episodes also "
43
+ "require queue-level capacity planning across multiple tickets, so "
44
+ "premature or resource-greedy routing can underperform even when the "
45
+ "visible text looks plausible."
46
  ),
47
  "allowed_fields": [
48
  "issue_type",
 
53
  },
54
  }
55
 
56
+
57
+ PLANNING_ROUTE_UPDATES: dict[str, dict] = {
58
+ "ticket-022": {
59
+ "planning_note": (
60
+ "If the application queue is saturated, billing operations can own the "
61
+ "customer-facing charge review as a lower-fidelity fallback while the bug "
62
+ "investigation continues separately."
63
+ ),
64
+ "alternate_issue_type": "billing_license",
65
+ "alternate_assignment_group": "license_ops",
66
+ "alternate_resolution_action": "assign",
67
+ "alternate_route_score_multiplier": 0.74,
68
+ },
69
+ "ticket-027": {
70
+ "planning_note": (
71
+ "If procurement capacity is available, treat this like a commercial review. "
72
+ "If not, a lightweight service-desk acknowledgement is still acceptable."
73
+ ),
74
+ "alternate_issue_type": "service_request",
75
+ "alternate_priority": "medium",
76
+ "alternate_assignment_group": "procurement",
77
+ "alternate_resolution_action": "assign",
78
+ "alternate_route_score_multiplier": 0.92,
79
+ },
80
+ "ticket-029": {
81
+ "planning_note": (
82
+ "Seat expansion is the preferred route, but license operations can still "
83
+ "handle the prorating clarification when procurement is the bottleneck."
84
+ ),
85
+ "alternate_issue_type": "billing_license",
86
+ "alternate_assignment_group": "license_ops",
87
+ "alternate_resolution_action": "fulfill",
88
+ "alternate_route_score_multiplier": 0.82,
89
+ },
90
+ "ticket-040": {
91
+ "planning_note": (
92
+ "The request can be treated either as roadmap feedback or as a support "
93
+ "escalation if the operational impact is emphasized."
94
+ ),
95
+ "alternate_issue_type": "application_support",
96
+ "alternate_priority": "high",
97
+ "alternate_resolution_action": "escalate",
98
+ "alternate_route_score_multiplier": 0.76,
99
+ },
100
+ "ticket-047": {
101
+ "planning_note": (
102
+ "The preferred route is an immediate service-desk extension, but the "
103
+ "commercial owner can take it if operational fulfillment capacity is exhausted."
104
+ ),
105
+ "alternate_assignment_group": "procurement",
106
+ "alternate_resolution_action": "assign",
107
+ "alternate_route_score_multiplier": 0.78,
108
+ },
109
+ "ticket-048": {
110
+ "planning_note": (
111
+ "This belongs with procurement when commercial reviewers are available, "
112
+ "but a generic service-desk acknowledgement is an acceptable fallback."
113
+ ),
114
+ "alternate_assignment_group": "service_desk",
115
+ "alternate_resolution_action": "acknowledge",
116
+ "alternate_route_score_multiplier": 0.9,
117
+ },
118
+ "ticket-050": {
119
+ "planning_note": (
120
+ "Central coordination is preferred. If service-desk capacity is depleted, "
121
+ "onboarding operations can still run a reduced fulfillment path."
122
+ ),
123
+ "alternate_priority": "medium",
124
+ "alternate_assignment_group": "onboarding_ops",
125
+ "alternate_resolution_action": "fulfill",
126
+ "alternate_route_score_multiplier": 0.84,
127
+ },
128
+ "ticket-051": {
129
+ "planning_note": (
130
+ "Commercial procurement owns the contract amendment, but this can also "
131
+ "be treated as a service request when the commercial queue needs triage."
132
+ ),
133
+ "alternate_issue_type": "service_request",
134
+ "alternate_route_score_multiplier": 0.83,
135
+ },
136
+ "ticket-053": {
137
+ "planning_note": (
138
+ "Security scheduling is ideal, but a compliance acknowledgement is still "
139
+ "acceptable when the security team only needs to confirm the process."
140
+ ),
141
+ "alternate_issue_type": "security_compliance",
142
+ "alternate_resolution_action": "acknowledge",
143
+ "alternate_route_score_multiplier": 0.8,
144
+ },
145
+ "ticket-054": {
146
+ "planning_note": (
147
+ "License operations can fulfill the archive request directly. If that queue "
148
+ "is saturated, service desk can acknowledge and queue the retrieval."
149
+ ),
150
+ "alternate_assignment_group": "service_desk",
151
+ "alternate_resolution_action": "acknowledge",
152
+ "alternate_route_score_multiplier": 0.9,
153
+ },
154
+ }
155
+
156
+
157
+ CURATED_EXPANSION_RECORDS: list[dict] = [
158
+ {
159
+ "ticket_id": "ticket-056",
160
+ "title": "Vendor DPA redlines need an owner before pricing sign-off",
161
+ "requester": "procurement@harborcompliance.io",
162
+ "description": (
163
+ "Commercial review is already moving, but the team needs to know who owns "
164
+ "the vendor DPA redlines before pricing can be approved."
165
+ ),
166
+ "issue_type": "general_inquiry",
167
+ "priority": "medium",
168
+ "assignment_group": "procurement",
169
+ "resolution_action": "assign",
170
+ "planning_note": (
171
+ "Procurement is preferred, but service desk can acknowledge and route the "
172
+ "questionnaire logistics if the commercial queue is saturated."
173
+ ),
174
+ "alternate_assignment_group": "service_desk",
175
+ "alternate_resolution_action": "acknowledge",
176
+ "alternate_route_score_multiplier": 0.9,
177
+ },
178
+ {
179
+ "ticket_id": "ticket-057",
180
+ "title": "Board audit packet needs a timeline for the privileged-account lockout",
181
+ "requester": "security-ops@atlasbank.io",
182
+ "description": (
183
+ "Following up on ticket-046. The board pack needs a timeline and ownership "
184
+ "summary for the privileged admin lockout before tomorrow morning."
185
+ ),
186
+ "issue_type": "identity_access",
187
+ "priority": "high",
188
+ "assignment_group": "security_team",
189
+ "resolution_action": "escalate",
190
+ "related_ticket_id": "ticket-046",
191
+ "planning_note": (
192
+ "Security still owns the privileged-access review, but service desk can "
193
+ "collect chronology and prepare the packet if the security queue is jammed."
194
+ ),
195
+ "alternate_assignment_group": "service_desk",
196
+ "alternate_resolution_action": "assign",
197
+ "alternate_route_score_multiplier": 0.72,
198
+ },
199
+ {
200
+ "ticket_id": "ticket-058",
201
+ "title": "Temporary contractor extension during an onboarding surge",
202
+ "requester": "hr@talentbridge.co",
203
+ "description": (
204
+ "A contractor start date slipped by two weeks and the account needs to stay "
205
+ "active while the onboarding backlog is already full."
206
+ ),
207
+ "issue_type": "onboarding",
208
+ "priority": "medium",
209
+ "assignment_group": "service_desk",
210
+ "resolution_action": "assign",
211
+ "planning_note": (
212
+ "Service desk is preferred for cross-team coordination. If coordination "
213
+ "capacity is exhausted, onboarding operations can fulfill the extension directly."
214
+ ),
215
+ "alternate_assignment_group": "onboarding_ops",
216
+ "alternate_resolution_action": "fulfill",
217
+ "alternate_route_score_multiplier": 0.85,
218
+ },
219
+ {
220
+ "ticket_id": "ticket-059",
221
+ "title": "Archived invoice packet plus quarter-close clarification",
222
+ "requester": "boardops@silverpine.com",
223
+ "description": (
224
+ "Finance needs archived invoice PDFs plus a quick note explaining whether any "
225
+ "quarter-close adjustments are still pending."
226
+ ),
227
+ "issue_type": "general_inquiry",
228
+ "priority": "medium",
229
+ "assignment_group": "license_ops",
230
+ "resolution_action": "fulfill",
231
+ "planning_note": (
232
+ "Invoice operations can fulfill directly. If that queue is constrained, "
233
+ "service desk can acknowledge and schedule the retrieval."
234
+ ),
235
+ "alternate_assignment_group": "service_desk",
236
+ "alternate_resolution_action": "acknowledge",
237
+ "alternate_route_score_multiplier": 0.88,
238
+ },
239
+ {
240
+ "ticket_id": "ticket-060",
241
+ "title": "Re: Temporary sandbox extension for the signed pilot",
242
+ "requester": "solutions@bluequarry.io",
243
+ "description": (
244
+ "Following up on ticket-047. The customer launch rehearsal is tomorrow, so the "
245
+ "sandbox extension needs either immediate execution or a commercial owner to unblock it."
246
+ ),
247
+ "issue_type": "service_request",
248
+ "priority": "high",
249
+ "assignment_group": "service_desk",
250
+ "resolution_action": "escalate",
251
+ "related_ticket_id": "ticket-047",
252
+ "planning_note": (
253
+ "Immediate operational execution is preferred. Procurement can still own the "
254
+ "approval path if service-desk capacity is already depleted."
255
+ ),
256
+ "alternate_assignment_group": "procurement",
257
+ "alternate_resolution_action": "assign",
258
+ "alternate_route_score_multiplier": 0.8,
259
+ },
260
+ {
261
+ "ticket_id": "ticket-061",
262
+ "title": "Risk-exception review is blocking an SSO restore",
263
+ "requester": "identity-risk@sterlingmed.io",
264
+ "description": (
265
+ "Users cannot log in through SSO until a temporary risk exception is approved. "
266
+ "The product team may need logs, but the unblock decision is tied to the review."
267
+ ),
268
+ "issue_type": "identity_access",
269
+ "priority": "critical",
270
+ "assignment_group": "security_team",
271
+ "resolution_action": "escalate",
272
+ "planning_note": (
273
+ "Security owns the final unblock decision. If security is saturated, the "
274
+ "application team can still take the first-response diagnostics path."
275
+ ),
276
+ "alternate_issue_type": "application_support",
277
+ "alternate_priority": "high",
278
+ "alternate_assignment_group": "application_team",
279
+ "alternate_resolution_action": "escalate",
280
+ "alternate_route_score_multiplier": 0.74,
281
+ },
282
+ {
283
+ "ticket_id": "ticket-062",
284
+ "title": "Need product remediation evidence for a customer security questionnaire",
285
+ "requester": "assurance@clientgrid.com",
286
+ "description": (
287
+ "A customer questionnaire asks for evidence that a previously remediated "
288
+ "application vulnerability is fully closed."
289
+ ),
290
+ "issue_type": "security_compliance",
291
+ "priority": "medium",
292
+ "assignment_group": "application_team",
293
+ "resolution_action": "fulfill",
294
+ "planning_note": (
295
+ "Application engineering is preferred because they hold the remediation artifacts. "
296
+ "Security can still acknowledge the questionnaire and buy time when app capacity is tight."
297
+ ),
298
+ "alternate_assignment_group": "security_team",
299
+ "alternate_resolution_action": "acknowledge",
300
+ "alternate_route_score_multiplier": 0.82,
301
+ },
302
+ {
303
+ "ticket_id": "ticket-063",
304
+ "title": "Subsidiary admin training with a seat-transfer request",
305
+ "requester": "enablement@globalcorp.com",
306
+ "description": (
307
+ "A newly acquired subsidiary needs admin training next week and also wants "
308
+ "to transfer existing seats into the parent contract."
309
+ ),
310
+ "issue_type": "service_request",
311
+ "priority": "medium",
312
+ "assignment_group": "procurement",
313
+ "resolution_action": "assign",
314
+ "planning_note": (
315
+ "Procurement owns the commercial transfer. If that queue is overloaded, "
316
+ "onboarding operations can still deliver the training portion first."
317
+ ),
318
+ "alternate_issue_type": "onboarding",
319
+ "alternate_assignment_group": "onboarding_ops",
320
+ "alternate_resolution_action": "fulfill",
321
+ "alternate_route_score_multiplier": 0.78,
322
+ },
323
+ {
324
+ "ticket_id": "ticket-064",
325
+ "title": "Legal-hold export of invoice history",
326
+ "requester": "legalops@northshoreenergy.com",
327
+ "description": (
328
+ "Legal needs invoice history exported for a hold notice. No pricing change is "
329
+ "required, but the request must be acknowledged today."
330
+ ),
331
+ "issue_type": "general_inquiry",
332
+ "priority": "high",
333
+ "assignment_group": "license_ops",
334
+ "resolution_action": "fulfill",
335
+ "planning_note": (
336
+ "License operations can deliver the export. If they are capacity-constrained, "
337
+ "service desk can acknowledge the request and queue the retrieval."
338
+ ),
339
+ "alternate_assignment_group": "service_desk",
340
+ "alternate_resolution_action": "acknowledge",
341
+ "alternate_route_score_multiplier": 0.87,
342
+ },
343
+ {
344
+ "ticket_id": "ticket-065",
345
+ "title": "Cross-functional launch checklist for an acquired support team",
346
+ "requester": "integration@mergerco.com",
347
+ "description": (
348
+ "Twelve support agents from an acquired business need onboarding, mailbox "
349
+ "setup, and a security attestation before Monday."
350
+ ),
351
+ "issue_type": "onboarding",
352
+ "priority": "high",
353
+ "assignment_group": "service_desk",
354
+ "resolution_action": "assign",
355
+ "planning_note": (
356
+ "Central coordination is preferred. If service-desk capacity is exhausted, "
357
+ "onboarding operations can still run a reduced fulfillment path."
358
+ ),
359
+ "alternate_priority": "medium",
360
+ "alternate_assignment_group": "onboarding_ops",
361
+ "alternate_resolution_action": "fulfill",
362
+ "alternate_route_score_multiplier": 0.81,
363
+ },
364
+ {
365
+ "ticket_id": "ticket-066",
366
+ "title": "Pilot customer asks who approves a credential-defense allowlist",
367
+ "requester": "pilotops@cruxsystems.io",
368
+ "description": (
369
+ "A pilot customer needs to know who approves an IP allowlist for a credential-"
370
+ "defense control before they continue their test."
371
+ ),
372
+ "issue_type": "general_inquiry",
373
+ "priority": "medium",
374
+ "assignment_group": "security_team",
375
+ "resolution_action": "assign",
376
+ "planning_note": (
377
+ "Security should own the answer when available. If that queue is overloaded, "
378
+ "service desk can acknowledge and route the ownership question."
379
+ ),
380
+ "alternate_assignment_group": "service_desk",
381
+ "alternate_resolution_action": "acknowledge",
382
+ "alternate_route_score_multiplier": 0.84,
383
+ },
384
+ {
385
+ "ticket_id": "ticket-067",
386
+ "title": "Re: Remediation evidence package is now blocking a renewal signature",
387
+ "requester": "assurance@clientgrid.com",
388
+ "description": (
389
+ "Following up on ticket-052. Renewal signature is blocked until the remediation "
390
+ "evidence package is delivered or a commercial owner confirms the delay."
391
+ ),
392
+ "issue_type": "security_compliance",
393
+ "priority": "high",
394
+ "assignment_group": "application_team",
395
+ "resolution_action": "escalate",
396
+ "related_ticket_id": "ticket-052",
397
+ "planning_note": (
398
+ "Application engineering is preferred because they own the evidence. Procurement "
399
+ "can still coordinate the renewal communication if the evidence queue is saturated."
400
+ ),
401
+ "alternate_issue_type": "service_request",
402
+ "alternate_priority": "medium",
403
+ "alternate_assignment_group": "procurement",
404
+ "alternate_resolution_action": "assign",
405
+ "alternate_route_score_multiplier": 0.76,
406
+ },
407
+ ]
408
+
409
+
410
+ def _apply_dataset_enhancements(
411
+ dataset: list[HelpdeskTicketRecord],
412
+ ) -> list[HelpdeskTicketRecord]:
413
+ enhanced_dataset: list[HelpdeskTicketRecord] = []
414
+ for record in dataset:
415
+ update = PLANNING_ROUTE_UPDATES.get(record.ticket_id)
416
+ enhanced_dataset.append(
417
+ record.model_copy(update=update) if update is not None else record
418
+ )
419
+
420
+ seen_ids = {record.ticket_id for record in enhanced_dataset}
421
+ for raw_record in CURATED_EXPANSION_RECORDS:
422
+ ticket_id = str(raw_record["ticket_id"])
423
+ if ticket_id in seen_ids:
424
+ raise ValueError(f"Duplicate ticket_id in curated expansion: {ticket_id}")
425
+ enhanced_dataset.append(HelpdeskTicketRecord.model_validate(raw_record))
426
+ seen_ids.add(ticket_id)
427
+ return enhanced_dataset
428
+
429
  assert tuple(TASKS.keys()) == TASK_IDS
430
 
431
 
 
434
  # Accept UTF-8 files saved with a BOM, which is common on Windows editors.
435
  with dataset_path.open("r", encoding="utf-8-sig") as f:
436
  raw = json.load(f)
437
+ dataset = [HelpdeskTicketRecord.model_validate(r) for r in raw]
438
+ return _apply_dataset_enhancements(dataset)
439
 
440
 
441
  def get_task_definition(task_id: int) -> dict:
tests/test_api_integration.py CHANGED
@@ -517,8 +517,8 @@ class TestHeuristicInferenceRegression(unittest.TestCase):
517
  self.assertIsInstance(reward, float)
518
 
519
  def test_overall_average_reward_in_expected_range(self):
520
- """4.2.2 — Overall average reward across all 3 tasks is in [0.8, 1.0],
521
- consistent with the recorded heuristic baseline of 0.9400.
522
  """
523
  rewards = []
524
  for task_id in self._TASKS:
@@ -529,8 +529,8 @@ class TestHeuristicInferenceRegression(unittest.TestCase):
529
  overall_avg = sum(rewards) / len(rewards)
530
  self.assertGreaterEqual(
531
  overall_avg,
532
- 0.75,
533
- f"Overall average reward {overall_avg:.4f} is below the smoke-test floor of 0.75",
534
  )
535
  self.assertLessEqual(
536
  overall_avg,
 
517
  self.assertIsInstance(reward, float)
518
 
519
  def test_overall_average_reward_in_expected_range(self):
520
+ """4.2.2 — Overall average reward across all 3 tasks stays in a healthy
521
+ smoke-test range for the plain no-investigation heuristic baseline.
522
  """
523
  rewards = []
524
  for task_id in self._TASKS:
 
529
  overall_avg = sum(rewards) / len(rewards)
530
  self.assertGreaterEqual(
531
  overall_avg,
532
+ 0.45,
533
+ f"Overall average reward {overall_avg:.4f} is below the smoke-test floor of 0.45",
534
  )
535
  self.assertLessEqual(
536
  overall_avg,
tests/test_competitive_upgrade.py CHANGED
@@ -643,10 +643,39 @@ class TestInvestigationActions(unittest.TestCase):
643
  tool_name="lookup_internal_routing_note",
644
  )
645
  )
646
- self.assertEqual(obs.last_tool_result["routing_note"], ticket.ambiguity_note)
647
  self.assertEqual(obs.current_ticket["ambiguity_note"], ticket.ambiguity_note)
648
  self.assertGreater(obs.reward or 0.0, 0.0)
649
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
650
  def test_submit_without_required_investigation_gets_shaping_penalty(self) -> None:
651
  from unittest.mock import patch
652
 
@@ -710,7 +739,12 @@ class TestQueueEconomics(unittest.TestCase):
710
  final_obs = env.step(HelpdeskTicketAction(issue_type=ticket.issue_type))
711
 
712
  self.assertTrue(final_obs.done)
713
- self.assertAlmostEqual(final_obs.reward, 0.95, places=9)
 
 
 
 
 
714
 
715
 
716
  class TestTerminalInvalidActionFinalReward(unittest.TestCase):
 
643
  tool_name="lookup_internal_routing_note",
644
  )
645
  )
646
+ self.assertIn(ticket.ambiguity_note, obs.last_tool_result["routing_note"])
647
  self.assertEqual(obs.current_ticket["ambiguity_note"], ticket.ambiguity_note)
648
  self.assertGreater(obs.reward or 0.0, 0.0)
649
 
650
+ def test_queue_capacity_forecast_reveals_routing_options(self) -> None:
651
+ from unittest.mock import patch
652
+
653
+ dataset = load_dataset()
654
+ ticket = next(
655
+ (t for t in dataset if t.alternate_route_score_multiplier > 0.0),
656
+ None,
657
+ )
658
+ self.assertIsNotNone(ticket)
659
+
660
+ env = _make_env()
661
+ with patch.object(env, "_dataset", [ticket]):
662
+ with patch.object(env, "_tickets_by_id", {ticket.ticket_id: ticket}):
663
+ obs = env.reset(seed=0, task_id=3, queue_size=1)
664
+
665
+ self.assertNotIn("routing_options", obs.current_ticket)
666
+ obs = env.step(
667
+ HelpdeskTicketAction(
668
+ action_type="investigate",
669
+ tool_name="lookup_queue_capacity_forecast",
670
+ )
671
+ )
672
+
673
+ self.assertEqual(obs.last_tool_result["tool_name"], "lookup_queue_capacity_forecast")
674
+ self.assertTrue(obs.last_tool_result["found"])
675
+ self.assertIn("preferred_route_label", obs.last_tool_result)
676
+ self.assertIn("routing_options", obs.current_ticket)
677
+ self.assertGreaterEqual(len(obs.current_ticket["routing_options"]), 2)
678
+
679
  def test_submit_without_required_investigation_gets_shaping_penalty(self) -> None:
680
  from unittest.mock import patch
681
 
 
739
  final_obs = env.step(HelpdeskTicketAction(issue_type=ticket.issue_type))
740
 
741
  self.assertTrue(final_obs.done)
742
+ self.assertLess(final_obs.reward, 1.0)
743
+ self.assertAlmostEqual(
744
+ final_obs.last_reward_components.get("investigation_penalty_applied", 0.0),
745
+ 0.04,
746
+ places=9,
747
+ )
748
 
749
 
750
  class TestTerminalInvalidActionFinalReward(unittest.TestCase):
tests/test_extra_fields_penalty.py CHANGED
@@ -44,8 +44,8 @@ def _make_env() -> HelpdeskTicketRoutingEnvironment:
44
  class TestExtraFieldsPenalty(unittest.TestCase):
45
  """Requirement 7: step() rejects actions with fields outside the task's allowed_fields."""
46
 
47
- def test_extra_fields_returns_open_interval_penalty_reward(self) -> None:
48
- """Task 1 penalties should keep the returned reward inside the open interval."""
49
  env = _make_env()
50
  obs = env.reset(seed=42, task_id=1)
51
 
@@ -61,7 +61,7 @@ class TestExtraFieldsPenalty(unittest.TestCase):
61
  penalty_obs = env.step(action)
62
 
63
  self.assertIsInstance(penalty_obs, HelpdeskTicketObservation)
64
- self.assertGreater(penalty_obs.reward, 0.0)
65
  self.assertLess(penalty_obs.reward, 1.0)
66
 
67
  def test_extra_fields_advances_ticket_index(self) -> None:
@@ -78,8 +78,8 @@ class TestExtraFieldsPenalty(unittest.TestCase):
78
 
79
  self.assertEqual(penalty_obs.tickets_processed, 1)
80
 
81
- def test_extra_fields_records_score_inside_open_interval(self) -> None:
82
- """per_ticket_scores must stay in the open interval after a penalty step."""
83
  env = _make_env()
84
  env.reset(seed=42, task_id=1)
85
 
@@ -91,7 +91,7 @@ class TestExtraFieldsPenalty(unittest.TestCase):
91
 
92
  state = env.state
93
  self.assertEqual(len(state.per_ticket_scores), 1)
94
- self.assertGreater(state.per_ticket_scores[0], 0.0)
95
  self.assertLess(state.per_ticket_scores[0], 1.0)
96
 
97
  def test_extra_fields_history_entry_has_penalty_reason(self) -> None:
@@ -109,7 +109,7 @@ class TestExtraFieldsPenalty(unittest.TestCase):
109
  entry = penalty_obs.history[0]
110
  self.assertIn("penalty_reason", entry)
111
  self.assertIn("assignment_group", entry["penalty_reason"])
112
- self.assertGreater(entry["score"], 0.0)
113
  self.assertLess(entry["score"], 1.0)
114
 
115
  def test_no_extra_fields_grades_normally(self) -> None:
 
44
  class TestExtraFieldsPenalty(unittest.TestCase):
45
  """Requirement 7: step() rejects actions with fields outside the task's allowed_fields."""
46
 
47
+ def test_extra_fields_returns_closed_interval_penalty_reward(self) -> None:
48
+ """Task 1 penalties should keep the returned reward inside the unit interval."""
49
  env = _make_env()
50
  obs = env.reset(seed=42, task_id=1)
51
 
 
61
  penalty_obs = env.step(action)
62
 
63
  self.assertIsInstance(penalty_obs, HelpdeskTicketObservation)
64
+ self.assertGreaterEqual(penalty_obs.reward, 0.0)
65
  self.assertLess(penalty_obs.reward, 1.0)
66
 
67
  def test_extra_fields_advances_ticket_index(self) -> None:
 
78
 
79
  self.assertEqual(penalty_obs.tickets_processed, 1)
80
 
81
+ def test_extra_fields_records_score_inside_unit_interval(self) -> None:
82
+ """per_ticket_scores must stay in the unit interval after a penalty step."""
83
  env = _make_env()
84
  env.reset(seed=42, task_id=1)
85
 
 
91
 
92
  state = env.state
93
  self.assertEqual(len(state.per_ticket_scores), 1)
94
+ self.assertGreaterEqual(state.per_ticket_scores[0], 0.0)
95
  self.assertLess(state.per_ticket_scores[0], 1.0)
96
 
97
  def test_extra_fields_history_entry_has_penalty_reason(self) -> None:
 
109
  entry = penalty_obs.history[0]
110
  self.assertIn("penalty_reason", entry)
111
  self.assertIn("assignment_group", entry["penalty_reason"])
112
+ self.assertGreaterEqual(entry["score"], 0.0)
113
  self.assertLess(entry["score"], 1.0)
114
 
115
  def test_no_extra_fields_grades_normally(self) -> None:
tests/test_grader_unit.py CHANGED
@@ -47,7 +47,7 @@ class GraderUnitTests(unittest.TestCase):
47
 
48
  score, breakdown = grade_action(action, ticket, task_id=3)
49
 
50
- self.assertAlmostEqual(score, 0.99)
51
  self.assertEqual(
52
  breakdown,
53
  {
@@ -88,7 +88,7 @@ class GraderUnitTests(unittest.TestCase):
88
  if predicted == expected
89
  else ISSUE_TYPE_SIMILARITY.get((predicted, expected), 0.0)
90
  )
91
- expected_task_score = max(0.01, min(0.99, raw_expected_score))
92
  self.assertAlmostEqual(score, expected_task_score)
93
  self.assertEqual(breakdown, {"issue_type": raw_expected_score})
94
 
@@ -98,7 +98,7 @@ class GraderUnitTests(unittest.TestCase):
98
 
99
  score, breakdown = grade_action(action, ticket, task_id=1)
100
 
101
- self.assertAlmostEqual(score, 0.01)
102
  self.assertEqual(breakdown, {"issue_type": 0.0})
103
 
104
  def test_priority_scoring_uses_defined_proximity_table(self) -> None:
@@ -133,7 +133,7 @@ class GraderUnitTests(unittest.TestCase):
133
  {"issue_type": 1.0, "priority": priority_score},
134
  )
135
  raw_score = 0.6 + 0.4 * priority_score
136
- expected_task_score = max(0.01, min(0.99, raw_score))
137
  self.assertAlmostEqual(score, expected_task_score)
138
 
139
  def test_task_2_weights_apply_as_documented(self) -> None:
@@ -195,6 +195,42 @@ class GraderUnitTests(unittest.TestCase):
195
  )
196
  self.assertAlmostEqual(score, 0.65)
197
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
198
  def test_resolution_action_partial_credit_uses_declared_similarity_table(self) -> None:
199
  ticket = _ticket()
200
  action = HelpdeskTicketAction(
@@ -252,7 +288,7 @@ class GraderUnitTests(unittest.TestCase):
252
  },
253
  )
254
  raw_score = 0.35 + 0.20 + 0.25 * assignment_group_score + 0.20
255
- expected_task_score = max(0.01, min(0.99, raw_score))
256
  self.assertAlmostEqual(score, expected_task_score)
257
 
258
  def test_resolution_action_scoring_matches_declared_similarity_table_exhaustively(self) -> None:
@@ -284,7 +320,7 @@ class GraderUnitTests(unittest.TestCase):
284
  },
285
  )
286
  raw_score = 0.35 + 0.20 + 0.25 + 0.20 * resolution_action_score
287
- expected_task_score = max(0.01, min(0.99, raw_score))
288
  self.assertAlmostEqual(score, expected_task_score)
289
 
290
  def test_partial_credit_tables_never_override_exact_match(self) -> None:
 
47
 
48
  score, breakdown = grade_action(action, ticket, task_id=3)
49
 
50
+ self.assertAlmostEqual(score, 1.0)
51
  self.assertEqual(
52
  breakdown,
53
  {
 
88
  if predicted == expected
89
  else ISSUE_TYPE_SIMILARITY.get((predicted, expected), 0.0)
90
  )
91
+ expected_task_score = max(0.0, min(1.0, raw_expected_score))
92
  self.assertAlmostEqual(score, expected_task_score)
93
  self.assertEqual(breakdown, {"issue_type": raw_expected_score})
94
 
 
98
 
99
  score, breakdown = grade_action(action, ticket, task_id=1)
100
 
101
+ self.assertAlmostEqual(score, 0.0)
102
  self.assertEqual(breakdown, {"issue_type": 0.0})
103
 
104
  def test_priority_scoring_uses_defined_proximity_table(self) -> None:
 
133
  {"issue_type": 1.0, "priority": priority_score},
134
  )
135
  raw_score = 0.6 + 0.4 * priority_score
136
+ expected_task_score = max(0.0, min(1.0, raw_score))
137
  self.assertAlmostEqual(score, expected_task_score)
138
 
139
  def test_task_2_weights_apply_as_documented(self) -> None:
 
195
  )
196
  self.assertAlmostEqual(score, 0.65)
197
 
198
+ def test_alternate_route_can_win_when_primary_route_is_worse(self) -> None:
199
+ ticket = HelpdeskTicketRecord(
200
+ ticket_id="ticket-alt",
201
+ title="Planning ticket",
202
+ requester="planner@example.com",
203
+ description="Capacity-sensitive routing decision.",
204
+ issue_type="service_request",
205
+ priority="medium",
206
+ assignment_group="procurement",
207
+ resolution_action="assign",
208
+ alternate_issue_type="billing_license",
209
+ alternate_priority="high",
210
+ alternate_assignment_group="license_ops",
211
+ alternate_resolution_action="fulfill",
212
+ alternate_route_score_multiplier=0.85,
213
+ )
214
+ action = HelpdeskTicketAction(
215
+ issue_type="billing_license",
216
+ priority="high",
217
+ assignment_group="license_ops",
218
+ resolution_action="fulfill",
219
+ )
220
+
221
+ score, breakdown = grade_action(action, ticket, task_id=3)
222
+
223
+ self.assertAlmostEqual(score, 0.85)
224
+ self.assertEqual(
225
+ breakdown,
226
+ {
227
+ "issue_type": 0.85,
228
+ "priority": 0.85,
229
+ "assignment_group": 0.85,
230
+ "resolution_action": 0.85,
231
+ },
232
+ )
233
+
234
  def test_resolution_action_partial_credit_uses_declared_similarity_table(self) -> None:
235
  ticket = _ticket()
236
  action = HelpdeskTicketAction(
 
288
  },
289
  )
290
  raw_score = 0.35 + 0.20 + 0.25 * assignment_group_score + 0.20
291
+ expected_task_score = max(0.0, min(1.0, raw_score))
292
  self.assertAlmostEqual(score, expected_task_score)
293
 
294
  def test_resolution_action_scoring_matches_declared_similarity_table_exhaustively(self) -> None:
 
320
  },
321
  )
322
  raw_score = 0.35 + 0.20 + 0.25 + 0.20 * resolution_action_score
323
+ expected_task_score = max(0.0, min(1.0, raw_score))
324
  self.assertAlmostEqual(score, expected_task_score)
325
 
326
  def test_partial_credit_tables_never_override_exact_match(self) -> None:
tests/test_tasks_unit.py CHANGED
@@ -8,7 +8,7 @@ import openenv_test_stubs # noqa: F401
8
 
9
  from models import HelpdeskTicketRecord
10
  from server import tasks as task_module
11
- from server.tasks import TASKS, get_task_definition, load_dataset
12
  from vocabulary import (
13
  ASSIGNMENT_GROUPS,
14
  ISSUE_TYPES,
@@ -51,7 +51,17 @@ class TasksAndDatasetUnitTests(unittest.TestCase):
51
  dataset = load_dataset()
52
 
53
  self.assertGreaterEqual(len(dataset), 45)
54
- self.assertTrue(all(isinstance(record, HelpdeskTicketRecord) for record in dataset))
 
 
 
 
 
 
 
 
 
 
55
 
56
  def test_dataset_ticket_ids_are_unique(self) -> None:
57
  dataset = load_dataset()
@@ -100,9 +110,13 @@ class TasksAndDatasetUnitTests(unittest.TestCase):
100
  dataset = load_dataset()
101
  ambiguity_count = sum(1 for record in dataset if record.ambiguity_note)
102
  follow_up_count = sum(1 for record in dataset if record.related_ticket_id)
 
 
 
103
 
104
  self.assertGreaterEqual(ambiguity_count, 4)
105
  self.assertGreaterEqual(follow_up_count, 3)
 
106
 
107
  def test_load_dataset_accepts_utf8_bom(self) -> None:
108
  sample = (
@@ -129,7 +143,8 @@ class TasksAndDatasetUnitTests(unittest.TestCase):
129
  with mock.patch.object(task_module.Path, "open", fake_open):
130
  dataset = load_dataset()
131
 
132
- self.assertEqual([record.ticket_id for record in dataset], ["ticket-bom"])
 
133
 
134
 
135
  if __name__ == "__main__":
 
8
 
9
  from models import HelpdeskTicketRecord
10
  from server import tasks as task_module
11
+ from server.tasks import CURATED_EXPANSION_RECORDS, TASKS, get_task_definition, load_dataset
12
  from vocabulary import (
13
  ASSIGNMENT_GROUPS,
14
  ISSUE_TYPES,
 
51
  dataset = load_dataset()
52
 
53
  self.assertGreaterEqual(len(dataset), 45)
54
+ self.assertTrue(
55
+ all(
56
+ isinstance(record, HelpdeskTicketRecord)
57
+ or (
58
+ record.__class__.__name__ == "HelpdeskTicketRecord"
59
+ and hasattr(record, "model_dump")
60
+ and hasattr(record, "ticket_id")
61
+ )
62
+ for record in dataset
63
+ )
64
+ )
65
 
66
  def test_dataset_ticket_ids_are_unique(self) -> None:
67
  dataset = load_dataset()
 
110
  dataset = load_dataset()
111
  ambiguity_count = sum(1 for record in dataset if record.ambiguity_note)
112
  follow_up_count = sum(1 for record in dataset if record.related_ticket_id)
113
+ alternate_route_count = sum(
114
+ 1 for record in dataset if record.alternate_route_score_multiplier > 0.0
115
+ )
116
 
117
  self.assertGreaterEqual(ambiguity_count, 4)
118
  self.assertGreaterEqual(follow_up_count, 3)
119
+ self.assertGreaterEqual(alternate_route_count, 10)
120
 
121
  def test_load_dataset_accepts_utf8_bom(self) -> None:
122
  sample = (
 
143
  with mock.patch.object(task_module.Path, "open", fake_open):
144
  dataset = load_dataset()
145
 
146
+ self.assertIn("ticket-bom", [record.ticket_id for record in dataset])
147
+ self.assertEqual(len(dataset), 1 + len(CURATED_EXPANSION_RECORDS))
148
 
149
 
150
  if __name__ == "__main__":