Spaces:
Running
Running
Add queue-planning helpdesk routing mechanics
Browse files- README.md +32 -13
- inference.py +179 -11
- models.py +54 -0
- policy_learning.py +77 -5
- server/environment.py +481 -8
- server/grader.py +71 -15
- server/reward.py +3 -4
- server/tasks.py +381 -4
- tests/test_api_integration.py +4 -4
- tests/test_competitive_upgrade.py +36 -2
- tests/test_extra_fields_penalty.py +7 -7
- tests/test_grader_unit.py +42 -6
- tests/test_tasks_unit.py +18 -3
README.md
CHANGED
|
@@ -55,21 +55,22 @@ This domain is useful for OpenEnv because it is operationally realistic, easy to
|
|
| 55 |
The project uses a queue-based episode model.
|
| 56 |
|
| 57 |
- `reset()` samples a task and a queue of 3 to 5 tickets
|
| 58 |
-
- `step()`
|
| 59 |
- `state()` exposes the internal episode snapshot
|
| 60 |
-
-
|
|
|
|
| 61 |
|
| 62 |
The environment classes and vocabulary are intentionally frozen to keep collaboration and judging simple.
|
| 63 |
|
| 64 |
## Lightweight Policy Improvement Loop
|
| 65 |
|
| 66 |
-
The repo
|
| 67 |
|
| 68 |
-
That gives the project a
|
| 69 |
|
| 70 |
-
- compare `no_investigation`
|
| 71 |
-
- log per-step rewards, feedback summaries, and reward components to JSONL
|
| 72 |
-
-
|
| 73 |
- select the best policy on train seeds, then re-evaluate it on holdout seeds
|
| 74 |
|
| 75 |
Example commands:
|
|
@@ -90,7 +91,7 @@ Artifacts are written to `analysis/policy_learning_runs/` by default:
|
|
| 90 |
- `search_eval_episodes.jsonl`
|
| 91 |
- `search_eval_trajectories.jsonl`
|
| 92 |
|
| 93 |
-
The default submit policy inside this runner stays deterministic and local. It reuses the repo's heuristic routing logic, so the
|
| 94 |
|
| 95 |
## Task Ladder
|
| 96 |
|
|
@@ -149,8 +150,11 @@ Visible ticket fields:
|
|
| 149 |
- `requester`
|
| 150 |
- `description`
|
| 151 |
- optional `ambiguity_note`
|
|
|
|
| 152 |
- optional `related_ticket_id`
|
| 153 |
- optional `related_ticket_preview`
|
|
|
|
|
|
|
| 154 |
|
| 155 |
Each observation also includes:
|
| 156 |
|
|
@@ -173,6 +177,8 @@ Each observation also includes:
|
|
| 173 |
- `last_reward_components`
|
| 174 |
- `rubric_reward` on terminal observations
|
| 175 |
- `metadata.last_feedback_summary` for compact reward / penalty feedback
|
|
|
|
|
|
|
| 176 |
- standard OpenEnv fields such as `done` and `reward`
|
| 177 |
|
| 178 |
The internal `HelpdeskTicketState` tracks:
|
|
@@ -187,6 +193,10 @@ The internal `HelpdeskTicketState` tracks:
|
|
| 187 |
- `total_reward`
|
| 188 |
- `reward`
|
| 189 |
- `done`
|
|
|
|
|
|
|
|
|
|
|
|
|
| 190 |
|
| 191 |
## Grading And Reward
|
| 192 |
|
|
@@ -202,14 +212,17 @@ Available tools:
|
|
| 202 |
- `lookup_related_ticket`
|
| 203 |
- `lookup_requester_history`
|
| 204 |
- `lookup_internal_routing_note`
|
|
|
|
| 205 |
|
| 206 |
Hard-task investigation behavior:
|
| 207 |
|
| 208 |
- some ambiguous and non-default-routing tickets start with both redacted titles and redacted descriptions
|
| 209 |
- linked-ticket previews and internal routing notes stay hidden until the matching tool is used
|
|
|
|
| 210 |
- only useful investigation steps return a small positive shaping reward
|
| 211 |
- blind or repeated probing does not pay by default
|
| 212 |
- premature hard-task submission can incur a shaping penalty even when the visible text looks plausible
|
|
|
|
| 213 |
- terminal `rubric_reward` remains the objective evaluation signal, while per-step `reward` is the denser training signal
|
| 214 |
|
| 215 |
Per-field behavior:
|
|
@@ -218,6 +231,7 @@ Per-field behavior:
|
|
| 218 |
- `priority`: exact match or proximity credit
|
| 219 |
- `assignment_group`: exact match, with a small declared partial-credit map for nearby ownership mistakes
|
| 220 |
- `resolution_action`: exact match, with a small declared partial-credit map for nearby next-step mistakes
|
|
|
|
| 221 |
|
| 222 |
Task weights:
|
| 223 |
|
|
@@ -227,22 +241,23 @@ Task weights:
|
|
| 227 |
| 2 | 60% | 40% | - | - |
|
| 228 |
| 3 | 35% | 20% | 25% | 20% |
|
| 229 |
|
| 230 |
-
Final episode reward:
|
| 231 |
|
| 232 |
```text
|
| 233 |
-
average(per_ticket_scores)
|
| 234 |
```
|
| 235 |
|
| 236 |
-
|
| 237 |
|
| 238 |
Step reward is lightly milestone-shaped: high per-ticket scores get a small bonus and very low scores get a small penalty before the final clamp.
|
| 239 |
|
| 240 |
-
Final reward also includes a queue-economics penalty when the agent exceeds the free investigation budget. One investigation per queued ticket is free, but extra investigation steps reduce the final reward more noticeably than before.
|
| 241 |
|
| 242 |
To make the environment more RL-friendly, each observation now also surfaces structured reward telemetry:
|
| 243 |
|
| 244 |
- `last_reward_components` exposes ticket score, shaped step reward, milestone adjustment, trajectory reward when applicable, and any investigation penalty applied
|
| 245 |
- `average_score_so_far` and `progress_fraction` expose trajectory progress without leaking future labels
|
|
|
|
| 246 |
- `history` retains the same reward components plus a compact `feedback_summary` string for downstream agents
|
| 247 |
|
| 248 |
## Grounded Scoring
|
|
@@ -253,6 +268,7 @@ The grader is intentionally narrow and declared, not fully fuzzy.
|
|
| 253 |
- `assignment_group` and `resolution_action` now expose only a small declared partial-credit map for nearby mistakes
|
| 254 |
- `priority` only gets proximity credit from the declared table in `server/grader.py`
|
| 255 |
- `issue_type` only gets partial credit for a small declared similarity map
|
|
|
|
| 256 |
- wrong labels outside those explicit maps score `0.0`
|
| 257 |
|
| 258 |
That scoring policy is now backed by checked-in unit tests in `tests/test_grader_unit.py` and `tests/test_tasks_unit.py`.
|
|
@@ -267,7 +283,7 @@ That grounding pass supported keeping the current similarity map small and expla
|
|
| 267 |
|
| 268 |
## Dataset Snapshot
|
| 269 |
|
| 270 |
-
The labeled dataset
|
| 271 |
|
| 272 |
It includes:
|
| 273 |
|
|
@@ -280,6 +296,9 @@ It includes:
|
|
| 280 |
- onboarding tickets
|
| 281 |
- feature requests
|
| 282 |
- follow-up cases linked through `related_ticket_id`
|
|
|
|
|
|
|
|
|
|
| 283 |
|
| 284 |
## Difficulty Coverage
|
| 285 |
|
|
|
|
| 55 |
The project uses a queue-based episode model.
|
| 56 |
|
| 57 |
- `reset()` samples a task and a queue of 3 to 5 tickets
|
| 58 |
+
- `step()` lets the agent investigate or submit one ticket at a time
|
| 59 |
- `state()` exposes the internal episode snapshot
|
| 60 |
+
- hard-task episodes also track queue-level capacity, alternate acceptable routes, and planning penalties across tickets
|
| 61 |
+
- final evaluation is based on the queue outcome, not on isolated per-ticket classification alone
|
| 62 |
|
| 63 |
The environment classes and vocabulary are intentionally frozen to keep collaboration and judging simple.
|
| 64 |
|
| 65 |
## Lightweight Policy Improvement Loop
|
| 66 |
|
| 67 |
+
The repo includes a local policy runner in `policy_learning.py`. It still does not update model weights, but it now does more than cosmetic search: it evaluates repeated seeded rollouts, learns cue-conditioned tool preferences for investigation, uses the same planning-aware deterministic submit logic as `inference.py`, and ranks policies by terminal rubric reward first, with lower planning penalty as the tie-breaker.
|
| 68 |
|
| 69 |
+
That gives the project a meaningful improvement loop for judge demos:
|
| 70 |
|
| 71 |
+
- compare `no_investigation`, `investigate_when_context_hidden`, and `adaptive_cue_bandit`
|
| 72 |
+
- log per-step rewards, feedback summaries, planning penalties, and reward components to JSONL
|
| 73 |
+
- learn when to use `lookup_queue_capacity_forecast` versus the other investigation tools
|
| 74 |
- select the best policy on train seeds, then re-evaluate it on holdout seeds
|
| 75 |
|
| 76 |
Example commands:
|
|
|
|
| 91 |
- `search_eval_episodes.jsonl`
|
| 92 |
- `search_eval_trajectories.jsonl`
|
| 93 |
|
| 94 |
+
The default submit policy inside this runner stays deterministic and local. It reuses the repo's heuristic routing logic plus planning-aware routing overrides, so the search loop can study both investigation policy and queue-aware submission quality without depending on external LLM latency or API cost.
|
| 95 |
|
| 96 |
## Task Ladder
|
| 97 |
|
|
|
|
| 150 |
- `requester`
|
| 151 |
- `description`
|
| 152 |
- optional `ambiguity_note`
|
| 153 |
+
- optional `planning_note`
|
| 154 |
- optional `related_ticket_id`
|
| 155 |
- optional `related_ticket_preview`
|
| 156 |
+
- optional `routing_options`
|
| 157 |
+
- optional `capacity_state`
|
| 158 |
|
| 159 |
Each observation also includes:
|
| 160 |
|
|
|
|
| 177 |
- `last_reward_components`
|
| 178 |
- `rubric_reward` on terminal observations
|
| 179 |
- `metadata.last_feedback_summary` for compact reward / penalty feedback
|
| 180 |
+
- `metadata.capacity_state` and `metadata.future_queue_demand` on hard-task episodes
|
| 181 |
+
- `metadata.planning_penalty_total` and `metadata.planning_penalty_applied`
|
| 182 |
- standard OpenEnv fields such as `done` and `reward`
|
| 183 |
|
| 184 |
The internal `HelpdeskTicketState` tracks:
|
|
|
|
| 193 |
- `total_reward`
|
| 194 |
- `reward`
|
| 195 |
- `done`
|
| 196 |
+
- `team_capacity_remaining`
|
| 197 |
+
- `high_priority_slots_remaining`
|
| 198 |
+
- `escalation_slots_remaining`
|
| 199 |
+
- `planning_penalty_total`
|
| 200 |
|
| 201 |
## Grading And Reward
|
| 202 |
|
|
|
|
| 212 |
- `lookup_related_ticket`
|
| 213 |
- `lookup_requester_history`
|
| 214 |
- `lookup_internal_routing_note`
|
| 215 |
+
- `lookup_queue_capacity_forecast`
|
| 216 |
|
| 217 |
Hard-task investigation behavior:
|
| 218 |
|
| 219 |
- some ambiguous and non-default-routing tickets start with both redacted titles and redacted descriptions
|
| 220 |
- linked-ticket previews and internal routing notes stay hidden until the matching tool is used
|
| 221 |
+
- capacity-sensitive tickets can expose queue pressure, future demand, and alternate routing options through `lookup_queue_capacity_forecast`
|
| 222 |
- only useful investigation steps return a small positive shaping reward
|
| 223 |
- blind or repeated probing does not pay by default
|
| 224 |
- premature hard-task submission can incur a shaping penalty even when the visible text looks plausible
|
| 225 |
+
- resource-greedy routing can add planning penalties later in the queue even when a single ticket looks correct in isolation
|
| 226 |
- terminal `rubric_reward` remains the objective evaluation signal, while per-step `reward` is the denser training signal
|
| 227 |
|
| 228 |
Per-field behavior:
|
|
|
|
| 231 |
- `priority`: exact match or proximity credit
|
| 232 |
- `assignment_group`: exact match, with a small declared partial-credit map for nearby ownership mistakes
|
| 233 |
- `resolution_action`: exact match, with a small declared partial-credit map for nearby next-step mistakes
|
| 234 |
+
- hard task only: some tickets also declare an alternate acceptable route with a reduced score multiplier, so the grader can reward capacity-aware fallback choices without collapsing into full fuzziness
|
| 235 |
|
| 236 |
Task weights:
|
| 237 |
|
|
|
|
| 241 |
| 2 | 60% | 40% | - | - |
|
| 242 |
| 3 | 35% | 20% | 25% | 20% |
|
| 243 |
|
| 244 |
+
Final episode rubric reward is queue-based:
|
| 245 |
|
| 246 |
```text
|
| 247 |
+
clamp(average(per_ticket_scores) + trajectory bonuses - planning penalties - extra investigation penalties)
|
| 248 |
```
|
| 249 |
|
| 250 |
+
Both `reward` and `rubric_reward` now use the closed interval `[0.0, 1.0]`.
|
| 251 |
|
| 252 |
Step reward is lightly milestone-shaped: high per-ticket scores get a small bonus and very low scores get a small penalty before the final clamp.
|
| 253 |
|
| 254 |
+
Final reward also includes a queue-economics penalty when the agent exceeds the free investigation budget. One investigation per queued ticket is free, but extra investigation steps reduce the final reward more noticeably than before. On hard-task queues, assignment-group capacity, high-priority slots, and escalation slots also create cross-ticket trade-offs.
|
| 255 |
|
| 256 |
To make the environment more RL-friendly, each observation now also surfaces structured reward telemetry:
|
| 257 |
|
| 258 |
- `last_reward_components` exposes ticket score, shaped step reward, milestone adjustment, trajectory reward when applicable, and any investigation penalty applied
|
| 259 |
- `average_score_so_far` and `progress_fraction` expose trajectory progress without leaking future labels
|
| 260 |
+
- hard-task telemetry includes planning penalties, capacity usage, and the post-action capacity snapshot
|
| 261 |
- `history` retains the same reward components plus a compact `feedback_summary` string for downstream agents
|
| 262 |
|
| 263 |
## Grounded Scoring
|
|
|
|
| 268 |
- `assignment_group` and `resolution_action` now expose only a small declared partial-credit map for nearby mistakes
|
| 269 |
- `priority` only gets proximity credit from the declared table in `server/grader.py`
|
| 270 |
- `issue_type` only gets partial credit for a small declared similarity map
|
| 271 |
+
- hard-task alternate routes must be explicitly declared in the dataset and carry an explicit score multiplier
|
| 272 |
- wrong labels outside those explicit maps score `0.0`
|
| 273 |
|
| 274 |
That scoring policy is now backed by checked-in unit tests in `tests/test_grader_unit.py` and `tests/test_tasks_unit.py`.
|
|
|
|
| 283 |
|
| 284 |
## Dataset Snapshot
|
| 285 |
|
| 286 |
+
The effective labeled dataset now contains 70 tickets spanning straightforward, ambiguous, and planning-sensitive helpdesk scenarios.
|
| 287 |
|
| 288 |
It includes:
|
| 289 |
|
|
|
|
| 296 |
- onboarding tickets
|
| 297 |
- feature requests
|
| 298 |
- follow-up cases linked through `related_ticket_id`
|
| 299 |
+
- 16 tickets with explicit ambiguity notes
|
| 300 |
+
- 7 linked follow-up cases
|
| 301 |
+
- 22 tickets with declared alternate routes for queue-level planning
|
| 302 |
|
| 303 |
## Difficulty Coverage
|
| 304 |
|
inference.py
CHANGED
|
@@ -195,6 +195,7 @@ def format_recent_history_entries(
|
|
| 195 |
|
| 196 |
def build_llm_user_message(ticket: dict, allowed_fields: list[str], instructions: str) -> str:
|
| 197 |
ambiguity_note = ticket.get("ambiguity_note")
|
|
|
|
| 198 |
related_preview = ticket.get("related_ticket_preview") or {}
|
| 199 |
last_tool_result = ticket.get("last_tool_result")
|
| 200 |
context_status = ticket.get("context_status") or {}
|
|
@@ -204,9 +205,14 @@ def build_llm_user_message(ticket: dict, allowed_fields: list[str], instructions
|
|
| 204 |
investigation_budget_remaining = ticket.get("investigation_budget_remaining")
|
| 205 |
average_score_so_far = ticket.get("average_score_so_far")
|
| 206 |
progress_fraction = ticket.get("progress_fraction")
|
|
|
|
|
|
|
|
|
|
| 207 |
extra_context_lines: list[str] = []
|
| 208 |
if ambiguity_note:
|
| 209 |
extra_context_lines.append(f"Ambiguity note: {ambiguity_note}")
|
|
|
|
|
|
|
| 210 |
if related_preview:
|
| 211 |
extra_context_lines.extend(
|
| 212 |
[
|
|
@@ -224,6 +230,18 @@ def build_llm_user_message(ticket: dict, allowed_fields: list[str], instructions
|
|
| 224 |
extra_context_lines.append(
|
| 225 |
"Context status: " + json.dumps(context_status, sort_keys=True)
|
| 226 |
)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 227 |
if feedback_summary:
|
| 228 |
extra_context_lines.append(f"Latest environment feedback: {feedback_summary}")
|
| 229 |
if last_reward_components:
|
|
@@ -293,7 +311,7 @@ def _format_bool(value: bool) -> str:
|
|
| 293 |
|
| 294 |
|
| 295 |
def clamp_reported_score(score: float) -> float:
|
| 296 |
-
return max(0.
|
| 297 |
|
| 298 |
|
| 299 |
def _format_action_for_log(action: HelpdeskTicketAction) -> str:
|
|
@@ -553,14 +571,19 @@ TIME_SENSITIVE_PRIORITY_KEYWORDS = (
|
|
| 553 |
def build_routing_text(ticket: dict) -> str:
|
| 554 |
related_preview = ticket.get("related_ticket_preview") or {}
|
| 555 |
last_tool_result = ticket.get("last_tool_result") or {}
|
|
|
|
| 556 |
return " ".join(
|
| 557 |
[
|
| 558 |
ticket.get("title", ""),
|
| 559 |
ticket.get("description", ""),
|
| 560 |
ticket.get("ambiguity_note", ""),
|
|
|
|
| 561 |
related_preview.get("title", ""),
|
| 562 |
related_preview.get("description", ""),
|
| 563 |
json.dumps(last_tool_result, sort_keys=True),
|
|
|
|
|
|
|
|
|
|
| 564 |
]
|
| 565 |
).lower()
|
| 566 |
|
|
@@ -630,6 +653,90 @@ def heuristic_action(
|
|
| 630 |
return result
|
| 631 |
|
| 632 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 633 |
def apply_domain_overrides(
|
| 634 |
ticket: dict, candidate: dict[str, Any], allowed_fields: list[str]
|
| 635 |
) -> tuple[dict[str, Any], list[str]]:
|
|
@@ -697,9 +804,27 @@ def build_action(
|
|
| 697 |
ticket: dict, allowed_fields: list[str], instructions: str
|
| 698 |
) -> tuple[HelpdeskTicketAction, str, str | None]:
|
| 699 |
heuristic_dict = heuristic_action(ticket, allowed_fields)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 700 |
|
| 701 |
if llm_client is None:
|
| 702 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 703 |
|
| 704 |
try:
|
| 705 |
llm_dict = call_llm(ticket, allowed_fields, instructions)
|
|
@@ -731,9 +856,19 @@ def build_action(
|
|
| 731 |
candidate,
|
| 732 |
allowed_fields,
|
| 733 |
)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 734 |
|
| 735 |
backfilled_fields = [field for field in allowed_fields if field not in accepted_fields]
|
| 736 |
-
if
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 737 |
reason_parts = []
|
| 738 |
if backfilled_fields:
|
| 739 |
reason_parts.append(f"heuristic_backfill={backfilled_fields}")
|
|
@@ -741,6 +876,8 @@ def build_action(
|
|
| 741 |
reason_parts.append(f"invalid_llm_fields={rejected_fields}")
|
| 742 |
if override_reasons:
|
| 743 |
reason_parts.append(f"domain_overrides={override_reasons}")
|
|
|
|
|
|
|
| 744 |
return (
|
| 745 |
HelpdeskTicketAction(**candidate),
|
| 746 |
"llm_backfilled",
|
|
@@ -752,7 +889,23 @@ def build_action(
|
|
| 752 |
return (
|
| 753 |
HelpdeskTicketAction(**heuristic_dict),
|
| 754 |
"heuristic_fallback",
|
| 755 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 756 |
)
|
| 757 |
|
| 758 |
|
|
@@ -857,6 +1010,7 @@ def should_investigate(ticket: dict, history: list[dict[str, Any]]) -> tuple[boo
|
|
| 857 |
if hidden_context_remaining:
|
| 858 |
preferred_tools.extend(
|
| 859 |
[
|
|
|
|
| 860 |
"lookup_related_ticket",
|
| 861 |
"lookup_internal_routing_note",
|
| 862 |
"lookup_requester_history",
|
|
@@ -892,6 +1046,14 @@ def merge_ticket_context(ticket: dict, observation: Any) -> dict:
|
|
| 892 |
observation_metadata = getattr(observation, "metadata", {}) or {}
|
| 893 |
if observation_metadata.get("last_feedback_summary"):
|
| 894 |
merged_ticket["feedback_summary"] = observation_metadata["last_feedback_summary"]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 895 |
return merged_ticket
|
| 896 |
|
| 897 |
|
|
@@ -933,12 +1095,10 @@ def run() -> None:
|
|
| 933 |
if ticket is None:
|
| 934 |
break
|
| 935 |
|
| 936 |
-
|
| 937 |
-
|
| 938 |
-
investigate
|
| 939 |
-
|
| 940 |
-
and getattr(obs, "investigation_budget_remaining", 0) > 0
|
| 941 |
-
):
|
| 942 |
tool_action = HelpdeskTicketAction(
|
| 943 |
action_type="investigate",
|
| 944 |
tool_name=tool_name,
|
|
@@ -947,10 +1107,13 @@ def run() -> None:
|
|
| 947 |
result = sync_client.step(tool_action)
|
| 948 |
obs = result.observation
|
| 949 |
step_num += 1
|
|
|
|
|
|
|
|
|
|
| 950 |
log_step(
|
| 951 |
step=step_num,
|
| 952 |
action=tool_action,
|
| 953 |
-
reward=
|
| 954 |
done=bool(result.done),
|
| 955 |
error=None,
|
| 956 |
)
|
|
@@ -959,6 +1122,11 @@ def run() -> None:
|
|
| 959 |
ticket = obs.current_ticket
|
| 960 |
if ticket is None:
|
| 961 |
break
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 962 |
|
| 963 |
ticket_with_context = merge_ticket_context(ticket, obs)
|
| 964 |
action, action_source, fallback_reason = build_action(
|
|
|
|
| 195 |
|
| 196 |
def build_llm_user_message(ticket: dict, allowed_fields: list[str], instructions: str) -> str:
|
| 197 |
ambiguity_note = ticket.get("ambiguity_note")
|
| 198 |
+
planning_note = ticket.get("planning_note")
|
| 199 |
related_preview = ticket.get("related_ticket_preview") or {}
|
| 200 |
last_tool_result = ticket.get("last_tool_result")
|
| 201 |
context_status = ticket.get("context_status") or {}
|
|
|
|
| 205 |
investigation_budget_remaining = ticket.get("investigation_budget_remaining")
|
| 206 |
average_score_so_far = ticket.get("average_score_so_far")
|
| 207 |
progress_fraction = ticket.get("progress_fraction")
|
| 208 |
+
capacity_state = ticket.get("capacity_state")
|
| 209 |
+
future_queue_demand = ticket.get("future_queue_demand")
|
| 210 |
+
routing_options = ticket.get("routing_options") or []
|
| 211 |
extra_context_lines: list[str] = []
|
| 212 |
if ambiguity_note:
|
| 213 |
extra_context_lines.append(f"Ambiguity note: {ambiguity_note}")
|
| 214 |
+
if planning_note:
|
| 215 |
+
extra_context_lines.append(f"Planning note: {planning_note}")
|
| 216 |
if related_preview:
|
| 217 |
extra_context_lines.extend(
|
| 218 |
[
|
|
|
|
| 230 |
extra_context_lines.append(
|
| 231 |
"Context status: " + json.dumps(context_status, sort_keys=True)
|
| 232 |
)
|
| 233 |
+
if capacity_state:
|
| 234 |
+
extra_context_lines.append(
|
| 235 |
+
"Queue capacity state: " + json.dumps(capacity_state, sort_keys=True)
|
| 236 |
+
)
|
| 237 |
+
if future_queue_demand:
|
| 238 |
+
extra_context_lines.append(
|
| 239 |
+
"Future queue demand: " + json.dumps(future_queue_demand, sort_keys=True)
|
| 240 |
+
)
|
| 241 |
+
if routing_options:
|
| 242 |
+
extra_context_lines.append(
|
| 243 |
+
"Routing options: " + json.dumps(routing_options, sort_keys=True)
|
| 244 |
+
)
|
| 245 |
if feedback_summary:
|
| 246 |
extra_context_lines.append(f"Latest environment feedback: {feedback_summary}")
|
| 247 |
if last_reward_components:
|
|
|
|
| 311 |
|
| 312 |
|
| 313 |
def clamp_reported_score(score: float) -> float:
|
| 314 |
+
return max(0.0, min(1.0, score))
|
| 315 |
|
| 316 |
|
| 317 |
def _format_action_for_log(action: HelpdeskTicketAction) -> str:
|
|
|
|
| 571 |
def build_routing_text(ticket: dict) -> str:
|
| 572 |
related_preview = ticket.get("related_ticket_preview") or {}
|
| 573 |
last_tool_result = ticket.get("last_tool_result") or {}
|
| 574 |
+
routing_options = ticket.get("routing_options") or []
|
| 575 |
return " ".join(
|
| 576 |
[
|
| 577 |
ticket.get("title", ""),
|
| 578 |
ticket.get("description", ""),
|
| 579 |
ticket.get("ambiguity_note", ""),
|
| 580 |
+
ticket.get("planning_note", ""),
|
| 581 |
related_preview.get("title", ""),
|
| 582 |
related_preview.get("description", ""),
|
| 583 |
json.dumps(last_tool_result, sort_keys=True),
|
| 584 |
+
json.dumps(routing_options, sort_keys=True),
|
| 585 |
+
json.dumps(ticket.get("capacity_state") or {}, sort_keys=True),
|
| 586 |
+
json.dumps(ticket.get("future_queue_demand") or {}, sort_keys=True),
|
| 587 |
]
|
| 588 |
).lower()
|
| 589 |
|
|
|
|
| 653 |
return result
|
| 654 |
|
| 655 |
|
| 656 |
+
def _get_routing_options(ticket: dict[str, Any]) -> list[dict[str, Any]]:
|
| 657 |
+
options = ticket.get("routing_options") or []
|
| 658 |
+
return [option for option in options if isinstance(option, dict)]
|
| 659 |
+
|
| 660 |
+
|
| 661 |
+
def _get_routing_option_by_label(
|
| 662 |
+
ticket: dict[str, Any],
|
| 663 |
+
label: str | None,
|
| 664 |
+
) -> dict[str, Any] | None:
|
| 665 |
+
if label is None:
|
| 666 |
+
return None
|
| 667 |
+
for option in _get_routing_options(ticket):
|
| 668 |
+
if option.get("label") == label:
|
| 669 |
+
return option
|
| 670 |
+
return None
|
| 671 |
+
|
| 672 |
+
|
| 673 |
+
def _route_option_fields_match(
|
| 674 |
+
option: dict[str, Any],
|
| 675 |
+
candidate: dict[str, Any],
|
| 676 |
+
allowed_fields: list[str],
|
| 677 |
+
) -> bool:
|
| 678 |
+
for field in ("issue_type", "priority", "assignment_group", "resolution_action"):
|
| 679 |
+
if field not in allowed_fields:
|
| 680 |
+
continue
|
| 681 |
+
option_value = option.get(field)
|
| 682 |
+
candidate_value = candidate.get(field)
|
| 683 |
+
if option_value is None or candidate_value is None:
|
| 684 |
+
continue
|
| 685 |
+
if str(option_value) != str(candidate_value):
|
| 686 |
+
return False
|
| 687 |
+
return True
|
| 688 |
+
|
| 689 |
+
|
| 690 |
+
def _preferred_routing_label(ticket: dict[str, Any]) -> str | None:
|
| 691 |
+
last_tool_result = ticket.get("last_tool_result") or {}
|
| 692 |
+
tool_name = str(last_tool_result.get("tool_name", "") or "")
|
| 693 |
+
preferred_label = str(last_tool_result.get("preferred_route_label", "") or "")
|
| 694 |
+
if tool_name == "lookup_queue_capacity_forecast" and preferred_label in {
|
| 695 |
+
"primary",
|
| 696 |
+
"alternate",
|
| 697 |
+
}:
|
| 698 |
+
return preferred_label
|
| 699 |
+
return None
|
| 700 |
+
|
| 701 |
+
|
| 702 |
+
def apply_capacity_planning_overrides(
|
| 703 |
+
ticket: dict[str, Any],
|
| 704 |
+
candidate: dict[str, Any],
|
| 705 |
+
allowed_fields: list[str],
|
| 706 |
+
) -> tuple[dict[str, Any], list[str]]:
|
| 707 |
+
updated = dict(candidate)
|
| 708 |
+
reasons: list[str] = []
|
| 709 |
+
preferred_label = _preferred_routing_label(ticket)
|
| 710 |
+
preferred_option = _get_routing_option_by_label(ticket, preferred_label)
|
| 711 |
+
if preferred_option is None:
|
| 712 |
+
return updated, reasons
|
| 713 |
+
|
| 714 |
+
current_matching_label = None
|
| 715 |
+
for option in _get_routing_options(ticket):
|
| 716 |
+
if _route_option_fields_match(option, updated, allowed_fields):
|
| 717 |
+
current_matching_label = option.get("label")
|
| 718 |
+
break
|
| 719 |
+
|
| 720 |
+
if current_matching_label == preferred_label:
|
| 721 |
+
return updated, reasons
|
| 722 |
+
|
| 723 |
+
for field in ("issue_type", "priority", "assignment_group", "resolution_action"):
|
| 724 |
+
if field not in allowed_fields:
|
| 725 |
+
continue
|
| 726 |
+
option_value = preferred_option.get(field)
|
| 727 |
+
if option_value is None:
|
| 728 |
+
continue
|
| 729 |
+
updated[field] = option_value
|
| 730 |
+
|
| 731 |
+
last_tool_result = ticket.get("last_tool_result") or {}
|
| 732 |
+
reasons.append(
|
| 733 |
+
"planning_override="
|
| 734 |
+
f"{preferred_label}(primary_pressure={last_tool_result.get('primary_pressure')},"
|
| 735 |
+
f"alternate_pressure={last_tool_result.get('alternate_pressure')})"
|
| 736 |
+
)
|
| 737 |
+
return updated, reasons
|
| 738 |
+
|
| 739 |
+
|
| 740 |
def apply_domain_overrides(
|
| 741 |
ticket: dict, candidate: dict[str, Any], allowed_fields: list[str]
|
| 742 |
) -> tuple[dict[str, Any], list[str]]:
|
|
|
|
| 804 |
ticket: dict, allowed_fields: list[str], instructions: str
|
| 805 |
) -> tuple[HelpdeskTicketAction, str, str | None]:
|
| 806 |
heuristic_dict = heuristic_action(ticket, allowed_fields)
|
| 807 |
+
heuristic_dict, heuristic_override_reasons = apply_domain_overrides(
|
| 808 |
+
ticket,
|
| 809 |
+
heuristic_dict,
|
| 810 |
+
allowed_fields,
|
| 811 |
+
)
|
| 812 |
+
heuristic_dict, heuristic_planning_reasons = apply_capacity_planning_overrides(
|
| 813 |
+
ticket,
|
| 814 |
+
heuristic_dict,
|
| 815 |
+
allowed_fields,
|
| 816 |
+
)
|
| 817 |
|
| 818 |
if llm_client is None:
|
| 819 |
+
fallback_reason = None
|
| 820 |
+
reason_parts = []
|
| 821 |
+
if heuristic_override_reasons:
|
| 822 |
+
reason_parts.append(f"domain_overrides={heuristic_override_reasons}")
|
| 823 |
+
if heuristic_planning_reasons:
|
| 824 |
+
reason_parts.append(f"planning_overrides={heuristic_planning_reasons}")
|
| 825 |
+
if reason_parts:
|
| 826 |
+
fallback_reason = "; ".join(reason_parts)
|
| 827 |
+
return HelpdeskTicketAction(**heuristic_dict), "heuristic", fallback_reason
|
| 828 |
|
| 829 |
try:
|
| 830 |
llm_dict = call_llm(ticket, allowed_fields, instructions)
|
|
|
|
| 856 |
candidate,
|
| 857 |
allowed_fields,
|
| 858 |
)
|
| 859 |
+
candidate, planning_override_reasons = apply_capacity_planning_overrides(
|
| 860 |
+
ticket,
|
| 861 |
+
candidate,
|
| 862 |
+
allowed_fields,
|
| 863 |
+
)
|
| 864 |
|
| 865 |
backfilled_fields = [field for field in allowed_fields if field not in accepted_fields]
|
| 866 |
+
if (
|
| 867 |
+
backfilled_fields
|
| 868 |
+
or rejected_fields
|
| 869 |
+
or override_reasons
|
| 870 |
+
or planning_override_reasons
|
| 871 |
+
):
|
| 872 |
reason_parts = []
|
| 873 |
if backfilled_fields:
|
| 874 |
reason_parts.append(f"heuristic_backfill={backfilled_fields}")
|
|
|
|
| 876 |
reason_parts.append(f"invalid_llm_fields={rejected_fields}")
|
| 877 |
if override_reasons:
|
| 878 |
reason_parts.append(f"domain_overrides={override_reasons}")
|
| 879 |
+
if planning_override_reasons:
|
| 880 |
+
reason_parts.append(f"planning_overrides={planning_override_reasons}")
|
| 881 |
return (
|
| 882 |
HelpdeskTicketAction(**candidate),
|
| 883 |
"llm_backfilled",
|
|
|
|
| 889 |
return (
|
| 890 |
HelpdeskTicketAction(**heuristic_dict),
|
| 891 |
"heuristic_fallback",
|
| 892 |
+
"; ".join(
|
| 893 |
+
part
|
| 894 |
+
for part in (
|
| 895 |
+
str(exc),
|
| 896 |
+
(
|
| 897 |
+
f"domain_overrides={heuristic_override_reasons}"
|
| 898 |
+
if heuristic_override_reasons
|
| 899 |
+
else None
|
| 900 |
+
),
|
| 901 |
+
(
|
| 902 |
+
f"planning_overrides={heuristic_planning_reasons}"
|
| 903 |
+
if heuristic_planning_reasons
|
| 904 |
+
else None
|
| 905 |
+
),
|
| 906 |
+
)
|
| 907 |
+
if part
|
| 908 |
+
),
|
| 909 |
)
|
| 910 |
|
| 911 |
|
|
|
|
| 1010 |
if hidden_context_remaining:
|
| 1011 |
preferred_tools.extend(
|
| 1012 |
[
|
| 1013 |
+
"lookup_queue_capacity_forecast",
|
| 1014 |
"lookup_related_ticket",
|
| 1015 |
"lookup_internal_routing_note",
|
| 1016 |
"lookup_requester_history",
|
|
|
|
| 1046 |
observation_metadata = getattr(observation, "metadata", {}) or {}
|
| 1047 |
if observation_metadata.get("last_feedback_summary"):
|
| 1048 |
merged_ticket["feedback_summary"] = observation_metadata["last_feedback_summary"]
|
| 1049 |
+
if observation_metadata.get("capacity_state") is not None:
|
| 1050 |
+
merged_ticket["capacity_state"] = observation_metadata["capacity_state"]
|
| 1051 |
+
if observation_metadata.get("future_queue_demand") is not None:
|
| 1052 |
+
merged_ticket["future_queue_demand"] = observation_metadata["future_queue_demand"]
|
| 1053 |
+
if observation_metadata.get("planning_penalty_total") is not None:
|
| 1054 |
+
merged_ticket["planning_penalty_total"] = observation_metadata["planning_penalty_total"]
|
| 1055 |
+
if observation_metadata.get("planning_penalty_applied") is not None:
|
| 1056 |
+
merged_ticket["planning_penalty_applied"] = observation_metadata["planning_penalty_applied"]
|
| 1057 |
return merged_ticket
|
| 1058 |
|
| 1059 |
|
|
|
|
| 1095 |
if ticket is None:
|
| 1096 |
break
|
| 1097 |
|
| 1098 |
+
while getattr(obs, "investigation_budget_remaining", 0) > 0:
|
| 1099 |
+
investigate, tool_name = should_investigate(ticket, obs.history)
|
| 1100 |
+
if not investigate or tool_name is None:
|
| 1101 |
+
break
|
|
|
|
|
|
|
| 1102 |
tool_action = HelpdeskTicketAction(
|
| 1103 |
action_type="investigate",
|
| 1104 |
tool_name=tool_name,
|
|
|
|
| 1107 |
result = sync_client.step(tool_action)
|
| 1108 |
obs = result.observation
|
| 1109 |
step_num += 1
|
| 1110 |
+
reward = float(result.reward or 0.0)
|
| 1111 |
+
if result.reward is not None:
|
| 1112 |
+
task_step_rewards.append(reward)
|
| 1113 |
log_step(
|
| 1114 |
step=step_num,
|
| 1115 |
action=tool_action,
|
| 1116 |
+
reward=reward,
|
| 1117 |
done=bool(result.done),
|
| 1118 |
error=None,
|
| 1119 |
)
|
|
|
|
| 1122 |
ticket = obs.current_ticket
|
| 1123 |
if ticket is None:
|
| 1124 |
break
|
| 1125 |
+
if result.done:
|
| 1126 |
+
break
|
| 1127 |
+
ticket = obs.current_ticket
|
| 1128 |
+
if ticket is None:
|
| 1129 |
+
break
|
| 1130 |
|
| 1131 |
ticket_with_context = merge_ticket_context(ticket, obs)
|
| 1132 |
action, action_source, fallback_reason = build_action(
|
models.py
CHANGED
|
@@ -19,6 +19,7 @@ RESOLUTION_ACTION_SET = set(RESOLUTION_ACTIONS)
|
|
| 19 |
ACTION_TYPE_SET = {"submit", "investigate"}
|
| 20 |
TOOL_NAME_SET = {"lookup_related_ticket", "lookup_requester_history"}
|
| 21 |
TOOL_NAME_SET.add("lookup_internal_routing_note")
|
|
|
|
| 22 |
|
| 23 |
|
| 24 |
def _validate_choice(value: str, allowed: set[str], field_name: str) -> str:
|
|
@@ -47,6 +48,12 @@ class HelpdeskTicketRecord(BaseModel):
|
|
| 47 |
resolution_action: str
|
| 48 |
ambiguity_note: Optional[str] = None
|
| 49 |
related_ticket_id: Optional[str] = None
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 50 |
|
| 51 |
@field_validator("issue_type")
|
| 52 |
@classmethod
|
|
@@ -68,6 +75,44 @@ class HelpdeskTicketRecord(BaseModel):
|
|
| 68 |
def validate_resolution_action(cls, value: str) -> str:
|
| 69 |
return _validate_choice(value, RESOLUTION_ACTION_SET, "resolution_action")
|
| 70 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 71 |
|
| 72 |
class HelpdeskTicketAction(Action):
|
| 73 |
action_type: str = "submit"
|
|
@@ -146,7 +191,16 @@ class HelpdeskTicketState(State):
|
|
| 146 |
investigation_steps: int = 0
|
| 147 |
investigation_budget_remaining: int = 0
|
| 148 |
investigation_penalty_applied: float = 0.0
|
|
|
|
| 149 |
last_tool_result: Optional[dict[str, Any]] = None
|
| 150 |
last_reward_components: dict[str, Any] = Field(default_factory=dict)
|
| 151 |
ticket_tool_usage: dict[str, list[str]] = Field(default_factory=dict)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 152 |
history_entries: list[dict] = Field(default_factory=list)
|
|
|
|
| 19 |
ACTION_TYPE_SET = {"submit", "investigate"}
|
| 20 |
TOOL_NAME_SET = {"lookup_related_ticket", "lookup_requester_history"}
|
| 21 |
TOOL_NAME_SET.add("lookup_internal_routing_note")
|
| 22 |
+
TOOL_NAME_SET.add("lookup_queue_capacity_forecast")
|
| 23 |
|
| 24 |
|
| 25 |
def _validate_choice(value: str, allowed: set[str], field_name: str) -> str:
|
|
|
|
| 48 |
resolution_action: str
|
| 49 |
ambiguity_note: Optional[str] = None
|
| 50 |
related_ticket_id: Optional[str] = None
|
| 51 |
+
planning_note: Optional[str] = None
|
| 52 |
+
alternate_issue_type: Optional[str] = None
|
| 53 |
+
alternate_priority: Optional[str] = None
|
| 54 |
+
alternate_assignment_group: Optional[str] = None
|
| 55 |
+
alternate_resolution_action: Optional[str] = None
|
| 56 |
+
alternate_route_score_multiplier: float = 0.0
|
| 57 |
|
| 58 |
@field_validator("issue_type")
|
| 59 |
@classmethod
|
|
|
|
| 75 |
def validate_resolution_action(cls, value: str) -> str:
|
| 76 |
return _validate_choice(value, RESOLUTION_ACTION_SET, "resolution_action")
|
| 77 |
|
| 78 |
+
@field_validator("alternate_issue_type")
|
| 79 |
+
@classmethod
|
| 80 |
+
def validate_alternate_issue_type(cls, value: Optional[str]) -> Optional[str]:
|
| 81 |
+
return _validate_optional_choice(value, ISSUE_TYPE_SET, "alternate_issue_type")
|
| 82 |
+
|
| 83 |
+
@field_validator("alternate_priority")
|
| 84 |
+
@classmethod
|
| 85 |
+
def validate_alternate_priority(cls, value: Optional[str]) -> Optional[str]:
|
| 86 |
+
return _validate_optional_choice(value, PRIORITY_SET, "alternate_priority")
|
| 87 |
+
|
| 88 |
+
@field_validator("alternate_assignment_group")
|
| 89 |
+
@classmethod
|
| 90 |
+
def validate_alternate_assignment_group(cls, value: Optional[str]) -> Optional[str]:
|
| 91 |
+
return _validate_optional_choice(
|
| 92 |
+
value,
|
| 93 |
+
ASSIGNMENT_GROUP_SET,
|
| 94 |
+
"alternate_assignment_group",
|
| 95 |
+
)
|
| 96 |
+
|
| 97 |
+
@field_validator("alternate_resolution_action")
|
| 98 |
+
@classmethod
|
| 99 |
+
def validate_alternate_resolution_action(
|
| 100 |
+
cls,
|
| 101 |
+
value: Optional[str],
|
| 102 |
+
) -> Optional[str]:
|
| 103 |
+
return _validate_optional_choice(
|
| 104 |
+
value,
|
| 105 |
+
RESOLUTION_ACTION_SET,
|
| 106 |
+
"alternate_resolution_action",
|
| 107 |
+
)
|
| 108 |
+
|
| 109 |
+
@field_validator("alternate_route_score_multiplier")
|
| 110 |
+
@classmethod
|
| 111 |
+
def validate_alternate_route_score_multiplier(cls, value: float) -> float:
|
| 112 |
+
if not 0.0 <= value <= 1.0:
|
| 113 |
+
raise ValueError("alternate_route_score_multiplier must be in [0.0, 1.0]")
|
| 114 |
+
return value
|
| 115 |
+
|
| 116 |
|
| 117 |
class HelpdeskTicketAction(Action):
|
| 118 |
action_type: str = "submit"
|
|
|
|
| 191 |
investigation_steps: int = 0
|
| 192 |
investigation_budget_remaining: int = 0
|
| 193 |
investigation_penalty_applied: float = 0.0
|
| 194 |
+
planning_penalty_applied: float = 0.0
|
| 195 |
last_tool_result: Optional[dict[str, Any]] = None
|
| 196 |
last_reward_components: dict[str, Any] = Field(default_factory=dict)
|
| 197 |
ticket_tool_usage: dict[str, list[str]] = Field(default_factory=dict)
|
| 198 |
+
team_capacity_initial: dict[str, int] = Field(default_factory=dict)
|
| 199 |
+
team_capacity_remaining: dict[str, int] = Field(default_factory=dict)
|
| 200 |
+
high_priority_slots_initial: int = 0
|
| 201 |
+
high_priority_slots_remaining: int = 0
|
| 202 |
+
escalation_slots_initial: int = 0
|
| 203 |
+
escalation_slots_remaining: int = 0
|
| 204 |
+
planning_penalty_total: float = 0.0
|
| 205 |
+
capacity_pressure_tickets_resolved: int = 0
|
| 206 |
history_entries: list[dict] = Field(default_factory=list)
|
policy_learning.py
CHANGED
|
@@ -88,6 +88,7 @@ AVAILABLE_TOOLS = (
|
|
| 88 |
"lookup_related_ticket",
|
| 89 |
"lookup_requester_history",
|
| 90 |
"lookup_internal_routing_note",
|
|
|
|
| 91 |
)
|
| 92 |
|
| 93 |
|
|
@@ -229,6 +230,11 @@ def default_submit_builder(
|
|
| 229 |
inference = importlib.import_module("inference")
|
| 230 |
candidate = inference.heuristic_action(ticket, allowed_fields)
|
| 231 |
candidate, _ = inference.apply_domain_overrides(ticket, candidate, allowed_fields)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 232 |
return HelpdeskTicketAction(**candidate)
|
| 233 |
|
| 234 |
|
|
@@ -237,7 +243,11 @@ def _routing_text(ticket: dict[str, Any]) -> str:
|
|
| 237 |
str(ticket.get("title", "")),
|
| 238 |
str(ticket.get("description", "")),
|
| 239 |
str(ticket.get("ambiguity_note", "")),
|
|
|
|
| 240 |
json.dumps(ticket.get("last_tool_result") or {}, sort_keys=True),
|
|
|
|
|
|
|
|
|
|
| 241 |
]
|
| 242 |
related_preview = ticket.get("related_ticket_preview") or {}
|
| 243 |
parts.extend(
|
|
@@ -251,6 +261,24 @@ def _routing_text(ticket: dict[str, Any]) -> str:
|
|
| 251 |
|
| 252 |
def infer_ticket_cue(ticket: dict[str, Any]) -> str:
|
| 253 |
text = _routing_text(ticket)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 254 |
if any(
|
| 255 |
phrase in text
|
| 256 |
for phrase in ("re:", "follow-up", "following up", "regression", "reference ticket", "third update")
|
|
@@ -297,14 +325,20 @@ def preferred_tool_order(
|
|
| 297 |
hidden_context_remaining: bool,
|
| 298 |
) -> list[str]:
|
| 299 |
text = _routing_text(ticket)
|
|
|
|
| 300 |
last_tool_result = ticket.get("last_tool_result") or {}
|
| 301 |
last_tool_name = str(last_tool_result.get("tool_name", "") or "")
|
|
|
|
| 302 |
|
| 303 |
preferred_tools: list[str] = []
|
|
|
|
|
|
|
| 304 |
if last_tool_name == "lookup_related_ticket":
|
| 305 |
preferred_tools.append("lookup_requester_history")
|
| 306 |
if last_tool_name == "lookup_requester_history":
|
| 307 |
preferred_tools.append("lookup_internal_routing_note")
|
|
|
|
|
|
|
| 308 |
|
| 309 |
if any(
|
| 310 |
phrase in text
|
|
@@ -336,9 +370,15 @@ def preferred_tool_order(
|
|
| 336 |
):
|
| 337 |
preferred_tools.append("lookup_requester_history")
|
| 338 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 339 |
if hidden_context_remaining:
|
| 340 |
preferred_tools.extend(
|
| 341 |
[
|
|
|
|
| 342 |
"lookup_internal_routing_note",
|
| 343 |
"lookup_related_ticket",
|
| 344 |
"lookup_requester_history",
|
|
@@ -545,6 +585,8 @@ def rollout_episode(
|
|
| 545 |
"terminal_reward": terminal_reward,
|
| 546 |
"terminal_rubric_reward": terminal_rubric_reward,
|
| 547 |
"average_ticket_score": env.state.average_score_so_far,
|
|
|
|
|
|
|
| 548 |
"per_ticket_scores": list(env.state.per_ticket_scores),
|
| 549 |
}
|
| 550 |
if adaptive_bandit is not None and policy.strategy == "adaptive":
|
|
@@ -583,6 +625,15 @@ def summarize_policy_episodes(
|
|
| 583 |
"avg_terminal_rubric_reward": _safe_mean(
|
| 584 |
[float(episode["terminal_rubric_reward"]) for episode in task_episodes]
|
| 585 |
),
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 586 |
"avg_investigation_steps": _safe_mean(
|
| 587 |
[float(episode["investigation_steps"]) for episode in task_episodes]
|
| 588 |
),
|
|
@@ -604,6 +655,15 @@ def summarize_policy_episodes(
|
|
| 604 |
"avg_terminal_rubric_reward": _safe_mean(
|
| 605 |
[float(episode["terminal_rubric_reward"]) for episode in episode_summaries]
|
| 606 |
),
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 607 |
"avg_investigation_steps": _safe_mean(
|
| 608 |
[float(episode["investigation_steps"]) for episode in episode_summaries]
|
| 609 |
),
|
|
@@ -653,11 +713,12 @@ def evaluate_policy(
|
|
| 653 |
return result
|
| 654 |
|
| 655 |
|
| 656 |
-
def _selection_tuple(summary: dict[str, Any]) -> tuple[float, float, float, float]:
|
| 657 |
return (
|
| 658 |
-
float(summary["avg_normalized_return"]),
|
| 659 |
-
float(summary["avg_terminal_reward"]),
|
| 660 |
float(summary["avg_terminal_rubric_reward"]),
|
|
|
|
|
|
|
|
|
|
| 661 |
-float(summary["avg_investigation_steps"]),
|
| 662 |
)
|
| 663 |
|
|
@@ -713,7 +774,7 @@ def compare_policies(
|
|
| 713 |
"mode": "compare",
|
| 714 |
"task_ids": task_ids,
|
| 715 |
"seeds": seeds,
|
| 716 |
-
"selection_metric": "
|
| 717 |
"baseline_policy": baseline_run["policy"],
|
| 718 |
"best_policy": best_run["policy"],
|
| 719 |
"improvement_vs_baseline": {
|
|
@@ -731,6 +792,11 @@ def compare_policies(
|
|
| 731 |
baseline_run["summary"],
|
| 732 |
"avg_terminal_rubric_reward",
|
| 733 |
),
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 734 |
},
|
| 735 |
"policy_summaries": [run["summary"] for run in policy_runs],
|
| 736 |
"ranking": [
|
|
@@ -825,7 +891,7 @@ def search_policies(
|
|
| 825 |
"task_ids": task_ids,
|
| 826 |
"train_seeds": train_seeds,
|
| 827 |
"eval_seeds": eval_seeds,
|
| 828 |
-
"selection_metric": "
|
| 829 |
"candidate_policies": [policy.name for policy in candidate_policies],
|
| 830 |
"selected_policy": selected_policy.name,
|
| 831 |
"baseline_policy": baseline_policy.name,
|
|
@@ -856,6 +922,11 @@ def search_policies(
|
|
| 856 |
eval_baseline["summary"],
|
| 857 |
"avg_terminal_rubric_reward",
|
| 858 |
),
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 859 |
},
|
| 860 |
"artifacts": {
|
| 861 |
"summary": str(output_dir / "search_summary.json"),
|
|
@@ -975,6 +1046,7 @@ def _print_summary(label: str, summary: dict[str, Any]) -> None:
|
|
| 975 |
"avg_normalized_return": summary["avg_normalized_return"],
|
| 976 |
"avg_terminal_reward": summary["avg_terminal_reward"],
|
| 977 |
"avg_terminal_rubric_reward": summary["avg_terminal_rubric_reward"],
|
|
|
|
| 978 |
"avg_investigation_steps": summary["avg_investigation_steps"],
|
| 979 |
}
|
| 980 |
},
|
|
|
|
| 88 |
"lookup_related_ticket",
|
| 89 |
"lookup_requester_history",
|
| 90 |
"lookup_internal_routing_note",
|
| 91 |
+
"lookup_queue_capacity_forecast",
|
| 92 |
)
|
| 93 |
|
| 94 |
|
|
|
|
| 230 |
inference = importlib.import_module("inference")
|
| 231 |
candidate = inference.heuristic_action(ticket, allowed_fields)
|
| 232 |
candidate, _ = inference.apply_domain_overrides(ticket, candidate, allowed_fields)
|
| 233 |
+
candidate, _ = inference.apply_capacity_planning_overrides(
|
| 234 |
+
ticket,
|
| 235 |
+
candidate,
|
| 236 |
+
allowed_fields,
|
| 237 |
+
)
|
| 238 |
return HelpdeskTicketAction(**candidate)
|
| 239 |
|
| 240 |
|
|
|
|
| 243 |
str(ticket.get("title", "")),
|
| 244 |
str(ticket.get("description", "")),
|
| 245 |
str(ticket.get("ambiguity_note", "")),
|
| 246 |
+
str(ticket.get("planning_note", "")),
|
| 247 |
json.dumps(ticket.get("last_tool_result") or {}, sort_keys=True),
|
| 248 |
+
json.dumps(ticket.get("routing_options") or [], sort_keys=True),
|
| 249 |
+
json.dumps(ticket.get("capacity_state") or {}, sort_keys=True),
|
| 250 |
+
json.dumps(ticket.get("future_queue_demand") or {}, sort_keys=True),
|
| 251 |
]
|
| 252 |
related_preview = ticket.get("related_ticket_preview") or {}
|
| 253 |
parts.extend(
|
|
|
|
| 261 |
|
| 262 |
def infer_ticket_cue(ticket: dict[str, Any]) -> str:
|
| 263 |
text = _routing_text(ticket)
|
| 264 |
+
context_status = ticket.get("context_status") or {}
|
| 265 |
+
if (
|
| 266 |
+
ticket.get("planning_note")
|
| 267 |
+
or ticket.get("routing_options")
|
| 268 |
+
or "lookup_queue_capacity_forecast"
|
| 269 |
+
in (context_status.get("recommended_tools") or [])
|
| 270 |
+
or any(
|
| 271 |
+
phrase in text
|
| 272 |
+
for phrase in (
|
| 273 |
+
"capacity",
|
| 274 |
+
"saturated",
|
| 275 |
+
"backlog",
|
| 276 |
+
"resource pressure",
|
| 277 |
+
"alternate route",
|
| 278 |
+
)
|
| 279 |
+
)
|
| 280 |
+
):
|
| 281 |
+
return "capacity_planning"
|
| 282 |
if any(
|
| 283 |
phrase in text
|
| 284 |
for phrase in ("re:", "follow-up", "following up", "regression", "reference ticket", "third update")
|
|
|
|
| 325 |
hidden_context_remaining: bool,
|
| 326 |
) -> list[str]:
|
| 327 |
text = _routing_text(ticket)
|
| 328 |
+
context_status = ticket.get("context_status") or {}
|
| 329 |
last_tool_result = ticket.get("last_tool_result") or {}
|
| 330 |
last_tool_name = str(last_tool_result.get("tool_name", "") or "")
|
| 331 |
+
recommended_tools = list(context_status.get("recommended_tools") or [])
|
| 332 |
|
| 333 |
preferred_tools: list[str] = []
|
| 334 |
+
if "lookup_queue_capacity_forecast" in recommended_tools:
|
| 335 |
+
preferred_tools.append("lookup_queue_capacity_forecast")
|
| 336 |
if last_tool_name == "lookup_related_ticket":
|
| 337 |
preferred_tools.append("lookup_requester_history")
|
| 338 |
if last_tool_name == "lookup_requester_history":
|
| 339 |
preferred_tools.append("lookup_internal_routing_note")
|
| 340 |
+
if last_tool_name == "lookup_internal_routing_note":
|
| 341 |
+
preferred_tools.append("lookup_queue_capacity_forecast")
|
| 342 |
|
| 343 |
if any(
|
| 344 |
phrase in text
|
|
|
|
| 370 |
):
|
| 371 |
preferred_tools.append("lookup_requester_history")
|
| 372 |
|
| 373 |
+
if infer_ticket_cue(ticket) == "capacity_planning":
|
| 374 |
+
preferred_tools.append("lookup_queue_capacity_forecast")
|
| 375 |
+
|
| 376 |
+
preferred_tools.extend(recommended_tools)
|
| 377 |
+
|
| 378 |
if hidden_context_remaining:
|
| 379 |
preferred_tools.extend(
|
| 380 |
[
|
| 381 |
+
"lookup_queue_capacity_forecast",
|
| 382 |
"lookup_internal_routing_note",
|
| 383 |
"lookup_related_ticket",
|
| 384 |
"lookup_requester_history",
|
|
|
|
| 585 |
"terminal_reward": terminal_reward,
|
| 586 |
"terminal_rubric_reward": terminal_rubric_reward,
|
| 587 |
"average_ticket_score": env.state.average_score_so_far,
|
| 588 |
+
"planning_penalty_total": env.state.planning_penalty_total,
|
| 589 |
+
"capacity_pressure_tickets_resolved": env.state.capacity_pressure_tickets_resolved,
|
| 590 |
"per_ticket_scores": list(env.state.per_ticket_scores),
|
| 591 |
}
|
| 592 |
if adaptive_bandit is not None and policy.strategy == "adaptive":
|
|
|
|
| 625 |
"avg_terminal_rubric_reward": _safe_mean(
|
| 626 |
[float(episode["terminal_rubric_reward"]) for episode in task_episodes]
|
| 627 |
),
|
| 628 |
+
"avg_planning_penalty_total": _safe_mean(
|
| 629 |
+
[float(episode["planning_penalty_total"]) for episode in task_episodes]
|
| 630 |
+
),
|
| 631 |
+
"avg_capacity_pressure_tickets_resolved": _safe_mean(
|
| 632 |
+
[
|
| 633 |
+
float(episode["capacity_pressure_tickets_resolved"])
|
| 634 |
+
for episode in task_episodes
|
| 635 |
+
]
|
| 636 |
+
),
|
| 637 |
"avg_investigation_steps": _safe_mean(
|
| 638 |
[float(episode["investigation_steps"]) for episode in task_episodes]
|
| 639 |
),
|
|
|
|
| 655 |
"avg_terminal_rubric_reward": _safe_mean(
|
| 656 |
[float(episode["terminal_rubric_reward"]) for episode in episode_summaries]
|
| 657 |
),
|
| 658 |
+
"avg_planning_penalty_total": _safe_mean(
|
| 659 |
+
[float(episode["planning_penalty_total"]) for episode in episode_summaries]
|
| 660 |
+
),
|
| 661 |
+
"avg_capacity_pressure_tickets_resolved": _safe_mean(
|
| 662 |
+
[
|
| 663 |
+
float(episode["capacity_pressure_tickets_resolved"])
|
| 664 |
+
for episode in episode_summaries
|
| 665 |
+
]
|
| 666 |
+
),
|
| 667 |
"avg_investigation_steps": _safe_mean(
|
| 668 |
[float(episode["investigation_steps"]) for episode in episode_summaries]
|
| 669 |
),
|
|
|
|
| 713 |
return result
|
| 714 |
|
| 715 |
|
| 716 |
+
def _selection_tuple(summary: dict[str, Any]) -> tuple[float, float, float, float, float]:
|
| 717 |
return (
|
|
|
|
|
|
|
| 718 |
float(summary["avg_terminal_rubric_reward"]),
|
| 719 |
+
-float(summary["avg_planning_penalty_total"]),
|
| 720 |
+
float(summary["avg_episode_return"]),
|
| 721 |
+
float(summary["avg_normalized_return"]),
|
| 722 |
-float(summary["avg_investigation_steps"]),
|
| 723 |
)
|
| 724 |
|
|
|
|
| 774 |
"mode": "compare",
|
| 775 |
"task_ids": task_ids,
|
| 776 |
"seeds": seeds,
|
| 777 |
+
"selection_metric": "avg_terminal_rubric_reward_then_lower_planning_penalty",
|
| 778 |
"baseline_policy": baseline_run["policy"],
|
| 779 |
"best_policy": best_run["policy"],
|
| 780 |
"improvement_vs_baseline": {
|
|
|
|
| 792 |
baseline_run["summary"],
|
| 793 |
"avg_terminal_rubric_reward",
|
| 794 |
),
|
| 795 |
+
"avg_planning_penalty_total": _delta(
|
| 796 |
+
best_run["summary"],
|
| 797 |
+
baseline_run["summary"],
|
| 798 |
+
"avg_planning_penalty_total",
|
| 799 |
+
),
|
| 800 |
},
|
| 801 |
"policy_summaries": [run["summary"] for run in policy_runs],
|
| 802 |
"ranking": [
|
|
|
|
| 891 |
"task_ids": task_ids,
|
| 892 |
"train_seeds": train_seeds,
|
| 893 |
"eval_seeds": eval_seeds,
|
| 894 |
+
"selection_metric": "avg_terminal_rubric_reward_then_lower_planning_penalty",
|
| 895 |
"candidate_policies": [policy.name for policy in candidate_policies],
|
| 896 |
"selected_policy": selected_policy.name,
|
| 897 |
"baseline_policy": baseline_policy.name,
|
|
|
|
| 922 |
eval_baseline["summary"],
|
| 923 |
"avg_terminal_rubric_reward",
|
| 924 |
),
|
| 925 |
+
"avg_planning_penalty_total": _delta(
|
| 926 |
+
eval_selected["summary"],
|
| 927 |
+
eval_baseline["summary"],
|
| 928 |
+
"avg_planning_penalty_total",
|
| 929 |
+
),
|
| 930 |
},
|
| 931 |
"artifacts": {
|
| 932 |
"summary": str(output_dir / "search_summary.json"),
|
|
|
|
| 1046 |
"avg_normalized_return": summary["avg_normalized_return"],
|
| 1047 |
"avg_terminal_reward": summary["avg_terminal_reward"],
|
| 1048 |
"avg_terminal_rubric_reward": summary["avg_terminal_rubric_reward"],
|
| 1049 |
+
"avg_planning_penalty_total": summary["avg_planning_penalty_total"],
|
| 1050 |
"avg_investigation_steps": summary["avg_investigation_steps"],
|
| 1051 |
}
|
| 1052 |
},
|
server/environment.py
CHANGED
|
@@ -31,6 +31,7 @@ AVAILABLE_TOOLS = (
|
|
| 31 |
"lookup_related_ticket",
|
| 32 |
"lookup_requester_history",
|
| 33 |
"lookup_internal_routing_note",
|
|
|
|
| 34 |
)
|
| 35 |
FREE_INVESTIGATIONS_PER_TICKET = 1
|
| 36 |
EXTRA_INVESTIGATION_COST = 0.04
|
|
@@ -44,6 +45,10 @@ PRIORITY_UNDERSHOOT_PENALTY = 0.03
|
|
| 44 |
SEVERE_PRIORITY_UNDERSHOOT_PENALTY = 0.07
|
| 45 |
DANGEROUS_RESOLUTION_PENALTY = 0.05
|
| 46 |
NONDEFAULT_ROUTING_FOLLOWTHROUGH_BONUS = 0.02
|
|
|
|
|
|
|
|
|
|
|
|
|
| 47 |
|
| 48 |
TASK3_INVESTIGATION_TOOL_PLAN: dict[str, tuple[str, ...]] = {
|
| 49 |
"ticket-021": ("lookup_related_ticket", "lookup_requester_history"),
|
|
@@ -161,6 +166,11 @@ class HelpdeskTicketRoutingEnvironment(
|
|
| 161 |
else:
|
| 162 |
queue_size = min(queue_size_value, len(self._dataset))
|
| 163 |
self._queue = self._rng.sample(self._dataset, min(queue_size, len(self._dataset)))
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 164 |
|
| 165 |
self._state = HelpdeskTicketState(
|
| 166 |
episode_id=episode_id or str(uuid.uuid4()),
|
|
@@ -174,8 +184,17 @@ class HelpdeskTicketRoutingEnvironment(
|
|
| 174 |
average_score_so_far=0.0,
|
| 175 |
investigation_budget_remaining=queue_size * FREE_INVESTIGATIONS_PER_TICKET,
|
| 176 |
investigation_penalty_applied=0.0,
|
|
|
|
| 177 |
last_reward_components={},
|
| 178 |
ticket_tool_usage={},
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 179 |
)
|
| 180 |
|
| 181 |
return self._build_observation(task)
|
|
@@ -298,6 +317,10 @@ class HelpdeskTicketRoutingEnvironment(
|
|
| 298 |
action,
|
| 299 |
task_id=task_id,
|
| 300 |
)
|
|
|
|
|
|
|
|
|
|
|
|
|
| 301 |
step_adjustments = compute_step_adjustments(
|
| 302 |
score,
|
| 303 |
previous_average=previous_average,
|
|
@@ -321,11 +344,17 @@ class HelpdeskTicketRoutingEnvironment(
|
|
| 321 |
self._state.per_ticket_scores,
|
| 322 |
len(self._queue),
|
| 323 |
self._state.step_count,
|
| 324 |
-
completion_bonus=
|
|
|
|
|
|
|
| 325 |
)
|
| 326 |
trajectory_reward = trajectory_components["final_reward"]
|
| 327 |
-
rubric_reward = self._apply_episode_economics(
|
| 328 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 329 |
self._state.total_reward = rubric_reward
|
| 330 |
investigation_penalty = self._compute_episode_penalty()
|
| 331 |
else:
|
|
@@ -333,7 +362,9 @@ class HelpdeskTicketRoutingEnvironment(
|
|
| 333 |
self._state.average_score_so_far = self._current_average_score()
|
| 334 |
self._state.step_count += 1
|
| 335 |
self._state.current_ticket_index += 1
|
| 336 |
-
final_reward = clamp_open_unit_interval(
|
|
|
|
|
|
|
| 337 |
|
| 338 |
reward_components = self._build_reward_components(
|
| 339 |
ticket_score=score,
|
|
@@ -348,12 +379,18 @@ class HelpdeskTicketRoutingEnvironment(
|
|
| 348 |
"context_gap_penalty": context_penalty,
|
| 349 |
"context_completion_bonus": process_bonus,
|
| 350 |
"risk_penalty": risk_penalty,
|
|
|
|
| 351 |
"delta_adjustment": step_adjustments["delta_adjustment"],
|
| 352 |
"required_investigation_count": len(self._required_tools_for_ticket(current_ticket)),
|
| 353 |
"hidden_context_remaining_count": missing_required_count,
|
| 354 |
"hidden_context_revealed_count": len(
|
| 355 |
self._used_tools_for_ticket(current_ticket.ticket_id)
|
| 356 |
),
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 357 |
"rubric_reward": rubric_reward,
|
| 358 |
"trajectory_average_reward": (
|
| 359 |
trajectory_components["average_reward"]
|
|
@@ -372,6 +409,7 @@ class HelpdeskTicketRoutingEnvironment(
|
|
| 372 |
),
|
| 373 |
},
|
| 374 |
)
|
|
|
|
| 375 |
|
| 376 |
history_entry = self._build_history_entry(
|
| 377 |
current_ticket,
|
|
@@ -390,6 +428,7 @@ class HelpdeskTicketRoutingEnvironment(
|
|
| 390 |
self._state.reward = final_reward
|
| 391 |
self._state.done = is_done
|
| 392 |
self._state.investigation_penalty_applied = self._compute_episode_penalty()
|
|
|
|
| 393 |
self._state.last_tool_result = None
|
| 394 |
self._state.last_reward_components = reward_components
|
| 395 |
|
|
@@ -425,14 +464,373 @@ class HelpdeskTicketRoutingEnvironment(
|
|
| 425 |
return 0.0
|
| 426 |
return sum(self._state.per_ticket_scores) / len(self._state.per_ticket_scores)
|
| 427 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 428 |
def _internal_routing_note_for_ticket(
|
| 429 |
self,
|
| 430 |
ticket: HelpdeskTicketRecord,
|
| 431 |
) -> str | None:
|
| 432 |
-
if ticket.ambiguity_note is not None:
|
| 433 |
-
return ticket.ambiguity_note
|
| 434 |
if self._state.current_task_id != 3:
|
| 435 |
-
return
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 436 |
|
| 437 |
default_group = ISSUE_TYPE_TO_ASSIGNMENT_GROUP.get(
|
| 438 |
ticket.issue_type,
|
|
@@ -442,7 +840,6 @@ class HelpdeskTicketRoutingEnvironment(
|
|
| 442 |
ticket.issue_type,
|
| 443 |
ticket.resolution_action,
|
| 444 |
)
|
| 445 |
-
note_parts: list[str] = []
|
| 446 |
|
| 447 |
if ticket.assignment_group != default_group:
|
| 448 |
note_parts.append(
|
|
@@ -517,6 +914,11 @@ class HelpdeskTicketRoutingEnvironment(
|
|
| 517 |
return self._ticket_repeated_requester_count(ticket) >= 2
|
| 518 |
if tool_name == "lookup_internal_routing_note":
|
| 519 |
return self._internal_routing_note_for_ticket(ticket) is not None
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 520 |
return False
|
| 521 |
|
| 522 |
def _required_tools_for_ticket(
|
|
@@ -546,6 +948,11 @@ class HelpdeskTicketRoutingEnvironment(
|
|
| 546 |
and "lookup_requester_history" not in required_tools
|
| 547 |
):
|
| 548 |
required_tools.append("lookup_requester_history")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 549 |
filtered_required_tools: list[str] = []
|
| 550 |
for tool_name in required_tools:
|
| 551 |
if tool_name in filtered_required_tools:
|
|
@@ -596,6 +1003,11 @@ class HelpdeskTicketRoutingEnvironment(
|
|
| 596 |
"The visible request is not enough to choose the final owner and next step. "
|
| 597 |
"Additional routing context is available via investigation."
|
| 598 |
)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 599 |
if self._ticket_has_nondefault_routing(ticket):
|
| 600 |
return (
|
| 601 |
"The visible request looks straightforward, but the decisive routing detail is hidden until investigation."
|
|
@@ -609,6 +1021,8 @@ class HelpdeskTicketRoutingEnvironment(
|
|
| 609 |
return "Follow-up request with hidden routing context"
|
| 610 |
if self._internal_routing_note_for_ticket(ticket) is not None:
|
| 611 |
return "Routing clarification required"
|
|
|
|
|
|
|
| 612 |
if self._ticket_mentions_follow_up(ticket):
|
| 613 |
return "Priority support follow-up"
|
| 614 |
return "Helpdesk routing decision"
|
|
@@ -805,6 +1219,24 @@ class HelpdeskTicketRoutingEnvironment(
|
|
| 805 |
"routing_note": routing_note if found else "",
|
| 806 |
}
|
| 807 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 808 |
def _run_investigation_tool(
|
| 809 |
self,
|
| 810 |
current_ticket: HelpdeskTicketRecord,
|
|
@@ -817,6 +1249,8 @@ class HelpdeskTicketRoutingEnvironment(
|
|
| 817 |
return self._lookup_requester_history(current_ticket)
|
| 818 |
if tool_name == "lookup_internal_routing_note":
|
| 819 |
return self._lookup_internal_routing_note(current_ticket)
|
|
|
|
|
|
|
| 820 |
raise ValueError(f"Unsupported tool_name: {tool_name}")
|
| 821 |
|
| 822 |
def _handle_investigation_action(
|
|
@@ -901,12 +1335,15 @@ class HelpdeskTicketRoutingEnvironment(
|
|
| 901 |
def _build_ticket_view(self, ticket: HelpdeskTicketRecord) -> dict[str, Any]:
|
| 902 |
progress = self._tool_progress_for_ticket(ticket)
|
| 903 |
remaining_tools = progress["remaining_tools"]
|
|
|
|
| 904 |
ticket_view: dict[str, Any] = {
|
| 905 |
"ticket_id": ticket.ticket_id,
|
| 906 |
"title": self._visible_title(ticket),
|
| 907 |
"requester": ticket.requester,
|
| 908 |
"description": self._visible_description(ticket),
|
| 909 |
}
|
|
|
|
|
|
|
| 910 |
if progress["required_tools"]:
|
| 911 |
ticket_view["context_status"] = {
|
| 912 |
"investigation_required": True,
|
|
@@ -919,6 +1356,11 @@ class HelpdeskTicketRoutingEnvironment(
|
|
| 919 |
}
|
| 920 |
if ticket.ambiguity_note is not None and "lookup_internal_routing_note" not in remaining_tools:
|
| 921 |
ticket_view["ambiguity_note"] = ticket.ambiguity_note
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 922 |
if ticket.related_ticket_id is not None and "lookup_related_ticket" not in remaining_tools:
|
| 923 |
ticket_view["related_ticket_id"] = ticket.related_ticket_id
|
| 924 |
related_ticket = self._tickets_by_id.get(ticket.related_ticket_id)
|
|
@@ -929,6 +1371,11 @@ class HelpdeskTicketRoutingEnvironment(
|
|
| 929 |
"requester": related_ticket.requester,
|
| 930 |
"description": related_ticket.description,
|
| 931 |
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 932 |
return ticket_view
|
| 933 |
|
| 934 |
def _build_feedback_summary(
|
|
@@ -982,6 +1429,12 @@ class HelpdeskTicketRoutingEnvironment(
|
|
| 982 |
risk_penalty = reward_components.get("risk_penalty")
|
| 983 |
if risk_penalty:
|
| 984 |
parts.append(f"risk_penalty={risk_penalty:.2f}")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 985 |
|
| 986 |
return "; ".join(parts)
|
| 987 |
|
|
@@ -1011,6 +1464,8 @@ class HelpdeskTicketRoutingEnvironment(
|
|
| 1011 |
"breakdown": breakdown,
|
| 1012 |
"queue_position": queue_position,
|
| 1013 |
}
|
|
|
|
|
|
|
| 1014 |
if reward is not None:
|
| 1015 |
history_entry["reward"] = reward
|
| 1016 |
if rubric_reward is not None:
|
|
@@ -1019,6 +1474,11 @@ class HelpdeskTicketRoutingEnvironment(
|
|
| 1019 |
history_entry["reward_kind"] = reward_kind
|
| 1020 |
if ticket.ambiguity_note is not None and "lookup_internal_routing_note" not in remaining_tools:
|
| 1021 |
history_entry["ambiguity_note"] = ticket.ambiguity_note
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1022 |
if ticket.related_ticket_id is not None and "lookup_related_ticket" not in remaining_tools:
|
| 1023 |
history_entry["related_ticket_id"] = ticket.related_ticket_id
|
| 1024 |
related_ticket = self._tickets_by_id.get(ticket.related_ticket_id)
|
|
@@ -1029,6 +1489,14 @@ class HelpdeskTicketRoutingEnvironment(
|
|
| 1029 |
"requester": related_ticket.requester,
|
| 1030 |
"description": related_ticket.description,
|
| 1031 |
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1032 |
if penalty_reason is not None:
|
| 1033 |
history_entry["penalty_reason"] = penalty_reason
|
| 1034 |
if tool_result is not None:
|
|
@@ -1098,7 +1566,12 @@ class HelpdeskTicketRoutingEnvironment(
|
|
| 1098 |
"average_score_so_far": self._state.average_score_so_far,
|
| 1099 |
"progress_fraction": progress_fraction,
|
| 1100 |
"investigation_penalty_applied": self._state.investigation_penalty_applied,
|
|
|
|
|
|
|
| 1101 |
}
|
|
|
|
|
|
|
|
|
|
| 1102 |
if last_history_entry is not None:
|
| 1103 |
metadata["last_score"] = last_history_entry.get("score")
|
| 1104 |
metadata["last_reward"] = last_history_entry.get("reward")
|
|
|
|
| 31 |
"lookup_related_ticket",
|
| 32 |
"lookup_requester_history",
|
| 33 |
"lookup_internal_routing_note",
|
| 34 |
+
"lookup_queue_capacity_forecast",
|
| 35 |
)
|
| 36 |
FREE_INVESTIGATIONS_PER_TICKET = 1
|
| 37 |
EXTRA_INVESTIGATION_COST = 0.04
|
|
|
|
| 45 |
SEVERE_PRIORITY_UNDERSHOOT_PENALTY = 0.07
|
| 46 |
DANGEROUS_RESOLUTION_PENALTY = 0.05
|
| 47 |
NONDEFAULT_ROUTING_FOLLOWTHROUGH_BONUS = 0.02
|
| 48 |
+
TEAM_CAPACITY_OVERFLOW_PENALTY = 0.08
|
| 49 |
+
HIGH_PRIORITY_SLOT_OVERFLOW_PENALTY = 0.06
|
| 50 |
+
ESCALATION_SLOT_OVERFLOW_PENALTY = 0.05
|
| 51 |
+
PLANNING_SUCCESS_BONUS = 0.05
|
| 52 |
|
| 53 |
TASK3_INVESTIGATION_TOOL_PLAN: dict[str, tuple[str, ...]] = {
|
| 54 |
"ticket-021": ("lookup_related_ticket", "lookup_requester_history"),
|
|
|
|
| 166 |
else:
|
| 167 |
queue_size = min(queue_size_value, len(self._dataset))
|
| 168 |
self._queue = self._rng.sample(self._dataset, min(queue_size, len(self._dataset)))
|
| 169 |
+
(
|
| 170 |
+
team_capacity_initial,
|
| 171 |
+
high_priority_slots_initial,
|
| 172 |
+
escalation_slots_initial,
|
| 173 |
+
) = self._initial_capacity_state_for_queue(task_id)
|
| 174 |
|
| 175 |
self._state = HelpdeskTicketState(
|
| 176 |
episode_id=episode_id or str(uuid.uuid4()),
|
|
|
|
| 184 |
average_score_so_far=0.0,
|
| 185 |
investigation_budget_remaining=queue_size * FREE_INVESTIGATIONS_PER_TICKET,
|
| 186 |
investigation_penalty_applied=0.0,
|
| 187 |
+
planning_penalty_applied=0.0,
|
| 188 |
last_reward_components={},
|
| 189 |
ticket_tool_usage={},
|
| 190 |
+
team_capacity_initial=team_capacity_initial,
|
| 191 |
+
team_capacity_remaining=dict(team_capacity_initial),
|
| 192 |
+
high_priority_slots_initial=high_priority_slots_initial,
|
| 193 |
+
high_priority_slots_remaining=high_priority_slots_initial,
|
| 194 |
+
escalation_slots_initial=escalation_slots_initial,
|
| 195 |
+
escalation_slots_remaining=escalation_slots_initial,
|
| 196 |
+
planning_penalty_total=0.0,
|
| 197 |
+
capacity_pressure_tickets_resolved=0,
|
| 198 |
)
|
| 199 |
|
| 200 |
return self._build_observation(task)
|
|
|
|
| 317 |
action,
|
| 318 |
task_id=task_id,
|
| 319 |
)
|
| 320 |
+
capacity_penalty, capacity_details = self._apply_capacity_usage(
|
| 321 |
+
current_ticket,
|
| 322 |
+
action,
|
| 323 |
+
)
|
| 324 |
step_adjustments = compute_step_adjustments(
|
| 325 |
score,
|
| 326 |
previous_average=previous_average,
|
|
|
|
| 344 |
self._state.per_ticket_scores,
|
| 345 |
len(self._queue),
|
| 346 |
self._state.step_count,
|
| 347 |
+
completion_bonus=(
|
| 348 |
+
self._trajectory_consistency_bonus() + self._planning_success_bonus()
|
| 349 |
+
),
|
| 350 |
)
|
| 351 |
trajectory_reward = trajectory_components["final_reward"]
|
| 352 |
+
rubric_reward = self._apply_episode_economics(
|
| 353 |
+
trajectory_reward - self._state.planning_penalty_total
|
| 354 |
+
)
|
| 355 |
+
final_reward = clamp_open_unit_interval(
|
| 356 |
+
rubric_reward - context_penalty - capacity_penalty
|
| 357 |
+
)
|
| 358 |
self._state.total_reward = rubric_reward
|
| 359 |
investigation_penalty = self._compute_episode_penalty()
|
| 360 |
else:
|
|
|
|
| 362 |
self._state.average_score_so_far = self._current_average_score()
|
| 363 |
self._state.step_count += 1
|
| 364 |
self._state.current_ticket_index += 1
|
| 365 |
+
final_reward = clamp_open_unit_interval(
|
| 366 |
+
step_reward - context_penalty - capacity_penalty
|
| 367 |
+
)
|
| 368 |
|
| 369 |
reward_components = self._build_reward_components(
|
| 370 |
ticket_score=score,
|
|
|
|
| 379 |
"context_gap_penalty": context_penalty,
|
| 380 |
"context_completion_bonus": process_bonus,
|
| 381 |
"risk_penalty": risk_penalty,
|
| 382 |
+
"capacity_penalty": capacity_penalty,
|
| 383 |
"delta_adjustment": step_adjustments["delta_adjustment"],
|
| 384 |
"required_investigation_count": len(self._required_tools_for_ticket(current_ticket)),
|
| 385 |
"hidden_context_remaining_count": missing_required_count,
|
| 386 |
"hidden_context_revealed_count": len(
|
| 387 |
self._used_tools_for_ticket(current_ticket.ticket_id)
|
| 388 |
),
|
| 389 |
+
"planning_penalty_total": self._state.planning_penalty_total,
|
| 390 |
+
"planning_penalty_applied": self._state.planning_penalty_applied,
|
| 391 |
+
"planning_success_bonus": self._planning_success_bonus()
|
| 392 |
+
if is_done
|
| 393 |
+
else 0.0,
|
| 394 |
"rubric_reward": rubric_reward,
|
| 395 |
"trajectory_average_reward": (
|
| 396 |
trajectory_components["average_reward"]
|
|
|
|
| 409 |
),
|
| 410 |
},
|
| 411 |
)
|
| 412 |
+
reward_components.update(capacity_details)
|
| 413 |
|
| 414 |
history_entry = self._build_history_entry(
|
| 415 |
current_ticket,
|
|
|
|
| 428 |
self._state.reward = final_reward
|
| 429 |
self._state.done = is_done
|
| 430 |
self._state.investigation_penalty_applied = self._compute_episode_penalty()
|
| 431 |
+
self._state.planning_penalty_applied = capacity_penalty
|
| 432 |
self._state.last_tool_result = None
|
| 433 |
self._state.last_reward_components = reward_components
|
| 434 |
|
|
|
|
| 464 |
return 0.0
|
| 465 |
return sum(self._state.per_ticket_scores) / len(self._state.per_ticket_scores)
|
| 466 |
|
| 467 |
+
def _ticket_has_alternate_route(self, ticket: HelpdeskTicketRecord) -> bool:
|
| 468 |
+
return any(
|
| 469 |
+
value is not None
|
| 470 |
+
for value in (
|
| 471 |
+
ticket.alternate_issue_type,
|
| 472 |
+
ticket.alternate_priority,
|
| 473 |
+
ticket.alternate_assignment_group,
|
| 474 |
+
ticket.alternate_resolution_action,
|
| 475 |
+
)
|
| 476 |
+
) and ticket.alternate_route_score_multiplier > 0.0
|
| 477 |
+
|
| 478 |
+
def _route_for_ticket(
|
| 479 |
+
self,
|
| 480 |
+
ticket: HelpdeskTicketRecord,
|
| 481 |
+
*,
|
| 482 |
+
use_alternate: bool = False,
|
| 483 |
+
) -> dict[str, str]:
|
| 484 |
+
if use_alternate and self._ticket_has_alternate_route(ticket):
|
| 485 |
+
return {
|
| 486 |
+
"issue_type": ticket.alternate_issue_type or ticket.issue_type,
|
| 487 |
+
"priority": ticket.alternate_priority or ticket.priority,
|
| 488 |
+
"assignment_group": (
|
| 489 |
+
ticket.alternate_assignment_group or ticket.assignment_group
|
| 490 |
+
),
|
| 491 |
+
"resolution_action": (
|
| 492 |
+
ticket.alternate_resolution_action or ticket.resolution_action
|
| 493 |
+
),
|
| 494 |
+
}
|
| 495 |
+
return {
|
| 496 |
+
"issue_type": ticket.issue_type,
|
| 497 |
+
"priority": ticket.priority,
|
| 498 |
+
"assignment_group": ticket.assignment_group,
|
| 499 |
+
"resolution_action": ticket.resolution_action,
|
| 500 |
+
}
|
| 501 |
+
|
| 502 |
+
def _route_for_action(
|
| 503 |
+
self,
|
| 504 |
+
ticket: HelpdeskTicketRecord,
|
| 505 |
+
action: HelpdeskTicketAction,
|
| 506 |
+
) -> dict[str, str]:
|
| 507 |
+
primary_route = self._route_for_ticket(ticket)
|
| 508 |
+
return {
|
| 509 |
+
"issue_type": action.issue_type or primary_route["issue_type"],
|
| 510 |
+
"priority": action.priority or primary_route["priority"],
|
| 511 |
+
"assignment_group": (
|
| 512 |
+
action.assignment_group or primary_route["assignment_group"]
|
| 513 |
+
),
|
| 514 |
+
"resolution_action": (
|
| 515 |
+
action.resolution_action or primary_route["resolution_action"]
|
| 516 |
+
),
|
| 517 |
+
}
|
| 518 |
+
|
| 519 |
+
def _route_capacity_cost(self, route: dict[str, str]) -> dict[str, Any]:
|
| 520 |
+
return {
|
| 521 |
+
"assignment_group": route["assignment_group"],
|
| 522 |
+
"team_slots": 1,
|
| 523 |
+
"high_priority_slots": 1
|
| 524 |
+
if route["priority"] in {"high", "critical"}
|
| 525 |
+
else 0,
|
| 526 |
+
"escalation_slots": 1
|
| 527 |
+
if route["resolution_action"] in {"assign", "escalate"}
|
| 528 |
+
else 0,
|
| 529 |
+
}
|
| 530 |
+
|
| 531 |
+
def _routing_options_for_ticket(self, ticket: HelpdeskTicketRecord) -> list[dict[str, Any]]:
|
| 532 |
+
options = [
|
| 533 |
+
{
|
| 534 |
+
"label": "primary",
|
| 535 |
+
"score_multiplier": 1.0,
|
| 536 |
+
**self._route_for_ticket(ticket),
|
| 537 |
+
"capacity_cost": self._route_capacity_cost(self._route_for_ticket(ticket)),
|
| 538 |
+
}
|
| 539 |
+
]
|
| 540 |
+
if self._ticket_has_alternate_route(ticket):
|
| 541 |
+
alternate_route = self._route_for_ticket(ticket, use_alternate=True)
|
| 542 |
+
options.append(
|
| 543 |
+
{
|
| 544 |
+
"label": "alternate",
|
| 545 |
+
"score_multiplier": ticket.alternate_route_score_multiplier,
|
| 546 |
+
**alternate_route,
|
| 547 |
+
"capacity_cost": self._route_capacity_cost(alternate_route),
|
| 548 |
+
}
|
| 549 |
+
)
|
| 550 |
+
return options
|
| 551 |
+
|
| 552 |
+
def _initial_capacity_state_for_queue(
|
| 553 |
+
self,
|
| 554 |
+
task_id: int,
|
| 555 |
+
) -> tuple[dict[str, int], int, int]:
|
| 556 |
+
if task_id != 3:
|
| 557 |
+
return {}, 0, 0
|
| 558 |
+
|
| 559 |
+
primary_group_demand: dict[str, int] = {}
|
| 560 |
+
alternate_relief_by_group: dict[str, int] = {}
|
| 561 |
+
all_groups: set[str] = set()
|
| 562 |
+
high_priority_demand = 0
|
| 563 |
+
high_priority_relief = 0
|
| 564 |
+
escalation_demand = 0
|
| 565 |
+
escalation_relief = 0
|
| 566 |
+
|
| 567 |
+
for ticket in self._queue:
|
| 568 |
+
primary_route = self._route_for_ticket(ticket)
|
| 569 |
+
all_groups.add(primary_route["assignment_group"])
|
| 570 |
+
primary_group_demand[primary_route["assignment_group"]] = (
|
| 571 |
+
primary_group_demand.get(primary_route["assignment_group"], 0) + 1
|
| 572 |
+
)
|
| 573 |
+
if primary_route["priority"] in {"high", "critical"}:
|
| 574 |
+
high_priority_demand += 1
|
| 575 |
+
if primary_route["resolution_action"] in {"assign", "escalate"}:
|
| 576 |
+
escalation_demand += 1
|
| 577 |
+
|
| 578 |
+
if self._ticket_has_alternate_route(ticket):
|
| 579 |
+
alternate_route = self._route_for_ticket(ticket, use_alternate=True)
|
| 580 |
+
all_groups.add(alternate_route["assignment_group"])
|
| 581 |
+
if alternate_route["assignment_group"] != primary_route["assignment_group"]:
|
| 582 |
+
alternate_relief_by_group[primary_route["assignment_group"]] = (
|
| 583 |
+
alternate_relief_by_group.get(
|
| 584 |
+
primary_route["assignment_group"],
|
| 585 |
+
0,
|
| 586 |
+
)
|
| 587 |
+
+ 1
|
| 588 |
+
)
|
| 589 |
+
if (
|
| 590 |
+
primary_route["priority"] in {"high", "critical"}
|
| 591 |
+
and alternate_route["priority"] not in {"high", "critical"}
|
| 592 |
+
):
|
| 593 |
+
high_priority_relief += 1
|
| 594 |
+
if (
|
| 595 |
+
primary_route["resolution_action"] in {"assign", "escalate"}
|
| 596 |
+
and alternate_route["resolution_action"] not in {"assign", "escalate"}
|
| 597 |
+
):
|
| 598 |
+
escalation_relief += 1
|
| 599 |
+
|
| 600 |
+
team_capacity_initial: dict[str, int] = {}
|
| 601 |
+
for group in sorted(all_groups):
|
| 602 |
+
demand = primary_group_demand.get(group, 0)
|
| 603 |
+
relief = alternate_relief_by_group.get(group, 0)
|
| 604 |
+
if demand <= 1:
|
| 605 |
+
team_capacity_initial[group] = 1 if group in all_groups else 0
|
| 606 |
+
elif relief > 0:
|
| 607 |
+
team_capacity_initial[group] = max(1, demand - 1)
|
| 608 |
+
else:
|
| 609 |
+
team_capacity_initial[group] = demand
|
| 610 |
+
|
| 611 |
+
if high_priority_demand <= 1:
|
| 612 |
+
high_priority_slots_initial = high_priority_demand
|
| 613 |
+
elif high_priority_relief > 0:
|
| 614 |
+
high_priority_slots_initial = max(1, high_priority_demand - 1)
|
| 615 |
+
else:
|
| 616 |
+
high_priority_slots_initial = high_priority_demand
|
| 617 |
+
|
| 618 |
+
if escalation_demand <= 1:
|
| 619 |
+
escalation_slots_initial = escalation_demand
|
| 620 |
+
elif escalation_relief > 0:
|
| 621 |
+
escalation_slots_initial = max(1, escalation_demand - 1)
|
| 622 |
+
else:
|
| 623 |
+
escalation_slots_initial = escalation_demand
|
| 624 |
+
|
| 625 |
+
return (
|
| 626 |
+
team_capacity_initial,
|
| 627 |
+
high_priority_slots_initial,
|
| 628 |
+
escalation_slots_initial,
|
| 629 |
+
)
|
| 630 |
+
|
| 631 |
+
def _future_queue_demand(self) -> dict[str, Any]:
|
| 632 |
+
future_tickets = self._queue[self._state.current_ticket_index + 1 :]
|
| 633 |
+
team_demand: dict[str, int] = {}
|
| 634 |
+
high_priority_needed = 0
|
| 635 |
+
escalation_needed = 0
|
| 636 |
+
capacity_sensitive_tickets = 0
|
| 637 |
+
|
| 638 |
+
for ticket in future_tickets:
|
| 639 |
+
route = self._route_for_ticket(ticket)
|
| 640 |
+
team_demand[route["assignment_group"]] = (
|
| 641 |
+
team_demand.get(route["assignment_group"], 0) + 1
|
| 642 |
+
)
|
| 643 |
+
if route["priority"] in {"high", "critical"}:
|
| 644 |
+
high_priority_needed += 1
|
| 645 |
+
if route["resolution_action"] in {"assign", "escalate"}:
|
| 646 |
+
escalation_needed += 1
|
| 647 |
+
if self._ticket_has_alternate_route(ticket):
|
| 648 |
+
capacity_sensitive_tickets += 1
|
| 649 |
+
|
| 650 |
+
return {
|
| 651 |
+
"remaining_ticket_count": len(future_tickets),
|
| 652 |
+
"team_demand": team_demand,
|
| 653 |
+
"high_priority_needed": high_priority_needed,
|
| 654 |
+
"escalation_needed": escalation_needed,
|
| 655 |
+
"capacity_sensitive_tickets": capacity_sensitive_tickets,
|
| 656 |
+
}
|
| 657 |
+
|
| 658 |
+
def _capacity_state_snapshot(self) -> dict[str, Any]:
|
| 659 |
+
return {
|
| 660 |
+
"team_capacity_remaining": dict(self._state.team_capacity_remaining),
|
| 661 |
+
"team_capacity_initial": dict(self._state.team_capacity_initial),
|
| 662 |
+
"high_priority_slots_remaining": self._state.high_priority_slots_remaining,
|
| 663 |
+
"high_priority_slots_initial": self._state.high_priority_slots_initial,
|
| 664 |
+
"escalation_slots_remaining": self._state.escalation_slots_remaining,
|
| 665 |
+
"escalation_slots_initial": self._state.escalation_slots_initial,
|
| 666 |
+
}
|
| 667 |
+
|
| 668 |
+
def _planning_route_recommendation(self, ticket: HelpdeskTicketRecord) -> dict[str, Any]:
|
| 669 |
+
primary_route = self._route_for_ticket(ticket)
|
| 670 |
+
alternate_route = (
|
| 671 |
+
self._route_for_ticket(ticket, use_alternate=True)
|
| 672 |
+
if self._ticket_has_alternate_route(ticket)
|
| 673 |
+
else None
|
| 674 |
+
)
|
| 675 |
+
future_demand = self._future_queue_demand()
|
| 676 |
+
capacity_state = self._capacity_state_snapshot()
|
| 677 |
+
|
| 678 |
+
def pressure_score(route: dict[str, str]) -> int:
|
| 679 |
+
cost = self._route_capacity_cost(route)
|
| 680 |
+
group_remaining = capacity_state["team_capacity_remaining"].get(
|
| 681 |
+
route["assignment_group"],
|
| 682 |
+
1,
|
| 683 |
+
)
|
| 684 |
+
group_pressure = max(
|
| 685 |
+
0,
|
| 686 |
+
future_demand["team_demand"].get(route["assignment_group"], 0)
|
| 687 |
+
+ cost["team_slots"]
|
| 688 |
+
- group_remaining,
|
| 689 |
+
)
|
| 690 |
+
priority_pressure = max(
|
| 691 |
+
0,
|
| 692 |
+
future_demand["high_priority_needed"] + cost["high_priority_slots"]
|
| 693 |
+
- capacity_state["high_priority_slots_remaining"],
|
| 694 |
+
)
|
| 695 |
+
escalation_pressure = max(
|
| 696 |
+
0,
|
| 697 |
+
future_demand["escalation_needed"] + cost["escalation_slots"]
|
| 698 |
+
- capacity_state["escalation_slots_remaining"],
|
| 699 |
+
)
|
| 700 |
+
return group_pressure + priority_pressure + escalation_pressure
|
| 701 |
+
|
| 702 |
+
primary_pressure = pressure_score(primary_route)
|
| 703 |
+
alternate_pressure = (
|
| 704 |
+
pressure_score(alternate_route) if alternate_route is not None else primary_pressure
|
| 705 |
+
)
|
| 706 |
+
preferred_label = (
|
| 707 |
+
"alternate"
|
| 708 |
+
if alternate_route is not None and alternate_pressure < primary_pressure
|
| 709 |
+
else "primary"
|
| 710 |
+
)
|
| 711 |
+
return {
|
| 712 |
+
"preferred_label": preferred_label,
|
| 713 |
+
"primary_pressure": primary_pressure,
|
| 714 |
+
"alternate_pressure": alternate_pressure,
|
| 715 |
+
"capacity_state": capacity_state,
|
| 716 |
+
"future_demand": future_demand,
|
| 717 |
+
}
|
| 718 |
+
|
| 719 |
+
def _ticket_is_capacity_sensitive(self, ticket: HelpdeskTicketRecord) -> bool:
|
| 720 |
+
if self._state.current_task_id != 3 or not self._ticket_has_alternate_route(ticket):
|
| 721 |
+
return False
|
| 722 |
+
recommendation = self._planning_route_recommendation(ticket)
|
| 723 |
+
return recommendation["preferred_label"] == "alternate" or any(
|
| 724 |
+
value > 0
|
| 725 |
+
for value in (
|
| 726 |
+
recommendation["primary_pressure"],
|
| 727 |
+
recommendation["alternate_pressure"],
|
| 728 |
+
)
|
| 729 |
+
)
|
| 730 |
+
|
| 731 |
+
def _route_matches_alternate(
|
| 732 |
+
self,
|
| 733 |
+
ticket: HelpdeskTicketRecord,
|
| 734 |
+
route: dict[str, str],
|
| 735 |
+
) -> bool:
|
| 736 |
+
if not self._ticket_has_alternate_route(ticket):
|
| 737 |
+
return False
|
| 738 |
+
return route == self._route_for_ticket(ticket, use_alternate=True)
|
| 739 |
+
|
| 740 |
+
def _apply_capacity_usage(
|
| 741 |
+
self,
|
| 742 |
+
ticket: HelpdeskTicketRecord,
|
| 743 |
+
action: HelpdeskTicketAction,
|
| 744 |
+
) -> tuple[float, dict[str, Any]]:
|
| 745 |
+
if self._state.current_task_id != 3:
|
| 746 |
+
return 0.0, {}
|
| 747 |
+
|
| 748 |
+
route = self._route_for_action(ticket, action)
|
| 749 |
+
capacity_cost = self._route_capacity_cost(route)
|
| 750 |
+
group = str(capacity_cost["assignment_group"])
|
| 751 |
+
|
| 752 |
+
if group not in self._state.team_capacity_remaining:
|
| 753 |
+
self._state.team_capacity_remaining[group] = 1
|
| 754 |
+
self._state.team_capacity_initial.setdefault(group, 1)
|
| 755 |
+
|
| 756 |
+
group_remaining = self._state.team_capacity_remaining[group]
|
| 757 |
+
group_overflow = max(0, int(capacity_cost["team_slots"]) - group_remaining)
|
| 758 |
+
self._state.team_capacity_remaining[group] = max(
|
| 759 |
+
0,
|
| 760 |
+
group_remaining - int(capacity_cost["team_slots"]),
|
| 761 |
+
)
|
| 762 |
+
|
| 763 |
+
high_priority_cost = int(capacity_cost["high_priority_slots"])
|
| 764 |
+
high_priority_overflow = max(
|
| 765 |
+
0,
|
| 766 |
+
high_priority_cost - self._state.high_priority_slots_remaining,
|
| 767 |
+
)
|
| 768 |
+
self._state.high_priority_slots_remaining = max(
|
| 769 |
+
0,
|
| 770 |
+
self._state.high_priority_slots_remaining - high_priority_cost,
|
| 771 |
+
)
|
| 772 |
+
|
| 773 |
+
escalation_cost = int(capacity_cost["escalation_slots"])
|
| 774 |
+
escalation_overflow = max(
|
| 775 |
+
0,
|
| 776 |
+
escalation_cost - self._state.escalation_slots_remaining,
|
| 777 |
+
)
|
| 778 |
+
self._state.escalation_slots_remaining = max(
|
| 779 |
+
0,
|
| 780 |
+
self._state.escalation_slots_remaining - escalation_cost,
|
| 781 |
+
)
|
| 782 |
+
|
| 783 |
+
capacity_penalty = round(
|
| 784 |
+
group_overflow * TEAM_CAPACITY_OVERFLOW_PENALTY
|
| 785 |
+
+ high_priority_overflow * HIGH_PRIORITY_SLOT_OVERFLOW_PENALTY
|
| 786 |
+
+ escalation_overflow * ESCALATION_SLOT_OVERFLOW_PENALTY,
|
| 787 |
+
4,
|
| 788 |
+
)
|
| 789 |
+
self._state.planning_penalty_total = round(
|
| 790 |
+
self._state.planning_penalty_total + capacity_penalty,
|
| 791 |
+
4,
|
| 792 |
+
)
|
| 793 |
+
self._state.planning_penalty_applied = capacity_penalty
|
| 794 |
+
|
| 795 |
+
used_alternate_route = self._route_matches_alternate(ticket, route)
|
| 796 |
+
if used_alternate_route:
|
| 797 |
+
self._state.capacity_pressure_tickets_resolved += 1
|
| 798 |
+
|
| 799 |
+
return capacity_penalty, {
|
| 800 |
+
"capacity_cost": capacity_cost,
|
| 801 |
+
"group_overflow": group_overflow,
|
| 802 |
+
"high_priority_overflow": high_priority_overflow,
|
| 803 |
+
"escalation_overflow": escalation_overflow,
|
| 804 |
+
"used_alternate_route": used_alternate_route,
|
| 805 |
+
"capacity_state_after_action": self._capacity_state_snapshot(),
|
| 806 |
+
}
|
| 807 |
+
|
| 808 |
+
def _planning_success_bonus(self) -> float:
|
| 809 |
+
if self._state.current_task_id != 3 or self._state.planning_penalty_total > 0.0:
|
| 810 |
+
return 0.0
|
| 811 |
+
capacity_sensitive_count = sum(
|
| 812 |
+
1 for ticket in self._queue if self._ticket_has_alternate_route(ticket)
|
| 813 |
+
)
|
| 814 |
+
if capacity_sensitive_count == 0:
|
| 815 |
+
return 0.0
|
| 816 |
+
coverage = min(
|
| 817 |
+
1.0,
|
| 818 |
+
self._state.capacity_pressure_tickets_resolved / capacity_sensitive_count,
|
| 819 |
+
)
|
| 820 |
+
return round(PLANNING_SUCCESS_BONUS * coverage, 4)
|
| 821 |
+
|
| 822 |
def _internal_routing_note_for_ticket(
|
| 823 |
self,
|
| 824 |
ticket: HelpdeskTicketRecord,
|
| 825 |
) -> str | None:
|
|
|
|
|
|
|
| 826 |
if self._state.current_task_id != 3:
|
| 827 |
+
return ticket.ambiguity_note or ticket.planning_note
|
| 828 |
+
|
| 829 |
+
note_parts: list[str] = []
|
| 830 |
+
if ticket.ambiguity_note is not None:
|
| 831 |
+
note_parts.append(ticket.ambiguity_note)
|
| 832 |
+
if ticket.planning_note is not None:
|
| 833 |
+
note_parts.append(ticket.planning_note)
|
| 834 |
|
| 835 |
default_group = ISSUE_TYPE_TO_ASSIGNMENT_GROUP.get(
|
| 836 |
ticket.issue_type,
|
|
|
|
| 840 |
ticket.issue_type,
|
| 841 |
ticket.resolution_action,
|
| 842 |
)
|
|
|
|
| 843 |
|
| 844 |
if ticket.assignment_group != default_group:
|
| 845 |
note_parts.append(
|
|
|
|
| 914 |
return self._ticket_repeated_requester_count(ticket) >= 2
|
| 915 |
if tool_name == "lookup_internal_routing_note":
|
| 916 |
return self._internal_routing_note_for_ticket(ticket) is not None
|
| 917 |
+
if tool_name == "lookup_queue_capacity_forecast":
|
| 918 |
+
return self._state.current_task_id == 3 and (
|
| 919 |
+
self._ticket_has_alternate_route(ticket)
|
| 920 |
+
or self._future_queue_demand()["remaining_ticket_count"] > 0
|
| 921 |
+
)
|
| 922 |
return False
|
| 923 |
|
| 924 |
def _required_tools_for_ticket(
|
|
|
|
| 948 |
and "lookup_requester_history" not in required_tools
|
| 949 |
):
|
| 950 |
required_tools.append("lookup_requester_history")
|
| 951 |
+
if (
|
| 952 |
+
self._ticket_is_capacity_sensitive(ticket)
|
| 953 |
+
and "lookup_queue_capacity_forecast" not in required_tools
|
| 954 |
+
):
|
| 955 |
+
required_tools.append("lookup_queue_capacity_forecast")
|
| 956 |
filtered_required_tools: list[str] = []
|
| 957 |
for tool_name in required_tools:
|
| 958 |
if tool_name in filtered_required_tools:
|
|
|
|
| 1003 |
"The visible request is not enough to choose the final owner and next step. "
|
| 1004 |
"Additional routing context is available via investigation."
|
| 1005 |
)
|
| 1006 |
+
if self._ticket_has_alternate_route(ticket):
|
| 1007 |
+
return (
|
| 1008 |
+
"The queue is under resource pressure and this ticket may support more than "
|
| 1009 |
+
"one acceptable routing path. Additional planning context is available via investigation."
|
| 1010 |
+
)
|
| 1011 |
if self._ticket_has_nondefault_routing(ticket):
|
| 1012 |
return (
|
| 1013 |
"The visible request looks straightforward, but the decisive routing detail is hidden until investigation."
|
|
|
|
| 1021 |
return "Follow-up request with hidden routing context"
|
| 1022 |
if self._internal_routing_note_for_ticket(ticket) is not None:
|
| 1023 |
return "Routing clarification required"
|
| 1024 |
+
if self._ticket_has_alternate_route(ticket):
|
| 1025 |
+
return "Capacity-sensitive routing decision"
|
| 1026 |
if self._ticket_mentions_follow_up(ticket):
|
| 1027 |
return "Priority support follow-up"
|
| 1028 |
return "Helpdesk routing decision"
|
|
|
|
| 1219 |
"routing_note": routing_note if found else "",
|
| 1220 |
}
|
| 1221 |
|
| 1222 |
+
def _lookup_queue_capacity_forecast(
|
| 1223 |
+
self,
|
| 1224 |
+
current_ticket: HelpdeskTicketRecord,
|
| 1225 |
+
) -> dict[str, Any]:
|
| 1226 |
+
recommendation = self._planning_route_recommendation(current_ticket)
|
| 1227 |
+
routing_options = self._routing_options_for_ticket(current_ticket)
|
| 1228 |
+
return {
|
| 1229 |
+
"tool_name": "lookup_queue_capacity_forecast",
|
| 1230 |
+
"found": True,
|
| 1231 |
+
"ticket_id": current_ticket.ticket_id,
|
| 1232 |
+
"preferred_route_label": recommendation["preferred_label"],
|
| 1233 |
+
"primary_pressure": recommendation["primary_pressure"],
|
| 1234 |
+
"alternate_pressure": recommendation["alternate_pressure"],
|
| 1235 |
+
"capacity_state": recommendation["capacity_state"],
|
| 1236 |
+
"future_queue_demand": recommendation["future_demand"],
|
| 1237 |
+
"routing_options": routing_options,
|
| 1238 |
+
}
|
| 1239 |
+
|
| 1240 |
def _run_investigation_tool(
|
| 1241 |
self,
|
| 1242 |
current_ticket: HelpdeskTicketRecord,
|
|
|
|
| 1249 |
return self._lookup_requester_history(current_ticket)
|
| 1250 |
if tool_name == "lookup_internal_routing_note":
|
| 1251 |
return self._lookup_internal_routing_note(current_ticket)
|
| 1252 |
+
if tool_name == "lookup_queue_capacity_forecast":
|
| 1253 |
+
return self._lookup_queue_capacity_forecast(current_ticket)
|
| 1254 |
raise ValueError(f"Unsupported tool_name: {tool_name}")
|
| 1255 |
|
| 1256 |
def _handle_investigation_action(
|
|
|
|
| 1335 |
def _build_ticket_view(self, ticket: HelpdeskTicketRecord) -> dict[str, Any]:
|
| 1336 |
progress = self._tool_progress_for_ticket(ticket)
|
| 1337 |
remaining_tools = progress["remaining_tools"]
|
| 1338 |
+
used_tools = set(self._used_tools_for_ticket(ticket.ticket_id))
|
| 1339 |
ticket_view: dict[str, Any] = {
|
| 1340 |
"ticket_id": ticket.ticket_id,
|
| 1341 |
"title": self._visible_title(ticket),
|
| 1342 |
"requester": ticket.requester,
|
| 1343 |
"description": self._visible_description(ticket),
|
| 1344 |
}
|
| 1345 |
+
if self._state.current_task_id == 3:
|
| 1346 |
+
ticket_view["capacity_state"] = self._capacity_state_snapshot()
|
| 1347 |
if progress["required_tools"]:
|
| 1348 |
ticket_view["context_status"] = {
|
| 1349 |
"investigation_required": True,
|
|
|
|
| 1356 |
}
|
| 1357 |
if ticket.ambiguity_note is not None and "lookup_internal_routing_note" not in remaining_tools:
|
| 1358 |
ticket_view["ambiguity_note"] = ticket.ambiguity_note
|
| 1359 |
+
if (
|
| 1360 |
+
ticket.planning_note is not None
|
| 1361 |
+
and "lookup_internal_routing_note" not in remaining_tools
|
| 1362 |
+
):
|
| 1363 |
+
ticket_view["planning_note"] = ticket.planning_note
|
| 1364 |
if ticket.related_ticket_id is not None and "lookup_related_ticket" not in remaining_tools:
|
| 1365 |
ticket_view["related_ticket_id"] = ticket.related_ticket_id
|
| 1366 |
related_ticket = self._tickets_by_id.get(ticket.related_ticket_id)
|
|
|
|
| 1371 |
"requester": related_ticket.requester,
|
| 1372 |
"description": related_ticket.description,
|
| 1373 |
}
|
| 1374 |
+
if self._ticket_has_alternate_route(ticket) and (
|
| 1375 |
+
"lookup_internal_routing_note" in used_tools
|
| 1376 |
+
or "lookup_queue_capacity_forecast" in used_tools
|
| 1377 |
+
):
|
| 1378 |
+
ticket_view["routing_options"] = self._routing_options_for_ticket(ticket)
|
| 1379 |
return ticket_view
|
| 1380 |
|
| 1381 |
def _build_feedback_summary(
|
|
|
|
| 1429 |
risk_penalty = reward_components.get("risk_penalty")
|
| 1430 |
if risk_penalty:
|
| 1431 |
parts.append(f"risk_penalty={risk_penalty:.2f}")
|
| 1432 |
+
capacity_penalty = reward_components.get("capacity_penalty")
|
| 1433 |
+
if capacity_penalty:
|
| 1434 |
+
parts.append(f"capacity_penalty={capacity_penalty:.2f}")
|
| 1435 |
+
planning_penalty_total = reward_components.get("planning_penalty_total")
|
| 1436 |
+
if planning_penalty_total:
|
| 1437 |
+
parts.append(f"planning_penalty_total={planning_penalty_total:.2f}")
|
| 1438 |
|
| 1439 |
return "; ".join(parts)
|
| 1440 |
|
|
|
|
| 1464 |
"breakdown": breakdown,
|
| 1465 |
"queue_position": queue_position,
|
| 1466 |
}
|
| 1467 |
+
if self._state.current_task_id == 3:
|
| 1468 |
+
history_entry["capacity_state"] = self._capacity_state_snapshot()
|
| 1469 |
if reward is not None:
|
| 1470 |
history_entry["reward"] = reward
|
| 1471 |
if rubric_reward is not None:
|
|
|
|
| 1474 |
history_entry["reward_kind"] = reward_kind
|
| 1475 |
if ticket.ambiguity_note is not None and "lookup_internal_routing_note" not in remaining_tools:
|
| 1476 |
history_entry["ambiguity_note"] = ticket.ambiguity_note
|
| 1477 |
+
if (
|
| 1478 |
+
ticket.planning_note is not None
|
| 1479 |
+
and "lookup_internal_routing_note" not in remaining_tools
|
| 1480 |
+
):
|
| 1481 |
+
history_entry["planning_note"] = ticket.planning_note
|
| 1482 |
if ticket.related_ticket_id is not None and "lookup_related_ticket" not in remaining_tools:
|
| 1483 |
history_entry["related_ticket_id"] = ticket.related_ticket_id
|
| 1484 |
related_ticket = self._tickets_by_id.get(ticket.related_ticket_id)
|
|
|
|
| 1489 |
"requester": related_ticket.requester,
|
| 1490 |
"description": related_ticket.description,
|
| 1491 |
}
|
| 1492 |
+
if (
|
| 1493 |
+
self._ticket_has_alternate_route(ticket)
|
| 1494 |
+
and (
|
| 1495 |
+
"lookup_internal_routing_note" not in remaining_tools
|
| 1496 |
+
or "lookup_queue_capacity_forecast" in self._used_tools_for_ticket(ticket.ticket_id)
|
| 1497 |
+
)
|
| 1498 |
+
):
|
| 1499 |
+
history_entry["routing_options"] = self._routing_options_for_ticket(ticket)
|
| 1500 |
if penalty_reason is not None:
|
| 1501 |
history_entry["penalty_reason"] = penalty_reason
|
| 1502 |
if tool_result is not None:
|
|
|
|
| 1566 |
"average_score_so_far": self._state.average_score_so_far,
|
| 1567 |
"progress_fraction": progress_fraction,
|
| 1568 |
"investigation_penalty_applied": self._state.investigation_penalty_applied,
|
| 1569 |
+
"planning_penalty_total": self._state.planning_penalty_total,
|
| 1570 |
+
"planning_penalty_applied": self._state.planning_penalty_applied,
|
| 1571 |
}
|
| 1572 |
+
if self._state.current_task_id == 3:
|
| 1573 |
+
metadata["capacity_state"] = self._capacity_state_snapshot()
|
| 1574 |
+
metadata["future_queue_demand"] = self._future_queue_demand()
|
| 1575 |
if last_history_entry is not None:
|
| 1576 |
metadata["last_score"] = last_history_entry.get("score")
|
| 1577 |
metadata["last_reward"] = last_history_entry.get("reward")
|
server/grader.py
CHANGED
|
@@ -2,9 +2,6 @@ from __future__ import annotations
|
|
| 2 |
|
| 3 |
from models import HelpdeskTicketAction, HelpdeskTicketRecord
|
| 4 |
|
| 5 |
-
TASK_SCORE_EPSILON = 0.01
|
| 6 |
-
|
| 7 |
-
|
| 8 |
ISSUE_TYPE_SIMILARITY = {
|
| 9 |
("billing_license", "service_request"): 0.4,
|
| 10 |
("service_request", "billing_license"): 0.4,
|
|
@@ -120,31 +117,90 @@ def _score_exact(predicted: str | None, expected: str) -> float:
|
|
| 120 |
return 1.0 if _normalized(predicted) == _normalized(expected) and predicted else 0.0
|
| 121 |
|
| 122 |
|
| 123 |
-
def
|
| 124 |
action: HelpdeskTicketAction,
|
| 125 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 126 |
task_id: int,
|
| 127 |
) -> tuple[float, dict[str, float]]:
|
| 128 |
-
if task_id not in TASK_WEIGHTS:
|
| 129 |
-
raise ValueError(f"Unsupported task_id: {task_id}")
|
| 130 |
-
|
| 131 |
field_scores = {
|
| 132 |
-
"issue_type": _score_exact_or_similar(action.issue_type,
|
| 133 |
-
"priority": _score_priority(action.priority,
|
| 134 |
"assignment_group": _score_exact_or_table(
|
| 135 |
action.assignment_group,
|
| 136 |
-
|
| 137 |
ASSIGNMENT_GROUP_SIMILARITY,
|
| 138 |
),
|
| 139 |
"resolution_action": _score_exact_or_table(
|
| 140 |
action.resolution_action,
|
| 141 |
-
|
| 142 |
RESOLUTION_ACTION_SIMILARITY,
|
| 143 |
),
|
| 144 |
}
|
| 145 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 146 |
weights = TASK_WEIGHTS[task_id]
|
| 147 |
raw_score = sum(field_scores[field] * weight for field, weight in weights.items())
|
| 148 |
-
|
| 149 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 150 |
return score, breakdown
|
|
|
|
| 2 |
|
| 3 |
from models import HelpdeskTicketAction, HelpdeskTicketRecord
|
| 4 |
|
|
|
|
|
|
|
|
|
|
| 5 |
ISSUE_TYPE_SIMILARITY = {
|
| 6 |
("billing_license", "service_request"): 0.4,
|
| 7 |
("service_request", "billing_license"): 0.4,
|
|
|
|
| 117 |
return 1.0 if _normalized(predicted) == _normalized(expected) and predicted else 0.0
|
| 118 |
|
| 119 |
|
| 120 |
+
def _score_route(
|
| 121 |
action: HelpdeskTicketAction,
|
| 122 |
+
*,
|
| 123 |
+
issue_type: str,
|
| 124 |
+
priority: str,
|
| 125 |
+
assignment_group: str,
|
| 126 |
+
resolution_action: str,
|
| 127 |
+
score_multiplier: float,
|
| 128 |
task_id: int,
|
| 129 |
) -> tuple[float, dict[str, float]]:
|
|
|
|
|
|
|
|
|
|
| 130 |
field_scores = {
|
| 131 |
+
"issue_type": _score_exact_or_similar(action.issue_type, issue_type),
|
| 132 |
+
"priority": _score_priority(action.priority, priority),
|
| 133 |
"assignment_group": _score_exact_or_table(
|
| 134 |
action.assignment_group,
|
| 135 |
+
assignment_group,
|
| 136 |
ASSIGNMENT_GROUP_SIMILARITY,
|
| 137 |
),
|
| 138 |
"resolution_action": _score_exact_or_table(
|
| 139 |
action.resolution_action,
|
| 140 |
+
resolution_action,
|
| 141 |
RESOLUTION_ACTION_SIMILARITY,
|
| 142 |
),
|
| 143 |
}
|
| 144 |
+
if score_multiplier != 1.0:
|
| 145 |
+
field_scores = {
|
| 146 |
+
field: round(score * score_multiplier, 4)
|
| 147 |
+
for field, score in field_scores.items()
|
| 148 |
+
}
|
| 149 |
weights = TASK_WEIGHTS[task_id]
|
| 150 |
raw_score = sum(field_scores[field] * weight for field, weight in weights.items())
|
| 151 |
+
return raw_score, field_scores
|
| 152 |
+
|
| 153 |
+
|
| 154 |
+
def _alternate_route_available(ticket: HelpdeskTicketRecord) -> bool:
|
| 155 |
+
return any(
|
| 156 |
+
value is not None
|
| 157 |
+
for value in (
|
| 158 |
+
ticket.alternate_issue_type,
|
| 159 |
+
ticket.alternate_priority,
|
| 160 |
+
ticket.alternate_assignment_group,
|
| 161 |
+
ticket.alternate_resolution_action,
|
| 162 |
+
)
|
| 163 |
+
) and ticket.alternate_route_score_multiplier > 0.0
|
| 164 |
+
|
| 165 |
+
|
| 166 |
+
def grade_action(
|
| 167 |
+
action: HelpdeskTicketAction,
|
| 168 |
+
ticket: HelpdeskTicketRecord,
|
| 169 |
+
task_id: int,
|
| 170 |
+
) -> tuple[float, dict[str, float]]:
|
| 171 |
+
if task_id not in TASK_WEIGHTS:
|
| 172 |
+
raise ValueError(f"Unsupported task_id: {task_id}")
|
| 173 |
+
|
| 174 |
+
primary_score, primary_field_scores = _score_route(
|
| 175 |
+
action,
|
| 176 |
+
issue_type=ticket.issue_type,
|
| 177 |
+
priority=ticket.priority,
|
| 178 |
+
assignment_group=ticket.assignment_group,
|
| 179 |
+
resolution_action=ticket.resolution_action,
|
| 180 |
+
score_multiplier=1.0,
|
| 181 |
+
task_id=task_id,
|
| 182 |
+
)
|
| 183 |
+
chosen_score = primary_score
|
| 184 |
+
chosen_field_scores = primary_field_scores
|
| 185 |
+
|
| 186 |
+
if _alternate_route_available(ticket):
|
| 187 |
+
alternate_score, alternate_field_scores = _score_route(
|
| 188 |
+
action,
|
| 189 |
+
issue_type=ticket.alternate_issue_type or ticket.issue_type,
|
| 190 |
+
priority=ticket.alternate_priority or ticket.priority,
|
| 191 |
+
assignment_group=(
|
| 192 |
+
ticket.alternate_assignment_group or ticket.assignment_group
|
| 193 |
+
),
|
| 194 |
+
resolution_action=(
|
| 195 |
+
ticket.alternate_resolution_action or ticket.resolution_action
|
| 196 |
+
),
|
| 197 |
+
score_multiplier=ticket.alternate_route_score_multiplier,
|
| 198 |
+
task_id=task_id,
|
| 199 |
+
)
|
| 200 |
+
if alternate_score > chosen_score:
|
| 201 |
+
chosen_score = alternate_score
|
| 202 |
+
chosen_field_scores = alternate_field_scores
|
| 203 |
+
|
| 204 |
+
score = max(0.0, min(1.0, chosen_score))
|
| 205 |
+
breakdown = {field: chosen_field_scores[field] for field in TASK_WEIGHTS[task_id]}
|
| 206 |
return score, breakdown
|
server/reward.py
CHANGED
|
@@ -8,15 +8,14 @@ DELTA_REWARD_WEIGHT = 0.08
|
|
| 8 |
DELTA_REWARD_CAP = 0.04
|
| 9 |
PROCESS_BONUS_CAP = 0.08
|
| 10 |
RISK_PENALTY_CAP = 0.12
|
| 11 |
-
OPEN_INTERVAL_EPSILON = 0.01
|
| 12 |
|
| 13 |
|
| 14 |
def _clamp_unit_interval(value: float) -> float:
|
| 15 |
return max(0.0, min(1.0, value))
|
| 16 |
|
| 17 |
|
| 18 |
-
def clamp_open_unit_interval(value: float, epsilon: float =
|
| 19 |
-
return
|
| 20 |
|
| 21 |
|
| 22 |
def compute_step_adjustments(
|
|
@@ -93,7 +92,7 @@ def compute_trajectory_adjustments(
|
|
| 93 |
avg = sum(per_ticket_scores) / len(per_ticket_scores)
|
| 94 |
bounded_completion_bonus = max(0.0, min(0.08, completion_bonus))
|
| 95 |
bounded_consistency_bonus = max(0.0, min(0.05, consistency_bonus))
|
| 96 |
-
final_reward =
|
| 97 |
avg + bounded_completion_bonus + bounded_consistency_bonus
|
| 98 |
)
|
| 99 |
return {
|
|
|
|
| 8 |
DELTA_REWARD_CAP = 0.04
|
| 9 |
PROCESS_BONUS_CAP = 0.08
|
| 10 |
RISK_PENALTY_CAP = 0.12
|
|
|
|
| 11 |
|
| 12 |
|
| 13 |
def _clamp_unit_interval(value: float) -> float:
|
| 14 |
return max(0.0, min(1.0, value))
|
| 15 |
|
| 16 |
|
| 17 |
+
def clamp_open_unit_interval(value: float, epsilon: float = 0.0) -> float:
|
| 18 |
+
return _clamp_unit_interval(value)
|
| 19 |
|
| 20 |
|
| 21 |
def compute_step_adjustments(
|
|
|
|
| 92 |
avg = sum(per_ticket_scores) / len(per_ticket_scores)
|
| 93 |
bounded_completion_bonus = max(0.0, min(0.08, completion_bonus))
|
| 94 |
bounded_consistency_bonus = max(0.0, min(0.05, consistency_bonus))
|
| 95 |
+
final_reward = _clamp_unit_interval(
|
| 96 |
avg + bounded_completion_bonus + bounded_consistency_bonus
|
| 97 |
)
|
| 98 |
return {
|
server/tasks.py
CHANGED
|
@@ -36,10 +36,13 @@ TASKS = {
|
|
| 36 |
"instructions": (
|
| 37 |
"Perform full helpdesk routing by selecting the best issue type, "
|
| 38 |
"priority, assignment group, and resolution action for the ticket. "
|
| 39 |
-
"Use any ambiguity notes
|
|
|
|
| 40 |
"Some hard tickets intentionally hide decisive routing context until "
|
| 41 |
-
"you investigate with the available tools,
|
| 42 |
-
"
|
|
|
|
|
|
|
| 43 |
),
|
| 44 |
"allowed_fields": [
|
| 45 |
"issue_type",
|
|
@@ -50,6 +53,379 @@ TASKS = {
|
|
| 50 |
},
|
| 51 |
}
|
| 52 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 53 |
assert tuple(TASKS.keys()) == TASK_IDS
|
| 54 |
|
| 55 |
|
|
@@ -58,7 +434,8 @@ def load_dataset() -> list[HelpdeskTicketRecord]:
|
|
| 58 |
# Accept UTF-8 files saved with a BOM, which is common on Windows editors.
|
| 59 |
with dataset_path.open("r", encoding="utf-8-sig") as f:
|
| 60 |
raw = json.load(f)
|
| 61 |
-
|
|
|
|
| 62 |
|
| 63 |
|
| 64 |
def get_task_definition(task_id: int) -> dict:
|
|
|
|
| 36 |
"instructions": (
|
| 37 |
"Perform full helpdesk routing by selecting the best issue type, "
|
| 38 |
"priority, assignment group, and resolution action for the ticket. "
|
| 39 |
+
"Use any ambiguity notes, related-ticket previews, queue-capacity "
|
| 40 |
+
"forecasts, and planning state when present. "
|
| 41 |
"Some hard tickets intentionally hide decisive routing context until "
|
| 42 |
+
"you investigate with the available tools, and some hard episodes also "
|
| 43 |
+
"require queue-level capacity planning across multiple tickets, so "
|
| 44 |
+
"premature or resource-greedy routing can underperform even when the "
|
| 45 |
+
"visible text looks plausible."
|
| 46 |
),
|
| 47 |
"allowed_fields": [
|
| 48 |
"issue_type",
|
|
|
|
| 53 |
},
|
| 54 |
}
|
| 55 |
|
| 56 |
+
|
| 57 |
+
PLANNING_ROUTE_UPDATES: dict[str, dict] = {
|
| 58 |
+
"ticket-022": {
|
| 59 |
+
"planning_note": (
|
| 60 |
+
"If the application queue is saturated, billing operations can own the "
|
| 61 |
+
"customer-facing charge review as a lower-fidelity fallback while the bug "
|
| 62 |
+
"investigation continues separately."
|
| 63 |
+
),
|
| 64 |
+
"alternate_issue_type": "billing_license",
|
| 65 |
+
"alternate_assignment_group": "license_ops",
|
| 66 |
+
"alternate_resolution_action": "assign",
|
| 67 |
+
"alternate_route_score_multiplier": 0.74,
|
| 68 |
+
},
|
| 69 |
+
"ticket-027": {
|
| 70 |
+
"planning_note": (
|
| 71 |
+
"If procurement capacity is available, treat this like a commercial review. "
|
| 72 |
+
"If not, a lightweight service-desk acknowledgement is still acceptable."
|
| 73 |
+
),
|
| 74 |
+
"alternate_issue_type": "service_request",
|
| 75 |
+
"alternate_priority": "medium",
|
| 76 |
+
"alternate_assignment_group": "procurement",
|
| 77 |
+
"alternate_resolution_action": "assign",
|
| 78 |
+
"alternate_route_score_multiplier": 0.92,
|
| 79 |
+
},
|
| 80 |
+
"ticket-029": {
|
| 81 |
+
"planning_note": (
|
| 82 |
+
"Seat expansion is the preferred route, but license operations can still "
|
| 83 |
+
"handle the prorating clarification when procurement is the bottleneck."
|
| 84 |
+
),
|
| 85 |
+
"alternate_issue_type": "billing_license",
|
| 86 |
+
"alternate_assignment_group": "license_ops",
|
| 87 |
+
"alternate_resolution_action": "fulfill",
|
| 88 |
+
"alternate_route_score_multiplier": 0.82,
|
| 89 |
+
},
|
| 90 |
+
"ticket-040": {
|
| 91 |
+
"planning_note": (
|
| 92 |
+
"The request can be treated either as roadmap feedback or as a support "
|
| 93 |
+
"escalation if the operational impact is emphasized."
|
| 94 |
+
),
|
| 95 |
+
"alternate_issue_type": "application_support",
|
| 96 |
+
"alternate_priority": "high",
|
| 97 |
+
"alternate_resolution_action": "escalate",
|
| 98 |
+
"alternate_route_score_multiplier": 0.76,
|
| 99 |
+
},
|
| 100 |
+
"ticket-047": {
|
| 101 |
+
"planning_note": (
|
| 102 |
+
"The preferred route is an immediate service-desk extension, but the "
|
| 103 |
+
"commercial owner can take it if operational fulfillment capacity is exhausted."
|
| 104 |
+
),
|
| 105 |
+
"alternate_assignment_group": "procurement",
|
| 106 |
+
"alternate_resolution_action": "assign",
|
| 107 |
+
"alternate_route_score_multiplier": 0.78,
|
| 108 |
+
},
|
| 109 |
+
"ticket-048": {
|
| 110 |
+
"planning_note": (
|
| 111 |
+
"This belongs with procurement when commercial reviewers are available, "
|
| 112 |
+
"but a generic service-desk acknowledgement is an acceptable fallback."
|
| 113 |
+
),
|
| 114 |
+
"alternate_assignment_group": "service_desk",
|
| 115 |
+
"alternate_resolution_action": "acknowledge",
|
| 116 |
+
"alternate_route_score_multiplier": 0.9,
|
| 117 |
+
},
|
| 118 |
+
"ticket-050": {
|
| 119 |
+
"planning_note": (
|
| 120 |
+
"Central coordination is preferred. If service-desk capacity is depleted, "
|
| 121 |
+
"onboarding operations can still run a reduced fulfillment path."
|
| 122 |
+
),
|
| 123 |
+
"alternate_priority": "medium",
|
| 124 |
+
"alternate_assignment_group": "onboarding_ops",
|
| 125 |
+
"alternate_resolution_action": "fulfill",
|
| 126 |
+
"alternate_route_score_multiplier": 0.84,
|
| 127 |
+
},
|
| 128 |
+
"ticket-051": {
|
| 129 |
+
"planning_note": (
|
| 130 |
+
"Commercial procurement owns the contract amendment, but this can also "
|
| 131 |
+
"be treated as a service request when the commercial queue needs triage."
|
| 132 |
+
),
|
| 133 |
+
"alternate_issue_type": "service_request",
|
| 134 |
+
"alternate_route_score_multiplier": 0.83,
|
| 135 |
+
},
|
| 136 |
+
"ticket-053": {
|
| 137 |
+
"planning_note": (
|
| 138 |
+
"Security scheduling is ideal, but a compliance acknowledgement is still "
|
| 139 |
+
"acceptable when the security team only needs to confirm the process."
|
| 140 |
+
),
|
| 141 |
+
"alternate_issue_type": "security_compliance",
|
| 142 |
+
"alternate_resolution_action": "acknowledge",
|
| 143 |
+
"alternate_route_score_multiplier": 0.8,
|
| 144 |
+
},
|
| 145 |
+
"ticket-054": {
|
| 146 |
+
"planning_note": (
|
| 147 |
+
"License operations can fulfill the archive request directly. If that queue "
|
| 148 |
+
"is saturated, service desk can acknowledge and queue the retrieval."
|
| 149 |
+
),
|
| 150 |
+
"alternate_assignment_group": "service_desk",
|
| 151 |
+
"alternate_resolution_action": "acknowledge",
|
| 152 |
+
"alternate_route_score_multiplier": 0.9,
|
| 153 |
+
},
|
| 154 |
+
}
|
| 155 |
+
|
| 156 |
+
|
| 157 |
+
CURATED_EXPANSION_RECORDS: list[dict] = [
|
| 158 |
+
{
|
| 159 |
+
"ticket_id": "ticket-056",
|
| 160 |
+
"title": "Vendor DPA redlines need an owner before pricing sign-off",
|
| 161 |
+
"requester": "procurement@harborcompliance.io",
|
| 162 |
+
"description": (
|
| 163 |
+
"Commercial review is already moving, but the team needs to know who owns "
|
| 164 |
+
"the vendor DPA redlines before pricing can be approved."
|
| 165 |
+
),
|
| 166 |
+
"issue_type": "general_inquiry",
|
| 167 |
+
"priority": "medium",
|
| 168 |
+
"assignment_group": "procurement",
|
| 169 |
+
"resolution_action": "assign",
|
| 170 |
+
"planning_note": (
|
| 171 |
+
"Procurement is preferred, but service desk can acknowledge and route the "
|
| 172 |
+
"questionnaire logistics if the commercial queue is saturated."
|
| 173 |
+
),
|
| 174 |
+
"alternate_assignment_group": "service_desk",
|
| 175 |
+
"alternate_resolution_action": "acknowledge",
|
| 176 |
+
"alternate_route_score_multiplier": 0.9,
|
| 177 |
+
},
|
| 178 |
+
{
|
| 179 |
+
"ticket_id": "ticket-057",
|
| 180 |
+
"title": "Board audit packet needs a timeline for the privileged-account lockout",
|
| 181 |
+
"requester": "security-ops@atlasbank.io",
|
| 182 |
+
"description": (
|
| 183 |
+
"Following up on ticket-046. The board pack needs a timeline and ownership "
|
| 184 |
+
"summary for the privileged admin lockout before tomorrow morning."
|
| 185 |
+
),
|
| 186 |
+
"issue_type": "identity_access",
|
| 187 |
+
"priority": "high",
|
| 188 |
+
"assignment_group": "security_team",
|
| 189 |
+
"resolution_action": "escalate",
|
| 190 |
+
"related_ticket_id": "ticket-046",
|
| 191 |
+
"planning_note": (
|
| 192 |
+
"Security still owns the privileged-access review, but service desk can "
|
| 193 |
+
"collect chronology and prepare the packet if the security queue is jammed."
|
| 194 |
+
),
|
| 195 |
+
"alternate_assignment_group": "service_desk",
|
| 196 |
+
"alternate_resolution_action": "assign",
|
| 197 |
+
"alternate_route_score_multiplier": 0.72,
|
| 198 |
+
},
|
| 199 |
+
{
|
| 200 |
+
"ticket_id": "ticket-058",
|
| 201 |
+
"title": "Temporary contractor extension during an onboarding surge",
|
| 202 |
+
"requester": "hr@talentbridge.co",
|
| 203 |
+
"description": (
|
| 204 |
+
"A contractor start date slipped by two weeks and the account needs to stay "
|
| 205 |
+
"active while the onboarding backlog is already full."
|
| 206 |
+
),
|
| 207 |
+
"issue_type": "onboarding",
|
| 208 |
+
"priority": "medium",
|
| 209 |
+
"assignment_group": "service_desk",
|
| 210 |
+
"resolution_action": "assign",
|
| 211 |
+
"planning_note": (
|
| 212 |
+
"Service desk is preferred for cross-team coordination. If coordination "
|
| 213 |
+
"capacity is exhausted, onboarding operations can fulfill the extension directly."
|
| 214 |
+
),
|
| 215 |
+
"alternate_assignment_group": "onboarding_ops",
|
| 216 |
+
"alternate_resolution_action": "fulfill",
|
| 217 |
+
"alternate_route_score_multiplier": 0.85,
|
| 218 |
+
},
|
| 219 |
+
{
|
| 220 |
+
"ticket_id": "ticket-059",
|
| 221 |
+
"title": "Archived invoice packet plus quarter-close clarification",
|
| 222 |
+
"requester": "boardops@silverpine.com",
|
| 223 |
+
"description": (
|
| 224 |
+
"Finance needs archived invoice PDFs plus a quick note explaining whether any "
|
| 225 |
+
"quarter-close adjustments are still pending."
|
| 226 |
+
),
|
| 227 |
+
"issue_type": "general_inquiry",
|
| 228 |
+
"priority": "medium",
|
| 229 |
+
"assignment_group": "license_ops",
|
| 230 |
+
"resolution_action": "fulfill",
|
| 231 |
+
"planning_note": (
|
| 232 |
+
"Invoice operations can fulfill directly. If that queue is constrained, "
|
| 233 |
+
"service desk can acknowledge and schedule the retrieval."
|
| 234 |
+
),
|
| 235 |
+
"alternate_assignment_group": "service_desk",
|
| 236 |
+
"alternate_resolution_action": "acknowledge",
|
| 237 |
+
"alternate_route_score_multiplier": 0.88,
|
| 238 |
+
},
|
| 239 |
+
{
|
| 240 |
+
"ticket_id": "ticket-060",
|
| 241 |
+
"title": "Re: Temporary sandbox extension for the signed pilot",
|
| 242 |
+
"requester": "solutions@bluequarry.io",
|
| 243 |
+
"description": (
|
| 244 |
+
"Following up on ticket-047. The customer launch rehearsal is tomorrow, so the "
|
| 245 |
+
"sandbox extension needs either immediate execution or a commercial owner to unblock it."
|
| 246 |
+
),
|
| 247 |
+
"issue_type": "service_request",
|
| 248 |
+
"priority": "high",
|
| 249 |
+
"assignment_group": "service_desk",
|
| 250 |
+
"resolution_action": "escalate",
|
| 251 |
+
"related_ticket_id": "ticket-047",
|
| 252 |
+
"planning_note": (
|
| 253 |
+
"Immediate operational execution is preferred. Procurement can still own the "
|
| 254 |
+
"approval path if service-desk capacity is already depleted."
|
| 255 |
+
),
|
| 256 |
+
"alternate_assignment_group": "procurement",
|
| 257 |
+
"alternate_resolution_action": "assign",
|
| 258 |
+
"alternate_route_score_multiplier": 0.8,
|
| 259 |
+
},
|
| 260 |
+
{
|
| 261 |
+
"ticket_id": "ticket-061",
|
| 262 |
+
"title": "Risk-exception review is blocking an SSO restore",
|
| 263 |
+
"requester": "identity-risk@sterlingmed.io",
|
| 264 |
+
"description": (
|
| 265 |
+
"Users cannot log in through SSO until a temporary risk exception is approved. "
|
| 266 |
+
"The product team may need logs, but the unblock decision is tied to the review."
|
| 267 |
+
),
|
| 268 |
+
"issue_type": "identity_access",
|
| 269 |
+
"priority": "critical",
|
| 270 |
+
"assignment_group": "security_team",
|
| 271 |
+
"resolution_action": "escalate",
|
| 272 |
+
"planning_note": (
|
| 273 |
+
"Security owns the final unblock decision. If security is saturated, the "
|
| 274 |
+
"application team can still take the first-response diagnostics path."
|
| 275 |
+
),
|
| 276 |
+
"alternate_issue_type": "application_support",
|
| 277 |
+
"alternate_priority": "high",
|
| 278 |
+
"alternate_assignment_group": "application_team",
|
| 279 |
+
"alternate_resolution_action": "escalate",
|
| 280 |
+
"alternate_route_score_multiplier": 0.74,
|
| 281 |
+
},
|
| 282 |
+
{
|
| 283 |
+
"ticket_id": "ticket-062",
|
| 284 |
+
"title": "Need product remediation evidence for a customer security questionnaire",
|
| 285 |
+
"requester": "assurance@clientgrid.com",
|
| 286 |
+
"description": (
|
| 287 |
+
"A customer questionnaire asks for evidence that a previously remediated "
|
| 288 |
+
"application vulnerability is fully closed."
|
| 289 |
+
),
|
| 290 |
+
"issue_type": "security_compliance",
|
| 291 |
+
"priority": "medium",
|
| 292 |
+
"assignment_group": "application_team",
|
| 293 |
+
"resolution_action": "fulfill",
|
| 294 |
+
"planning_note": (
|
| 295 |
+
"Application engineering is preferred because they hold the remediation artifacts. "
|
| 296 |
+
"Security can still acknowledge the questionnaire and buy time when app capacity is tight."
|
| 297 |
+
),
|
| 298 |
+
"alternate_assignment_group": "security_team",
|
| 299 |
+
"alternate_resolution_action": "acknowledge",
|
| 300 |
+
"alternate_route_score_multiplier": 0.82,
|
| 301 |
+
},
|
| 302 |
+
{
|
| 303 |
+
"ticket_id": "ticket-063",
|
| 304 |
+
"title": "Subsidiary admin training with a seat-transfer request",
|
| 305 |
+
"requester": "enablement@globalcorp.com",
|
| 306 |
+
"description": (
|
| 307 |
+
"A newly acquired subsidiary needs admin training next week and also wants "
|
| 308 |
+
"to transfer existing seats into the parent contract."
|
| 309 |
+
),
|
| 310 |
+
"issue_type": "service_request",
|
| 311 |
+
"priority": "medium",
|
| 312 |
+
"assignment_group": "procurement",
|
| 313 |
+
"resolution_action": "assign",
|
| 314 |
+
"planning_note": (
|
| 315 |
+
"Procurement owns the commercial transfer. If that queue is overloaded, "
|
| 316 |
+
"onboarding operations can still deliver the training portion first."
|
| 317 |
+
),
|
| 318 |
+
"alternate_issue_type": "onboarding",
|
| 319 |
+
"alternate_assignment_group": "onboarding_ops",
|
| 320 |
+
"alternate_resolution_action": "fulfill",
|
| 321 |
+
"alternate_route_score_multiplier": 0.78,
|
| 322 |
+
},
|
| 323 |
+
{
|
| 324 |
+
"ticket_id": "ticket-064",
|
| 325 |
+
"title": "Legal-hold export of invoice history",
|
| 326 |
+
"requester": "legalops@northshoreenergy.com",
|
| 327 |
+
"description": (
|
| 328 |
+
"Legal needs invoice history exported for a hold notice. No pricing change is "
|
| 329 |
+
"required, but the request must be acknowledged today."
|
| 330 |
+
),
|
| 331 |
+
"issue_type": "general_inquiry",
|
| 332 |
+
"priority": "high",
|
| 333 |
+
"assignment_group": "license_ops",
|
| 334 |
+
"resolution_action": "fulfill",
|
| 335 |
+
"planning_note": (
|
| 336 |
+
"License operations can deliver the export. If they are capacity-constrained, "
|
| 337 |
+
"service desk can acknowledge the request and queue the retrieval."
|
| 338 |
+
),
|
| 339 |
+
"alternate_assignment_group": "service_desk",
|
| 340 |
+
"alternate_resolution_action": "acknowledge",
|
| 341 |
+
"alternate_route_score_multiplier": 0.87,
|
| 342 |
+
},
|
| 343 |
+
{
|
| 344 |
+
"ticket_id": "ticket-065",
|
| 345 |
+
"title": "Cross-functional launch checklist for an acquired support team",
|
| 346 |
+
"requester": "integration@mergerco.com",
|
| 347 |
+
"description": (
|
| 348 |
+
"Twelve support agents from an acquired business need onboarding, mailbox "
|
| 349 |
+
"setup, and a security attestation before Monday."
|
| 350 |
+
),
|
| 351 |
+
"issue_type": "onboarding",
|
| 352 |
+
"priority": "high",
|
| 353 |
+
"assignment_group": "service_desk",
|
| 354 |
+
"resolution_action": "assign",
|
| 355 |
+
"planning_note": (
|
| 356 |
+
"Central coordination is preferred. If service-desk capacity is exhausted, "
|
| 357 |
+
"onboarding operations can still run a reduced fulfillment path."
|
| 358 |
+
),
|
| 359 |
+
"alternate_priority": "medium",
|
| 360 |
+
"alternate_assignment_group": "onboarding_ops",
|
| 361 |
+
"alternate_resolution_action": "fulfill",
|
| 362 |
+
"alternate_route_score_multiplier": 0.81,
|
| 363 |
+
},
|
| 364 |
+
{
|
| 365 |
+
"ticket_id": "ticket-066",
|
| 366 |
+
"title": "Pilot customer asks who approves a credential-defense allowlist",
|
| 367 |
+
"requester": "pilotops@cruxsystems.io",
|
| 368 |
+
"description": (
|
| 369 |
+
"A pilot customer needs to know who approves an IP allowlist for a credential-"
|
| 370 |
+
"defense control before they continue their test."
|
| 371 |
+
),
|
| 372 |
+
"issue_type": "general_inquiry",
|
| 373 |
+
"priority": "medium",
|
| 374 |
+
"assignment_group": "security_team",
|
| 375 |
+
"resolution_action": "assign",
|
| 376 |
+
"planning_note": (
|
| 377 |
+
"Security should own the answer when available. If that queue is overloaded, "
|
| 378 |
+
"service desk can acknowledge and route the ownership question."
|
| 379 |
+
),
|
| 380 |
+
"alternate_assignment_group": "service_desk",
|
| 381 |
+
"alternate_resolution_action": "acknowledge",
|
| 382 |
+
"alternate_route_score_multiplier": 0.84,
|
| 383 |
+
},
|
| 384 |
+
{
|
| 385 |
+
"ticket_id": "ticket-067",
|
| 386 |
+
"title": "Re: Remediation evidence package is now blocking a renewal signature",
|
| 387 |
+
"requester": "assurance@clientgrid.com",
|
| 388 |
+
"description": (
|
| 389 |
+
"Following up on ticket-052. Renewal signature is blocked until the remediation "
|
| 390 |
+
"evidence package is delivered or a commercial owner confirms the delay."
|
| 391 |
+
),
|
| 392 |
+
"issue_type": "security_compliance",
|
| 393 |
+
"priority": "high",
|
| 394 |
+
"assignment_group": "application_team",
|
| 395 |
+
"resolution_action": "escalate",
|
| 396 |
+
"related_ticket_id": "ticket-052",
|
| 397 |
+
"planning_note": (
|
| 398 |
+
"Application engineering is preferred because they own the evidence. Procurement "
|
| 399 |
+
"can still coordinate the renewal communication if the evidence queue is saturated."
|
| 400 |
+
),
|
| 401 |
+
"alternate_issue_type": "service_request",
|
| 402 |
+
"alternate_priority": "medium",
|
| 403 |
+
"alternate_assignment_group": "procurement",
|
| 404 |
+
"alternate_resolution_action": "assign",
|
| 405 |
+
"alternate_route_score_multiplier": 0.76,
|
| 406 |
+
},
|
| 407 |
+
]
|
| 408 |
+
|
| 409 |
+
|
| 410 |
+
def _apply_dataset_enhancements(
|
| 411 |
+
dataset: list[HelpdeskTicketRecord],
|
| 412 |
+
) -> list[HelpdeskTicketRecord]:
|
| 413 |
+
enhanced_dataset: list[HelpdeskTicketRecord] = []
|
| 414 |
+
for record in dataset:
|
| 415 |
+
update = PLANNING_ROUTE_UPDATES.get(record.ticket_id)
|
| 416 |
+
enhanced_dataset.append(
|
| 417 |
+
record.model_copy(update=update) if update is not None else record
|
| 418 |
+
)
|
| 419 |
+
|
| 420 |
+
seen_ids = {record.ticket_id for record in enhanced_dataset}
|
| 421 |
+
for raw_record in CURATED_EXPANSION_RECORDS:
|
| 422 |
+
ticket_id = str(raw_record["ticket_id"])
|
| 423 |
+
if ticket_id in seen_ids:
|
| 424 |
+
raise ValueError(f"Duplicate ticket_id in curated expansion: {ticket_id}")
|
| 425 |
+
enhanced_dataset.append(HelpdeskTicketRecord.model_validate(raw_record))
|
| 426 |
+
seen_ids.add(ticket_id)
|
| 427 |
+
return enhanced_dataset
|
| 428 |
+
|
| 429 |
assert tuple(TASKS.keys()) == TASK_IDS
|
| 430 |
|
| 431 |
|
|
|
|
| 434 |
# Accept UTF-8 files saved with a BOM, which is common on Windows editors.
|
| 435 |
with dataset_path.open("r", encoding="utf-8-sig") as f:
|
| 436 |
raw = json.load(f)
|
| 437 |
+
dataset = [HelpdeskTicketRecord.model_validate(r) for r in raw]
|
| 438 |
+
return _apply_dataset_enhancements(dataset)
|
| 439 |
|
| 440 |
|
| 441 |
def get_task_definition(task_id: int) -> dict:
|
tests/test_api_integration.py
CHANGED
|
@@ -517,8 +517,8 @@ class TestHeuristicInferenceRegression(unittest.TestCase):
|
|
| 517 |
self.assertIsInstance(reward, float)
|
| 518 |
|
| 519 |
def test_overall_average_reward_in_expected_range(self):
|
| 520 |
-
"""4.2.2 — Overall average reward across all 3 tasks
|
| 521 |
-
|
| 522 |
"""
|
| 523 |
rewards = []
|
| 524 |
for task_id in self._TASKS:
|
|
@@ -529,8 +529,8 @@ class TestHeuristicInferenceRegression(unittest.TestCase):
|
|
| 529 |
overall_avg = sum(rewards) / len(rewards)
|
| 530 |
self.assertGreaterEqual(
|
| 531 |
overall_avg,
|
| 532 |
-
0.
|
| 533 |
-
f"Overall average reward {overall_avg:.4f} is below the smoke-test floor of 0.
|
| 534 |
)
|
| 535 |
self.assertLessEqual(
|
| 536 |
overall_avg,
|
|
|
|
| 517 |
self.assertIsInstance(reward, float)
|
| 518 |
|
| 519 |
def test_overall_average_reward_in_expected_range(self):
|
| 520 |
+
"""4.2.2 — Overall average reward across all 3 tasks stays in a healthy
|
| 521 |
+
smoke-test range for the plain no-investigation heuristic baseline.
|
| 522 |
"""
|
| 523 |
rewards = []
|
| 524 |
for task_id in self._TASKS:
|
|
|
|
| 529 |
overall_avg = sum(rewards) / len(rewards)
|
| 530 |
self.assertGreaterEqual(
|
| 531 |
overall_avg,
|
| 532 |
+
0.45,
|
| 533 |
+
f"Overall average reward {overall_avg:.4f} is below the smoke-test floor of 0.45",
|
| 534 |
)
|
| 535 |
self.assertLessEqual(
|
| 536 |
overall_avg,
|
tests/test_competitive_upgrade.py
CHANGED
|
@@ -643,10 +643,39 @@ class TestInvestigationActions(unittest.TestCase):
|
|
| 643 |
tool_name="lookup_internal_routing_note",
|
| 644 |
)
|
| 645 |
)
|
| 646 |
-
self.
|
| 647 |
self.assertEqual(obs.current_ticket["ambiguity_note"], ticket.ambiguity_note)
|
| 648 |
self.assertGreater(obs.reward or 0.0, 0.0)
|
| 649 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 650 |
def test_submit_without_required_investigation_gets_shaping_penalty(self) -> None:
|
| 651 |
from unittest.mock import patch
|
| 652 |
|
|
@@ -710,7 +739,12 @@ class TestQueueEconomics(unittest.TestCase):
|
|
| 710 |
final_obs = env.step(HelpdeskTicketAction(issue_type=ticket.issue_type))
|
| 711 |
|
| 712 |
self.assertTrue(final_obs.done)
|
| 713 |
-
self.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 714 |
|
| 715 |
|
| 716 |
class TestTerminalInvalidActionFinalReward(unittest.TestCase):
|
|
|
|
| 643 |
tool_name="lookup_internal_routing_note",
|
| 644 |
)
|
| 645 |
)
|
| 646 |
+
self.assertIn(ticket.ambiguity_note, obs.last_tool_result["routing_note"])
|
| 647 |
self.assertEqual(obs.current_ticket["ambiguity_note"], ticket.ambiguity_note)
|
| 648 |
self.assertGreater(obs.reward or 0.0, 0.0)
|
| 649 |
|
| 650 |
+
def test_queue_capacity_forecast_reveals_routing_options(self) -> None:
|
| 651 |
+
from unittest.mock import patch
|
| 652 |
+
|
| 653 |
+
dataset = load_dataset()
|
| 654 |
+
ticket = next(
|
| 655 |
+
(t for t in dataset if t.alternate_route_score_multiplier > 0.0),
|
| 656 |
+
None,
|
| 657 |
+
)
|
| 658 |
+
self.assertIsNotNone(ticket)
|
| 659 |
+
|
| 660 |
+
env = _make_env()
|
| 661 |
+
with patch.object(env, "_dataset", [ticket]):
|
| 662 |
+
with patch.object(env, "_tickets_by_id", {ticket.ticket_id: ticket}):
|
| 663 |
+
obs = env.reset(seed=0, task_id=3, queue_size=1)
|
| 664 |
+
|
| 665 |
+
self.assertNotIn("routing_options", obs.current_ticket)
|
| 666 |
+
obs = env.step(
|
| 667 |
+
HelpdeskTicketAction(
|
| 668 |
+
action_type="investigate",
|
| 669 |
+
tool_name="lookup_queue_capacity_forecast",
|
| 670 |
+
)
|
| 671 |
+
)
|
| 672 |
+
|
| 673 |
+
self.assertEqual(obs.last_tool_result["tool_name"], "lookup_queue_capacity_forecast")
|
| 674 |
+
self.assertTrue(obs.last_tool_result["found"])
|
| 675 |
+
self.assertIn("preferred_route_label", obs.last_tool_result)
|
| 676 |
+
self.assertIn("routing_options", obs.current_ticket)
|
| 677 |
+
self.assertGreaterEqual(len(obs.current_ticket["routing_options"]), 2)
|
| 678 |
+
|
| 679 |
def test_submit_without_required_investigation_gets_shaping_penalty(self) -> None:
|
| 680 |
from unittest.mock import patch
|
| 681 |
|
|
|
|
| 739 |
final_obs = env.step(HelpdeskTicketAction(issue_type=ticket.issue_type))
|
| 740 |
|
| 741 |
self.assertTrue(final_obs.done)
|
| 742 |
+
self.assertLess(final_obs.reward, 1.0)
|
| 743 |
+
self.assertAlmostEqual(
|
| 744 |
+
final_obs.last_reward_components.get("investigation_penalty_applied", 0.0),
|
| 745 |
+
0.04,
|
| 746 |
+
places=9,
|
| 747 |
+
)
|
| 748 |
|
| 749 |
|
| 750 |
class TestTerminalInvalidActionFinalReward(unittest.TestCase):
|
tests/test_extra_fields_penalty.py
CHANGED
|
@@ -44,8 +44,8 @@ def _make_env() -> HelpdeskTicketRoutingEnvironment:
|
|
| 44 |
class TestExtraFieldsPenalty(unittest.TestCase):
|
| 45 |
"""Requirement 7: step() rejects actions with fields outside the task's allowed_fields."""
|
| 46 |
|
| 47 |
-
def
|
| 48 |
-
"""Task 1 penalties should keep the returned reward inside the
|
| 49 |
env = _make_env()
|
| 50 |
obs = env.reset(seed=42, task_id=1)
|
| 51 |
|
|
@@ -61,7 +61,7 @@ class TestExtraFieldsPenalty(unittest.TestCase):
|
|
| 61 |
penalty_obs = env.step(action)
|
| 62 |
|
| 63 |
self.assertIsInstance(penalty_obs, HelpdeskTicketObservation)
|
| 64 |
-
self.
|
| 65 |
self.assertLess(penalty_obs.reward, 1.0)
|
| 66 |
|
| 67 |
def test_extra_fields_advances_ticket_index(self) -> None:
|
|
@@ -78,8 +78,8 @@ class TestExtraFieldsPenalty(unittest.TestCase):
|
|
| 78 |
|
| 79 |
self.assertEqual(penalty_obs.tickets_processed, 1)
|
| 80 |
|
| 81 |
-
def
|
| 82 |
-
"""per_ticket_scores must stay in the
|
| 83 |
env = _make_env()
|
| 84 |
env.reset(seed=42, task_id=1)
|
| 85 |
|
|
@@ -91,7 +91,7 @@ class TestExtraFieldsPenalty(unittest.TestCase):
|
|
| 91 |
|
| 92 |
state = env.state
|
| 93 |
self.assertEqual(len(state.per_ticket_scores), 1)
|
| 94 |
-
self.
|
| 95 |
self.assertLess(state.per_ticket_scores[0], 1.0)
|
| 96 |
|
| 97 |
def test_extra_fields_history_entry_has_penalty_reason(self) -> None:
|
|
@@ -109,7 +109,7 @@ class TestExtraFieldsPenalty(unittest.TestCase):
|
|
| 109 |
entry = penalty_obs.history[0]
|
| 110 |
self.assertIn("penalty_reason", entry)
|
| 111 |
self.assertIn("assignment_group", entry["penalty_reason"])
|
| 112 |
-
self.
|
| 113 |
self.assertLess(entry["score"], 1.0)
|
| 114 |
|
| 115 |
def test_no_extra_fields_grades_normally(self) -> None:
|
|
|
|
| 44 |
class TestExtraFieldsPenalty(unittest.TestCase):
|
| 45 |
"""Requirement 7: step() rejects actions with fields outside the task's allowed_fields."""
|
| 46 |
|
| 47 |
+
def test_extra_fields_returns_closed_interval_penalty_reward(self) -> None:
|
| 48 |
+
"""Task 1 penalties should keep the returned reward inside the unit interval."""
|
| 49 |
env = _make_env()
|
| 50 |
obs = env.reset(seed=42, task_id=1)
|
| 51 |
|
|
|
|
| 61 |
penalty_obs = env.step(action)
|
| 62 |
|
| 63 |
self.assertIsInstance(penalty_obs, HelpdeskTicketObservation)
|
| 64 |
+
self.assertGreaterEqual(penalty_obs.reward, 0.0)
|
| 65 |
self.assertLess(penalty_obs.reward, 1.0)
|
| 66 |
|
| 67 |
def test_extra_fields_advances_ticket_index(self) -> None:
|
|
|
|
| 78 |
|
| 79 |
self.assertEqual(penalty_obs.tickets_processed, 1)
|
| 80 |
|
| 81 |
+
def test_extra_fields_records_score_inside_unit_interval(self) -> None:
|
| 82 |
+
"""per_ticket_scores must stay in the unit interval after a penalty step."""
|
| 83 |
env = _make_env()
|
| 84 |
env.reset(seed=42, task_id=1)
|
| 85 |
|
|
|
|
| 91 |
|
| 92 |
state = env.state
|
| 93 |
self.assertEqual(len(state.per_ticket_scores), 1)
|
| 94 |
+
self.assertGreaterEqual(state.per_ticket_scores[0], 0.0)
|
| 95 |
self.assertLess(state.per_ticket_scores[0], 1.0)
|
| 96 |
|
| 97 |
def test_extra_fields_history_entry_has_penalty_reason(self) -> None:
|
|
|
|
| 109 |
entry = penalty_obs.history[0]
|
| 110 |
self.assertIn("penalty_reason", entry)
|
| 111 |
self.assertIn("assignment_group", entry["penalty_reason"])
|
| 112 |
+
self.assertGreaterEqual(entry["score"], 0.0)
|
| 113 |
self.assertLess(entry["score"], 1.0)
|
| 114 |
|
| 115 |
def test_no_extra_fields_grades_normally(self) -> None:
|
tests/test_grader_unit.py
CHANGED
|
@@ -47,7 +47,7 @@ class GraderUnitTests(unittest.TestCase):
|
|
| 47 |
|
| 48 |
score, breakdown = grade_action(action, ticket, task_id=3)
|
| 49 |
|
| 50 |
-
self.assertAlmostEqual(score,
|
| 51 |
self.assertEqual(
|
| 52 |
breakdown,
|
| 53 |
{
|
|
@@ -88,7 +88,7 @@ class GraderUnitTests(unittest.TestCase):
|
|
| 88 |
if predicted == expected
|
| 89 |
else ISSUE_TYPE_SIMILARITY.get((predicted, expected), 0.0)
|
| 90 |
)
|
| 91 |
-
expected_task_score = max(0.
|
| 92 |
self.assertAlmostEqual(score, expected_task_score)
|
| 93 |
self.assertEqual(breakdown, {"issue_type": raw_expected_score})
|
| 94 |
|
|
@@ -98,7 +98,7 @@ class GraderUnitTests(unittest.TestCase):
|
|
| 98 |
|
| 99 |
score, breakdown = grade_action(action, ticket, task_id=1)
|
| 100 |
|
| 101 |
-
self.assertAlmostEqual(score, 0.
|
| 102 |
self.assertEqual(breakdown, {"issue_type": 0.0})
|
| 103 |
|
| 104 |
def test_priority_scoring_uses_defined_proximity_table(self) -> None:
|
|
@@ -133,7 +133,7 @@ class GraderUnitTests(unittest.TestCase):
|
|
| 133 |
{"issue_type": 1.0, "priority": priority_score},
|
| 134 |
)
|
| 135 |
raw_score = 0.6 + 0.4 * priority_score
|
| 136 |
-
expected_task_score = max(0.
|
| 137 |
self.assertAlmostEqual(score, expected_task_score)
|
| 138 |
|
| 139 |
def test_task_2_weights_apply_as_documented(self) -> None:
|
|
@@ -195,6 +195,42 @@ class GraderUnitTests(unittest.TestCase):
|
|
| 195 |
)
|
| 196 |
self.assertAlmostEqual(score, 0.65)
|
| 197 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 198 |
def test_resolution_action_partial_credit_uses_declared_similarity_table(self) -> None:
|
| 199 |
ticket = _ticket()
|
| 200 |
action = HelpdeskTicketAction(
|
|
@@ -252,7 +288,7 @@ class GraderUnitTests(unittest.TestCase):
|
|
| 252 |
},
|
| 253 |
)
|
| 254 |
raw_score = 0.35 + 0.20 + 0.25 * assignment_group_score + 0.20
|
| 255 |
-
expected_task_score = max(0.
|
| 256 |
self.assertAlmostEqual(score, expected_task_score)
|
| 257 |
|
| 258 |
def test_resolution_action_scoring_matches_declared_similarity_table_exhaustively(self) -> None:
|
|
@@ -284,7 +320,7 @@ class GraderUnitTests(unittest.TestCase):
|
|
| 284 |
},
|
| 285 |
)
|
| 286 |
raw_score = 0.35 + 0.20 + 0.25 + 0.20 * resolution_action_score
|
| 287 |
-
expected_task_score = max(0.
|
| 288 |
self.assertAlmostEqual(score, expected_task_score)
|
| 289 |
|
| 290 |
def test_partial_credit_tables_never_override_exact_match(self) -> None:
|
|
|
|
| 47 |
|
| 48 |
score, breakdown = grade_action(action, ticket, task_id=3)
|
| 49 |
|
| 50 |
+
self.assertAlmostEqual(score, 1.0)
|
| 51 |
self.assertEqual(
|
| 52 |
breakdown,
|
| 53 |
{
|
|
|
|
| 88 |
if predicted == expected
|
| 89 |
else ISSUE_TYPE_SIMILARITY.get((predicted, expected), 0.0)
|
| 90 |
)
|
| 91 |
+
expected_task_score = max(0.0, min(1.0, raw_expected_score))
|
| 92 |
self.assertAlmostEqual(score, expected_task_score)
|
| 93 |
self.assertEqual(breakdown, {"issue_type": raw_expected_score})
|
| 94 |
|
|
|
|
| 98 |
|
| 99 |
score, breakdown = grade_action(action, ticket, task_id=1)
|
| 100 |
|
| 101 |
+
self.assertAlmostEqual(score, 0.0)
|
| 102 |
self.assertEqual(breakdown, {"issue_type": 0.0})
|
| 103 |
|
| 104 |
def test_priority_scoring_uses_defined_proximity_table(self) -> None:
|
|
|
|
| 133 |
{"issue_type": 1.0, "priority": priority_score},
|
| 134 |
)
|
| 135 |
raw_score = 0.6 + 0.4 * priority_score
|
| 136 |
+
expected_task_score = max(0.0, min(1.0, raw_score))
|
| 137 |
self.assertAlmostEqual(score, expected_task_score)
|
| 138 |
|
| 139 |
def test_task_2_weights_apply_as_documented(self) -> None:
|
|
|
|
| 195 |
)
|
| 196 |
self.assertAlmostEqual(score, 0.65)
|
| 197 |
|
| 198 |
+
def test_alternate_route_can_win_when_primary_route_is_worse(self) -> None:
|
| 199 |
+
ticket = HelpdeskTicketRecord(
|
| 200 |
+
ticket_id="ticket-alt",
|
| 201 |
+
title="Planning ticket",
|
| 202 |
+
requester="planner@example.com",
|
| 203 |
+
description="Capacity-sensitive routing decision.",
|
| 204 |
+
issue_type="service_request",
|
| 205 |
+
priority="medium",
|
| 206 |
+
assignment_group="procurement",
|
| 207 |
+
resolution_action="assign",
|
| 208 |
+
alternate_issue_type="billing_license",
|
| 209 |
+
alternate_priority="high",
|
| 210 |
+
alternate_assignment_group="license_ops",
|
| 211 |
+
alternate_resolution_action="fulfill",
|
| 212 |
+
alternate_route_score_multiplier=0.85,
|
| 213 |
+
)
|
| 214 |
+
action = HelpdeskTicketAction(
|
| 215 |
+
issue_type="billing_license",
|
| 216 |
+
priority="high",
|
| 217 |
+
assignment_group="license_ops",
|
| 218 |
+
resolution_action="fulfill",
|
| 219 |
+
)
|
| 220 |
+
|
| 221 |
+
score, breakdown = grade_action(action, ticket, task_id=3)
|
| 222 |
+
|
| 223 |
+
self.assertAlmostEqual(score, 0.85)
|
| 224 |
+
self.assertEqual(
|
| 225 |
+
breakdown,
|
| 226 |
+
{
|
| 227 |
+
"issue_type": 0.85,
|
| 228 |
+
"priority": 0.85,
|
| 229 |
+
"assignment_group": 0.85,
|
| 230 |
+
"resolution_action": 0.85,
|
| 231 |
+
},
|
| 232 |
+
)
|
| 233 |
+
|
| 234 |
def test_resolution_action_partial_credit_uses_declared_similarity_table(self) -> None:
|
| 235 |
ticket = _ticket()
|
| 236 |
action = HelpdeskTicketAction(
|
|
|
|
| 288 |
},
|
| 289 |
)
|
| 290 |
raw_score = 0.35 + 0.20 + 0.25 * assignment_group_score + 0.20
|
| 291 |
+
expected_task_score = max(0.0, min(1.0, raw_score))
|
| 292 |
self.assertAlmostEqual(score, expected_task_score)
|
| 293 |
|
| 294 |
def test_resolution_action_scoring_matches_declared_similarity_table_exhaustively(self) -> None:
|
|
|
|
| 320 |
},
|
| 321 |
)
|
| 322 |
raw_score = 0.35 + 0.20 + 0.25 + 0.20 * resolution_action_score
|
| 323 |
+
expected_task_score = max(0.0, min(1.0, raw_score))
|
| 324 |
self.assertAlmostEqual(score, expected_task_score)
|
| 325 |
|
| 326 |
def test_partial_credit_tables_never_override_exact_match(self) -> None:
|
tests/test_tasks_unit.py
CHANGED
|
@@ -8,7 +8,7 @@ import openenv_test_stubs # noqa: F401
|
|
| 8 |
|
| 9 |
from models import HelpdeskTicketRecord
|
| 10 |
from server import tasks as task_module
|
| 11 |
-
from server.tasks import TASKS, get_task_definition, load_dataset
|
| 12 |
from vocabulary import (
|
| 13 |
ASSIGNMENT_GROUPS,
|
| 14 |
ISSUE_TYPES,
|
|
@@ -51,7 +51,17 @@ class TasksAndDatasetUnitTests(unittest.TestCase):
|
|
| 51 |
dataset = load_dataset()
|
| 52 |
|
| 53 |
self.assertGreaterEqual(len(dataset), 45)
|
| 54 |
-
self.assertTrue(
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 55 |
|
| 56 |
def test_dataset_ticket_ids_are_unique(self) -> None:
|
| 57 |
dataset = load_dataset()
|
|
@@ -100,9 +110,13 @@ class TasksAndDatasetUnitTests(unittest.TestCase):
|
|
| 100 |
dataset = load_dataset()
|
| 101 |
ambiguity_count = sum(1 for record in dataset if record.ambiguity_note)
|
| 102 |
follow_up_count = sum(1 for record in dataset if record.related_ticket_id)
|
|
|
|
|
|
|
|
|
|
| 103 |
|
| 104 |
self.assertGreaterEqual(ambiguity_count, 4)
|
| 105 |
self.assertGreaterEqual(follow_up_count, 3)
|
|
|
|
| 106 |
|
| 107 |
def test_load_dataset_accepts_utf8_bom(self) -> None:
|
| 108 |
sample = (
|
|
@@ -129,7 +143,8 @@ class TasksAndDatasetUnitTests(unittest.TestCase):
|
|
| 129 |
with mock.patch.object(task_module.Path, "open", fake_open):
|
| 130 |
dataset = load_dataset()
|
| 131 |
|
| 132 |
-
self.
|
|
|
|
| 133 |
|
| 134 |
|
| 135 |
if __name__ == "__main__":
|
|
|
|
| 8 |
|
| 9 |
from models import HelpdeskTicketRecord
|
| 10 |
from server import tasks as task_module
|
| 11 |
+
from server.tasks import CURATED_EXPANSION_RECORDS, TASKS, get_task_definition, load_dataset
|
| 12 |
from vocabulary import (
|
| 13 |
ASSIGNMENT_GROUPS,
|
| 14 |
ISSUE_TYPES,
|
|
|
|
| 51 |
dataset = load_dataset()
|
| 52 |
|
| 53 |
self.assertGreaterEqual(len(dataset), 45)
|
| 54 |
+
self.assertTrue(
|
| 55 |
+
all(
|
| 56 |
+
isinstance(record, HelpdeskTicketRecord)
|
| 57 |
+
or (
|
| 58 |
+
record.__class__.__name__ == "HelpdeskTicketRecord"
|
| 59 |
+
and hasattr(record, "model_dump")
|
| 60 |
+
and hasattr(record, "ticket_id")
|
| 61 |
+
)
|
| 62 |
+
for record in dataset
|
| 63 |
+
)
|
| 64 |
+
)
|
| 65 |
|
| 66 |
def test_dataset_ticket_ids_are_unique(self) -> None:
|
| 67 |
dataset = load_dataset()
|
|
|
|
| 110 |
dataset = load_dataset()
|
| 111 |
ambiguity_count = sum(1 for record in dataset if record.ambiguity_note)
|
| 112 |
follow_up_count = sum(1 for record in dataset if record.related_ticket_id)
|
| 113 |
+
alternate_route_count = sum(
|
| 114 |
+
1 for record in dataset if record.alternate_route_score_multiplier > 0.0
|
| 115 |
+
)
|
| 116 |
|
| 117 |
self.assertGreaterEqual(ambiguity_count, 4)
|
| 118 |
self.assertGreaterEqual(follow_up_count, 3)
|
| 119 |
+
self.assertGreaterEqual(alternate_route_count, 10)
|
| 120 |
|
| 121 |
def test_load_dataset_accepts_utf8_bom(self) -> None:
|
| 122 |
sample = (
|
|
|
|
| 143 |
with mock.patch.object(task_module.Path, "open", fake_open):
|
| 144 |
dataset = load_dataset()
|
| 145 |
|
| 146 |
+
self.assertIn("ticket-bom", [record.ticket_id for record in dataset])
|
| 147 |
+
self.assertEqual(len(dataset), 1 + len(CURATED_EXPANSION_RECORDS))
|
| 148 |
|
| 149 |
|
| 150 |
if __name__ == "__main__":
|