Spaces:
Running
Running
Upgrade helpdesk env with queue dynamics and operational actions
Browse files- README.md +31 -17
- inference.py +83 -2
- models.py +22 -1
- policy_learning.py +56 -2
- server/environment.py +665 -15
- server/grader.py +14 -4
- server/tasks.py +55 -13
- tests/test_api_integration.py +2 -2
- tests/test_competitive_upgrade.py +39 -20
- tests/test_extra_fields_penalty.py +102 -76
- tests/test_grader_unit.py +196 -42
- tests/test_policy_learning.py +1 -1
- tests/test_tasks_unit.py +9 -10
README.md
CHANGED
|
@@ -25,7 +25,7 @@ If a judge reads only one short explanation, it should be this:
|
|
| 25 |
|
| 26 |
- this environment models a real enterprise workflow, not a toy classification task
|
| 27 |
- each ticket requires typed routing decisions that are easy to score deterministically
|
| 28 |
-
- the task ladder
|
| 29 |
- the repo is small enough to rerun quickly and explicit enough to understand without hidden business logic
|
| 30 |
|
| 31 |
## What This Environment Simulates
|
|
@@ -34,9 +34,10 @@ The environment models a realistic helpdesk workflow:
|
|
| 34 |
|
| 35 |
1. a new ticket enters the queue
|
| 36 |
2. the agent reads the ticket title and description
|
| 37 |
-
3. the agent may investigate
|
| 38 |
-
4. the
|
| 39 |
-
5. the
|
|
|
|
| 40 |
|
| 41 |
For hard-task tickets, the environment can now withhold decisive routing context until the agent uses the right investigation tool. That keeps the task from collapsing into one-shot classification and makes tool choice part of the policy.
|
| 42 |
|
|
@@ -45,7 +46,7 @@ This domain is useful for OpenEnv because it is operationally realistic, easy to
|
|
| 45 |
## Why This Is A Good Hackathon Domain
|
| 46 |
|
| 47 |
- it reflects real enterprise support operations
|
| 48 |
-
- the action space is structured and judge-friendly,
|
| 49 |
- correctness can be scored deterministically
|
| 50 |
- the hard task is meaningfully harder than the easy and medium tasks
|
| 51 |
- the environment is small enough to rerun quickly
|
|
@@ -55,9 +56,9 @@ This domain is useful for OpenEnv because it is operationally realistic, easy to
|
|
| 55 |
The project uses a queue-based episode model.
|
| 56 |
|
| 57 |
- `reset()` samples a task and a queue of 3 to 5 tickets
|
| 58 |
-
- `step()` lets the agent investigate or submit one ticket at a time
|
| 59 |
- `state()` exposes the internal episode snapshot
|
| 60 |
-
- hard-task episodes also track queue-level capacity, alternate acceptable routes,
|
| 61 |
- final evaluation is based on the queue outcome, not on isolated per-ticket classification alone
|
| 62 |
|
| 63 |
The environment classes and vocabulary are intentionally frozen to keep collaboration and judging simple.
|
|
@@ -91,15 +92,15 @@ Artifacts are written to `analysis/policy_learning_runs/` by default:
|
|
| 91 |
- `search_eval_episodes.jsonl`
|
| 92 |
- `search_eval_trajectories.jsonl`
|
| 93 |
|
| 94 |
-
The default submit policy inside this runner stays deterministic and local. It reuses the repo's heuristic routing logic plus planning-aware routing overrides,
|
| 95 |
|
| 96 |
## Task Ladder
|
| 97 |
|
| 98 |
| ID | Name | Difficulty | Required Fields | What The Agent Must Do |
|
| 99 |
|----|------|------------|-----------------|-------------------------|
|
| 100 |
-
| 1 |
|
| 101 |
-
| 2 |
|
| 102 |
-
| 3 |
|
| 103 |
|
| 104 |
## Locked Vocabulary
|
| 105 |
|
|
@@ -151,10 +152,13 @@ Visible ticket fields:
|
|
| 151 |
- `description`
|
| 152 |
- optional `ambiguity_note`
|
| 153 |
- optional `planning_note`
|
|
|
|
| 154 |
- optional `related_ticket_id`
|
| 155 |
- optional `related_ticket_preview`
|
| 156 |
- optional `routing_options`
|
| 157 |
- optional `capacity_state`
|
|
|
|
|
|
|
| 158 |
|
| 159 |
Each observation also includes:
|
| 160 |
|
|
@@ -196,16 +200,23 @@ The internal `HelpdeskTicketState` tracks:
|
|
| 196 |
- `team_capacity_remaining`
|
| 197 |
- `high_priority_slots_remaining`
|
| 198 |
- `escalation_slots_remaining`
|
|
|
|
| 199 |
- `planning_penalty_total`
|
|
|
|
|
|
|
|
|
|
| 200 |
|
| 201 |
## Grading And Reward
|
| 202 |
|
| 203 |
Scoring is deterministic and normalized to `[0.0, 1.0]`.
|
| 204 |
|
| 205 |
-
The action model now supports
|
| 206 |
|
| 207 |
- `action_type="submit"` for the final routing answer
|
| 208 |
- `action_type="investigate"` with a small built-in tool surface before submission
|
|
|
|
|
|
|
|
|
|
| 209 |
|
| 210 |
Available tools:
|
| 211 |
|
|
@@ -223,6 +234,8 @@ Hard-task investigation behavior:
|
|
| 223 |
- blind or repeated probing does not pay by default
|
| 224 |
- premature hard-task submission can incur a shaping penalty even when the visible text looks plausible
|
| 225 |
- resource-greedy routing can add planning penalties later in the queue even when a single ticket looks correct in isolation
|
|
|
|
|
|
|
| 226 |
- terminal `rubric_reward` remains the objective evaluation signal, while per-step `reward` is the denser training signal
|
| 227 |
|
| 228 |
Per-field behavior:
|
|
@@ -237,9 +250,9 @@ Task weights:
|
|
| 237 |
|
| 238 |
| Task | Issue Type | Priority | Assignment Group | Resolution Action |
|
| 239 |
|------|------------|----------|------------------|-------------------|
|
| 240 |
-
| 1 |
|
| 241 |
-
| 2 |
|
| 242 |
-
| 3 |
|
| 243 |
|
| 244 |
Final episode rubric reward is queue-based:
|
| 245 |
|
|
@@ -251,7 +264,7 @@ Both `reward` and `rubric_reward` now use the closed interval `[0.0, 1.0]`.
|
|
| 251 |
|
| 252 |
Step reward is lightly milestone-shaped: high per-ticket scores get a small bonus and very low scores get a small penalty before the final clamp.
|
| 253 |
|
| 254 |
-
Final reward also includes a queue-economics penalty when the agent exceeds the free investigation budget. One investigation per queued ticket is free, but extra investigation steps reduce the final reward more noticeably than before. On hard-task queues, assignment-group capacity, high-priority slots,
|
| 255 |
|
| 256 |
To make the environment more RL-friendly, each observation now also surfaces structured reward telemetry:
|
| 257 |
|
|
@@ -302,7 +315,7 @@ It includes:
|
|
| 302 |
|
| 303 |
## Difficulty Coverage
|
| 304 |
|
| 305 |
-
The difficulty ladder is visible
|
| 306 |
|
| 307 |
Easy-style examples:
|
| 308 |
|
|
@@ -322,6 +335,7 @@ Hard-style examples:
|
|
| 322 |
- `ticket-029`: seat expansion combined with a prorating question
|
| 323 |
- `ticket-038`: follow-up billing thread with escalated urgency
|
| 324 |
- `ticket-045`: repeated account suspension thread with legal-escalation pressure
|
|
|
|
| 325 |
|
| 326 |
## Repository Layout
|
| 327 |
|
|
|
|
| 25 |
|
| 26 |
- this environment models a real enterprise workflow, not a toy classification task
|
| 27 |
- each ticket requires typed routing decisions that are easy to score deterministically
|
| 28 |
+
- the task ladder now keeps full routing on every task and scales observability, queue pressure, and operational controls instead
|
| 29 |
- the repo is small enough to rerun quickly and explicit enough to understand without hidden business logic
|
| 30 |
|
| 31 |
## What This Environment Simulates
|
|
|
|
| 34 |
|
| 35 |
1. a new ticket enters the queue
|
| 36 |
2. the agent reads the ticket title and description
|
| 37 |
+
3. the agent may investigate, request more information, open an incident, defer the ticket, or submit a routing decision
|
| 38 |
+
4. the queue state mutates: capacity shrinks, incidents stay open, deferred tickets return later, and poor handling can spawn follow-up tickets
|
| 39 |
+
5. the grader assigns deterministic credit
|
| 40 |
+
6. the environment advances until the queue is complete
|
| 41 |
|
| 42 |
For hard-task tickets, the environment can now withhold decisive routing context until the agent uses the right investigation tool. That keeps the task from collapsing into one-shot classification and makes tool choice part of the policy.
|
| 43 |
|
|
|
|
| 46 |
## Why This Is A Good Hackathon Domain
|
| 47 |
|
| 48 |
- it reflects real enterprise support operations
|
| 49 |
+
- the action space is structured and judge-friendly, but now includes meaningful operational controls beyond investigate-versus-submit
|
| 50 |
- correctness can be scored deterministically
|
| 51 |
- the hard task is meaningfully harder than the easy and medium tasks
|
| 52 |
- the environment is small enough to rerun quickly
|
|
|
|
| 56 |
The project uses a queue-based episode model.
|
| 57 |
|
| 58 |
- `reset()` samples a task and a queue of 3 to 5 tickets
|
| 59 |
+
- `step()` lets the agent investigate, request clarification, defer, open incidents, or submit one ticket at a time
|
| 60 |
- `state()` exposes the internal episode snapshot
|
| 61 |
+
- hard-task episodes also track queue-level capacity, incident slots, alternate acceptable routes, planning penalties, SLA pressure, and dynamic follow-up tickets across the queue
|
| 62 |
- final evaluation is based on the queue outcome, not on isolated per-ticket classification alone
|
| 63 |
|
| 64 |
The environment classes and vocabulary are intentionally frozen to keep collaboration and judging simple.
|
|
|
|
| 92 |
- `search_eval_episodes.jsonl`
|
| 93 |
- `search_eval_trajectories.jsonl`
|
| 94 |
|
| 95 |
+
The default submit policy inside this runner stays deterministic and local. It reuses the repo's heuristic routing logic plus planning-aware routing overrides, and the policy loop can now also exercise operational actions such as `request_info`, `open_incident`, and `defer` without depending on external LLM latency or API cost.
|
| 96 |
|
| 97 |
## Task Ladder
|
| 98 |
|
| 99 |
| ID | Name | Difficulty | Required Fields | What The Agent Must Do |
|
| 100 |
|----|------|------------|-----------------|-------------------------|
|
| 101 |
+
| 1 | Guided Full Routing | Easy | `issue_type`, `priority`, `assignment_group`, `resolution_action` | route a mostly visible ticket correctly |
|
| 102 |
+
| 2 | Contextual Full Routing | Medium | `issue_type`, `priority`, `assignment_group`, `resolution_action` | route under partial observability with investigation and clarification |
|
| 103 |
+
| 3 | Adaptive Queue Routing | Hard | `issue_type`, `priority`, `assignment_group`, `resolution_action` | route while managing queue pressure, incidents, deferrals, and downstream follow-ups |
|
| 104 |
|
| 105 |
## Locked Vocabulary
|
| 106 |
|
|
|
|
| 152 |
- `description`
|
| 153 |
- optional `ambiguity_note`
|
| 154 |
- optional `planning_note`
|
| 155 |
+
- optional `customer_update_note`
|
| 156 |
- optional `related_ticket_id`
|
| 157 |
- optional `related_ticket_preview`
|
| 158 |
- optional `routing_options`
|
| 159 |
- optional `capacity_state`
|
| 160 |
+
- optional `operational_context`
|
| 161 |
+
- optional `generated_from_ticket_id`
|
| 162 |
|
| 163 |
Each observation also includes:
|
| 164 |
|
|
|
|
| 200 |
- `team_capacity_remaining`
|
| 201 |
- `high_priority_slots_remaining`
|
| 202 |
- `escalation_slots_remaining`
|
| 203 |
+
- `incident_slots_remaining`
|
| 204 |
- `planning_penalty_total`
|
| 205 |
+
- `incident_gap_total`
|
| 206 |
+
- `sla_breach_count`
|
| 207 |
+
- `dynamic_queue_events`
|
| 208 |
|
| 209 |
## Grading And Reward
|
| 210 |
|
| 211 |
Scoring is deterministic and normalized to `[0.0, 1.0]`.
|
| 212 |
|
| 213 |
+
The action model now supports five paths:
|
| 214 |
|
| 215 |
- `action_type="submit"` for the final routing answer
|
| 216 |
- `action_type="investigate"` with a small built-in tool surface before submission
|
| 217 |
+
- `action_type="request_info"` to ask for customer / operator clarification on the current ticket
|
| 218 |
+
- `action_type="open_incident"` to reserve incident handling capacity before routing risky tickets
|
| 219 |
+
- `action_type="defer"` to push a ticket later in the queue and accept the downstream queue consequences
|
| 220 |
|
| 221 |
Available tools:
|
| 222 |
|
|
|
|
| 234 |
- blind or repeated probing does not pay by default
|
| 235 |
- premature hard-task submission can incur a shaping penalty even when the visible text looks plausible
|
| 236 |
- resource-greedy routing can add planning penalties later in the queue even when a single ticket looks correct in isolation
|
| 237 |
+
- incident-sensitive tickets can require an explicit `open_incident` step to avoid future follow-up debt
|
| 238 |
+
- bad or incomplete hard-task handling can append a deterministic follow-up ticket later in the same episode
|
| 239 |
- terminal `rubric_reward` remains the objective evaluation signal, while per-step `reward` is the denser training signal
|
| 240 |
|
| 241 |
Per-field behavior:
|
|
|
|
| 250 |
|
| 251 |
| Task | Issue Type | Priority | Assignment Group | Resolution Action |
|
| 252 |
|------|------------|----------|------------------|-------------------|
|
| 253 |
+
| 1 | 40% | 20% | 20% | 20% |
|
| 254 |
+
| 2 | 32% | 20% | 24% | 24% |
|
| 255 |
+
| 3 | 30% | 20% | 25% | 25% |
|
| 256 |
|
| 257 |
Final episode rubric reward is queue-based:
|
| 258 |
|
|
|
|
| 264 |
|
| 265 |
Step reward is lightly milestone-shaped: high per-ticket scores get a small bonus and very low scores get a small penalty before the final clamp.
|
| 266 |
|
| 267 |
+
Final reward also includes a queue-economics penalty when the agent exceeds the free investigation budget. One investigation-style step per queued ticket is free, but extra investigation or clarification steps reduce the final reward more noticeably than before. On hard-task queues, assignment-group capacity, high-priority slots, escalation slots, incident slots, and deferred-ticket SLA pressure all create cross-ticket trade-offs.
|
| 268 |
|
| 269 |
To make the environment more RL-friendly, each observation now also surfaces structured reward telemetry:
|
| 270 |
|
|
|
|
| 315 |
|
| 316 |
## Difficulty Coverage
|
| 317 |
|
| 318 |
+
The difficulty ladder is now visible in observability and control, not just in the submitted field count.
|
| 319 |
|
| 320 |
Easy-style examples:
|
| 321 |
|
|
|
|
| 335 |
- `ticket-029`: seat expansion combined with a prorating question
|
| 336 |
- `ticket-038`: follow-up billing thread with escalated urgency
|
| 337 |
- `ticket-045`: repeated account suspension thread with legal-escalation pressure
|
| 338 |
+
- generated `*-followup` tickets: deterministic reopened cases that only appear when the earlier handling was incomplete or operationally risky
|
| 339 |
|
| 340 |
## Repository Layout
|
| 341 |
|
inference.py
CHANGED
|
@@ -196,9 +196,11 @@ def format_recent_history_entries(
|
|
| 196 |
def build_llm_user_message(ticket: dict, allowed_fields: list[str], instructions: str) -> str:
|
| 197 |
ambiguity_note = ticket.get("ambiguity_note")
|
| 198 |
planning_note = ticket.get("planning_note")
|
|
|
|
| 199 |
related_preview = ticket.get("related_ticket_preview") or {}
|
| 200 |
last_tool_result = ticket.get("last_tool_result")
|
| 201 |
context_status = ticket.get("context_status") or {}
|
|
|
|
| 202 |
recent_history = ticket.get("recent_history") or []
|
| 203 |
feedback_summary = ticket.get("feedback_summary")
|
| 204 |
last_reward_components = ticket.get("last_reward_components") or {}
|
|
@@ -213,6 +215,8 @@ def build_llm_user_message(ticket: dict, allowed_fields: list[str], instructions
|
|
| 213 |
extra_context_lines.append(f"Ambiguity note: {ambiguity_note}")
|
| 214 |
if planning_note:
|
| 215 |
extra_context_lines.append(f"Planning note: {planning_note}")
|
|
|
|
|
|
|
| 216 |
if related_preview:
|
| 217 |
extra_context_lines.extend(
|
| 218 |
[
|
|
@@ -230,6 +234,10 @@ def build_llm_user_message(ticket: dict, allowed_fields: list[str], instructions
|
|
| 230 |
extra_context_lines.append(
|
| 231 |
"Context status: " + json.dumps(context_status, sort_keys=True)
|
| 232 |
)
|
|
|
|
|
|
|
|
|
|
|
|
|
| 233 |
if capacity_state:
|
| 234 |
extra_context_lines.append(
|
| 235 |
"Queue capacity state: " + json.dumps(capacity_state, sort_keys=True)
|
|
@@ -572,16 +580,19 @@ def build_routing_text(ticket: dict) -> str:
|
|
| 572 |
related_preview = ticket.get("related_ticket_preview") or {}
|
| 573 |
last_tool_result = ticket.get("last_tool_result") or {}
|
| 574 |
routing_options = ticket.get("routing_options") or []
|
|
|
|
| 575 |
return " ".join(
|
| 576 |
[
|
| 577 |
ticket.get("title", ""),
|
| 578 |
ticket.get("description", ""),
|
| 579 |
ticket.get("ambiguity_note", ""),
|
| 580 |
ticket.get("planning_note", ""),
|
|
|
|
| 581 |
related_preview.get("title", ""),
|
| 582 |
related_preview.get("description", ""),
|
| 583 |
json.dumps(last_tool_result, sort_keys=True),
|
| 584 |
json.dumps(routing_options, sort_keys=True),
|
|
|
|
| 585 |
json.dumps(ticket.get("capacity_state") or {}, sort_keys=True),
|
| 586 |
json.dumps(ticket.get("future_queue_demand") or {}, sort_keys=True),
|
| 587 |
]
|
|
@@ -909,9 +920,14 @@ def build_action(
|
|
| 909 |
)
|
| 910 |
|
| 911 |
|
| 912 |
-
def should_investigate(
|
|
|
|
|
|
|
|
|
|
|
|
|
| 913 |
if not ticket:
|
| 914 |
return False, None
|
|
|
|
| 915 |
context_status = ticket.get("context_status") or {}
|
| 916 |
hidden_context_remaining = bool(context_status.get("hidden_context_remaining"))
|
| 917 |
investigation_required = bool(context_status.get("investigation_required"))
|
|
@@ -945,6 +961,7 @@ def should_investigate(ticket: dict, history: list[dict[str, Any]]) -> tuple[boo
|
|
| 945 |
tool_name
|
| 946 |
for tool_name in context_status.get("recommended_tools", [])
|
| 947 |
if tool_name not in used_tools
|
|
|
|
| 948 |
]
|
| 949 |
if hidden_context_remaining and recommended_tools:
|
| 950 |
return True, recommended_tools[0]
|
|
@@ -1018,6 +1035,8 @@ def should_investigate(ticket: dict, history: list[dict[str, Any]]) -> tuple[boo
|
|
| 1018 |
)
|
| 1019 |
|
| 1020 |
for tool_name in preferred_tools:
|
|
|
|
|
|
|
| 1021 |
if tool_name not in used_tools:
|
| 1022 |
return True, tool_name
|
| 1023 |
|
|
@@ -1026,6 +1045,39 @@ def should_investigate(ticket: dict, history: list[dict[str, Any]]) -> tuple[boo
|
|
| 1026 |
return False, None
|
| 1027 |
|
| 1028 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1029 |
def merge_ticket_context(ticket: dict, observation: Any) -> dict:
|
| 1030 |
merged_ticket = dict(ticket)
|
| 1031 |
if getattr(observation, "last_tool_result", None) is not None:
|
|
@@ -1033,6 +1085,7 @@ def merge_ticket_context(ticket: dict, observation: Any) -> dict:
|
|
| 1033 |
merged_ticket["recent_history"] = list(getattr(observation, "history", []))
|
| 1034 |
merged_ticket["queue_position"] = getattr(observation, "queue_position", None)
|
| 1035 |
merged_ticket["tickets_remaining"] = getattr(observation, "tickets_remaining", None)
|
|
|
|
| 1036 |
merged_ticket["investigation_budget_remaining"] = getattr(
|
| 1037 |
observation,
|
| 1038 |
"investigation_budget_remaining",
|
|
@@ -1040,6 +1093,10 @@ def merge_ticket_context(ticket: dict, observation: Any) -> dict:
|
|
| 1040 |
)
|
| 1041 |
merged_ticket["average_score_so_far"] = getattr(observation, "average_score_so_far", None)
|
| 1042 |
merged_ticket["progress_fraction"] = getattr(observation, "progress_fraction", None)
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1043 |
merged_ticket["last_reward_components"] = dict(
|
| 1044 |
getattr(observation, "last_reward_components", {}) or {}
|
| 1045 |
)
|
|
@@ -1096,7 +1153,11 @@ def run() -> None:
|
|
| 1096 |
break
|
| 1097 |
|
| 1098 |
while getattr(obs, "investigation_budget_remaining", 0) > 0:
|
| 1099 |
-
investigate, tool_name = should_investigate(
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1100 |
if not investigate or tool_name is None:
|
| 1101 |
break
|
| 1102 |
tool_action = HelpdeskTicketAction(
|
|
@@ -1129,6 +1190,26 @@ def run() -> None:
|
|
| 1129 |
break
|
| 1130 |
|
| 1131 |
ticket_with_context = merge_ticket_context(ticket, obs)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1132 |
action, action_source, fallback_reason = build_action(
|
| 1133 |
ticket_with_context,
|
| 1134 |
obs.allowed_fields,
|
|
|
|
| 196 |
def build_llm_user_message(ticket: dict, allowed_fields: list[str], instructions: str) -> str:
|
| 197 |
ambiguity_note = ticket.get("ambiguity_note")
|
| 198 |
planning_note = ticket.get("planning_note")
|
| 199 |
+
customer_update_note = ticket.get("customer_update_note")
|
| 200 |
related_preview = ticket.get("related_ticket_preview") or {}
|
| 201 |
last_tool_result = ticket.get("last_tool_result")
|
| 202 |
context_status = ticket.get("context_status") or {}
|
| 203 |
+
operational_context = ticket.get("operational_context") or {}
|
| 204 |
recent_history = ticket.get("recent_history") or []
|
| 205 |
feedback_summary = ticket.get("feedback_summary")
|
| 206 |
last_reward_components = ticket.get("last_reward_components") or {}
|
|
|
|
| 215 |
extra_context_lines.append(f"Ambiguity note: {ambiguity_note}")
|
| 216 |
if planning_note:
|
| 217 |
extra_context_lines.append(f"Planning note: {planning_note}")
|
| 218 |
+
if customer_update_note:
|
| 219 |
+
extra_context_lines.append(f"Customer update: {customer_update_note}")
|
| 220 |
if related_preview:
|
| 221 |
extra_context_lines.extend(
|
| 222 |
[
|
|
|
|
| 234 |
extra_context_lines.append(
|
| 235 |
"Context status: " + json.dumps(context_status, sort_keys=True)
|
| 236 |
)
|
| 237 |
+
if operational_context:
|
| 238 |
+
extra_context_lines.append(
|
| 239 |
+
"Operational context: " + json.dumps(operational_context, sort_keys=True)
|
| 240 |
+
)
|
| 241 |
if capacity_state:
|
| 242 |
extra_context_lines.append(
|
| 243 |
"Queue capacity state: " + json.dumps(capacity_state, sort_keys=True)
|
|
|
|
| 580 |
related_preview = ticket.get("related_ticket_preview") or {}
|
| 581 |
last_tool_result = ticket.get("last_tool_result") or {}
|
| 582 |
routing_options = ticket.get("routing_options") or []
|
| 583 |
+
operational_context = ticket.get("operational_context") or {}
|
| 584 |
return " ".join(
|
| 585 |
[
|
| 586 |
ticket.get("title", ""),
|
| 587 |
ticket.get("description", ""),
|
| 588 |
ticket.get("ambiguity_note", ""),
|
| 589 |
ticket.get("planning_note", ""),
|
| 590 |
+
ticket.get("customer_update_note", ""),
|
| 591 |
related_preview.get("title", ""),
|
| 592 |
related_preview.get("description", ""),
|
| 593 |
json.dumps(last_tool_result, sort_keys=True),
|
| 594 |
json.dumps(routing_options, sort_keys=True),
|
| 595 |
+
json.dumps(operational_context, sort_keys=True),
|
| 596 |
json.dumps(ticket.get("capacity_state") or {}, sort_keys=True),
|
| 597 |
json.dumps(ticket.get("future_queue_demand") or {}, sort_keys=True),
|
| 598 |
]
|
|
|
|
| 920 |
)
|
| 921 |
|
| 922 |
|
| 923 |
+
def should_investigate(
|
| 924 |
+
ticket: dict,
|
| 925 |
+
history: list[dict[str, Any]],
|
| 926 |
+
available_tools: list[str] | None = None,
|
| 927 |
+
) -> tuple[bool, str | None]:
|
| 928 |
if not ticket:
|
| 929 |
return False, None
|
| 930 |
+
available_tool_set = set(available_tools or [])
|
| 931 |
context_status = ticket.get("context_status") or {}
|
| 932 |
hidden_context_remaining = bool(context_status.get("hidden_context_remaining"))
|
| 933 |
investigation_required = bool(context_status.get("investigation_required"))
|
|
|
|
| 961 |
tool_name
|
| 962 |
for tool_name in context_status.get("recommended_tools", [])
|
| 963 |
if tool_name not in used_tools
|
| 964 |
+
and (not available_tool_set or tool_name in available_tool_set)
|
| 965 |
]
|
| 966 |
if hidden_context_remaining and recommended_tools:
|
| 967 |
return True, recommended_tools[0]
|
|
|
|
| 1035 |
)
|
| 1036 |
|
| 1037 |
for tool_name in preferred_tools:
|
| 1038 |
+
if available_tool_set and tool_name not in available_tool_set:
|
| 1039 |
+
continue
|
| 1040 |
if tool_name not in used_tools:
|
| 1041 |
return True, tool_name
|
| 1042 |
|
|
|
|
| 1045 |
return False, None
|
| 1046 |
|
| 1047 |
|
| 1048 |
+
def choose_operational_action(
|
| 1049 |
+
ticket: dict,
|
| 1050 |
+
history: list[dict[str, Any]],
|
| 1051 |
+
available_action_types: list[str] | None = None,
|
| 1052 |
+
) -> tuple[HelpdeskTicketAction | None, str | None]:
|
| 1053 |
+
if not ticket:
|
| 1054 |
+
return None, None
|
| 1055 |
+
operational_context = ticket.get("operational_context") or {}
|
| 1056 |
+
recommended_actions = list(operational_context.get("recommended_actions") or [])
|
| 1057 |
+
available_action_set = set(available_action_types or [])
|
| 1058 |
+
current_ticket_id = ticket.get("ticket_id")
|
| 1059 |
+
prior_ticket_history = [
|
| 1060 |
+
entry for entry in history if entry.get("ticket_id") == current_ticket_id
|
| 1061 |
+
]
|
| 1062 |
+
used_action_types = {
|
| 1063 |
+
entry.get("predicted", {}).get("action_type")
|
| 1064 |
+
for entry in prior_ticket_history
|
| 1065 |
+
if entry.get("predicted")
|
| 1066 |
+
}
|
| 1067 |
+
|
| 1068 |
+
for action_name in ("open_incident", "request_info", "defer"):
|
| 1069 |
+
if action_name not in recommended_actions:
|
| 1070 |
+
continue
|
| 1071 |
+
if available_action_set and action_name not in available_action_set:
|
| 1072 |
+
continue
|
| 1073 |
+
if action_name in used_action_types:
|
| 1074 |
+
continue
|
| 1075 |
+
if action_name == "defer" and ticket.get("tickets_after_current", 0) <= 0:
|
| 1076 |
+
continue
|
| 1077 |
+
return HelpdeskTicketAction(action_type=action_name), action_name
|
| 1078 |
+
return None, None
|
| 1079 |
+
|
| 1080 |
+
|
| 1081 |
def merge_ticket_context(ticket: dict, observation: Any) -> dict:
|
| 1082 |
merged_ticket = dict(ticket)
|
| 1083 |
if getattr(observation, "last_tool_result", None) is not None:
|
|
|
|
| 1085 |
merged_ticket["recent_history"] = list(getattr(observation, "history", []))
|
| 1086 |
merged_ticket["queue_position"] = getattr(observation, "queue_position", None)
|
| 1087 |
merged_ticket["tickets_remaining"] = getattr(observation, "tickets_remaining", None)
|
| 1088 |
+
merged_ticket["tickets_after_current"] = getattr(observation, "tickets_after_current", None)
|
| 1089 |
merged_ticket["investigation_budget_remaining"] = getattr(
|
| 1090 |
observation,
|
| 1091 |
"investigation_budget_remaining",
|
|
|
|
| 1093 |
)
|
| 1094 |
merged_ticket["average_score_so_far"] = getattr(observation, "average_score_so_far", None)
|
| 1095 |
merged_ticket["progress_fraction"] = getattr(observation, "progress_fraction", None)
|
| 1096 |
+
merged_ticket["available_tools"] = list(getattr(observation, "available_tools", []) or [])
|
| 1097 |
+
merged_ticket["available_action_types"] = list(
|
| 1098 |
+
getattr(observation, "available_action_types", []) or []
|
| 1099 |
+
)
|
| 1100 |
merged_ticket["last_reward_components"] = dict(
|
| 1101 |
getattr(observation, "last_reward_components", {}) or {}
|
| 1102 |
)
|
|
|
|
| 1153 |
break
|
| 1154 |
|
| 1155 |
while getattr(obs, "investigation_budget_remaining", 0) > 0:
|
| 1156 |
+
investigate, tool_name = should_investigate(
|
| 1157 |
+
ticket,
|
| 1158 |
+
obs.history,
|
| 1159 |
+
list(getattr(obs, "available_tools", []) or []),
|
| 1160 |
+
)
|
| 1161 |
if not investigate or tool_name is None:
|
| 1162 |
break
|
| 1163 |
tool_action = HelpdeskTicketAction(
|
|
|
|
| 1190 |
break
|
| 1191 |
|
| 1192 |
ticket_with_context = merge_ticket_context(ticket, obs)
|
| 1193 |
+
operational_action, operational_source = choose_operational_action(
|
| 1194 |
+
ticket_with_context,
|
| 1195 |
+
obs.history,
|
| 1196 |
+
list(getattr(obs, "available_action_types", []) or []),
|
| 1197 |
+
)
|
| 1198 |
+
if operational_action is not None and operational_source is not None:
|
| 1199 |
+
result = sync_client.step(operational_action)
|
| 1200 |
+
obs = result.observation
|
| 1201 |
+
step_num += 1
|
| 1202 |
+
reward = float(result.reward or 0.0)
|
| 1203 |
+
if result.reward is not None:
|
| 1204 |
+
task_step_rewards.append(reward)
|
| 1205 |
+
log_step(
|
| 1206 |
+
step=step_num,
|
| 1207 |
+
action=operational_action,
|
| 1208 |
+
reward=reward,
|
| 1209 |
+
done=bool(result.done),
|
| 1210 |
+
error=operational_source,
|
| 1211 |
+
)
|
| 1212 |
+
continue
|
| 1213 |
action, action_source, fallback_reason = build_action(
|
| 1214 |
ticket_with_context,
|
| 1215 |
obs.allowed_fields,
|
models.py
CHANGED
|
@@ -16,7 +16,13 @@ ISSUE_TYPE_SET = set(ISSUE_TYPES)
|
|
| 16 |
PRIORITY_SET = set(PRIORITIES)
|
| 17 |
ASSIGNMENT_GROUP_SET = set(ASSIGNMENT_GROUPS)
|
| 18 |
RESOLUTION_ACTION_SET = set(RESOLUTION_ACTIONS)
|
| 19 |
-
ACTION_TYPE_SET = {
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 20 |
TOOL_NAME_SET = {"lookup_related_ticket", "lookup_requester_history"}
|
| 21 |
TOOL_NAME_SET.add("lookup_internal_routing_note")
|
| 22 |
TOOL_NAME_SET.add("lookup_queue_capacity_forecast")
|
|
@@ -54,6 +60,9 @@ class HelpdeskTicketRecord(BaseModel):
|
|
| 54 |
alternate_assignment_group: Optional[str] = None
|
| 55 |
alternate_resolution_action: Optional[str] = None
|
| 56 |
alternate_route_score_multiplier: float = 0.0
|
|
|
|
|
|
|
|
|
|
| 57 |
|
| 58 |
@field_validator("issue_type")
|
| 59 |
@classmethod
|
|
@@ -203,4 +212,16 @@ class HelpdeskTicketState(State):
|
|
| 203 |
escalation_slots_remaining: int = 0
|
| 204 |
planning_penalty_total: float = 0.0
|
| 205 |
capacity_pressure_tickets_resolved: int = 0
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 206 |
history_entries: list[dict] = Field(default_factory=list)
|
|
|
|
| 16 |
PRIORITY_SET = set(PRIORITIES)
|
| 17 |
ASSIGNMENT_GROUP_SET = set(ASSIGNMENT_GROUPS)
|
| 18 |
RESOLUTION_ACTION_SET = set(RESOLUTION_ACTIONS)
|
| 19 |
+
ACTION_TYPE_SET = {
|
| 20 |
+
"submit",
|
| 21 |
+
"investigate",
|
| 22 |
+
"request_info",
|
| 23 |
+
"defer",
|
| 24 |
+
"open_incident",
|
| 25 |
+
}
|
| 26 |
TOOL_NAME_SET = {"lookup_related_ticket", "lookup_requester_history"}
|
| 27 |
TOOL_NAME_SET.add("lookup_internal_routing_note")
|
| 28 |
TOOL_NAME_SET.add("lookup_queue_capacity_forecast")
|
|
|
|
| 60 |
alternate_assignment_group: Optional[str] = None
|
| 61 |
alternate_resolution_action: Optional[str] = None
|
| 62 |
alternate_route_score_multiplier: float = 0.0
|
| 63 |
+
customer_update_note: Optional[str] = None
|
| 64 |
+
incident_recommended: bool = False
|
| 65 |
+
generated_from_ticket_id: Optional[str] = None
|
| 66 |
|
| 67 |
@field_validator("issue_type")
|
| 68 |
@classmethod
|
|
|
|
| 212 |
escalation_slots_remaining: int = 0
|
| 213 |
planning_penalty_total: float = 0.0
|
| 214 |
capacity_pressure_tickets_resolved: int = 0
|
| 215 |
+
ticket_request_info_usage: dict[str, int] = Field(default_factory=dict)
|
| 216 |
+
ticket_defer_counts: dict[str, int] = Field(default_factory=dict)
|
| 217 |
+
open_incident_ticket_ids: list[str] = Field(default_factory=list)
|
| 218 |
+
incident_slots_initial: int = 0
|
| 219 |
+
incident_slots_remaining: int = 0
|
| 220 |
+
incident_actions_used: int = 0
|
| 221 |
+
incident_gap_total: float = 0.0
|
| 222 |
+
deferred_ticket_count: int = 0
|
| 223 |
+
sla_breach_count: int = 0
|
| 224 |
+
spawned_follow_up_ticket_ids: list[str] = Field(default_factory=list)
|
| 225 |
+
spawned_follow_up_source_ids: list[str] = Field(default_factory=list)
|
| 226 |
+
dynamic_queue_events: list[dict[str, Any]] = Field(default_factory=list)
|
| 227 |
history_entries: list[dict] = Field(default_factory=list)
|
policy_learning.py
CHANGED
|
@@ -244,8 +244,10 @@ def _routing_text(ticket: dict[str, Any]) -> str:
|
|
| 244 |
str(ticket.get("description", "")),
|
| 245 |
str(ticket.get("ambiguity_note", "")),
|
| 246 |
str(ticket.get("planning_note", "")),
|
|
|
|
| 247 |
json.dumps(ticket.get("last_tool_result") or {}, sort_keys=True),
|
| 248 |
json.dumps(ticket.get("routing_options") or [], sort_keys=True),
|
|
|
|
| 249 |
json.dumps(ticket.get("capacity_state") or {}, sort_keys=True),
|
| 250 |
json.dumps(ticket.get("future_queue_demand") or {}, sort_keys=True),
|
| 251 |
]
|
|
@@ -265,6 +267,7 @@ def infer_ticket_cue(ticket: dict[str, Any]) -> str:
|
|
| 265 |
if (
|
| 266 |
ticket.get("planning_note")
|
| 267 |
or ticket.get("routing_options")
|
|
|
|
| 268 |
or "lookup_queue_capacity_forecast"
|
| 269 |
in (context_status.get("recommended_tools") or [])
|
| 270 |
or any(
|
|
@@ -316,6 +319,8 @@ def infer_ticket_cue(ticket: dict[str, Any]) -> str:
|
|
| 316 |
for phrase in ("still", "again", "overdue", "legal", "priority")
|
| 317 |
):
|
| 318 |
return "history_pressure"
|
|
|
|
|
|
|
| 319 |
return "generic_hidden_context"
|
| 320 |
|
| 321 |
|
|
@@ -397,17 +402,54 @@ def select_cue_based_tool(
|
|
| 397 |
*,
|
| 398 |
hidden_context_remaining: bool,
|
| 399 |
used_tools: set[str],
|
|
|
|
| 400 |
) -> str | None:
|
| 401 |
preferred_tools = preferred_tool_order(
|
| 402 |
ticket,
|
| 403 |
hidden_context_remaining=hidden_context_remaining,
|
| 404 |
)
|
|
|
|
| 405 |
for tool_name in preferred_tools:
|
|
|
|
|
|
|
| 406 |
if tool_name not in used_tools:
|
| 407 |
return tool_name
|
| 408 |
return None
|
| 409 |
|
| 410 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 411 |
def choose_policy_action(
|
| 412 |
policy: PolicyConfig,
|
| 413 |
observation: HelpdeskTicketObservation,
|
|
@@ -425,6 +467,7 @@ def choose_policy_action(
|
|
| 425 |
used_tools = set(used_tools_by_ticket.get(ticket_id, set()))
|
| 426 |
context_status = ticket.get("context_status") or {}
|
| 427 |
hidden_context_remaining = bool(context_status.get("hidden_context_remaining"))
|
|
|
|
| 428 |
|
| 429 |
if ticket_investigations < policy.max_investigations_per_ticket:
|
| 430 |
if policy.strategy == "adaptive" and adaptive_bandit is not None and hidden_context_remaining:
|
|
@@ -434,11 +477,13 @@ def choose_policy_action(
|
|
| 434 |
ticket,
|
| 435 |
hidden_context_remaining=hidden_context_remaining,
|
| 436 |
)
|
| 437 |
-
if tool_name not in used_tools
|
| 438 |
]
|
| 439 |
if not candidate_tools:
|
| 440 |
candidate_tools = [
|
| 441 |
-
tool_name
|
|
|
|
|
|
|
| 442 |
]
|
| 443 |
if candidate_tools:
|
| 444 |
cue = infer_ticket_cue(ticket)
|
|
@@ -454,6 +499,7 @@ def choose_policy_action(
|
|
| 454 |
ticket,
|
| 455 |
hidden_context_remaining=hidden_context_remaining,
|
| 456 |
used_tools=used_tools,
|
|
|
|
| 457 |
)
|
| 458 |
if tool_name is not None:
|
| 459 |
return (
|
|
@@ -492,6 +538,14 @@ def choose_policy_action(
|
|
| 492 |
infer_ticket_cue(ticket),
|
| 493 |
)
|
| 494 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 495 |
return submit_builder(ticket, list(observation.allowed_fields)), "submit", None
|
| 496 |
|
| 497 |
|
|
|
|
| 244 |
str(ticket.get("description", "")),
|
| 245 |
str(ticket.get("ambiguity_note", "")),
|
| 246 |
str(ticket.get("planning_note", "")),
|
| 247 |
+
str(ticket.get("customer_update_note", "")),
|
| 248 |
json.dumps(ticket.get("last_tool_result") or {}, sort_keys=True),
|
| 249 |
json.dumps(ticket.get("routing_options") or [], sort_keys=True),
|
| 250 |
+
json.dumps(ticket.get("operational_context") or {}, sort_keys=True),
|
| 251 |
json.dumps(ticket.get("capacity_state") or {}, sort_keys=True),
|
| 252 |
json.dumps(ticket.get("future_queue_demand") or {}, sort_keys=True),
|
| 253 |
]
|
|
|
|
| 267 |
if (
|
| 268 |
ticket.get("planning_note")
|
| 269 |
or ticket.get("routing_options")
|
| 270 |
+
or (ticket.get("operational_context") or {}).get("incident_recommended")
|
| 271 |
or "lookup_queue_capacity_forecast"
|
| 272 |
in (context_status.get("recommended_tools") or [])
|
| 273 |
or any(
|
|
|
|
| 319 |
for phrase in ("still", "again", "overdue", "legal", "priority")
|
| 320 |
):
|
| 321 |
return "history_pressure"
|
| 322 |
+
if any(phrase in text for phrase in ("incident", "outage", "lockout", "company-wide")):
|
| 323 |
+
return "incident_pressure"
|
| 324 |
return "generic_hidden_context"
|
| 325 |
|
| 326 |
|
|
|
|
| 402 |
*,
|
| 403 |
hidden_context_remaining: bool,
|
| 404 |
used_tools: set[str],
|
| 405 |
+
available_tools: set[str] | None = None,
|
| 406 |
) -> str | None:
|
| 407 |
preferred_tools = preferred_tool_order(
|
| 408 |
ticket,
|
| 409 |
hidden_context_remaining=hidden_context_remaining,
|
| 410 |
)
|
| 411 |
+
available_tool_set = set(available_tools or [])
|
| 412 |
for tool_name in preferred_tools:
|
| 413 |
+
if available_tool_set and tool_name not in available_tool_set:
|
| 414 |
+
continue
|
| 415 |
if tool_name not in used_tools:
|
| 416 |
return tool_name
|
| 417 |
return None
|
| 418 |
|
| 419 |
|
| 420 |
+
def choose_operational_action(
|
| 421 |
+
ticket: dict[str, Any],
|
| 422 |
+
history: list[dict[str, Any]],
|
| 423 |
+
available_action_types: list[str] | None = None,
|
| 424 |
+
) -> tuple[HelpdeskTicketAction | None, str | None]:
|
| 425 |
+
if not ticket:
|
| 426 |
+
return None, None
|
| 427 |
+
operational_context = ticket.get("operational_context") or {}
|
| 428 |
+
recommended_actions = list(operational_context.get("recommended_actions") or [])
|
| 429 |
+
available_action_set = set(available_action_types or [])
|
| 430 |
+
current_ticket_id = str(ticket.get("ticket_id", ""))
|
| 431 |
+
prior_ticket_history = [
|
| 432 |
+
entry for entry in history if entry.get("ticket_id") == current_ticket_id
|
| 433 |
+
]
|
| 434 |
+
used_action_types = {
|
| 435 |
+
entry.get("predicted", {}).get("action_type")
|
| 436 |
+
for entry in prior_ticket_history
|
| 437 |
+
if entry.get("predicted")
|
| 438 |
+
}
|
| 439 |
+
|
| 440 |
+
for action_name in ("open_incident", "request_info", "defer"):
|
| 441 |
+
if action_name not in recommended_actions:
|
| 442 |
+
continue
|
| 443 |
+
if available_action_set and action_name not in available_action_set:
|
| 444 |
+
continue
|
| 445 |
+
if action_name in used_action_types:
|
| 446 |
+
continue
|
| 447 |
+
if action_name == "defer" and not ticket.get("tickets_after_current", 0):
|
| 448 |
+
continue
|
| 449 |
+
return HelpdeskTicketAction(action_type=action_name), action_name
|
| 450 |
+
return None, None
|
| 451 |
+
|
| 452 |
+
|
| 453 |
def choose_policy_action(
|
| 454 |
policy: PolicyConfig,
|
| 455 |
observation: HelpdeskTicketObservation,
|
|
|
|
| 467 |
used_tools = set(used_tools_by_ticket.get(ticket_id, set()))
|
| 468 |
context_status = ticket.get("context_status") or {}
|
| 469 |
hidden_context_remaining = bool(context_status.get("hidden_context_remaining"))
|
| 470 |
+
available_tools = set(getattr(observation, "available_tools", []) or [])
|
| 471 |
|
| 472 |
if ticket_investigations < policy.max_investigations_per_ticket:
|
| 473 |
if policy.strategy == "adaptive" and adaptive_bandit is not None and hidden_context_remaining:
|
|
|
|
| 477 |
ticket,
|
| 478 |
hidden_context_remaining=hidden_context_remaining,
|
| 479 |
)
|
| 480 |
+
if tool_name not in used_tools and tool_name in available_tools
|
| 481 |
]
|
| 482 |
if not candidate_tools:
|
| 483 |
candidate_tools = [
|
| 484 |
+
tool_name
|
| 485 |
+
for tool_name in AVAILABLE_TOOLS
|
| 486 |
+
if tool_name not in used_tools and tool_name in available_tools
|
| 487 |
]
|
| 488 |
if candidate_tools:
|
| 489 |
cue = infer_ticket_cue(ticket)
|
|
|
|
| 499 |
ticket,
|
| 500 |
hidden_context_remaining=hidden_context_remaining,
|
| 501 |
used_tools=used_tools,
|
| 502 |
+
available_tools=available_tools,
|
| 503 |
)
|
| 504 |
if tool_name is not None:
|
| 505 |
return (
|
|
|
|
| 538 |
infer_ticket_cue(ticket),
|
| 539 |
)
|
| 540 |
|
| 541 |
+
operational_action, operational_source = choose_operational_action(
|
| 542 |
+
ticket,
|
| 543 |
+
list(getattr(observation, "history", []) or []),
|
| 544 |
+
list(getattr(observation, "available_action_types", []) or []),
|
| 545 |
+
)
|
| 546 |
+
if operational_action is not None and operational_source is not None:
|
| 547 |
+
return operational_action, operational_source, infer_ticket_cue(ticket)
|
| 548 |
+
|
| 549 |
return submit_builder(ticket, list(observation.allowed_fields)), "submit", None
|
| 550 |
|
| 551 |
|
server/environment.py
CHANGED
|
@@ -26,17 +26,37 @@ from vocabulary import (
|
|
| 26 |
|
| 27 |
|
| 28 |
QUEUE_SIZE_RANGE = (3, 5)
|
| 29 |
-
|
| 30 |
-
AVAILABLE_TOOLS = (
|
| 31 |
"lookup_related_ticket",
|
| 32 |
"lookup_requester_history",
|
| 33 |
"lookup_internal_routing_note",
|
| 34 |
"lookup_queue_capacity_forecast",
|
| 35 |
)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 36 |
FREE_INVESTIGATIONS_PER_TICKET = 1
|
| 37 |
EXTRA_INVESTIGATION_COST = 0.04
|
| 38 |
MAX_EXTRA_INVESTIGATION_PENALTY = 0.25
|
| 39 |
USEFUL_INVESTIGATION_REWARD = 0.03
|
|
|
|
|
|
|
|
|
|
| 40 |
PREMATURE_SUBMIT_PENALTY = 0.22
|
| 41 |
NONDEFAULT_HIDDEN_CONTEXT_PENALTY = 0.08
|
| 42 |
CONTEXT_COMPLETION_BONUS = 0.06
|
|
@@ -49,6 +69,11 @@ TEAM_CAPACITY_OVERFLOW_PENALTY = 0.08
|
|
| 49 |
HIGH_PRIORITY_SLOT_OVERFLOW_PENALTY = 0.06
|
| 50 |
ESCALATION_SLOT_OVERFLOW_PENALTY = 0.05
|
| 51 |
PLANNING_SUCCESS_BONUS = 0.05
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 52 |
|
| 53 |
TASK3_INVESTIGATION_TOOL_PLAN: dict[str, tuple[str, ...]] = {
|
| 54 |
"ticket-021": ("lookup_related_ticket", "lookup_requester_history"),
|
|
@@ -170,6 +195,7 @@ class HelpdeskTicketRoutingEnvironment(
|
|
| 170 |
team_capacity_initial,
|
| 171 |
high_priority_slots_initial,
|
| 172 |
escalation_slots_initial,
|
|
|
|
| 173 |
) = self._initial_capacity_state_for_queue(task_id)
|
| 174 |
|
| 175 |
self._state = HelpdeskTicketState(
|
|
@@ -193,8 +219,20 @@ class HelpdeskTicketRoutingEnvironment(
|
|
| 193 |
high_priority_slots_remaining=high_priority_slots_initial,
|
| 194 |
escalation_slots_initial=escalation_slots_initial,
|
| 195 |
escalation_slots_remaining=escalation_slots_initial,
|
|
|
|
|
|
|
| 196 |
planning_penalty_total=0.0,
|
| 197 |
capacity_pressure_tickets_resolved=0,
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 198 |
)
|
| 199 |
|
| 200 |
return self._build_observation(task)
|
|
@@ -215,9 +253,19 @@ class HelpdeskTicketRoutingEnvironment(
|
|
| 215 |
current_ticket = self._queue[idx]
|
| 216 |
task_id = self._state.current_task_id
|
| 217 |
task = get_task_definition(task_id)
|
|
|
|
|
|
|
|
|
|
|
|
|
| 218 |
|
| 219 |
if action.action_type == "investigate":
|
| 220 |
return self._handle_investigation_action(task, current_ticket, action, idx)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 221 |
|
| 222 |
submitted_fields = {
|
| 223 |
f
|
|
@@ -317,6 +365,7 @@ class HelpdeskTicketRoutingEnvironment(
|
|
| 317 |
action,
|
| 318 |
task_id=task_id,
|
| 319 |
)
|
|
|
|
| 320 |
capacity_penalty, capacity_details = self._apply_capacity_usage(
|
| 321 |
current_ticket,
|
| 322 |
action,
|
|
@@ -353,7 +402,7 @@ class HelpdeskTicketRoutingEnvironment(
|
|
| 353 |
trajectory_reward - self._state.planning_penalty_total
|
| 354 |
)
|
| 355 |
final_reward = clamp_open_unit_interval(
|
| 356 |
-
rubric_reward - context_penalty - capacity_penalty
|
| 357 |
)
|
| 358 |
self._state.total_reward = rubric_reward
|
| 359 |
investigation_penalty = self._compute_episode_penalty()
|
|
@@ -363,7 +412,31 @@ class HelpdeskTicketRoutingEnvironment(
|
|
| 363 |
self._state.step_count += 1
|
| 364 |
self._state.current_ticket_index += 1
|
| 365 |
final_reward = clamp_open_unit_interval(
|
| 366 |
-
step_reward - context_penalty - capacity_penalty
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 367 |
)
|
| 368 |
|
| 369 |
reward_components = self._build_reward_components(
|
|
@@ -379,6 +452,7 @@ class HelpdeskTicketRoutingEnvironment(
|
|
| 379 |
"context_gap_penalty": context_penalty,
|
| 380 |
"context_completion_bonus": process_bonus,
|
| 381 |
"risk_penalty": risk_penalty,
|
|
|
|
| 382 |
"capacity_penalty": capacity_penalty,
|
| 383 |
"delta_adjustment": step_adjustments["delta_adjustment"],
|
| 384 |
"required_investigation_count": len(self._required_tools_for_ticket(current_ticket)),
|
|
@@ -391,6 +465,7 @@ class HelpdeskTicketRoutingEnvironment(
|
|
| 391 |
"planning_success_bonus": self._planning_success_bonus()
|
| 392 |
if is_done
|
| 393 |
else 0.0,
|
|
|
|
| 394 |
"rubric_reward": rubric_reward,
|
| 395 |
"trajectory_average_reward": (
|
| 396 |
trajectory_components["average_reward"]
|
|
@@ -457,6 +532,10 @@ class HelpdeskTicketRoutingEnvironment(
|
|
| 457 |
|
| 458 |
def _apply_episode_economics(self, base_reward: float) -> float:
|
| 459 |
penalty = self._compute_episode_penalty()
|
|
|
|
|
|
|
|
|
|
|
|
|
| 460 |
return clamp_open_unit_interval(base_reward - penalty)
|
| 461 |
|
| 462 |
def _current_average_score(self) -> float:
|
|
@@ -464,6 +543,17 @@ class HelpdeskTicketRoutingEnvironment(
|
|
| 464 |
return 0.0
|
| 465 |
return sum(self._state.per_ticket_scores) / len(self._state.per_ticket_scores)
|
| 466 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 467 |
def _ticket_has_alternate_route(self, ticket: HelpdeskTicketRecord) -> bool:
|
| 468 |
return any(
|
| 469 |
value is not None
|
|
@@ -552,9 +642,9 @@ class HelpdeskTicketRoutingEnvironment(
|
|
| 552 |
def _initial_capacity_state_for_queue(
|
| 553 |
self,
|
| 554 |
task_id: int,
|
| 555 |
-
) -> tuple[dict[str, int], int, int]:
|
| 556 |
if task_id != 3:
|
| 557 |
-
return {}, 0, 0
|
| 558 |
|
| 559 |
primary_group_demand: dict[str, int] = {}
|
| 560 |
alternate_relief_by_group: dict[str, int] = {}
|
|
@@ -563,6 +653,7 @@ class HelpdeskTicketRoutingEnvironment(
|
|
| 563 |
high_priority_relief = 0
|
| 564 |
escalation_demand = 0
|
| 565 |
escalation_relief = 0
|
|
|
|
| 566 |
|
| 567 |
for ticket in self._queue:
|
| 568 |
primary_route = self._route_for_ticket(ticket)
|
|
@@ -574,6 +665,8 @@ class HelpdeskTicketRoutingEnvironment(
|
|
| 574 |
high_priority_demand += 1
|
| 575 |
if primary_route["resolution_action"] in {"assign", "escalate"}:
|
| 576 |
escalation_demand += 1
|
|
|
|
|
|
|
| 577 |
|
| 578 |
if self._ticket_has_alternate_route(ticket):
|
| 579 |
alternate_route = self._route_for_ticket(ticket, use_alternate=True)
|
|
@@ -622,10 +715,16 @@ class HelpdeskTicketRoutingEnvironment(
|
|
| 622 |
else:
|
| 623 |
escalation_slots_initial = escalation_demand
|
| 624 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 625 |
return (
|
| 626 |
team_capacity_initial,
|
| 627 |
high_priority_slots_initial,
|
| 628 |
escalation_slots_initial,
|
|
|
|
| 629 |
)
|
| 630 |
|
| 631 |
def _future_queue_demand(self) -> dict[str, Any]:
|
|
@@ -634,6 +733,7 @@ class HelpdeskTicketRoutingEnvironment(
|
|
| 634 |
high_priority_needed = 0
|
| 635 |
escalation_needed = 0
|
| 636 |
capacity_sensitive_tickets = 0
|
|
|
|
| 637 |
|
| 638 |
for ticket in future_tickets:
|
| 639 |
route = self._route_for_ticket(ticket)
|
|
@@ -646,6 +746,8 @@ class HelpdeskTicketRoutingEnvironment(
|
|
| 646 |
escalation_needed += 1
|
| 647 |
if self._ticket_has_alternate_route(ticket):
|
| 648 |
capacity_sensitive_tickets += 1
|
|
|
|
|
|
|
| 649 |
|
| 650 |
return {
|
| 651 |
"remaining_ticket_count": len(future_tickets),
|
|
@@ -653,6 +755,7 @@ class HelpdeskTicketRoutingEnvironment(
|
|
| 653 |
"high_priority_needed": high_priority_needed,
|
| 654 |
"escalation_needed": escalation_needed,
|
| 655 |
"capacity_sensitive_tickets": capacity_sensitive_tickets,
|
|
|
|
| 656 |
}
|
| 657 |
|
| 658 |
def _capacity_state_snapshot(self) -> dict[str, Any]:
|
|
@@ -663,6 +766,8 @@ class HelpdeskTicketRoutingEnvironment(
|
|
| 663 |
"high_priority_slots_initial": self._state.high_priority_slots_initial,
|
| 664 |
"escalation_slots_remaining": self._state.escalation_slots_remaining,
|
| 665 |
"escalation_slots_initial": self._state.escalation_slots_initial,
|
|
|
|
|
|
|
| 666 |
}
|
| 667 |
|
| 668 |
def _planning_route_recommendation(self, ticket: HelpdeskTicketRecord) -> dict[str, Any]:
|
|
@@ -897,8 +1002,191 @@ class HelpdeskTicketRoutingEnvironment(
|
|
| 897 |
)
|
| 898 |
)
|
| 899 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 900 |
def _ticket_repeated_requester_count(self, ticket: HelpdeskTicketRecord) -> int:
|
| 901 |
-
return sum(
|
|
|
|
|
|
|
|
|
|
|
|
|
| 902 |
|
| 903 |
def _tool_has_available_context(
|
| 904 |
self,
|
|
@@ -927,7 +1215,7 @@ class HelpdeskTicketRoutingEnvironment(
|
|
| 927 |
task_id: int | None = None,
|
| 928 |
) -> list[str]:
|
| 929 |
resolved_task_id = self._state.current_task_id if task_id is None else task_id
|
| 930 |
-
if resolved_task_id
|
| 931 |
return []
|
| 932 |
required_tools: list[str] = list(TASK3_INVESTIGATION_TOOL_PLAN.get(ticket.ticket_id, ()))
|
| 933 |
if ticket.related_ticket_id is not None and "lookup_related_ticket" not in required_tools:
|
|
@@ -949,18 +1237,51 @@ class HelpdeskTicketRoutingEnvironment(
|
|
| 949 |
):
|
| 950 |
required_tools.append("lookup_requester_history")
|
| 951 |
if (
|
| 952 |
-
|
|
|
|
| 953 |
and "lookup_queue_capacity_forecast" not in required_tools
|
| 954 |
):
|
| 955 |
required_tools.append("lookup_queue_capacity_forecast")
|
| 956 |
filtered_required_tools: list[str] = []
|
|
|
|
| 957 |
for tool_name in required_tools:
|
| 958 |
if tool_name in filtered_required_tools:
|
| 959 |
continue
|
|
|
|
|
|
|
| 960 |
if self._tool_has_available_context(ticket, tool_name):
|
| 961 |
filtered_required_tools.append(tool_name)
|
| 962 |
return filtered_required_tools
|
| 963 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 964 |
def _used_tools_for_ticket(self, ticket_id: str) -> list[str]:
|
| 965 |
return list(self._state.ticket_tool_usage.get(ticket_id, []))
|
| 966 |
|
|
@@ -983,6 +1304,8 @@ class HelpdeskTicketRoutingEnvironment(
|
|
| 983 |
revealed_tools = self._used_tools_for_ticket(ticket.ticket_id)
|
| 984 |
remaining_tools = self._remaining_tools_for_ticket(ticket)
|
| 985 |
total_required = max(1, len(required_tools))
|
|
|
|
|
|
|
| 986 |
return {
|
| 987 |
"required_tools": required_tools,
|
| 988 |
"revealed_tools": revealed_tools,
|
|
@@ -990,6 +1313,8 @@ class HelpdeskTicketRoutingEnvironment(
|
|
| 990 |
"revealed_count": len(revealed_tools),
|
| 991 |
"remaining_count": len(remaining_tools),
|
| 992 |
"completeness": round(len(revealed_tools) / total_required, 2),
|
|
|
|
|
|
|
| 993 |
}
|
| 994 |
|
| 995 |
def _default_redacted_description(self, ticket: HelpdeskTicketRecord) -> str:
|
|
@@ -1028,7 +1353,7 @@ class HelpdeskTicketRoutingEnvironment(
|
|
| 1028 |
return "Helpdesk routing decision"
|
| 1029 |
|
| 1030 |
def _visible_title(self, ticket: HelpdeskTicketRecord) -> str:
|
| 1031 |
-
if self._state.current_task_id
|
| 1032 |
return HARD_TASK_TITLE_REDACTIONS.get(
|
| 1033 |
ticket.ticket_id,
|
| 1034 |
self._default_redacted_title(ticket),
|
|
@@ -1036,7 +1361,7 @@ class HelpdeskTicketRoutingEnvironment(
|
|
| 1036 |
return ticket.title
|
| 1037 |
|
| 1038 |
def _visible_description(self, ticket: HelpdeskTicketRecord) -> str:
|
| 1039 |
-
if self._state.current_task_id
|
| 1040 |
return HARD_TASK_DESCRIPTION_REDACTIONS.get(
|
| 1041 |
ticket.ticket_id,
|
| 1042 |
self._default_redacted_description(ticket),
|
|
@@ -1122,6 +1447,21 @@ class HelpdeskTicketRoutingEnvironment(
|
|
| 1122 |
|
| 1123 |
return round(priority_penalty + resolution_penalty, 4)
|
| 1124 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1125 |
def _build_reward_components(
|
| 1126 |
self,
|
| 1127 |
*,
|
|
@@ -1198,7 +1538,7 @@ class HelpdeskTicketRoutingEnvironment(
|
|
| 1198 |
"assignment_group": ticket.assignment_group,
|
| 1199 |
"resolution_action": ticket.resolution_action,
|
| 1200 |
}
|
| 1201 |
-
for ticket in self.
|
| 1202 |
if ticket.requester == current_ticket.requester
|
| 1203 |
and ticket.ticket_id != current_ticket.ticket_id
|
| 1204 |
]
|
|
@@ -1235,6 +1575,7 @@ class HelpdeskTicketRoutingEnvironment(
|
|
| 1235 |
"capacity_state": recommendation["capacity_state"],
|
| 1236 |
"future_queue_demand": recommendation["future_demand"],
|
| 1237 |
"routing_options": routing_options,
|
|
|
|
| 1238 |
}
|
| 1239 |
|
| 1240 |
def _run_investigation_tool(
|
|
@@ -1262,6 +1603,8 @@ class HelpdeskTicketRoutingEnvironment(
|
|
| 1262 |
) -> HelpdeskTicketObservation:
|
| 1263 |
if action.tool_name is None:
|
| 1264 |
raise ValueError("Investigate actions require tool_name")
|
|
|
|
|
|
|
| 1265 |
submitted_fields = {
|
| 1266 |
field
|
| 1267 |
for field in ("issue_type", "priority", "assignment_group", "resolution_action")
|
|
@@ -1332,10 +1675,279 @@ class HelpdeskTicketRoutingEnvironment(
|
|
| 1332 |
self._state.last_reward_components = reward_components
|
| 1333 |
return self._build_observation(task, done=False, reward=investigation_reward)
|
| 1334 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1335 |
def _build_ticket_view(self, ticket: HelpdeskTicketRecord) -> dict[str, Any]:
|
| 1336 |
progress = self._tool_progress_for_ticket(ticket)
|
| 1337 |
remaining_tools = progress["remaining_tools"]
|
| 1338 |
used_tools = set(self._used_tools_for_ticket(ticket.ticket_id))
|
|
|
|
| 1339 |
ticket_view: dict[str, Any] = {
|
| 1340 |
"ticket_id": ticket.ticket_id,
|
| 1341 |
"title": self._visible_title(ticket),
|
|
@@ -1354,6 +1966,14 @@ class HelpdeskTicketRoutingEnvironment(
|
|
| 1354 |
"investigations_used_for_ticket": progress["revealed_count"],
|
| 1355 |
"recommended_tools": list(remaining_tools),
|
| 1356 |
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1357 |
if ticket.ambiguity_note is not None and "lookup_internal_routing_note" not in remaining_tools:
|
| 1358 |
ticket_view["ambiguity_note"] = ticket.ambiguity_note
|
| 1359 |
if (
|
|
@@ -1361,6 +1981,8 @@ class HelpdeskTicketRoutingEnvironment(
|
|
| 1361 |
and "lookup_internal_routing_note" not in remaining_tools
|
| 1362 |
):
|
| 1363 |
ticket_view["planning_note"] = ticket.planning_note
|
|
|
|
|
|
|
| 1364 |
if ticket.related_ticket_id is not None and "lookup_related_ticket" not in remaining_tools:
|
| 1365 |
ticket_view["related_ticket_id"] = ticket.related_ticket_id
|
| 1366 |
related_ticket = self._tickets_by_id.get(ticket.related_ticket_id)
|
|
@@ -1376,6 +1998,8 @@ class HelpdeskTicketRoutingEnvironment(
|
|
| 1376 |
or "lookup_queue_capacity_forecast" in used_tools
|
| 1377 |
):
|
| 1378 |
ticket_view["routing_options"] = self._routing_options_for_ticket(ticket)
|
|
|
|
|
|
|
| 1379 |
return ticket_view
|
| 1380 |
|
| 1381 |
def _build_feedback_summary(
|
|
@@ -1398,6 +2022,13 @@ class HelpdeskTicketRoutingEnvironment(
|
|
| 1398 |
parts.append(f"Investigation step used {tool_name or 'a tool'}")
|
| 1399 |
if reward_components and reward_components.get("new_context_revealed"):
|
| 1400 |
parts.append("new context was revealed")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1401 |
elif penalty_reason is not None:
|
| 1402 |
parts.append(f"Penalty applied: {penalty_reason}")
|
| 1403 |
else:
|
|
@@ -1435,6 +2066,12 @@ class HelpdeskTicketRoutingEnvironment(
|
|
| 1435 |
planning_penalty_total = reward_components.get("planning_penalty_total")
|
| 1436 |
if planning_penalty_total:
|
| 1437 |
parts.append(f"planning_penalty_total={planning_penalty_total:.2f}")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1438 |
|
| 1439 |
return "; ".join(parts)
|
| 1440 |
|
|
@@ -1463,6 +2100,12 @@ class HelpdeskTicketRoutingEnvironment(
|
|
| 1463 |
"score": score,
|
| 1464 |
"breakdown": breakdown,
|
| 1465 |
"queue_position": queue_position,
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1466 |
}
|
| 1467 |
if self._state.current_task_id == 3:
|
| 1468 |
history_entry["capacity_state"] = self._capacity_state_snapshot()
|
|
@@ -1479,6 +2122,8 @@ class HelpdeskTicketRoutingEnvironment(
|
|
| 1479 |
and "lookup_internal_routing_note" not in remaining_tools
|
| 1480 |
):
|
| 1481 |
history_entry["planning_note"] = ticket.planning_note
|
|
|
|
|
|
|
| 1482 |
if ticket.related_ticket_id is not None and "lookup_related_ticket" not in remaining_tools:
|
| 1483 |
history_entry["related_ticket_id"] = ticket.related_ticket_id
|
| 1484 |
related_ticket = self._tickets_by_id.get(ticket.related_ticket_id)
|
|
@@ -1503,6 +2148,8 @@ class HelpdeskTicketRoutingEnvironment(
|
|
| 1503 |
history_entry["tool_result"] = tool_result
|
| 1504 |
if reward_components is not None:
|
| 1505 |
history_entry["reward_components"] = reward_components
|
|
|
|
|
|
|
| 1506 |
if progress["required_tools"]:
|
| 1507 |
history_entry["context_progress"] = {
|
| 1508 |
"hidden_context_remaining": bool(progress["remaining_count"]),
|
|
@@ -1562,12 +2209,15 @@ class HelpdeskTicketRoutingEnvironment(
|
|
| 1562 |
and (ticket_view.get("context_status") or {}).get("hidden_context_remaining")
|
| 1563 |
),
|
| 1564 |
"action_mode": "investigate_or_submit",
|
| 1565 |
-
"available_action_types":
|
| 1566 |
"average_score_so_far": self._state.average_score_so_far,
|
| 1567 |
"progress_fraction": progress_fraction,
|
| 1568 |
"investigation_penalty_applied": self._state.investigation_penalty_applied,
|
| 1569 |
"planning_penalty_total": self._state.planning_penalty_total,
|
| 1570 |
"planning_penalty_applied": self._state.planning_penalty_applied,
|
|
|
|
|
|
|
|
|
|
| 1571 |
}
|
| 1572 |
if self._state.current_task_id == 3:
|
| 1573 |
metadata["capacity_state"] = self._capacity_state_snapshot()
|
|
@@ -1591,8 +2241,8 @@ class HelpdeskTicketRoutingEnvironment(
|
|
| 1591 |
task_name=task["name"],
|
| 1592 |
instructions=task["instructions"],
|
| 1593 |
allowed_fields=list(task["allowed_fields"]),
|
| 1594 |
-
available_action_types=
|
| 1595 |
-
available_tools=
|
| 1596 |
investigation_budget_remaining=self._state.investigation_budget_remaining,
|
| 1597 |
last_tool_result=self._state.last_tool_result,
|
| 1598 |
current_ticket=ticket_view,
|
|
|
|
| 26 |
|
| 27 |
|
| 28 |
QUEUE_SIZE_RANGE = (3, 5)
|
| 29 |
+
BASE_AVAILABLE_TOOLS = (
|
|
|
|
| 30 |
"lookup_related_ticket",
|
| 31 |
"lookup_requester_history",
|
| 32 |
"lookup_internal_routing_note",
|
| 33 |
"lookup_queue_capacity_forecast",
|
| 34 |
)
|
| 35 |
+
TASK_AVAILABLE_ACTION_TYPES: dict[int, tuple[str, ...]] = {
|
| 36 |
+
1: ("submit", "investigate"),
|
| 37 |
+
2: ("submit", "investigate", "request_info"),
|
| 38 |
+
3: ("submit", "investigate", "request_info", "defer", "open_incident"),
|
| 39 |
+
}
|
| 40 |
+
TASK_AVAILABLE_TOOLS: dict[int, tuple[str, ...]] = {
|
| 41 |
+
1: (
|
| 42 |
+
"lookup_related_ticket",
|
| 43 |
+
"lookup_requester_history",
|
| 44 |
+
"lookup_internal_routing_note",
|
| 45 |
+
),
|
| 46 |
+
2: (
|
| 47 |
+
"lookup_related_ticket",
|
| 48 |
+
"lookup_requester_history",
|
| 49 |
+
"lookup_internal_routing_note",
|
| 50 |
+
),
|
| 51 |
+
3: BASE_AVAILABLE_TOOLS,
|
| 52 |
+
}
|
| 53 |
FREE_INVESTIGATIONS_PER_TICKET = 1
|
| 54 |
EXTRA_INVESTIGATION_COST = 0.04
|
| 55 |
MAX_EXTRA_INVESTIGATION_PENALTY = 0.25
|
| 56 |
USEFUL_INVESTIGATION_REWARD = 0.03
|
| 57 |
+
USEFUL_REQUEST_INFO_REWARD = 0.025
|
| 58 |
+
INCIDENT_OPEN_REWARD = 0.03
|
| 59 |
+
REQUEST_INFO_CONTEXT_COMPLETION_BONUS = 0.02
|
| 60 |
PREMATURE_SUBMIT_PENALTY = 0.22
|
| 61 |
NONDEFAULT_HIDDEN_CONTEXT_PENALTY = 0.08
|
| 62 |
CONTEXT_COMPLETION_BONUS = 0.06
|
|
|
|
| 69 |
HIGH_PRIORITY_SLOT_OVERFLOW_PENALTY = 0.06
|
| 70 |
ESCALATION_SLOT_OVERFLOW_PENALTY = 0.05
|
| 71 |
PLANNING_SUCCESS_BONUS = 0.05
|
| 72 |
+
INCIDENT_SLOT_OVERFLOW_PENALTY = 0.05
|
| 73 |
+
INCIDENT_GAP_PENALTY = 0.07
|
| 74 |
+
SLA_BREACH_PENALTY = 0.04
|
| 75 |
+
FOLLOW_UP_SPAWN_THRESHOLD = 0.72
|
| 76 |
+
MAX_DEFERS_PER_TICKET = 1
|
| 77 |
|
| 78 |
TASK3_INVESTIGATION_TOOL_PLAN: dict[str, tuple[str, ...]] = {
|
| 79 |
"ticket-021": ("lookup_related_ticket", "lookup_requester_history"),
|
|
|
|
| 195 |
team_capacity_initial,
|
| 196 |
high_priority_slots_initial,
|
| 197 |
escalation_slots_initial,
|
| 198 |
+
incident_slots_initial,
|
| 199 |
) = self._initial_capacity_state_for_queue(task_id)
|
| 200 |
|
| 201 |
self._state = HelpdeskTicketState(
|
|
|
|
| 219 |
high_priority_slots_remaining=high_priority_slots_initial,
|
| 220 |
escalation_slots_initial=escalation_slots_initial,
|
| 221 |
escalation_slots_remaining=escalation_slots_initial,
|
| 222 |
+
incident_slots_initial=incident_slots_initial,
|
| 223 |
+
incident_slots_remaining=incident_slots_initial,
|
| 224 |
planning_penalty_total=0.0,
|
| 225 |
capacity_pressure_tickets_resolved=0,
|
| 226 |
+
ticket_request_info_usage={},
|
| 227 |
+
ticket_defer_counts={},
|
| 228 |
+
open_incident_ticket_ids=[],
|
| 229 |
+
incident_actions_used=0,
|
| 230 |
+
incident_gap_total=0.0,
|
| 231 |
+
deferred_ticket_count=0,
|
| 232 |
+
sla_breach_count=0,
|
| 233 |
+
spawned_follow_up_ticket_ids=[],
|
| 234 |
+
spawned_follow_up_source_ids=[],
|
| 235 |
+
dynamic_queue_events=[],
|
| 236 |
)
|
| 237 |
|
| 238 |
return self._build_observation(task)
|
|
|
|
| 253 |
current_ticket = self._queue[idx]
|
| 254 |
task_id = self._state.current_task_id
|
| 255 |
task = get_task_definition(task_id)
|
| 256 |
+
if action.action_type not in self._available_action_types_for_task(task_id):
|
| 257 |
+
raise ValueError(
|
| 258 |
+
f"Unsupported action_type {action.action_type!r} for task {task_id}"
|
| 259 |
+
)
|
| 260 |
|
| 261 |
if action.action_type == "investigate":
|
| 262 |
return self._handle_investigation_action(task, current_ticket, action, idx)
|
| 263 |
+
if action.action_type == "request_info":
|
| 264 |
+
return self._handle_request_info_action(task, current_ticket, action, idx)
|
| 265 |
+
if action.action_type == "defer":
|
| 266 |
+
return self._handle_defer_action(task, current_ticket, action, idx)
|
| 267 |
+
if action.action_type == "open_incident":
|
| 268 |
+
return self._handle_open_incident_action(task, current_ticket, action, idx)
|
| 269 |
|
| 270 |
submitted_fields = {
|
| 271 |
f
|
|
|
|
| 365 |
action,
|
| 366 |
task_id=task_id,
|
| 367 |
)
|
| 368 |
+
incident_gap_penalty = self._incident_gap_penalty(current_ticket, action)
|
| 369 |
capacity_penalty, capacity_details = self._apply_capacity_usage(
|
| 370 |
current_ticket,
|
| 371 |
action,
|
|
|
|
| 402 |
trajectory_reward - self._state.planning_penalty_total
|
| 403 |
)
|
| 404 |
final_reward = clamp_open_unit_interval(
|
| 405 |
+
rubric_reward - context_penalty - capacity_penalty - incident_gap_penalty
|
| 406 |
)
|
| 407 |
self._state.total_reward = rubric_reward
|
| 408 |
investigation_penalty = self._compute_episode_penalty()
|
|
|
|
| 412 |
self._state.step_count += 1
|
| 413 |
self._state.current_ticket_index += 1
|
| 414 |
final_reward = clamp_open_unit_interval(
|
| 415 |
+
step_reward - context_penalty - capacity_penalty - incident_gap_penalty
|
| 416 |
+
)
|
| 417 |
+
|
| 418 |
+
spawned_follow_up_ticket_id = None
|
| 419 |
+
if self._should_spawn_follow_up(
|
| 420 |
+
current_ticket,
|
| 421 |
+
score=score,
|
| 422 |
+
context_penalty=context_penalty,
|
| 423 |
+
incident_gap_penalty=incident_gap_penalty,
|
| 424 |
+
):
|
| 425 |
+
spawned_follow_up = self._spawn_follow_up_ticket(current_ticket)
|
| 426 |
+
spawned_follow_up_ticket_id = spawned_follow_up.ticket_id
|
| 427 |
+
if is_done:
|
| 428 |
+
is_done = False
|
| 429 |
+
trajectory_reward = None
|
| 430 |
+
trajectory_components = None
|
| 431 |
+
rubric_reward = None
|
| 432 |
+
final_reward = clamp_open_unit_interval(
|
| 433 |
+
step_reward - context_penalty - capacity_penalty - incident_gap_penalty
|
| 434 |
+
)
|
| 435 |
+
self._state.total_reward = 0.0
|
| 436 |
+
if incident_gap_penalty > 0.0:
|
| 437 |
+
self._state.incident_gap_total = round(
|
| 438 |
+
self._state.incident_gap_total + incident_gap_penalty,
|
| 439 |
+
4,
|
| 440 |
)
|
| 441 |
|
| 442 |
reward_components = self._build_reward_components(
|
|
|
|
| 452 |
"context_gap_penalty": context_penalty,
|
| 453 |
"context_completion_bonus": process_bonus,
|
| 454 |
"risk_penalty": risk_penalty,
|
| 455 |
+
"incident_gap_penalty": incident_gap_penalty,
|
| 456 |
"capacity_penalty": capacity_penalty,
|
| 457 |
"delta_adjustment": step_adjustments["delta_adjustment"],
|
| 458 |
"required_investigation_count": len(self._required_tools_for_ticket(current_ticket)),
|
|
|
|
| 465 |
"planning_success_bonus": self._planning_success_bonus()
|
| 466 |
if is_done
|
| 467 |
else 0.0,
|
| 468 |
+
"spawned_follow_up_ticket_id": spawned_follow_up_ticket_id,
|
| 469 |
"rubric_reward": rubric_reward,
|
| 470 |
"trajectory_average_reward": (
|
| 471 |
trajectory_components["average_reward"]
|
|
|
|
| 532 |
|
| 533 |
def _apply_episode_economics(self, base_reward: float) -> float:
|
| 534 |
penalty = self._compute_episode_penalty()
|
| 535 |
+
penalty += min(
|
| 536 |
+
0.25,
|
| 537 |
+
self._state.sla_breach_count * SLA_BREACH_PENALTY + self._state.incident_gap_total,
|
| 538 |
+
)
|
| 539 |
return clamp_open_unit_interval(base_reward - penalty)
|
| 540 |
|
| 541 |
def _current_average_score(self) -> float:
|
|
|
|
| 543 |
return 0.0
|
| 544 |
return sum(self._state.per_ticket_scores) / len(self._state.per_ticket_scores)
|
| 545 |
|
| 546 |
+
def _available_action_types_for_task(self, task_id: int | None = None) -> list[str]:
|
| 547 |
+
resolved_task_id = self._state.current_task_id if task_id is None else task_id
|
| 548 |
+
return list(TASK_AVAILABLE_ACTION_TYPES.get(int(resolved_task_id or 1), ("submit",)))
|
| 549 |
+
|
| 550 |
+
def _available_tools_for_task(self, task_id: int | None = None) -> list[str]:
|
| 551 |
+
resolved_task_id = self._state.current_task_id if task_id is None else task_id
|
| 552 |
+
return list(TASK_AVAILABLE_TOOLS.get(int(resolved_task_id or 1), ()))
|
| 553 |
+
|
| 554 |
+
def _sync_queue_ticket_ids(self) -> None:
|
| 555 |
+
self._state.queue_ticket_ids = [ticket.ticket_id for ticket in self._queue]
|
| 556 |
+
|
| 557 |
def _ticket_has_alternate_route(self, ticket: HelpdeskTicketRecord) -> bool:
|
| 558 |
return any(
|
| 559 |
value is not None
|
|
|
|
| 642 |
def _initial_capacity_state_for_queue(
|
| 643 |
self,
|
| 644 |
task_id: int,
|
| 645 |
+
) -> tuple[dict[str, int], int, int, int]:
|
| 646 |
if task_id != 3:
|
| 647 |
+
return {}, 0, 0, 0
|
| 648 |
|
| 649 |
primary_group_demand: dict[str, int] = {}
|
| 650 |
alternate_relief_by_group: dict[str, int] = {}
|
|
|
|
| 653 |
high_priority_relief = 0
|
| 654 |
escalation_demand = 0
|
| 655 |
escalation_relief = 0
|
| 656 |
+
incident_demand = 0
|
| 657 |
|
| 658 |
for ticket in self._queue:
|
| 659 |
primary_route = self._route_for_ticket(ticket)
|
|
|
|
| 665 |
high_priority_demand += 1
|
| 666 |
if primary_route["resolution_action"] in {"assign", "escalate"}:
|
| 667 |
escalation_demand += 1
|
| 668 |
+
if self._requires_incident(ticket):
|
| 669 |
+
incident_demand += 1
|
| 670 |
|
| 671 |
if self._ticket_has_alternate_route(ticket):
|
| 672 |
alternate_route = self._route_for_ticket(ticket, use_alternate=True)
|
|
|
|
| 715 |
else:
|
| 716 |
escalation_slots_initial = escalation_demand
|
| 717 |
|
| 718 |
+
if incident_demand <= 1:
|
| 719 |
+
incident_slots_initial = incident_demand
|
| 720 |
+
else:
|
| 721 |
+
incident_slots_initial = max(1, incident_demand - 1)
|
| 722 |
+
|
| 723 |
return (
|
| 724 |
team_capacity_initial,
|
| 725 |
high_priority_slots_initial,
|
| 726 |
escalation_slots_initial,
|
| 727 |
+
incident_slots_initial,
|
| 728 |
)
|
| 729 |
|
| 730 |
def _future_queue_demand(self) -> dict[str, Any]:
|
|
|
|
| 733 |
high_priority_needed = 0
|
| 734 |
escalation_needed = 0
|
| 735 |
capacity_sensitive_tickets = 0
|
| 736 |
+
incident_needed = 0
|
| 737 |
|
| 738 |
for ticket in future_tickets:
|
| 739 |
route = self._route_for_ticket(ticket)
|
|
|
|
| 746 |
escalation_needed += 1
|
| 747 |
if self._ticket_has_alternate_route(ticket):
|
| 748 |
capacity_sensitive_tickets += 1
|
| 749 |
+
if self._requires_incident(ticket):
|
| 750 |
+
incident_needed += 1
|
| 751 |
|
| 752 |
return {
|
| 753 |
"remaining_ticket_count": len(future_tickets),
|
|
|
|
| 755 |
"high_priority_needed": high_priority_needed,
|
| 756 |
"escalation_needed": escalation_needed,
|
| 757 |
"capacity_sensitive_tickets": capacity_sensitive_tickets,
|
| 758 |
+
"incident_needed": incident_needed,
|
| 759 |
}
|
| 760 |
|
| 761 |
def _capacity_state_snapshot(self) -> dict[str, Any]:
|
|
|
|
| 766 |
"high_priority_slots_initial": self._state.high_priority_slots_initial,
|
| 767 |
"escalation_slots_remaining": self._state.escalation_slots_remaining,
|
| 768 |
"escalation_slots_initial": self._state.escalation_slots_initial,
|
| 769 |
+
"incident_slots_remaining": self._state.incident_slots_remaining,
|
| 770 |
+
"incident_slots_initial": self._state.incident_slots_initial,
|
| 771 |
}
|
| 772 |
|
| 773 |
def _planning_route_recommendation(self, ticket: HelpdeskTicketRecord) -> dict[str, Any]:
|
|
|
|
| 1002 |
)
|
| 1003 |
)
|
| 1004 |
|
| 1005 |
+
def _ticket_text(self, ticket: HelpdeskTicketRecord) -> str:
|
| 1006 |
+
return f"{ticket.title} {ticket.description}".lower()
|
| 1007 |
+
|
| 1008 |
+
def _requires_incident(self, ticket: HelpdeskTicketRecord) -> bool:
|
| 1009 |
+
if ticket.incident_recommended:
|
| 1010 |
+
return True
|
| 1011 |
+
text = self._ticket_text(ticket)
|
| 1012 |
+
return (
|
| 1013 |
+
ticket.priority in {"high", "critical"}
|
| 1014 |
+
and ticket.issue_type
|
| 1015 |
+
in {"application_support", "identity_access", "security_compliance"}
|
| 1016 |
+
and any(
|
| 1017 |
+
phrase in text
|
| 1018 |
+
for phrase in (
|
| 1019 |
+
"outage",
|
| 1020 |
+
"cannot log in",
|
| 1021 |
+
"login",
|
| 1022 |
+
"regression",
|
| 1023 |
+
"unstable",
|
| 1024 |
+
"blocked",
|
| 1025 |
+
"lockout",
|
| 1026 |
+
"company-wide",
|
| 1027 |
+
"production",
|
| 1028 |
+
"unresolved",
|
| 1029 |
+
)
|
| 1030 |
+
)
|
| 1031 |
+
)
|
| 1032 |
+
|
| 1033 |
+
def _incident_open_for_ticket(self, ticket: HelpdeskTicketRecord) -> bool:
|
| 1034 |
+
related_ids = {ticket.ticket_id}
|
| 1035 |
+
if ticket.related_ticket_id:
|
| 1036 |
+
related_ids.add(ticket.related_ticket_id)
|
| 1037 |
+
if ticket.generated_from_ticket_id:
|
| 1038 |
+
related_ids.add(ticket.generated_from_ticket_id)
|
| 1039 |
+
return any(ticket_id in self._state.open_incident_ticket_ids for ticket_id in related_ids)
|
| 1040 |
+
|
| 1041 |
+
def _request_info_note_for_ticket(self, ticket: HelpdeskTicketRecord) -> str | None:
|
| 1042 |
+
note_parts: list[str] = []
|
| 1043 |
+
if ticket.customer_update_note:
|
| 1044 |
+
note_parts.append(ticket.customer_update_note)
|
| 1045 |
+
if ticket.related_ticket_id is not None:
|
| 1046 |
+
note_parts.append(
|
| 1047 |
+
"The requester confirmed this is connected to the earlier case and wants a single accountable owner."
|
| 1048 |
+
)
|
| 1049 |
+
if self._ticket_has_nondefault_routing(ticket):
|
| 1050 |
+
note_parts.append(
|
| 1051 |
+
"The requester clarified that the blocker owner matters more than the superficial request label."
|
| 1052 |
+
)
|
| 1053 |
+
if self._ticket_has_alternate_route(ticket):
|
| 1054 |
+
note_parts.append(
|
| 1055 |
+
"Operations said an acknowledged fallback path is acceptable if the preferred queue is saturated."
|
| 1056 |
+
)
|
| 1057 |
+
if self._requires_incident(ticket):
|
| 1058 |
+
note_parts.append(
|
| 1059 |
+
"Stakeholders asked for incident-style coordination because the issue is still operationally active."
|
| 1060 |
+
)
|
| 1061 |
+
if not note_parts:
|
| 1062 |
+
return None
|
| 1063 |
+
return " ".join(note_parts)
|
| 1064 |
+
|
| 1065 |
+
def _request_info_used(self, ticket_id: str) -> bool:
|
| 1066 |
+
return self._state.ticket_request_info_usage.get(ticket_id, 0) > 0
|
| 1067 |
+
|
| 1068 |
+
def _defer_count(self, ticket_id: str) -> int:
|
| 1069 |
+
return self._state.ticket_defer_counts.get(ticket_id, 0)
|
| 1070 |
+
|
| 1071 |
+
def _record_dynamic_queue_event(self, event_type: str, **details: Any) -> None:
|
| 1072 |
+
self._state.dynamic_queue_events.append({"event_type": event_type, **details})
|
| 1073 |
+
|
| 1074 |
+
def _escalate_priority_level(self, priority: str) -> str:
|
| 1075 |
+
if priority == "low":
|
| 1076 |
+
return "medium"
|
| 1077 |
+
if priority == "medium":
|
| 1078 |
+
return "high"
|
| 1079 |
+
return "critical"
|
| 1080 |
+
|
| 1081 |
+
def _escalate_ticket_after_delay(
|
| 1082 |
+
self,
|
| 1083 |
+
ticket: HelpdeskTicketRecord,
|
| 1084 |
+
*,
|
| 1085 |
+
defer_count: int,
|
| 1086 |
+
) -> HelpdeskTicketRecord:
|
| 1087 |
+
escalated_priority = self._escalate_priority_level(ticket.priority)
|
| 1088 |
+
description_suffix = (
|
| 1089 |
+
" The ticket was deferred earlier in the queue and now needs firmer ownership."
|
| 1090 |
+
)
|
| 1091 |
+
customer_update = (
|
| 1092 |
+
ticket.customer_update_note
|
| 1093 |
+
or "The requester followed up after the delay and wants a committed owner."
|
| 1094 |
+
)
|
| 1095 |
+
return ticket.model_copy(
|
| 1096 |
+
update={
|
| 1097 |
+
"priority": escalated_priority,
|
| 1098 |
+
"title": (
|
| 1099 |
+
ticket.title
|
| 1100 |
+
if ticket.title.lower().startswith("re:")
|
| 1101 |
+
else f"Re: {ticket.title}"
|
| 1102 |
+
),
|
| 1103 |
+
"description": f"{ticket.description}{description_suffix}",
|
| 1104 |
+
"customer_update_note": customer_update,
|
| 1105 |
+
}
|
| 1106 |
+
)
|
| 1107 |
+
|
| 1108 |
+
def _should_spawn_follow_up(
|
| 1109 |
+
self,
|
| 1110 |
+
ticket: HelpdeskTicketRecord,
|
| 1111 |
+
*,
|
| 1112 |
+
score: float,
|
| 1113 |
+
context_penalty: float,
|
| 1114 |
+
incident_gap_penalty: float,
|
| 1115 |
+
) -> bool:
|
| 1116 |
+
if self._state.current_task_id != 3:
|
| 1117 |
+
return False
|
| 1118 |
+
if ticket.generated_from_ticket_id is not None:
|
| 1119 |
+
return False
|
| 1120 |
+
if ticket.ticket_id in self._state.spawned_follow_up_source_ids:
|
| 1121 |
+
return False
|
| 1122 |
+
if not (
|
| 1123 |
+
self._requires_incident(ticket)
|
| 1124 |
+
or self._ticket_mentions_follow_up(ticket)
|
| 1125 |
+
or ticket.related_ticket_id is not None
|
| 1126 |
+
or ticket.priority in {"high", "critical"}
|
| 1127 |
+
):
|
| 1128 |
+
return False
|
| 1129 |
+
return (
|
| 1130 |
+
score < FOLLOW_UP_SPAWN_THRESHOLD
|
| 1131 |
+
or (context_penalty >= 0.15 and score < 0.9)
|
| 1132 |
+
or incident_gap_penalty > 0.0
|
| 1133 |
+
)
|
| 1134 |
+
|
| 1135 |
+
def _spawn_follow_up_ticket(self, ticket: HelpdeskTicketRecord) -> HelpdeskTicketRecord:
|
| 1136 |
+
follow_up_ticket = HelpdeskTicketRecord(
|
| 1137 |
+
ticket_id=f"{ticket.ticket_id}-followup",
|
| 1138 |
+
title=(
|
| 1139 |
+
ticket.title
|
| 1140 |
+
if ticket.title.lower().startswith("re:")
|
| 1141 |
+
else f"Re: {ticket.title}"
|
| 1142 |
+
),
|
| 1143 |
+
requester=ticket.requester,
|
| 1144 |
+
description=(
|
| 1145 |
+
"The earlier handling did not fully resolve the issue. The requester is "
|
| 1146 |
+
f"following up on {ticket.ticket_id} and needs a single accountable owner now."
|
| 1147 |
+
),
|
| 1148 |
+
issue_type=ticket.issue_type,
|
| 1149 |
+
priority=(
|
| 1150 |
+
"critical"
|
| 1151 |
+
if ticket.priority in {"high", "critical"}
|
| 1152 |
+
else self._escalate_priority_level(ticket.priority)
|
| 1153 |
+
),
|
| 1154 |
+
assignment_group=ticket.assignment_group,
|
| 1155 |
+
resolution_action=(
|
| 1156 |
+
"escalate"
|
| 1157 |
+
if ticket.priority in {"high", "critical"} or self._requires_incident(ticket)
|
| 1158 |
+
else ticket.resolution_action
|
| 1159 |
+
),
|
| 1160 |
+
ambiguity_note=(
|
| 1161 |
+
ticket.ambiguity_note
|
| 1162 |
+
or "Prior routing did not settle ownership; route to the team that can actually unblock the issue."
|
| 1163 |
+
),
|
| 1164 |
+
related_ticket_id=ticket.ticket_id,
|
| 1165 |
+
planning_note=ticket.planning_note,
|
| 1166 |
+
customer_update_note=(
|
| 1167 |
+
"The requester said the last response did not resolve the blocker and wants an accountable next owner."
|
| 1168 |
+
),
|
| 1169 |
+
incident_recommended=self._requires_incident(ticket),
|
| 1170 |
+
generated_from_ticket_id=ticket.ticket_id,
|
| 1171 |
+
)
|
| 1172 |
+
self._queue.append(follow_up_ticket)
|
| 1173 |
+
self._tickets_by_id[follow_up_ticket.ticket_id] = follow_up_ticket
|
| 1174 |
+
self._sync_queue_ticket_ids()
|
| 1175 |
+
self._state.spawned_follow_up_ticket_ids.append(follow_up_ticket.ticket_id)
|
| 1176 |
+
self._state.spawned_follow_up_source_ids.append(ticket.ticket_id)
|
| 1177 |
+
self._record_dynamic_queue_event(
|
| 1178 |
+
"spawn_follow_up",
|
| 1179 |
+
source_ticket_id=ticket.ticket_id,
|
| 1180 |
+
follow_up_ticket_id=follow_up_ticket.ticket_id,
|
| 1181 |
+
)
|
| 1182 |
+
return follow_up_ticket
|
| 1183 |
+
|
| 1184 |
def _ticket_repeated_requester_count(self, ticket: HelpdeskTicketRecord) -> int:
|
| 1185 |
+
return sum(
|
| 1186 |
+
1
|
| 1187 |
+
for candidate in self._tickets_by_id.values()
|
| 1188 |
+
if candidate.requester == ticket.requester
|
| 1189 |
+
)
|
| 1190 |
|
| 1191 |
def _tool_has_available_context(
|
| 1192 |
self,
|
|
|
|
| 1215 |
task_id: int | None = None,
|
| 1216 |
) -> list[str]:
|
| 1217 |
resolved_task_id = self._state.current_task_id if task_id is None else task_id
|
| 1218 |
+
if resolved_task_id is None or resolved_task_id < 2:
|
| 1219 |
return []
|
| 1220 |
required_tools: list[str] = list(TASK3_INVESTIGATION_TOOL_PLAN.get(ticket.ticket_id, ()))
|
| 1221 |
if ticket.related_ticket_id is not None and "lookup_related_ticket" not in required_tools:
|
|
|
|
| 1237 |
):
|
| 1238 |
required_tools.append("lookup_requester_history")
|
| 1239 |
if (
|
| 1240 |
+
resolved_task_id == 3
|
| 1241 |
+
and self._ticket_is_capacity_sensitive(ticket)
|
| 1242 |
and "lookup_queue_capacity_forecast" not in required_tools
|
| 1243 |
):
|
| 1244 |
required_tools.append("lookup_queue_capacity_forecast")
|
| 1245 |
filtered_required_tools: list[str] = []
|
| 1246 |
+
allowed_tool_set = set(self._available_tools_for_task(resolved_task_id))
|
| 1247 |
for tool_name in required_tools:
|
| 1248 |
if tool_name in filtered_required_tools:
|
| 1249 |
continue
|
| 1250 |
+
if tool_name not in allowed_tool_set:
|
| 1251 |
+
continue
|
| 1252 |
if self._tool_has_available_context(ticket, tool_name):
|
| 1253 |
filtered_required_tools.append(tool_name)
|
| 1254 |
return filtered_required_tools
|
| 1255 |
|
| 1256 |
+
def _recommended_operational_actions(self, ticket: HelpdeskTicketRecord) -> list[str]:
|
| 1257 |
+
recommended_actions: list[str] = []
|
| 1258 |
+
available_action_types = set(self._available_action_types_for_task())
|
| 1259 |
+
if (
|
| 1260 |
+
"request_info" in available_action_types
|
| 1261 |
+
and self._request_info_note_for_ticket(ticket) is not None
|
| 1262 |
+
and not self._request_info_used(ticket.ticket_id)
|
| 1263 |
+
):
|
| 1264 |
+
recommended_actions.append("request_info")
|
| 1265 |
+
if (
|
| 1266 |
+
"open_incident" in available_action_types
|
| 1267 |
+
and self._requires_incident(ticket)
|
| 1268 |
+
and not self._incident_open_for_ticket(ticket)
|
| 1269 |
+
):
|
| 1270 |
+
recommended_actions.append("open_incident")
|
| 1271 |
+
if (
|
| 1272 |
+
"defer" in available_action_types
|
| 1273 |
+
and self._defer_count(ticket.ticket_id) < MAX_DEFERS_PER_TICKET
|
| 1274 |
+
and self._state.current_ticket_index < len(self._queue) - 1
|
| 1275 |
+
and ticket.priority not in {"high", "critical"}
|
| 1276 |
+
and (
|
| 1277 |
+
bool(self._remaining_tools_for_ticket(ticket))
|
| 1278 |
+
or self._ticket_is_capacity_sensitive(ticket)
|
| 1279 |
+
or self._request_info_note_for_ticket(ticket) is not None
|
| 1280 |
+
)
|
| 1281 |
+
):
|
| 1282 |
+
recommended_actions.append("defer")
|
| 1283 |
+
return recommended_actions
|
| 1284 |
+
|
| 1285 |
def _used_tools_for_ticket(self, ticket_id: str) -> list[str]:
|
| 1286 |
return list(self._state.ticket_tool_usage.get(ticket_id, []))
|
| 1287 |
|
|
|
|
| 1304 |
revealed_tools = self._used_tools_for_ticket(ticket.ticket_id)
|
| 1305 |
remaining_tools = self._remaining_tools_for_ticket(ticket)
|
| 1306 |
total_required = max(1, len(required_tools))
|
| 1307 |
+
request_info_used = self._request_info_used(ticket.ticket_id)
|
| 1308 |
+
operational_actions = self._recommended_operational_actions(ticket)
|
| 1309 |
return {
|
| 1310 |
"required_tools": required_tools,
|
| 1311 |
"revealed_tools": revealed_tools,
|
|
|
|
| 1313 |
"revealed_count": len(revealed_tools),
|
| 1314 |
"remaining_count": len(remaining_tools),
|
| 1315 |
"completeness": round(len(revealed_tools) / total_required, 2),
|
| 1316 |
+
"request_info_used": request_info_used,
|
| 1317 |
+
"recommended_operational_actions": operational_actions,
|
| 1318 |
}
|
| 1319 |
|
| 1320 |
def _default_redacted_description(self, ticket: HelpdeskTicketRecord) -> str:
|
|
|
|
| 1353 |
return "Helpdesk routing decision"
|
| 1354 |
|
| 1355 |
def _visible_title(self, ticket: HelpdeskTicketRecord) -> str:
|
| 1356 |
+
if self._state.current_task_id in {2, 3} and self._remaining_tools_for_ticket(ticket):
|
| 1357 |
return HARD_TASK_TITLE_REDACTIONS.get(
|
| 1358 |
ticket.ticket_id,
|
| 1359 |
self._default_redacted_title(ticket),
|
|
|
|
| 1361 |
return ticket.title
|
| 1362 |
|
| 1363 |
def _visible_description(self, ticket: HelpdeskTicketRecord) -> str:
|
| 1364 |
+
if self._state.current_task_id in {2, 3} and self._remaining_tools_for_ticket(ticket):
|
| 1365 |
return HARD_TASK_DESCRIPTION_REDACTIONS.get(
|
| 1366 |
ticket.ticket_id,
|
| 1367 |
self._default_redacted_description(ticket),
|
|
|
|
| 1447 |
|
| 1448 |
return round(priority_penalty + resolution_penalty, 4)
|
| 1449 |
|
| 1450 |
+
def _incident_gap_penalty(
|
| 1451 |
+
self,
|
| 1452 |
+
ticket: HelpdeskTicketRecord,
|
| 1453 |
+
action: HelpdeskTicketAction,
|
| 1454 |
+
) -> float:
|
| 1455 |
+
if self._state.current_task_id != 3:
|
| 1456 |
+
return 0.0
|
| 1457 |
+
if not self._requires_incident(ticket):
|
| 1458 |
+
return 0.0
|
| 1459 |
+
if self._incident_open_for_ticket(ticket):
|
| 1460 |
+
return 0.0
|
| 1461 |
+
if action.resolution_action in {"escalate", "assign"}:
|
| 1462 |
+
return round(INCIDENT_GAP_PENALTY / 2, 4)
|
| 1463 |
+
return INCIDENT_GAP_PENALTY
|
| 1464 |
+
|
| 1465 |
def _build_reward_components(
|
| 1466 |
self,
|
| 1467 |
*,
|
|
|
|
| 1538 |
"assignment_group": ticket.assignment_group,
|
| 1539 |
"resolution_action": ticket.resolution_action,
|
| 1540 |
}
|
| 1541 |
+
for ticket in self._tickets_by_id.values()
|
| 1542 |
if ticket.requester == current_ticket.requester
|
| 1543 |
and ticket.ticket_id != current_ticket.ticket_id
|
| 1544 |
]
|
|
|
|
| 1575 |
"capacity_state": recommendation["capacity_state"],
|
| 1576 |
"future_queue_demand": recommendation["future_demand"],
|
| 1577 |
"routing_options": routing_options,
|
| 1578 |
+
"incident_recommended": self._requires_incident(current_ticket),
|
| 1579 |
}
|
| 1580 |
|
| 1581 |
def _run_investigation_tool(
|
|
|
|
| 1603 |
) -> HelpdeskTicketObservation:
|
| 1604 |
if action.tool_name is None:
|
| 1605 |
raise ValueError("Investigate actions require tool_name")
|
| 1606 |
+
if action.tool_name not in self._available_tools_for_task():
|
| 1607 |
+
raise ValueError(f"Unsupported tool_name for current task: {action.tool_name}")
|
| 1608 |
submitted_fields = {
|
| 1609 |
field
|
| 1610 |
for field in ("issue_type", "priority", "assignment_group", "resolution_action")
|
|
|
|
| 1675 |
self._state.last_reward_components = reward_components
|
| 1676 |
return self._build_observation(task, done=False, reward=investigation_reward)
|
| 1677 |
|
| 1678 |
+
def _handle_request_info_action(
|
| 1679 |
+
self,
|
| 1680 |
+
task: dict,
|
| 1681 |
+
current_ticket: HelpdeskTicketRecord,
|
| 1682 |
+
action: HelpdeskTicketAction,
|
| 1683 |
+
idx: int,
|
| 1684 |
+
) -> HelpdeskTicketObservation:
|
| 1685 |
+
submitted_fields = {
|
| 1686 |
+
field
|
| 1687 |
+
for field in ("issue_type", "priority", "assignment_group", "resolution_action")
|
| 1688 |
+
if getattr(action, field) is not None
|
| 1689 |
+
}
|
| 1690 |
+
if submitted_fields:
|
| 1691 |
+
raise ValueError(
|
| 1692 |
+
"request_info actions cannot include submit fields: "
|
| 1693 |
+
f"{sorted(submitted_fields)}"
|
| 1694 |
+
)
|
| 1695 |
+
|
| 1696 |
+
ticket_id = current_ticket.ticket_id
|
| 1697 |
+
note = self._request_info_note_for_ticket(current_ticket)
|
| 1698 |
+
already_used = self._request_info_used(ticket_id)
|
| 1699 |
+
useful_request = note is not None and not already_used
|
| 1700 |
+
self._state.ticket_request_info_usage[ticket_id] = (
|
| 1701 |
+
self._state.ticket_request_info_usage.get(ticket_id, 0) + 1
|
| 1702 |
+
)
|
| 1703 |
+
self._state.step_count += 1
|
| 1704 |
+
self._state.investigation_steps += 1
|
| 1705 |
+
self._state.investigation_budget_remaining = max(
|
| 1706 |
+
0,
|
| 1707 |
+
self._state.investigation_budget_remaining - 1,
|
| 1708 |
+
)
|
| 1709 |
+
request_reward = USEFUL_REQUEST_INFO_REWARD if useful_request else 0.0
|
| 1710 |
+
tool_result = {
|
| 1711 |
+
"action_type": "request_info",
|
| 1712 |
+
"found": useful_request,
|
| 1713 |
+
"ticket_id": ticket_id,
|
| 1714 |
+
"customer_update_note": note if useful_request else "",
|
| 1715 |
+
}
|
| 1716 |
+
self._state.last_tool_result = tool_result
|
| 1717 |
+
self._state.last_step_reward = request_reward
|
| 1718 |
+
self._state.reward = request_reward
|
| 1719 |
+
self._state.done = False
|
| 1720 |
+
self._state.investigation_penalty_applied = self._compute_episode_penalty()
|
| 1721 |
+
progress = self._tool_progress_for_ticket(current_ticket)
|
| 1722 |
+
reward_components = self._build_reward_components(
|
| 1723 |
+
ticket_score=0.0,
|
| 1724 |
+
field_breakdown={},
|
| 1725 |
+
shaped_step_reward=request_reward,
|
| 1726 |
+
reward_kind="operational",
|
| 1727 |
+
final_reward=request_reward,
|
| 1728 |
+
investigation_penalty=self._state.investigation_penalty_applied,
|
| 1729 |
+
extra_details={
|
| 1730 |
+
"operational_action": "request_info",
|
| 1731 |
+
"new_context_revealed": useful_request,
|
| 1732 |
+
"customer_update_visible": useful_request,
|
| 1733 |
+
"hidden_context_remaining_count": progress["remaining_count"],
|
| 1734 |
+
"context_completeness": progress["completeness"],
|
| 1735 |
+
},
|
| 1736 |
+
)
|
| 1737 |
+
self._state.history_entries.append(
|
| 1738 |
+
self._build_history_entry(
|
| 1739 |
+
current_ticket,
|
| 1740 |
+
predicted=action.model_dump(exclude_none=True),
|
| 1741 |
+
score=0.0,
|
| 1742 |
+
breakdown={},
|
| 1743 |
+
queue_position=idx + 1,
|
| 1744 |
+
reward=request_reward,
|
| 1745 |
+
reward_kind="operational",
|
| 1746 |
+
tool_result=tool_result,
|
| 1747 |
+
reward_components=reward_components,
|
| 1748 |
+
)
|
| 1749 |
+
)
|
| 1750 |
+
self._state.last_reward_components = reward_components
|
| 1751 |
+
return self._build_observation(task, done=False, reward=request_reward)
|
| 1752 |
+
|
| 1753 |
+
def _handle_defer_action(
|
| 1754 |
+
self,
|
| 1755 |
+
task: dict,
|
| 1756 |
+
current_ticket: HelpdeskTicketRecord,
|
| 1757 |
+
action: HelpdeskTicketAction,
|
| 1758 |
+
idx: int,
|
| 1759 |
+
) -> HelpdeskTicketObservation:
|
| 1760 |
+
submitted_fields = {
|
| 1761 |
+
field
|
| 1762 |
+
for field in ("issue_type", "priority", "assignment_group", "resolution_action")
|
| 1763 |
+
if getattr(action, field) is not None
|
| 1764 |
+
}
|
| 1765 |
+
if submitted_fields:
|
| 1766 |
+
raise ValueError(
|
| 1767 |
+
"defer actions cannot include submit fields: "
|
| 1768 |
+
f"{sorted(submitted_fields)}"
|
| 1769 |
+
)
|
| 1770 |
+
|
| 1771 |
+
ticket_id = current_ticket.ticket_id
|
| 1772 |
+
existing_count = self._defer_count(ticket_id)
|
| 1773 |
+
defer_allowed = (
|
| 1774 |
+
existing_count < MAX_DEFERS_PER_TICKET
|
| 1775 |
+
and idx < len(self._queue) - 1
|
| 1776 |
+
and self._state.current_task_id in {2, 3}
|
| 1777 |
+
)
|
| 1778 |
+
defer_count = existing_count + 1
|
| 1779 |
+
reward = 0.0
|
| 1780 |
+
sla_risk = current_ticket.priority in {"high", "critical"} or self._ticket_mentions_follow_up(
|
| 1781 |
+
current_ticket
|
| 1782 |
+
)
|
| 1783 |
+
moved_ticket = current_ticket
|
| 1784 |
+
|
| 1785 |
+
if defer_allowed:
|
| 1786 |
+
self._state.ticket_defer_counts[ticket_id] = defer_count
|
| 1787 |
+
self._state.deferred_ticket_count += 1
|
| 1788 |
+
if sla_risk:
|
| 1789 |
+
self._state.sla_breach_count += 1
|
| 1790 |
+
moved_ticket = self._escalate_ticket_after_delay(
|
| 1791 |
+
current_ticket,
|
| 1792 |
+
defer_count=defer_count,
|
| 1793 |
+
)
|
| 1794 |
+
elif (
|
| 1795 |
+
self._remaining_tools_for_ticket(current_ticket)
|
| 1796 |
+
or self._request_info_note_for_ticket(current_ticket) is not None
|
| 1797 |
+
or self._ticket_is_capacity_sensitive(current_ticket)
|
| 1798 |
+
):
|
| 1799 |
+
reward = REQUEST_INFO_CONTEXT_COMPLETION_BONUS
|
| 1800 |
+
self._queue.pop(idx)
|
| 1801 |
+
self._queue.append(moved_ticket)
|
| 1802 |
+
self._tickets_by_id[moved_ticket.ticket_id] = moved_ticket
|
| 1803 |
+
self._sync_queue_ticket_ids()
|
| 1804 |
+
self._record_dynamic_queue_event(
|
| 1805 |
+
"defer",
|
| 1806 |
+
ticket_id=ticket_id,
|
| 1807 |
+
defer_count=defer_count,
|
| 1808 |
+
sla_risk=sla_risk,
|
| 1809 |
+
)
|
| 1810 |
+
else:
|
| 1811 |
+
self._state.sla_breach_count += 1
|
| 1812 |
+
self._record_dynamic_queue_event(
|
| 1813 |
+
"defer_denied",
|
| 1814 |
+
ticket_id=ticket_id,
|
| 1815 |
+
defer_count=defer_count,
|
| 1816 |
+
)
|
| 1817 |
+
|
| 1818 |
+
self._state.step_count += 1
|
| 1819 |
+
self._state.last_tool_result = {
|
| 1820 |
+
"action_type": "defer",
|
| 1821 |
+
"ticket_id": ticket_id,
|
| 1822 |
+
"defer_allowed": defer_allowed,
|
| 1823 |
+
"defer_count": defer_count,
|
| 1824 |
+
"sla_risk": sla_risk,
|
| 1825 |
+
}
|
| 1826 |
+
self._state.last_step_reward = reward
|
| 1827 |
+
self._state.reward = reward
|
| 1828 |
+
self._state.done = False
|
| 1829 |
+
reward_components = self._build_reward_components(
|
| 1830 |
+
ticket_score=0.0,
|
| 1831 |
+
field_breakdown={},
|
| 1832 |
+
shaped_step_reward=reward,
|
| 1833 |
+
reward_kind="operational",
|
| 1834 |
+
final_reward=reward,
|
| 1835 |
+
extra_details={
|
| 1836 |
+
"operational_action": "defer",
|
| 1837 |
+
"defer_allowed": defer_allowed,
|
| 1838 |
+
"defer_count": defer_count,
|
| 1839 |
+
"sla_breach_count": self._state.sla_breach_count,
|
| 1840 |
+
},
|
| 1841 |
+
)
|
| 1842 |
+
self._state.history_entries.append(
|
| 1843 |
+
self._build_history_entry(
|
| 1844 |
+
current_ticket,
|
| 1845 |
+
predicted=action.model_dump(exclude_none=True),
|
| 1846 |
+
score=0.0,
|
| 1847 |
+
breakdown={},
|
| 1848 |
+
queue_position=idx + 1,
|
| 1849 |
+
reward=reward,
|
| 1850 |
+
reward_kind="operational",
|
| 1851 |
+
tool_result=self._state.last_tool_result,
|
| 1852 |
+
reward_components=reward_components,
|
| 1853 |
+
)
|
| 1854 |
+
)
|
| 1855 |
+
self._state.last_reward_components = reward_components
|
| 1856 |
+
return self._build_observation(task, done=False, reward=reward)
|
| 1857 |
+
|
| 1858 |
+
def _handle_open_incident_action(
|
| 1859 |
+
self,
|
| 1860 |
+
task: dict,
|
| 1861 |
+
current_ticket: HelpdeskTicketRecord,
|
| 1862 |
+
action: HelpdeskTicketAction,
|
| 1863 |
+
idx: int,
|
| 1864 |
+
) -> HelpdeskTicketObservation:
|
| 1865 |
+
submitted_fields = {
|
| 1866 |
+
field
|
| 1867 |
+
for field in ("issue_type", "priority", "assignment_group", "resolution_action")
|
| 1868 |
+
if getattr(action, field) is not None
|
| 1869 |
+
}
|
| 1870 |
+
if submitted_fields:
|
| 1871 |
+
raise ValueError(
|
| 1872 |
+
"open_incident actions cannot include submit fields: "
|
| 1873 |
+
f"{sorted(submitted_fields)}"
|
| 1874 |
+
)
|
| 1875 |
+
|
| 1876 |
+
useful_incident = (
|
| 1877 |
+
self._state.current_task_id == 3
|
| 1878 |
+
and self._requires_incident(current_ticket)
|
| 1879 |
+
and not self._incident_open_for_ticket(current_ticket)
|
| 1880 |
+
)
|
| 1881 |
+
overflow = 0
|
| 1882 |
+
incident_reward = 0.0
|
| 1883 |
+
if useful_incident:
|
| 1884 |
+
self._state.open_incident_ticket_ids.append(current_ticket.ticket_id)
|
| 1885 |
+
self._state.incident_actions_used += 1
|
| 1886 |
+
overflow = max(0, 1 - self._state.incident_slots_remaining)
|
| 1887 |
+
self._state.incident_slots_remaining = max(
|
| 1888 |
+
0,
|
| 1889 |
+
self._state.incident_slots_remaining - 1,
|
| 1890 |
+
)
|
| 1891 |
+
overflow_penalty = round(overflow * INCIDENT_SLOT_OVERFLOW_PENALTY, 4)
|
| 1892 |
+
if overflow_penalty > 0.0:
|
| 1893 |
+
self._state.planning_penalty_total = round(
|
| 1894 |
+
self._state.planning_penalty_total + overflow_penalty,
|
| 1895 |
+
4,
|
| 1896 |
+
)
|
| 1897 |
+
self._state.planning_penalty_applied = overflow_penalty
|
| 1898 |
+
incident_reward = clamp_open_unit_interval(
|
| 1899 |
+
INCIDENT_OPEN_REWARD - overflow_penalty
|
| 1900 |
+
)
|
| 1901 |
+
self._record_dynamic_queue_event(
|
| 1902 |
+
"open_incident",
|
| 1903 |
+
ticket_id=current_ticket.ticket_id,
|
| 1904 |
+
overflow=overflow,
|
| 1905 |
+
)
|
| 1906 |
+
|
| 1907 |
+
self._state.step_count += 1
|
| 1908 |
+
self._state.last_tool_result = {
|
| 1909 |
+
"action_type": "open_incident",
|
| 1910 |
+
"ticket_id": current_ticket.ticket_id,
|
| 1911 |
+
"incident_open": useful_incident,
|
| 1912 |
+
"incident_slots_remaining": self._state.incident_slots_remaining,
|
| 1913 |
+
"overflow": overflow,
|
| 1914 |
+
}
|
| 1915 |
+
self._state.last_step_reward = incident_reward
|
| 1916 |
+
self._state.reward = incident_reward
|
| 1917 |
+
self._state.done = False
|
| 1918 |
+
reward_components = self._build_reward_components(
|
| 1919 |
+
ticket_score=0.0,
|
| 1920 |
+
field_breakdown={},
|
| 1921 |
+
shaped_step_reward=incident_reward,
|
| 1922 |
+
reward_kind="operational",
|
| 1923 |
+
final_reward=incident_reward,
|
| 1924 |
+
extra_details={
|
| 1925 |
+
"operational_action": "open_incident",
|
| 1926 |
+
"incident_open": useful_incident,
|
| 1927 |
+
"incident_slots_remaining": self._state.incident_slots_remaining,
|
| 1928 |
+
},
|
| 1929 |
+
)
|
| 1930 |
+
self._state.history_entries.append(
|
| 1931 |
+
self._build_history_entry(
|
| 1932 |
+
current_ticket,
|
| 1933 |
+
predicted=action.model_dump(exclude_none=True),
|
| 1934 |
+
score=0.0,
|
| 1935 |
+
breakdown={},
|
| 1936 |
+
queue_position=idx + 1,
|
| 1937 |
+
reward=incident_reward,
|
| 1938 |
+
reward_kind="operational",
|
| 1939 |
+
tool_result=self._state.last_tool_result,
|
| 1940 |
+
reward_components=reward_components,
|
| 1941 |
+
)
|
| 1942 |
+
)
|
| 1943 |
+
self._state.last_reward_components = reward_components
|
| 1944 |
+
return self._build_observation(task, done=False, reward=incident_reward)
|
| 1945 |
+
|
| 1946 |
def _build_ticket_view(self, ticket: HelpdeskTicketRecord) -> dict[str, Any]:
|
| 1947 |
progress = self._tool_progress_for_ticket(ticket)
|
| 1948 |
remaining_tools = progress["remaining_tools"]
|
| 1949 |
used_tools = set(self._used_tools_for_ticket(ticket.ticket_id))
|
| 1950 |
+
operational_actions = progress["recommended_operational_actions"]
|
| 1951 |
ticket_view: dict[str, Any] = {
|
| 1952 |
"ticket_id": ticket.ticket_id,
|
| 1953 |
"title": self._visible_title(ticket),
|
|
|
|
| 1966 |
"investigations_used_for_ticket": progress["revealed_count"],
|
| 1967 |
"recommended_tools": list(remaining_tools),
|
| 1968 |
}
|
| 1969 |
+
ticket_view["operational_context"] = {
|
| 1970 |
+
"request_info_available": self._request_info_note_for_ticket(ticket) is not None,
|
| 1971 |
+
"request_info_used": progress["request_info_used"],
|
| 1972 |
+
"defer_count": self._defer_count(ticket.ticket_id),
|
| 1973 |
+
"incident_recommended": self._requires_incident(ticket),
|
| 1974 |
+
"incident_open": self._incident_open_for_ticket(ticket),
|
| 1975 |
+
"recommended_actions": operational_actions,
|
| 1976 |
+
}
|
| 1977 |
if ticket.ambiguity_note is not None and "lookup_internal_routing_note" not in remaining_tools:
|
| 1978 |
ticket_view["ambiguity_note"] = ticket.ambiguity_note
|
| 1979 |
if (
|
|
|
|
| 1981 |
and "lookup_internal_routing_note" not in remaining_tools
|
| 1982 |
):
|
| 1983 |
ticket_view["planning_note"] = ticket.planning_note
|
| 1984 |
+
if self._request_info_used(ticket.ticket_id):
|
| 1985 |
+
ticket_view["customer_update_note"] = self._request_info_note_for_ticket(ticket)
|
| 1986 |
if ticket.related_ticket_id is not None and "lookup_related_ticket" not in remaining_tools:
|
| 1987 |
ticket_view["related_ticket_id"] = ticket.related_ticket_id
|
| 1988 |
related_ticket = self._tickets_by_id.get(ticket.related_ticket_id)
|
|
|
|
| 1998 |
or "lookup_queue_capacity_forecast" in used_tools
|
| 1999 |
):
|
| 2000 |
ticket_view["routing_options"] = self._routing_options_for_ticket(ticket)
|
| 2001 |
+
if ticket.generated_from_ticket_id is not None:
|
| 2002 |
+
ticket_view["generated_from_ticket_id"] = ticket.generated_from_ticket_id
|
| 2003 |
return ticket_view
|
| 2004 |
|
| 2005 |
def _build_feedback_summary(
|
|
|
|
| 2022 |
parts.append(f"Investigation step used {tool_name or 'a tool'}")
|
| 2023 |
if reward_components and reward_components.get("new_context_revealed"):
|
| 2024 |
parts.append("new context was revealed")
|
| 2025 |
+
elif reward_kind == "operational":
|
| 2026 |
+
operational_action = (
|
| 2027 |
+
reward_components.get("operational_action")
|
| 2028 |
+
if reward_components
|
| 2029 |
+
else predicted.get("action_type")
|
| 2030 |
+
)
|
| 2031 |
+
parts.append(f"Operational step used {operational_action or 'an action'}")
|
| 2032 |
elif penalty_reason is not None:
|
| 2033 |
parts.append(f"Penalty applied: {penalty_reason}")
|
| 2034 |
else:
|
|
|
|
| 2066 |
planning_penalty_total = reward_components.get("planning_penalty_total")
|
| 2067 |
if planning_penalty_total:
|
| 2068 |
parts.append(f"planning_penalty_total={planning_penalty_total:.2f}")
|
| 2069 |
+
incident_gap_penalty = reward_components.get("incident_gap_penalty")
|
| 2070 |
+
if incident_gap_penalty:
|
| 2071 |
+
parts.append(f"incident_gap_penalty={incident_gap_penalty:.2f}")
|
| 2072 |
+
spawned_follow_up_ticket_id = reward_components.get("spawned_follow_up_ticket_id")
|
| 2073 |
+
if spawned_follow_up_ticket_id:
|
| 2074 |
+
parts.append(f"spawned_follow_up={spawned_follow_up_ticket_id}")
|
| 2075 |
|
| 2076 |
return "; ".join(parts)
|
| 2077 |
|
|
|
|
| 2100 |
"score": score,
|
| 2101 |
"breakdown": breakdown,
|
| 2102 |
"queue_position": queue_position,
|
| 2103 |
+
"operational_context": {
|
| 2104 |
+
"request_info_used": progress["request_info_used"],
|
| 2105 |
+
"defer_count": self._defer_count(ticket.ticket_id),
|
| 2106 |
+
"incident_open": self._incident_open_for_ticket(ticket),
|
| 2107 |
+
"recommended_actions": progress["recommended_operational_actions"],
|
| 2108 |
+
},
|
| 2109 |
}
|
| 2110 |
if self._state.current_task_id == 3:
|
| 2111 |
history_entry["capacity_state"] = self._capacity_state_snapshot()
|
|
|
|
| 2122 |
and "lookup_internal_routing_note" not in remaining_tools
|
| 2123 |
):
|
| 2124 |
history_entry["planning_note"] = ticket.planning_note
|
| 2125 |
+
if self._request_info_used(ticket.ticket_id):
|
| 2126 |
+
history_entry["customer_update_note"] = self._request_info_note_for_ticket(ticket)
|
| 2127 |
if ticket.related_ticket_id is not None and "lookup_related_ticket" not in remaining_tools:
|
| 2128 |
history_entry["related_ticket_id"] = ticket.related_ticket_id
|
| 2129 |
related_ticket = self._tickets_by_id.get(ticket.related_ticket_id)
|
|
|
|
| 2148 |
history_entry["tool_result"] = tool_result
|
| 2149 |
if reward_components is not None:
|
| 2150 |
history_entry["reward_components"] = reward_components
|
| 2151 |
+
if ticket.generated_from_ticket_id is not None:
|
| 2152 |
+
history_entry["generated_from_ticket_id"] = ticket.generated_from_ticket_id
|
| 2153 |
if progress["required_tools"]:
|
| 2154 |
history_entry["context_progress"] = {
|
| 2155 |
"hidden_context_remaining": bool(progress["remaining_count"]),
|
|
|
|
| 2209 |
and (ticket_view.get("context_status") or {}).get("hidden_context_remaining")
|
| 2210 |
),
|
| 2211 |
"action_mode": "investigate_or_submit",
|
| 2212 |
+
"available_action_types": self._available_action_types_for_task(),
|
| 2213 |
"average_score_so_far": self._state.average_score_so_far,
|
| 2214 |
"progress_fraction": progress_fraction,
|
| 2215 |
"investigation_penalty_applied": self._state.investigation_penalty_applied,
|
| 2216 |
"planning_penalty_total": self._state.planning_penalty_total,
|
| 2217 |
"planning_penalty_applied": self._state.planning_penalty_applied,
|
| 2218 |
+
"sla_breach_count": self._state.sla_breach_count,
|
| 2219 |
+
"incident_gap_total": self._state.incident_gap_total,
|
| 2220 |
+
"dynamic_queue_events": list(self._state.dynamic_queue_events[-5:]),
|
| 2221 |
}
|
| 2222 |
if self._state.current_task_id == 3:
|
| 2223 |
metadata["capacity_state"] = self._capacity_state_snapshot()
|
|
|
|
| 2241 |
task_name=task["name"],
|
| 2242 |
instructions=task["instructions"],
|
| 2243 |
allowed_fields=list(task["allowed_fields"]),
|
| 2244 |
+
available_action_types=self._available_action_types_for_task(),
|
| 2245 |
+
available_tools=self._available_tools_for_task(),
|
| 2246 |
investigation_budget_remaining=self._state.investigation_budget_remaining,
|
| 2247 |
last_tool_result=self._state.last_tool_result,
|
| 2248 |
current_ticket=ticket_view,
|
server/grader.py
CHANGED
|
@@ -64,13 +64,23 @@ PRIORITY_SCORES = {
|
|
| 64 |
|
| 65 |
|
| 66 |
TASK_WEIGHTS = {
|
| 67 |
-
1: {
|
| 68 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 69 |
3: {
|
| 70 |
-
"issue_type": 0.
|
| 71 |
"priority": 0.20,
|
| 72 |
"assignment_group": 0.25,
|
| 73 |
-
"resolution_action": 0.
|
| 74 |
},
|
| 75 |
}
|
| 76 |
|
|
|
|
| 64 |
|
| 65 |
|
| 66 |
TASK_WEIGHTS = {
|
| 67 |
+
1: {
|
| 68 |
+
"issue_type": 0.40,
|
| 69 |
+
"priority": 0.20,
|
| 70 |
+
"assignment_group": 0.20,
|
| 71 |
+
"resolution_action": 0.20,
|
| 72 |
+
},
|
| 73 |
+
2: {
|
| 74 |
+
"issue_type": 0.32,
|
| 75 |
+
"priority": 0.20,
|
| 76 |
+
"assignment_group": 0.24,
|
| 77 |
+
"resolution_action": 0.24,
|
| 78 |
+
},
|
| 79 |
3: {
|
| 80 |
+
"issue_type": 0.30,
|
| 81 |
"priority": 0.20,
|
| 82 |
"assignment_group": 0.25,
|
| 83 |
+
"resolution_action": 0.25,
|
| 84 |
},
|
| 85 |
}
|
| 86 |
|
server/tasks.py
CHANGED
|
@@ -10,28 +10,39 @@ from vocabulary import TASK_IDS
|
|
| 10 |
TASKS = {
|
| 11 |
1: {
|
| 12 |
"id": 1,
|
| 13 |
-
"name": "
|
| 14 |
"difficulty": "easy",
|
| 15 |
"instructions": (
|
| 16 |
-
"
|
| 17 |
-
"
|
|
|
|
| 18 |
),
|
| 19 |
-
"allowed_fields": [
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 20 |
},
|
| 21 |
2: {
|
| 22 |
"id": 2,
|
| 23 |
-
"name": "
|
| 24 |
"difficulty": "medium",
|
| 25 |
"instructions": (
|
| 26 |
-
"
|
| 27 |
-
"
|
| 28 |
-
"
|
| 29 |
),
|
| 30 |
-
"allowed_fields": [
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 31 |
},
|
| 32 |
3: {
|
| 33 |
"id": 3,
|
| 34 |
-
"name": "
|
| 35 |
"difficulty": "hard",
|
| 36 |
"instructions": (
|
| 37 |
"Perform full helpdesk routing by selecting the best issue type, "
|
|
@@ -40,9 +51,8 @@ TASKS = {
|
|
| 40 |
"forecasts, and planning state when present. "
|
| 41 |
"Some hard tickets intentionally hide decisive routing context until "
|
| 42 |
"you investigate with the available tools, and some hard episodes also "
|
| 43 |
-
"require queue-level capacity planning
|
| 44 |
-
"
|
| 45 |
-
"visible text looks plausible."
|
| 46 |
),
|
| 47 |
"allowed_fields": [
|
| 48 |
"issue_type",
|
|
@@ -61,6 +71,10 @@ PLANNING_ROUTE_UPDATES: dict[str, dict] = {
|
|
| 61 |
"customer-facing charge review as a lower-fidelity fallback while the bug "
|
| 62 |
"investigation continues separately."
|
| 63 |
),
|
|
|
|
|
|
|
|
|
|
|
|
|
| 64 |
"alternate_issue_type": "billing_license",
|
| 65 |
"alternate_assignment_group": "license_ops",
|
| 66 |
"alternate_resolution_action": "assign",
|
|
@@ -82,6 +96,10 @@ PLANNING_ROUTE_UPDATES: dict[str, dict] = {
|
|
| 82 |
"Seat expansion is the preferred route, but license operations can still "
|
| 83 |
"handle the prorating clarification when procurement is the bottleneck."
|
| 84 |
),
|
|
|
|
|
|
|
|
|
|
|
|
|
| 85 |
"alternate_issue_type": "billing_license",
|
| 86 |
"alternate_assignment_group": "license_ops",
|
| 87 |
"alternate_resolution_action": "fulfill",
|
|
@@ -92,6 +110,10 @@ PLANNING_ROUTE_UPDATES: dict[str, dict] = {
|
|
| 92 |
"The request can be treated either as roadmap feedback or as a support "
|
| 93 |
"escalation if the operational impact is emphasized."
|
| 94 |
),
|
|
|
|
|
|
|
|
|
|
|
|
|
| 95 |
"alternate_issue_type": "application_support",
|
| 96 |
"alternate_priority": "high",
|
| 97 |
"alternate_resolution_action": "escalate",
|
|
@@ -138,6 +160,10 @@ PLANNING_ROUTE_UPDATES: dict[str, dict] = {
|
|
| 138 |
"Security scheduling is ideal, but a compliance acknowledgement is still "
|
| 139 |
"acceptable when the security team only needs to confirm the process."
|
| 140 |
),
|
|
|
|
|
|
|
|
|
|
|
|
|
| 141 |
"alternate_issue_type": "security_compliance",
|
| 142 |
"alternate_resolution_action": "acknowledge",
|
| 143 |
"alternate_route_score_multiplier": 0.8,
|
|
@@ -192,6 +218,10 @@ CURATED_EXPANSION_RECORDS: list[dict] = [
|
|
| 192 |
"Security still owns the privileged-access review, but service desk can "
|
| 193 |
"collect chronology and prepare the packet if the security queue is jammed."
|
| 194 |
),
|
|
|
|
|
|
|
|
|
|
|
|
|
| 195 |
"alternate_assignment_group": "service_desk",
|
| 196 |
"alternate_resolution_action": "assign",
|
| 197 |
"alternate_route_score_multiplier": 0.72,
|
|
@@ -253,6 +283,10 @@ CURATED_EXPANSION_RECORDS: list[dict] = [
|
|
| 253 |
"Immediate operational execution is preferred. Procurement can still own the "
|
| 254 |
"approval path if service-desk capacity is already depleted."
|
| 255 |
),
|
|
|
|
|
|
|
|
|
|
|
|
|
| 256 |
"alternate_assignment_group": "procurement",
|
| 257 |
"alternate_resolution_action": "assign",
|
| 258 |
"alternate_route_score_multiplier": 0.8,
|
|
@@ -273,6 +307,11 @@ CURATED_EXPANSION_RECORDS: list[dict] = [
|
|
| 273 |
"Security owns the final unblock decision. If security is saturated, the "
|
| 274 |
"application team can still take the first-response diagnostics path."
|
| 275 |
),
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 276 |
"alternate_issue_type": "application_support",
|
| 277 |
"alternate_priority": "high",
|
| 278 |
"alternate_assignment_group": "application_team",
|
|
@@ -398,6 +437,9 @@ CURATED_EXPANSION_RECORDS: list[dict] = [
|
|
| 398 |
"Application engineering is preferred because they own the evidence. Procurement "
|
| 399 |
"can still coordinate the renewal communication if the evidence queue is saturated."
|
| 400 |
),
|
|
|
|
|
|
|
|
|
|
| 401 |
"alternate_issue_type": "service_request",
|
| 402 |
"alternate_priority": "medium",
|
| 403 |
"alternate_assignment_group": "procurement",
|
|
|
|
| 10 |
TASKS = {
|
| 11 |
1: {
|
| 12 |
"id": 1,
|
| 13 |
+
"name": "Guided Full Routing",
|
| 14 |
"difficulty": "easy",
|
| 15 |
"instructions": (
|
| 16 |
+
"Perform full helpdesk routing by selecting issue type, priority, "
|
| 17 |
+
"assignment group, and resolution action. Easy-task episodes keep the "
|
| 18 |
+
"ticket text mostly visible and focus on grounded single-ticket routing."
|
| 19 |
),
|
| 20 |
+
"allowed_fields": [
|
| 21 |
+
"issue_type",
|
| 22 |
+
"priority",
|
| 23 |
+
"assignment_group",
|
| 24 |
+
"resolution_action",
|
| 25 |
+
],
|
| 26 |
},
|
| 27 |
2: {
|
| 28 |
"id": 2,
|
| 29 |
+
"name": "Contextual Full Routing",
|
| 30 |
"difficulty": "medium",
|
| 31 |
"instructions": (
|
| 32 |
+
"Perform full helpdesk routing with partial observability. Some "
|
| 33 |
+
"tickets hide related-case, requester-history, or clarification "
|
| 34 |
+
"details until you investigate or request more information."
|
| 35 |
),
|
| 36 |
+
"allowed_fields": [
|
| 37 |
+
"issue_type",
|
| 38 |
+
"priority",
|
| 39 |
+
"assignment_group",
|
| 40 |
+
"resolution_action",
|
| 41 |
+
],
|
| 42 |
},
|
| 43 |
3: {
|
| 44 |
"id": 3,
|
| 45 |
+
"name": "Adaptive Queue Routing",
|
| 46 |
"difficulty": "hard",
|
| 47 |
"instructions": (
|
| 48 |
"Perform full helpdesk routing by selecting the best issue type, "
|
|
|
|
| 51 |
"forecasts, and planning state when present. "
|
| 52 |
"Some hard tickets intentionally hide decisive routing context until "
|
| 53 |
"you investigate with the available tools, and some hard episodes also "
|
| 54 |
+
"require queue-level capacity planning, deferrals, incident management, "
|
| 55 |
+
"and recovery from downstream follow-up tickets."
|
|
|
|
| 56 |
),
|
| 57 |
"allowed_fields": [
|
| 58 |
"issue_type",
|
|
|
|
| 71 |
"customer-facing charge review as a lower-fidelity fallback while the bug "
|
| 72 |
"investigation continues separately."
|
| 73 |
),
|
| 74 |
+
"customer_update_note": (
|
| 75 |
+
"Finance confirmed the unexpected charge landed immediately after the "
|
| 76 |
+
"integration outage and wants one accountable owner today."
|
| 77 |
+
),
|
| 78 |
"alternate_issue_type": "billing_license",
|
| 79 |
"alternate_assignment_group": "license_ops",
|
| 80 |
"alternate_resolution_action": "assign",
|
|
|
|
| 96 |
"Seat expansion is the preferred route, but license operations can still "
|
| 97 |
"handle the prorating clarification when procurement is the bottleneck."
|
| 98 |
),
|
| 99 |
+
"customer_update_note": (
|
| 100 |
+
"The requester clarified that the blocker is both the seat increase and "
|
| 101 |
+
"the unexpected prorating language on the quote."
|
| 102 |
+
),
|
| 103 |
"alternate_issue_type": "billing_license",
|
| 104 |
"alternate_assignment_group": "license_ops",
|
| 105 |
"alternate_resolution_action": "fulfill",
|
|
|
|
| 110 |
"The request can be treated either as roadmap feedback or as a support "
|
| 111 |
"escalation if the operational impact is emphasized."
|
| 112 |
),
|
| 113 |
+
"customer_update_note": (
|
| 114 |
+
"The requester says the missing behavior is now blocking a customer "
|
| 115 |
+
"rollout, so this may need operational ownership rather than product triage."
|
| 116 |
+
),
|
| 117 |
"alternate_issue_type": "application_support",
|
| 118 |
"alternate_priority": "high",
|
| 119 |
"alternate_resolution_action": "escalate",
|
|
|
|
| 160 |
"Security scheduling is ideal, but a compliance acknowledgement is still "
|
| 161 |
"acceptable when the security team only needs to confirm the process."
|
| 162 |
),
|
| 163 |
+
"customer_update_note": (
|
| 164 |
+
"The requester clarified they mainly need confirmed ownership and a date "
|
| 165 |
+
"for the review, not the review itself right now."
|
| 166 |
+
),
|
| 167 |
"alternate_issue_type": "security_compliance",
|
| 168 |
"alternate_resolution_action": "acknowledge",
|
| 169 |
"alternate_route_score_multiplier": 0.8,
|
|
|
|
| 218 |
"Security still owns the privileged-access review, but service desk can "
|
| 219 |
"collect chronology and prepare the packet if the security queue is jammed."
|
| 220 |
),
|
| 221 |
+
"customer_update_note": (
|
| 222 |
+
"Executives want a single incident bridge owner before the board packet is sent."
|
| 223 |
+
),
|
| 224 |
+
"incident_recommended": True,
|
| 225 |
"alternate_assignment_group": "service_desk",
|
| 226 |
"alternate_resolution_action": "assign",
|
| 227 |
"alternate_route_score_multiplier": 0.72,
|
|
|
|
| 283 |
"Immediate operational execution is preferred. Procurement can still own the "
|
| 284 |
"approval path if service-desk capacity is already depleted."
|
| 285 |
),
|
| 286 |
+
"customer_update_note": (
|
| 287 |
+
"The customer says the launch rehearsal will fail without a same-day answer."
|
| 288 |
+
),
|
| 289 |
+
"incident_recommended": True,
|
| 290 |
"alternate_assignment_group": "procurement",
|
| 291 |
"alternate_resolution_action": "assign",
|
| 292 |
"alternate_route_score_multiplier": 0.8,
|
|
|
|
| 307 |
"Security owns the final unblock decision. If security is saturated, the "
|
| 308 |
"application team can still take the first-response diagnostics path."
|
| 309 |
),
|
| 310 |
+
"customer_update_note": (
|
| 311 |
+
"The identity-risk lead confirmed users remain locked out and wants incident "
|
| 312 |
+
"coordination while the exception is reviewed."
|
| 313 |
+
),
|
| 314 |
+
"incident_recommended": True,
|
| 315 |
"alternate_issue_type": "application_support",
|
| 316 |
"alternate_priority": "high",
|
| 317 |
"alternate_assignment_group": "application_team",
|
|
|
|
| 437 |
"Application engineering is preferred because they own the evidence. Procurement "
|
| 438 |
"can still coordinate the renewal communication if the evidence queue is saturated."
|
| 439 |
),
|
| 440 |
+
"customer_update_note": (
|
| 441 |
+
"Commercial leadership needs one named owner for the blocked renewal before end of day."
|
| 442 |
+
),
|
| 443 |
"alternate_issue_type": "service_request",
|
| 444 |
"alternate_priority": "medium",
|
| 445 |
"alternate_assignment_group": "procurement",
|
tests/test_api_integration.py
CHANGED
|
@@ -529,8 +529,8 @@ class TestHeuristicInferenceRegression(unittest.TestCase):
|
|
| 529 |
overall_avg = sum(rewards) / len(rewards)
|
| 530 |
self.assertGreaterEqual(
|
| 531 |
overall_avg,
|
| 532 |
-
0.
|
| 533 |
-
f"Overall average reward {overall_avg:.4f} is below the smoke-test floor of 0.
|
| 534 |
)
|
| 535 |
self.assertLessEqual(
|
| 536 |
overall_avg,
|
|
|
|
| 529 |
overall_avg = sum(rewards) / len(rewards)
|
| 530 |
self.assertGreaterEqual(
|
| 531 |
overall_avg,
|
| 532 |
+
0.25,
|
| 533 |
+
f"Overall average reward {overall_avg:.4f} is below the smoke-test floor of 0.25",
|
| 534 |
)
|
| 535 |
self.assertLessEqual(
|
| 536 |
overall_avg,
|
tests/test_competitive_upgrade.py
CHANGED
|
@@ -565,13 +565,16 @@ class TestInvestigationActions(unittest.TestCase):
|
|
| 565 |
|
| 566 |
def test_submit_after_investigation_completes_episode(self) -> None:
|
| 567 |
env, obs, ticket, related = self._make_linked_env()
|
| 568 |
-
env.step(
|
| 569 |
HelpdeskTicketAction(
|
| 570 |
action_type="investigate",
|
| 571 |
tool_name="lookup_related_ticket",
|
| 572 |
tool_target_ticket_id=ticket.related_ticket_id,
|
| 573 |
)
|
| 574 |
)
|
|
|
|
|
|
|
|
|
|
| 575 |
final_obs = env.step(
|
| 576 |
HelpdeskTicketAction(
|
| 577 |
issue_type=ticket.issue_type,
|
|
@@ -752,6 +755,7 @@ class TestTerminalInvalidActionFinalReward(unittest.TestCase):
|
|
| 752 |
|
| 753 |
def test_last_invalid_submit_returns_trajectory_reward_not_zero(self) -> None:
|
| 754 |
from unittest.mock import patch
|
|
|
|
| 755 |
|
| 756 |
dataset = load_dataset()
|
| 757 |
first = dataset[0]
|
|
@@ -764,25 +768,40 @@ class TestTerminalInvalidActionFinalReward(unittest.TestCase):
|
|
| 764 |
"_tickets_by_id",
|
| 765 |
{first.ticket_id: first, second.ticket_id: second},
|
| 766 |
):
|
| 767 |
-
|
| 768 |
-
|
| 769 |
-
|
| 770 |
-
|
| 771 |
-
|
| 772 |
-
|
| 773 |
-
|
| 774 |
-
|
| 775 |
-
|
| 776 |
-
|
| 777 |
-
|
| 778 |
-
|
| 779 |
-
|
| 780 |
-
|
| 781 |
-
|
| 782 |
-
|
| 783 |
-
|
| 784 |
-
|
| 785 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 786 |
|
| 787 |
|
| 788 |
# ---------------------------------------------------------------------------
|
|
|
|
| 565 |
|
| 566 |
def test_submit_after_investigation_completes_episode(self) -> None:
|
| 567 |
env, obs, ticket, related = self._make_linked_env()
|
| 568 |
+
obs = env.step(
|
| 569 |
HelpdeskTicketAction(
|
| 570 |
action_type="investigate",
|
| 571 |
tool_name="lookup_related_ticket",
|
| 572 |
tool_target_ticket_id=ticket.related_ticket_id,
|
| 573 |
)
|
| 574 |
)
|
| 575 |
+
operational_context = (obs.current_ticket or {}).get("operational_context", {})
|
| 576 |
+
if operational_context.get("incident_recommended"):
|
| 577 |
+
obs = env.step(HelpdeskTicketAction(action_type="open_incident"))
|
| 578 |
final_obs = env.step(
|
| 579 |
HelpdeskTicketAction(
|
| 580 |
issue_type=ticket.issue_type,
|
|
|
|
| 755 |
|
| 756 |
def test_last_invalid_submit_returns_trajectory_reward_not_zero(self) -> None:
|
| 757 |
from unittest.mock import patch
|
| 758 |
+
from server.tasks import get_task_definition as base_get_task_definition
|
| 759 |
|
| 760 |
dataset = load_dataset()
|
| 761 |
first = dataset[0]
|
|
|
|
| 768 |
"_tickets_by_id",
|
| 769 |
{first.ticket_id: first, second.ticket_id: second},
|
| 770 |
):
|
| 771 |
+
with patch(
|
| 772 |
+
"server.environment.get_task_definition",
|
| 773 |
+
side_effect=lambda task_id: (
|
| 774 |
+
{
|
| 775 |
+
**base_get_task_definition(task_id),
|
| 776 |
+
"allowed_fields": ["issue_type"],
|
| 777 |
+
}
|
| 778 |
+
if task_id == 1
|
| 779 |
+
else base_get_task_definition(task_id)
|
| 780 |
+
),
|
| 781 |
+
):
|
| 782 |
+
obs = env.reset(seed=0, task_id=1, queue_size=2)
|
| 783 |
+
|
| 784 |
+
tickets_by_id = {first.ticket_id: first, second.ticket_id: second}
|
| 785 |
+
current = tickets_by_id[obs.current_ticket["ticket_id"]]
|
| 786 |
+
obs = env.step(HelpdeskTicketAction(issue_type=current.issue_type))
|
| 787 |
+
self.assertFalse(obs.done)
|
| 788 |
+
|
| 789 |
+
current = tickets_by_id[obs.current_ticket["ticket_id"]]
|
| 790 |
+
final_obs = env.step(
|
| 791 |
+
HelpdeskTicketAction(
|
| 792 |
+
issue_type=current.issue_type,
|
| 793 |
+
priority="medium",
|
| 794 |
+
)
|
| 795 |
+
)
|
| 796 |
+
|
| 797 |
+
self.assertTrue(final_obs.done)
|
| 798 |
+
expected_average = sum(env.state.per_ticket_scores) / len(
|
| 799 |
+
env.state.per_ticket_scores
|
| 800 |
+
)
|
| 801 |
+
self.assertGreater(final_obs.reward, 0.0)
|
| 802 |
+
self.assertAlmostEqual(final_obs.reward, expected_average, places=9)
|
| 803 |
+
self.assertAlmostEqual(env.state.total_reward, expected_average, places=9)
|
| 804 |
+
self.assertAlmostEqual(env.state.reward or 0.0, expected_average, places=9)
|
| 805 |
|
| 806 |
|
| 807 |
# ---------------------------------------------------------------------------
|
tests/test_extra_fields_penalty.py
CHANGED
|
@@ -5,6 +5,7 @@ Validates Requirement 7: Step Validates Action Fields Against Task Contract.
|
|
| 5 |
"""
|
| 6 |
from __future__ import annotations
|
| 7 |
|
|
|
|
| 8 |
import sys
|
| 9 |
import os
|
| 10 |
import unittest
|
|
@@ -41,24 +42,42 @@ def _make_env() -> HelpdeskTicketRoutingEnvironment:
|
|
| 41 |
return HelpdeskTicketRoutingEnvironment()
|
| 42 |
|
| 43 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 44 |
class TestExtraFieldsPenalty(unittest.TestCase):
|
| 45 |
"""Requirement 7: step() rejects actions with fields outside the task's allowed_fields."""
|
| 46 |
|
| 47 |
def test_extra_fields_returns_closed_interval_penalty_reward(self) -> None:
|
| 48 |
"""Task 1 penalties should keep the returned reward inside the unit interval."""
|
| 49 |
env = _make_env()
|
| 50 |
-
|
|
|
|
| 51 |
|
| 52 |
-
|
| 53 |
-
|
| 54 |
|
| 55 |
-
|
| 56 |
-
|
| 57 |
-
|
| 58 |
-
|
| 59 |
-
|
| 60 |
-
|
| 61 |
-
|
| 62 |
|
| 63 |
self.assertIsInstance(penalty_obs, HelpdeskTicketObservation)
|
| 64 |
self.assertGreaterEqual(penalty_obs.reward, 0.0)
|
|
@@ -67,27 +86,29 @@ class TestExtraFieldsPenalty(unittest.TestCase):
|
|
| 67 |
def test_extra_fields_advances_ticket_index(self) -> None:
|
| 68 |
"""Penalty step must advance tickets_processed by 1."""
|
| 69 |
env = _make_env()
|
| 70 |
-
|
| 71 |
-
|
|
|
|
| 72 |
|
| 73 |
-
|
| 74 |
-
|
| 75 |
-
|
| 76 |
-
|
| 77 |
-
|
| 78 |
|
| 79 |
self.assertEqual(penalty_obs.tickets_processed, 1)
|
| 80 |
|
| 81 |
def test_extra_fields_records_score_inside_unit_interval(self) -> None:
|
| 82 |
"""per_ticket_scores must stay in the unit interval after a penalty step."""
|
| 83 |
env = _make_env()
|
| 84 |
-
|
|
|
|
| 85 |
|
| 86 |
-
|
| 87 |
-
|
| 88 |
-
|
| 89 |
-
|
| 90 |
-
|
| 91 |
|
| 92 |
state = env.state
|
| 93 |
self.assertEqual(len(state.per_ticket_scores), 1)
|
|
@@ -97,13 +118,14 @@ class TestExtraFieldsPenalty(unittest.TestCase):
|
|
| 97 |
def test_extra_fields_history_entry_has_penalty_reason(self) -> None:
|
| 98 |
"""History entry for a penalty step must include penalty_reason."""
|
| 99 |
env = _make_env()
|
| 100 |
-
|
|
|
|
| 101 |
|
| 102 |
-
|
| 103 |
-
|
| 104 |
-
|
| 105 |
-
|
| 106 |
-
|
| 107 |
|
| 108 |
self.assertEqual(len(penalty_obs.history), 1)
|
| 109 |
entry = penalty_obs.history[0]
|
|
@@ -115,18 +137,19 @@ class TestExtraFieldsPenalty(unittest.TestCase):
|
|
| 115 |
def test_no_extra_fields_grades_normally(self) -> None:
|
| 116 |
"""When action fields are within allowed_fields, grading proceeds normally (reward != forced 0.0)."""
|
| 117 |
env = _make_env()
|
| 118 |
-
|
|
|
|
| 119 |
|
| 120 |
-
|
| 121 |
-
|
| 122 |
-
|
| 123 |
-
|
| 124 |
-
|
| 125 |
-
|
| 126 |
-
|
| 127 |
|
| 128 |
-
|
| 129 |
-
|
| 130 |
|
| 131 |
# Should be a valid observation; reward may be any value in [0.0, 1.0]
|
| 132 |
self.assertIsInstance(result_obs, HelpdeskTicketObservation)
|
|
@@ -138,16 +161,17 @@ class TestExtraFieldsPenalty(unittest.TestCase):
|
|
| 138 |
def test_action_metadata_is_not_treated_as_extra_field(self) -> None:
|
| 139 |
"""OpenEnv Action metadata should not trigger the extra-fields penalty."""
|
| 140 |
env = _make_env()
|
| 141 |
-
|
| 142 |
-
|
| 143 |
-
|
| 144 |
-
|
| 145 |
-
|
| 146 |
-
|
| 147 |
-
|
| 148 |
-
|
|
|
|
|
|
|
| 149 |
)
|
| 150 |
-
)
|
| 151 |
|
| 152 |
self.assertEqual(len(result_obs.history), 1)
|
| 153 |
self.assertNotIn("penalty_reason", result_obs.history[0])
|
|
@@ -156,42 +180,44 @@ class TestExtraFieldsPenalty(unittest.TestCase):
|
|
| 156 |
def test_extra_fields_no_exception_raised(self) -> None:
|
| 157 |
"""Requirement 7.4: extra fields must not raise an unhandled exception."""
|
| 158 |
env = _make_env()
|
| 159 |
-
|
| 160 |
-
|
| 161 |
-
|
| 162 |
-
|
| 163 |
-
|
| 164 |
-
|
| 165 |
-
|
| 166 |
-
|
| 167 |
-
|
| 168 |
-
|
| 169 |
-
|
| 170 |
-
|
|
|
|
| 171 |
|
| 172 |
self.assertIsInstance(obs, HelpdeskTicketObservation)
|
| 173 |
|
| 174 |
def test_extra_fields_done_flag_set_correctly_on_last_ticket(self) -> None:
|
| 175 |
"""When the penalty step is on the last ticket, done stays True and reward stays episode-level."""
|
| 176 |
env = _make_env()
|
| 177 |
-
|
| 178 |
-
|
| 179 |
-
|
| 180 |
-
|
| 181 |
-
|
| 182 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 183 |
current_ticket_id = obs.current_ticket["ticket_id"]
|
| 184 |
current_ticket = tickets_by_id[current_ticket_id]
|
| 185 |
-
|
| 186 |
-
|
| 187 |
-
|
| 188 |
-
|
| 189 |
-
|
| 190 |
-
action = HelpdeskTicketAction(
|
| 191 |
-
issue_type=current_ticket.issue_type,
|
| 192 |
-
assignment_group=ASSIGNMENT_GROUPS[0], # extra field
|
| 193 |
-
)
|
| 194 |
-
final_obs = env.step(action)
|
| 195 |
|
| 196 |
self.assertTrue(final_obs.done)
|
| 197 |
self.assertGreater(final_obs.reward, 0.0)
|
|
|
|
| 5 |
"""
|
| 6 |
from __future__ import annotations
|
| 7 |
|
| 8 |
+
import contextlib
|
| 9 |
import sys
|
| 10 |
import os
|
| 11 |
import unittest
|
|
|
|
| 42 |
return HelpdeskTicketRoutingEnvironment()
|
| 43 |
|
| 44 |
|
| 45 |
+
def _task_with_issue_type_only(task_id: int) -> dict:
|
| 46 |
+
task = dict(TASKS[task_id])
|
| 47 |
+
if task_id == 1:
|
| 48 |
+
task["allowed_fields"] = ["issue_type"]
|
| 49 |
+
return task
|
| 50 |
+
|
| 51 |
+
|
| 52 |
+
@contextlib.contextmanager
|
| 53 |
+
def _restrict_task_1_fields():
|
| 54 |
+
original_fields = list(TASKS[1]["allowed_fields"])
|
| 55 |
+
TASKS[1]["allowed_fields"] = ["issue_type"]
|
| 56 |
+
try:
|
| 57 |
+
yield
|
| 58 |
+
finally:
|
| 59 |
+
TASKS[1]["allowed_fields"] = original_fields
|
| 60 |
+
|
| 61 |
+
|
| 62 |
class TestExtraFieldsPenalty(unittest.TestCase):
|
| 63 |
"""Requirement 7: step() rejects actions with fields outside the task's allowed_fields."""
|
| 64 |
|
| 65 |
def test_extra_fields_returns_closed_interval_penalty_reward(self) -> None:
|
| 66 |
"""Task 1 penalties should keep the returned reward inside the unit interval."""
|
| 67 |
env = _make_env()
|
| 68 |
+
with _restrict_task_1_fields():
|
| 69 |
+
obs = env.reset(seed=42, task_id=1)
|
| 70 |
|
| 71 |
+
# Task 1 allowed_fields should NOT include assignment_group
|
| 72 |
+
self.assertNotIn("assignment_group", obs.allowed_fields)
|
| 73 |
|
| 74 |
+
# Submit an action with an extra field (assignment_group) not in task 1's allowed_fields
|
| 75 |
+
action = HelpdeskTicketAction(
|
| 76 |
+
issue_type=ISSUE_TYPES[0],
|
| 77 |
+
priority=PRIORITIES[0],
|
| 78 |
+
assignment_group=ASSIGNMENT_GROUPS[0], # extra field
|
| 79 |
+
)
|
| 80 |
+
penalty_obs = env.step(action)
|
| 81 |
|
| 82 |
self.assertIsInstance(penalty_obs, HelpdeskTicketObservation)
|
| 83 |
self.assertGreaterEqual(penalty_obs.reward, 0.0)
|
|
|
|
| 86 |
def test_extra_fields_advances_ticket_index(self) -> None:
|
| 87 |
"""Penalty step must advance tickets_processed by 1."""
|
| 88 |
env = _make_env()
|
| 89 |
+
with _restrict_task_1_fields():
|
| 90 |
+
obs = env.reset(seed=42, task_id=1)
|
| 91 |
+
self.assertEqual(obs.tickets_processed, 0)
|
| 92 |
|
| 93 |
+
action = HelpdeskTicketAction(
|
| 94 |
+
issue_type=ISSUE_TYPES[0],
|
| 95 |
+
assignment_group=ASSIGNMENT_GROUPS[0], # extra field for task 1
|
| 96 |
+
)
|
| 97 |
+
penalty_obs = env.step(action)
|
| 98 |
|
| 99 |
self.assertEqual(penalty_obs.tickets_processed, 1)
|
| 100 |
|
| 101 |
def test_extra_fields_records_score_inside_unit_interval(self) -> None:
|
| 102 |
"""per_ticket_scores must stay in the unit interval after a penalty step."""
|
| 103 |
env = _make_env()
|
| 104 |
+
with _restrict_task_1_fields():
|
| 105 |
+
env.reset(seed=42, task_id=1)
|
| 106 |
|
| 107 |
+
action = HelpdeskTicketAction(
|
| 108 |
+
issue_type=ISSUE_TYPES[0],
|
| 109 |
+
assignment_group=ASSIGNMENT_GROUPS[0], # extra field
|
| 110 |
+
)
|
| 111 |
+
env.step(action)
|
| 112 |
|
| 113 |
state = env.state
|
| 114 |
self.assertEqual(len(state.per_ticket_scores), 1)
|
|
|
|
| 118 |
def test_extra_fields_history_entry_has_penalty_reason(self) -> None:
|
| 119 |
"""History entry for a penalty step must include penalty_reason."""
|
| 120 |
env = _make_env()
|
| 121 |
+
with _restrict_task_1_fields():
|
| 122 |
+
env.reset(seed=42, task_id=1)
|
| 123 |
|
| 124 |
+
action = HelpdeskTicketAction(
|
| 125 |
+
issue_type=ISSUE_TYPES[0],
|
| 126 |
+
assignment_group=ASSIGNMENT_GROUPS[0], # extra field
|
| 127 |
+
)
|
| 128 |
+
penalty_obs = env.step(action)
|
| 129 |
|
| 130 |
self.assertEqual(len(penalty_obs.history), 1)
|
| 131 |
entry = penalty_obs.history[0]
|
|
|
|
| 137 |
def test_no_extra_fields_grades_normally(self) -> None:
|
| 138 |
"""When action fields are within allowed_fields, grading proceeds normally (reward != forced 0.0)."""
|
| 139 |
env = _make_env()
|
| 140 |
+
with _restrict_task_1_fields():
|
| 141 |
+
obs = env.reset(seed=42, task_id=1)
|
| 142 |
|
| 143 |
+
# Build action using only allowed fields
|
| 144 |
+
allowed = obs.allowed_fields
|
| 145 |
+
action_kwargs = {}
|
| 146 |
+
if "issue_type" in allowed:
|
| 147 |
+
action_kwargs["issue_type"] = ISSUE_TYPES[0]
|
| 148 |
+
if "priority" in allowed:
|
| 149 |
+
action_kwargs["priority"] = PRIORITIES[0]
|
| 150 |
|
| 151 |
+
action = HelpdeskTicketAction(**action_kwargs)
|
| 152 |
+
result_obs = env.step(action)
|
| 153 |
|
| 154 |
# Should be a valid observation; reward may be any value in [0.0, 1.0]
|
| 155 |
self.assertIsInstance(result_obs, HelpdeskTicketObservation)
|
|
|
|
| 161 |
def test_action_metadata_is_not_treated_as_extra_field(self) -> None:
|
| 162 |
"""OpenEnv Action metadata should not trigger the extra-fields penalty."""
|
| 163 |
env = _make_env()
|
| 164 |
+
with _restrict_task_1_fields():
|
| 165 |
+
obs = env.reset(seed=42, task_id=1)
|
| 166 |
+
ticket_id = obs.current_ticket["ticket_id"]
|
| 167 |
+
current_ticket = env._tickets_by_id[ticket_id] # noqa: SLF001 - test-only inspection
|
| 168 |
+
|
| 169 |
+
result_obs = env.step(
|
| 170 |
+
HelpdeskTicketAction(
|
| 171 |
+
issue_type=current_ticket.issue_type,
|
| 172 |
+
metadata={},
|
| 173 |
+
)
|
| 174 |
)
|
|
|
|
| 175 |
|
| 176 |
self.assertEqual(len(result_obs.history), 1)
|
| 177 |
self.assertNotIn("penalty_reason", result_obs.history[0])
|
|
|
|
| 180 |
def test_extra_fields_no_exception_raised(self) -> None:
|
| 181 |
"""Requirement 7.4: extra fields must not raise an unhandled exception."""
|
| 182 |
env = _make_env()
|
| 183 |
+
with _restrict_task_1_fields():
|
| 184 |
+
env.reset(seed=42, task_id=1)
|
| 185 |
+
|
| 186 |
+
action = HelpdeskTicketAction(
|
| 187 |
+
issue_type=ISSUE_TYPES[0],
|
| 188 |
+
priority=PRIORITIES[0],
|
| 189 |
+
assignment_group=ASSIGNMENT_GROUPS[0],
|
| 190 |
+
resolution_action=RESOLUTION_ACTIONS[0], # multiple extra fields
|
| 191 |
+
)
|
| 192 |
+
try:
|
| 193 |
+
obs = env.step(action)
|
| 194 |
+
except Exception as exc: # noqa: BLE001
|
| 195 |
+
self.fail(f"step() raised an unexpected exception: {exc}")
|
| 196 |
|
| 197 |
self.assertIsInstance(obs, HelpdeskTicketObservation)
|
| 198 |
|
| 199 |
def test_extra_fields_done_flag_set_correctly_on_last_ticket(self) -> None:
|
| 200 |
"""When the penalty step is on the last ticket, done stays True and reward stays episode-level."""
|
| 201 |
env = _make_env()
|
| 202 |
+
with _restrict_task_1_fields():
|
| 203 |
+
obs = env.reset(seed=42, task_id=1)
|
| 204 |
+
queue_size = obs.queue_size
|
| 205 |
+
tickets_by_id = env._tickets_by_id # noqa: SLF001 - test-only inspection
|
| 206 |
+
|
| 207 |
+
# Process all tickets except the last one normally
|
| 208 |
+
for _ in range(queue_size - 1):
|
| 209 |
+
current_ticket_id = obs.current_ticket["ticket_id"]
|
| 210 |
+
current_ticket = tickets_by_id[current_ticket_id]
|
| 211 |
+
obs = env.step(HelpdeskTicketAction(issue_type=current_ticket.issue_type))
|
| 212 |
+
|
| 213 |
+
# Now trigger penalty on the last ticket
|
| 214 |
current_ticket_id = obs.current_ticket["ticket_id"]
|
| 215 |
current_ticket = tickets_by_id[current_ticket_id]
|
| 216 |
+
action = HelpdeskTicketAction(
|
| 217 |
+
issue_type=current_ticket.issue_type,
|
| 218 |
+
assignment_group=ASSIGNMENT_GROUPS[0], # extra field
|
| 219 |
+
)
|
| 220 |
+
final_obs = env.step(action)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 221 |
|
| 222 |
self.assertTrue(final_obs.done)
|
| 223 |
self.assertGreater(final_obs.reward, 0.0)
|
tests/test_grader_unit.py
CHANGED
|
@@ -16,6 +16,18 @@ from server.grader import (
|
|
| 16 |
from vocabulary import ASSIGNMENT_GROUPS, ISSUE_TYPES, PRIORITIES, RESOLUTION_ACTIONS
|
| 17 |
|
| 18 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 19 |
def _ticket(
|
| 20 |
*,
|
| 21 |
issue_type: str = "billing_license",
|
|
@@ -71,8 +83,24 @@ class GraderUnitTests(unittest.TestCase):
|
|
| 71 |
|
| 72 |
score, breakdown = grade_action(action, ticket, task_id=1)
|
| 73 |
|
| 74 |
-
|
| 75 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 76 |
|
| 77 |
def test_issue_type_scoring_matches_declared_similarity_table_exhaustively(self) -> None:
|
| 78 |
for expected in ISSUE_TYPES:
|
|
@@ -88,9 +116,24 @@ class GraderUnitTests(unittest.TestCase):
|
|
| 88 |
if predicted == expected
|
| 89 |
else ISSUE_TYPE_SIMILARITY.get((predicted, expected), 0.0)
|
| 90 |
)
|
| 91 |
-
|
| 92 |
-
|
| 93 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 94 |
|
| 95 |
def test_unrelated_issue_type_gets_zero_not_fuzzy_credit(self) -> None:
|
| 96 |
ticket = _ticket(issue_type="onboarding")
|
|
@@ -99,7 +142,16 @@ class GraderUnitTests(unittest.TestCase):
|
|
| 99 |
score, breakdown = grade_action(action, ticket, task_id=1)
|
| 100 |
|
| 101 |
self.assertAlmostEqual(score, 0.0)
|
| 102 |
-
self.assertEqual(
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 103 |
|
| 104 |
def test_priority_scoring_uses_defined_proximity_table(self) -> None:
|
| 105 |
ticket = _ticket(priority="critical")
|
|
@@ -109,7 +161,16 @@ class GraderUnitTests(unittest.TestCase):
|
|
| 109 |
|
| 110 |
self.assertAlmostEqual(breakdown["issue_type"], 1.0)
|
| 111 |
self.assertAlmostEqual(breakdown["priority"], 0.6)
|
| 112 |
-
self.assertAlmostEqual(
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 113 |
|
| 114 |
def test_priority_scoring_matches_declared_table_exhaustively(self) -> None:
|
| 115 |
for expected in PRIORITIES:
|
|
@@ -130,11 +191,24 @@ class GraderUnitTests(unittest.TestCase):
|
|
| 130 |
)
|
| 131 |
self.assertEqual(
|
| 132 |
breakdown,
|
| 133 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 134 |
)
|
| 135 |
-
raw_score = 0.6 + 0.4 * priority_score
|
| 136 |
-
expected_task_score = max(0.0, min(1.0, raw_score))
|
| 137 |
-
self.assertAlmostEqual(score, expected_task_score)
|
| 138 |
|
| 139 |
def test_task_2_weights_apply_as_documented(self) -> None:
|
| 140 |
ticket = _ticket(priority="high")
|
|
@@ -142,8 +216,26 @@ class GraderUnitTests(unittest.TestCase):
|
|
| 142 |
|
| 143 |
score, breakdown = grade_action(action, ticket, task_id=2)
|
| 144 |
|
| 145 |
-
self.assertEqual(
|
| 146 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 147 |
|
| 148 |
def test_assignment_group_partial_credit_uses_declared_similarity_table(self) -> None:
|
| 149 |
ticket = _ticket()
|
|
@@ -157,7 +249,16 @@ class GraderUnitTests(unittest.TestCase):
|
|
| 157 |
score, breakdown = grade_action(action, ticket, task_id=3)
|
| 158 |
|
| 159 |
self.assertEqual(breakdown["assignment_group"], 0.55)
|
| 160 |
-
self.assertAlmostEqual(
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 161 |
|
| 162 |
def test_assignment_group_unrelated_miss_stays_zero(self) -> None:
|
| 163 |
ticket = _ticket()
|
|
@@ -171,7 +272,16 @@ class GraderUnitTests(unittest.TestCase):
|
|
| 171 |
score, breakdown = grade_action(action, ticket, task_id=3)
|
| 172 |
|
| 173 |
self.assertEqual(breakdown["assignment_group"], 0.0)
|
| 174 |
-
self.assertAlmostEqual(
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 175 |
|
| 176 |
def test_task_3_weights_apply_as_documented(self) -> None:
|
| 177 |
ticket = _ticket(priority="high")
|
|
@@ -186,14 +296,24 @@ class GraderUnitTests(unittest.TestCase):
|
|
| 186 |
|
| 187 |
self.assertEqual(
|
| 188 |
breakdown,
|
| 189 |
-
|
| 190 |
-
|
| 191 |
-
|
| 192 |
-
|
| 193 |
-
|
| 194 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 195 |
)
|
| 196 |
-
self.assertAlmostEqual(score, 0.65)
|
| 197 |
|
| 198 |
def test_alternate_route_can_win_when_primary_route_is_worse(self) -> None:
|
| 199 |
ticket = HelpdeskTicketRecord(
|
|
@@ -243,7 +363,16 @@ class GraderUnitTests(unittest.TestCase):
|
|
| 243 |
score, breakdown = grade_action(action, ticket, task_id=3)
|
| 244 |
|
| 245 |
self.assertEqual(breakdown["resolution_action"], 0.35)
|
| 246 |
-
self.assertAlmostEqual(
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 247 |
|
| 248 |
def test_resolution_action_unrelated_miss_stays_zero(self) -> None:
|
| 249 |
ticket = _ticket()
|
|
@@ -257,7 +386,16 @@ class GraderUnitTests(unittest.TestCase):
|
|
| 257 |
score, breakdown = grade_action(action, ticket, task_id=3)
|
| 258 |
|
| 259 |
self.assertEqual(breakdown["resolution_action"], 0.0)
|
| 260 |
-
self.assertAlmostEqual(
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 261 |
|
| 262 |
def test_assignment_group_scoring_matches_declared_similarity_table_exhaustively(self) -> None:
|
| 263 |
for expected in ASSIGNMENT_GROUPS:
|
|
@@ -280,16 +418,24 @@ class GraderUnitTests(unittest.TestCase):
|
|
| 280 |
)
|
| 281 |
self.assertEqual(
|
| 282 |
breakdown,
|
| 283 |
-
|
| 284 |
-
|
| 285 |
-
|
| 286 |
-
|
| 287 |
-
|
| 288 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 289 |
)
|
| 290 |
-
raw_score = 0.35 + 0.20 + 0.25 * assignment_group_score + 0.20
|
| 291 |
-
expected_task_score = max(0.0, min(1.0, raw_score))
|
| 292 |
-
self.assertAlmostEqual(score, expected_task_score)
|
| 293 |
|
| 294 |
def test_resolution_action_scoring_matches_declared_similarity_table_exhaustively(self) -> None:
|
| 295 |
for expected in RESOLUTION_ACTIONS:
|
|
@@ -312,16 +458,24 @@ class GraderUnitTests(unittest.TestCase):
|
|
| 312 |
)
|
| 313 |
self.assertEqual(
|
| 314 |
breakdown,
|
| 315 |
-
|
| 316 |
-
|
| 317 |
-
|
| 318 |
-
|
| 319 |
-
|
| 320 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 321 |
)
|
| 322 |
-
raw_score = 0.35 + 0.20 + 0.25 + 0.20 * resolution_action_score
|
| 323 |
-
expected_task_score = max(0.0, min(1.0, raw_score))
|
| 324 |
-
self.assertAlmostEqual(score, expected_task_score)
|
| 325 |
|
| 326 |
def test_partial_credit_tables_never_override_exact_match(self) -> None:
|
| 327 |
for pair, value in ISSUE_TYPE_SIMILARITY.items():
|
|
|
|
| 16 |
from vocabulary import ASSIGNMENT_GROUPS, ISSUE_TYPES, PRIORITIES, RESOLUTION_ACTIONS
|
| 17 |
|
| 18 |
|
| 19 |
+
def _expected_breakdown(task_id: int, **field_scores: float) -> dict[str, float]:
|
| 20 |
+
return {field: field_scores[field] for field in TASK_WEIGHTS[task_id]}
|
| 21 |
+
|
| 22 |
+
|
| 23 |
+
def _expected_task_score(task_id: int, **field_scores: float) -> float:
|
| 24 |
+
raw_score = sum(
|
| 25 |
+
field_scores[field] * TASK_WEIGHTS[task_id][field]
|
| 26 |
+
for field in TASK_WEIGHTS[task_id]
|
| 27 |
+
)
|
| 28 |
+
return max(0.0, min(1.0, raw_score))
|
| 29 |
+
|
| 30 |
+
|
| 31 |
def _ticket(
|
| 32 |
*,
|
| 33 |
issue_type: str = "billing_license",
|
|
|
|
| 83 |
|
| 84 |
score, breakdown = grade_action(action, ticket, task_id=1)
|
| 85 |
|
| 86 |
+
expected_breakdown = _expected_breakdown(
|
| 87 |
+
1,
|
| 88 |
+
issue_type=0.4,
|
| 89 |
+
priority=0.0,
|
| 90 |
+
assignment_group=0.0,
|
| 91 |
+
resolution_action=0.0,
|
| 92 |
+
)
|
| 93 |
+
self.assertEqual(breakdown, expected_breakdown)
|
| 94 |
+
self.assertAlmostEqual(
|
| 95 |
+
score,
|
| 96 |
+
_expected_task_score(
|
| 97 |
+
1,
|
| 98 |
+
issue_type=0.4,
|
| 99 |
+
priority=0.0,
|
| 100 |
+
assignment_group=0.0,
|
| 101 |
+
resolution_action=0.0,
|
| 102 |
+
),
|
| 103 |
+
)
|
| 104 |
|
| 105 |
def test_issue_type_scoring_matches_declared_similarity_table_exhaustively(self) -> None:
|
| 106 |
for expected in ISSUE_TYPES:
|
|
|
|
| 116 |
if predicted == expected
|
| 117 |
else ISSUE_TYPE_SIMILARITY.get((predicted, expected), 0.0)
|
| 118 |
)
|
| 119 |
+
expected_breakdown = _expected_breakdown(
|
| 120 |
+
1,
|
| 121 |
+
issue_type=raw_expected_score,
|
| 122 |
+
priority=0.0,
|
| 123 |
+
assignment_group=0.0,
|
| 124 |
+
resolution_action=0.0,
|
| 125 |
+
)
|
| 126 |
+
self.assertAlmostEqual(
|
| 127 |
+
score,
|
| 128 |
+
_expected_task_score(
|
| 129 |
+
1,
|
| 130 |
+
issue_type=raw_expected_score,
|
| 131 |
+
priority=0.0,
|
| 132 |
+
assignment_group=0.0,
|
| 133 |
+
resolution_action=0.0,
|
| 134 |
+
),
|
| 135 |
+
)
|
| 136 |
+
self.assertEqual(breakdown, expected_breakdown)
|
| 137 |
|
| 138 |
def test_unrelated_issue_type_gets_zero_not_fuzzy_credit(self) -> None:
|
| 139 |
ticket = _ticket(issue_type="onboarding")
|
|
|
|
| 142 |
score, breakdown = grade_action(action, ticket, task_id=1)
|
| 143 |
|
| 144 |
self.assertAlmostEqual(score, 0.0)
|
| 145 |
+
self.assertEqual(
|
| 146 |
+
breakdown,
|
| 147 |
+
_expected_breakdown(
|
| 148 |
+
1,
|
| 149 |
+
issue_type=0.0,
|
| 150 |
+
priority=0.0,
|
| 151 |
+
assignment_group=0.0,
|
| 152 |
+
resolution_action=0.0,
|
| 153 |
+
),
|
| 154 |
+
)
|
| 155 |
|
| 156 |
def test_priority_scoring_uses_defined_proximity_table(self) -> None:
|
| 157 |
ticket = _ticket(priority="critical")
|
|
|
|
| 161 |
|
| 162 |
self.assertAlmostEqual(breakdown["issue_type"], 1.0)
|
| 163 |
self.assertAlmostEqual(breakdown["priority"], 0.6)
|
| 164 |
+
self.assertAlmostEqual(
|
| 165 |
+
score,
|
| 166 |
+
_expected_task_score(
|
| 167 |
+
2,
|
| 168 |
+
issue_type=1.0,
|
| 169 |
+
priority=0.6,
|
| 170 |
+
assignment_group=0.0,
|
| 171 |
+
resolution_action=0.0,
|
| 172 |
+
),
|
| 173 |
+
)
|
| 174 |
|
| 175 |
def test_priority_scoring_matches_declared_table_exhaustively(self) -> None:
|
| 176 |
for expected in PRIORITIES:
|
|
|
|
| 191 |
)
|
| 192 |
self.assertEqual(
|
| 193 |
breakdown,
|
| 194 |
+
_expected_breakdown(
|
| 195 |
+
2,
|
| 196 |
+
issue_type=1.0,
|
| 197 |
+
priority=priority_score,
|
| 198 |
+
assignment_group=0.0,
|
| 199 |
+
resolution_action=0.0,
|
| 200 |
+
),
|
| 201 |
+
)
|
| 202 |
+
self.assertAlmostEqual(
|
| 203 |
+
score,
|
| 204 |
+
_expected_task_score(
|
| 205 |
+
2,
|
| 206 |
+
issue_type=1.0,
|
| 207 |
+
priority=priority_score,
|
| 208 |
+
assignment_group=0.0,
|
| 209 |
+
resolution_action=0.0,
|
| 210 |
+
),
|
| 211 |
)
|
|
|
|
|
|
|
|
|
|
| 212 |
|
| 213 |
def test_task_2_weights_apply_as_documented(self) -> None:
|
| 214 |
ticket = _ticket(priority="high")
|
|
|
|
| 216 |
|
| 217 |
score, breakdown = grade_action(action, ticket, task_id=2)
|
| 218 |
|
| 219 |
+
self.assertEqual(
|
| 220 |
+
breakdown,
|
| 221 |
+
_expected_breakdown(
|
| 222 |
+
2,
|
| 223 |
+
issue_type=1.0,
|
| 224 |
+
priority=0.5,
|
| 225 |
+
assignment_group=0.0,
|
| 226 |
+
resolution_action=0.0,
|
| 227 |
+
),
|
| 228 |
+
)
|
| 229 |
+
self.assertAlmostEqual(
|
| 230 |
+
score,
|
| 231 |
+
_expected_task_score(
|
| 232 |
+
2,
|
| 233 |
+
issue_type=1.0,
|
| 234 |
+
priority=0.5,
|
| 235 |
+
assignment_group=0.0,
|
| 236 |
+
resolution_action=0.0,
|
| 237 |
+
),
|
| 238 |
+
)
|
| 239 |
|
| 240 |
def test_assignment_group_partial_credit_uses_declared_similarity_table(self) -> None:
|
| 241 |
ticket = _ticket()
|
|
|
|
| 249 |
score, breakdown = grade_action(action, ticket, task_id=3)
|
| 250 |
|
| 251 |
self.assertEqual(breakdown["assignment_group"], 0.55)
|
| 252 |
+
self.assertAlmostEqual(
|
| 253 |
+
score,
|
| 254 |
+
_expected_task_score(
|
| 255 |
+
3,
|
| 256 |
+
issue_type=1.0,
|
| 257 |
+
priority=1.0,
|
| 258 |
+
assignment_group=0.55,
|
| 259 |
+
resolution_action=1.0,
|
| 260 |
+
),
|
| 261 |
+
)
|
| 262 |
|
| 263 |
def test_assignment_group_unrelated_miss_stays_zero(self) -> None:
|
| 264 |
ticket = _ticket()
|
|
|
|
| 272 |
score, breakdown = grade_action(action, ticket, task_id=3)
|
| 273 |
|
| 274 |
self.assertEqual(breakdown["assignment_group"], 0.0)
|
| 275 |
+
self.assertAlmostEqual(
|
| 276 |
+
score,
|
| 277 |
+
_expected_task_score(
|
| 278 |
+
3,
|
| 279 |
+
issue_type=1.0,
|
| 280 |
+
priority=1.0,
|
| 281 |
+
assignment_group=0.0,
|
| 282 |
+
resolution_action=1.0,
|
| 283 |
+
),
|
| 284 |
+
)
|
| 285 |
|
| 286 |
def test_task_3_weights_apply_as_documented(self) -> None:
|
| 287 |
ticket = _ticket(priority="high")
|
|
|
|
| 296 |
|
| 297 |
self.assertEqual(
|
| 298 |
breakdown,
|
| 299 |
+
_expected_breakdown(
|
| 300 |
+
3,
|
| 301 |
+
issue_type=1.0,
|
| 302 |
+
priority=0.5,
|
| 303 |
+
assignment_group=0.0,
|
| 304 |
+
resolution_action=1.0,
|
| 305 |
+
),
|
| 306 |
+
)
|
| 307 |
+
self.assertAlmostEqual(
|
| 308 |
+
score,
|
| 309 |
+
_expected_task_score(
|
| 310 |
+
3,
|
| 311 |
+
issue_type=1.0,
|
| 312 |
+
priority=0.5,
|
| 313 |
+
assignment_group=0.0,
|
| 314 |
+
resolution_action=1.0,
|
| 315 |
+
),
|
| 316 |
)
|
|
|
|
| 317 |
|
| 318 |
def test_alternate_route_can_win_when_primary_route_is_worse(self) -> None:
|
| 319 |
ticket = HelpdeskTicketRecord(
|
|
|
|
| 363 |
score, breakdown = grade_action(action, ticket, task_id=3)
|
| 364 |
|
| 365 |
self.assertEqual(breakdown["resolution_action"], 0.35)
|
| 366 |
+
self.assertAlmostEqual(
|
| 367 |
+
score,
|
| 368 |
+
_expected_task_score(
|
| 369 |
+
3,
|
| 370 |
+
issue_type=1.0,
|
| 371 |
+
priority=1.0,
|
| 372 |
+
assignment_group=1.0,
|
| 373 |
+
resolution_action=0.35,
|
| 374 |
+
),
|
| 375 |
+
)
|
| 376 |
|
| 377 |
def test_resolution_action_unrelated_miss_stays_zero(self) -> None:
|
| 378 |
ticket = _ticket()
|
|
|
|
| 386 |
score, breakdown = grade_action(action, ticket, task_id=3)
|
| 387 |
|
| 388 |
self.assertEqual(breakdown["resolution_action"], 0.0)
|
| 389 |
+
self.assertAlmostEqual(
|
| 390 |
+
score,
|
| 391 |
+
_expected_task_score(
|
| 392 |
+
3,
|
| 393 |
+
issue_type=1.0,
|
| 394 |
+
priority=1.0,
|
| 395 |
+
assignment_group=1.0,
|
| 396 |
+
resolution_action=0.0,
|
| 397 |
+
),
|
| 398 |
+
)
|
| 399 |
|
| 400 |
def test_assignment_group_scoring_matches_declared_similarity_table_exhaustively(self) -> None:
|
| 401 |
for expected in ASSIGNMENT_GROUPS:
|
|
|
|
| 418 |
)
|
| 419 |
self.assertEqual(
|
| 420 |
breakdown,
|
| 421 |
+
_expected_breakdown(
|
| 422 |
+
3,
|
| 423 |
+
issue_type=1.0,
|
| 424 |
+
priority=1.0,
|
| 425 |
+
assignment_group=assignment_group_score,
|
| 426 |
+
resolution_action=1.0,
|
| 427 |
+
),
|
| 428 |
+
)
|
| 429 |
+
self.assertAlmostEqual(
|
| 430 |
+
score,
|
| 431 |
+
_expected_task_score(
|
| 432 |
+
3,
|
| 433 |
+
issue_type=1.0,
|
| 434 |
+
priority=1.0,
|
| 435 |
+
assignment_group=assignment_group_score,
|
| 436 |
+
resolution_action=1.0,
|
| 437 |
+
),
|
| 438 |
)
|
|
|
|
|
|
|
|
|
|
| 439 |
|
| 440 |
def test_resolution_action_scoring_matches_declared_similarity_table_exhaustively(self) -> None:
|
| 441 |
for expected in RESOLUTION_ACTIONS:
|
|
|
|
| 458 |
)
|
| 459 |
self.assertEqual(
|
| 460 |
breakdown,
|
| 461 |
+
_expected_breakdown(
|
| 462 |
+
3,
|
| 463 |
+
issue_type=1.0,
|
| 464 |
+
priority=1.0,
|
| 465 |
+
assignment_group=1.0,
|
| 466 |
+
resolution_action=resolution_action_score,
|
| 467 |
+
),
|
| 468 |
+
)
|
| 469 |
+
self.assertAlmostEqual(
|
| 470 |
+
score,
|
| 471 |
+
_expected_task_score(
|
| 472 |
+
3,
|
| 473 |
+
issue_type=1.0,
|
| 474 |
+
priority=1.0,
|
| 475 |
+
assignment_group=1.0,
|
| 476 |
+
resolution_action=resolution_action_score,
|
| 477 |
+
),
|
| 478 |
)
|
|
|
|
|
|
|
|
|
|
| 479 |
|
| 480 |
def test_partial_credit_tables_never_override_exact_match(self) -> None:
|
| 481 |
for pair, value in ISSUE_TYPE_SIMILARITY.items():
|
tests/test_policy_learning.py
CHANGED
|
@@ -171,7 +171,7 @@ class PolicyLearningTests(unittest.TestCase):
|
|
| 171 |
|
| 172 |
self.assertLess(no_summary["terminal_reward"], context_summary["terminal_reward"])
|
| 173 |
self.assertLess(no_summary["normalized_return"], context_summary["normalized_return"])
|
| 174 |
-
self.
|
| 175 |
|
| 176 |
def test_search_policies_selects_adaptive_policy(self) -> None:
|
| 177 |
report = search_policies(
|
|
|
|
| 171 |
|
| 172 |
self.assertLess(no_summary["terminal_reward"], context_summary["terminal_reward"])
|
| 173 |
self.assertLess(no_summary["normalized_return"], context_summary["normalized_return"])
|
| 174 |
+
self.assertGreaterEqual(context_summary["investigation_steps"], 1)
|
| 175 |
|
| 176 |
def test_search_policies_selects_adaptive_policy(self) -> None:
|
| 177 |
report = search_policies(
|
tests/test_tasks_unit.py
CHANGED
|
@@ -23,18 +23,17 @@ class TasksAndDatasetUnitTests(unittest.TestCase):
|
|
| 23 |
self.assertEqual(tuple(TASKS.keys()), TASK_IDS)
|
| 24 |
|
| 25 |
def test_task_allowed_fields_match_expected_ladder(self) -> None:
|
| 26 |
-
|
| 27 |
-
|
| 28 |
-
|
| 29 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 30 |
self.assertEqual(
|
| 31 |
get_task_definition(3)["allowed_fields"],
|
| 32 |
-
|
| 33 |
-
"issue_type",
|
| 34 |
-
"priority",
|
| 35 |
-
"assignment_group",
|
| 36 |
-
"resolution_action",
|
| 37 |
-
],
|
| 38 |
)
|
| 39 |
|
| 40 |
def test_task_difficulty_ladder_is_frozen(self) -> None:
|
|
|
|
| 23 |
self.assertEqual(tuple(TASKS.keys()), TASK_IDS)
|
| 24 |
|
| 25 |
def test_task_allowed_fields_match_expected_ladder(self) -> None:
|
| 26 |
+
expected_fields = [
|
| 27 |
+
"issue_type",
|
| 28 |
+
"priority",
|
| 29 |
+
"assignment_group",
|
| 30 |
+
"resolution_action",
|
| 31 |
+
]
|
| 32 |
+
self.assertEqual(get_task_definition(1)["allowed_fields"], expected_fields)
|
| 33 |
+
self.assertEqual(get_task_definition(2)["allowed_fields"], expected_fields)
|
| 34 |
self.assertEqual(
|
| 35 |
get_task_definition(3)["allowed_fields"],
|
| 36 |
+
expected_fields,
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 37 |
)
|
| 38 |
|
| 39 |
def test_task_difficulty_ladder_is_frozen(self) -> None:
|