Spaces:
Running
Running
Commit ·
3752981
0
Parent(s):
Initial commit
Browse files- KNOWLEDGE.md +258 -0
- LABEL_AUDIT.md +56 -0
- MARCH30_STATUS.md +117 -0
- MENTAL_MODEL.md +155 -0
- PLAN.md +147 -0
- Preparation +0 -0
- ProblemDetails +472 -0
- README.md +258 -0
- ROADMAP.md +339 -0
- __init__.cpython-313.pyc +0 -0
- __init__.py +0 -0
- app.cpython-313.pyc +0 -0
- client.cpython-313.pyc +0 -0
- client.py +28 -0
- data/dataset.json +543 -0
- environment.cpython-313.pyc +0 -0
- grader.cpython-313.pyc +0 -0
- inference.py +276 -0
- models.cpython-313.pyc +0 -0
- models.py +114 -0
- openenv.yaml +59 -0
- pyproject.toml +26 -0
- requirements.txt +6 -0
- reward.cpython-313.pyc +0 -0
- server/Dockerfile +12 -0
- server/app.py +43 -0
- server/environment.py +163 -0
- server/grader.py +103 -0
- server/reward.py +16 -0
- server/tasks.py +60 -0
- studymaterialLinks +16 -0
- tasks.cpython-313.pyc +0 -0
- vocabulary.py +67 -0
KNOWLEDGE.md
ADDED
|
@@ -0,0 +1,258 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# IT Helpdesk Ticket Routing OpenEnv - Knowledge Guide
|
| 2 |
+
|
| 3 |
+
## Part 1: What The Hackathon Wants
|
| 4 |
+
|
| 5 |
+
The hackathon is asking for a real-world environment that an AI agent can learn from through the standard OpenEnv interface.
|
| 6 |
+
|
| 7 |
+
In plain terms, the judges want:
|
| 8 |
+
|
| 9 |
+
1. a real human job, not a toy problem
|
| 10 |
+
2. typed models for actions, observations, and state
|
| 11 |
+
3. `reset()`, `step()`, and `state()`
|
| 12 |
+
4. at least 3 tasks with increasing difficulty
|
| 13 |
+
5. deterministic graders that return scores from `0.0` to `1.0`
|
| 14 |
+
6. a meaningful reward function
|
| 15 |
+
7. a baseline `inference.py`
|
| 16 |
+
8. Docker and deployment readiness
|
| 17 |
+
|
| 18 |
+
## Part 2: Why This Repo Uses IT Helpdesk Ticket Routing
|
| 19 |
+
|
| 20 |
+
IT helpdesk ticket routing is a strong OpenEnv domain because it is:
|
| 21 |
+
|
| 22 |
+
- a real operational workflow
|
| 23 |
+
- naturally multi-step
|
| 24 |
+
- easy to express with typed actions and observations
|
| 25 |
+
- easy to score deterministically
|
| 26 |
+
- useful for evaluating planning, classification, and routing ability in agents
|
| 27 |
+
|
| 28 |
+
## Part 3: The Core Mental Model
|
| 29 |
+
|
| 30 |
+
Think of this environment as a queue of helpdesk tickets.
|
| 31 |
+
|
| 32 |
+
For each ticket, the agent must answer:
|
| 33 |
+
|
| 34 |
+
- what kind of issue is this
|
| 35 |
+
- how urgent is it
|
| 36 |
+
- which resolver group should own it
|
| 37 |
+
- what should happen next
|
| 38 |
+
|
| 39 |
+
The environment shows one ticket at a time. The agent responds with structured fields. The grader scores that response. Then the environment moves to the next ticket.
|
| 40 |
+
|
| 41 |
+
## Part 4: Main Files
|
| 42 |
+
|
| 43 |
+
### `models.py`
|
| 44 |
+
|
| 45 |
+
Defines the typed objects used everywhere:
|
| 46 |
+
|
| 47 |
+
- `HelpdeskTicketRecord`
|
| 48 |
+
- `HelpdeskTicketAction`
|
| 49 |
+
- `HelpdeskTicketObservation`
|
| 50 |
+
- `HelpdeskTicketState`
|
| 51 |
+
|
| 52 |
+
### `server/environment.py`
|
| 53 |
+
|
| 54 |
+
This is the core engine.
|
| 55 |
+
|
| 56 |
+
It:
|
| 57 |
+
|
| 58 |
+
- loads the dataset
|
| 59 |
+
- samples a queue of 3 to 5 tickets
|
| 60 |
+
- tracks progress
|
| 61 |
+
- grades each step
|
| 62 |
+
- computes the final episode reward
|
| 63 |
+
|
| 64 |
+
### `server/grader.py`
|
| 65 |
+
|
| 66 |
+
Contains deterministic scoring logic.
|
| 67 |
+
|
| 68 |
+
It gives:
|
| 69 |
+
|
| 70 |
+
- exact or partial credit for `issue_type`
|
| 71 |
+
- exact or proximity credit for `priority`
|
| 72 |
+
- exact credit for `assignment_group`
|
| 73 |
+
- exact credit for `resolution_action`
|
| 74 |
+
|
| 75 |
+
### `server/reward.py`
|
| 76 |
+
|
| 77 |
+
Contains reward helpers:
|
| 78 |
+
|
| 79 |
+
- per-step reward clamping
|
| 80 |
+
- final trajectory reward calculation
|
| 81 |
+
|
| 82 |
+
### `server/tasks.py`
|
| 83 |
+
|
| 84 |
+
Defines the difficulty ladder:
|
| 85 |
+
|
| 86 |
+
- Task 1: issue type only
|
| 87 |
+
- Task 2: issue type plus priority
|
| 88 |
+
- Task 3: full routing
|
| 89 |
+
|
| 90 |
+
### `server/app.py`
|
| 91 |
+
|
| 92 |
+
Creates the OpenEnv app and exposes a custom `/tasks` route.
|
| 93 |
+
|
| 94 |
+
### `client.py`
|
| 95 |
+
|
| 96 |
+
Typed client used by the inference script.
|
| 97 |
+
|
| 98 |
+
### `inference.py`
|
| 99 |
+
|
| 100 |
+
The baseline agent runner.
|
| 101 |
+
|
| 102 |
+
It can:
|
| 103 |
+
|
| 104 |
+
- use a real LLM through an OpenAI-compatible API
|
| 105 |
+
- or fall back to a keyword heuristic
|
| 106 |
+
|
| 107 |
+
## Part 5: Tasks
|
| 108 |
+
|
| 109 |
+
### Task 1: Issue Type Classification
|
| 110 |
+
|
| 111 |
+
The agent predicts:
|
| 112 |
+
|
| 113 |
+
- `issue_type`
|
| 114 |
+
|
| 115 |
+
### Task 2: Issue Type And Priority
|
| 116 |
+
|
| 117 |
+
The agent predicts:
|
| 118 |
+
|
| 119 |
+
- `issue_type`
|
| 120 |
+
- `priority`
|
| 121 |
+
|
| 122 |
+
### Task 3: Full Ticket Routing
|
| 123 |
+
|
| 124 |
+
The agent predicts:
|
| 125 |
+
|
| 126 |
+
- `issue_type`
|
| 127 |
+
- `priority`
|
| 128 |
+
- `assignment_group`
|
| 129 |
+
- `resolution_action`
|
| 130 |
+
|
| 131 |
+
## Part 6: Ticket Vocabulary
|
| 132 |
+
|
| 133 |
+
### Issue types
|
| 134 |
+
|
| 135 |
+
- `billing_license`
|
| 136 |
+
- `identity_access`
|
| 137 |
+
- `application_support`
|
| 138 |
+
- `service_request`
|
| 139 |
+
- `spam_phishing`
|
| 140 |
+
- `general_inquiry`
|
| 141 |
+
- `security_compliance`
|
| 142 |
+
- `onboarding`
|
| 143 |
+
- `feature_request`
|
| 144 |
+
|
| 145 |
+
### Assignment groups
|
| 146 |
+
|
| 147 |
+
- `license_ops`
|
| 148 |
+
- `service_desk`
|
| 149 |
+
- `application_team`
|
| 150 |
+
- `procurement`
|
| 151 |
+
- `security_team`
|
| 152 |
+
- `onboarding_ops`
|
| 153 |
+
|
| 154 |
+
### Resolution actions
|
| 155 |
+
|
| 156 |
+
- `fulfill`
|
| 157 |
+
- `escalate`
|
| 158 |
+
- `assign`
|
| 159 |
+
- `ignore`
|
| 160 |
+
- `acknowledge`
|
| 161 |
+
|
| 162 |
+
## Part 7: Episode Flow
|
| 163 |
+
|
| 164 |
+
### `reset()`
|
| 165 |
+
|
| 166 |
+
Starts a new episode:
|
| 167 |
+
|
| 168 |
+
1. chooses a task
|
| 169 |
+
2. samples a queue of tickets
|
| 170 |
+
3. resets state
|
| 171 |
+
4. returns the first observation
|
| 172 |
+
|
| 173 |
+
### `step(action)`
|
| 174 |
+
|
| 175 |
+
Processes one ticket:
|
| 176 |
+
|
| 177 |
+
1. grades the action
|
| 178 |
+
2. stores the score
|
| 179 |
+
3. advances the queue index
|
| 180 |
+
4. returns the next ticket or the final reward
|
| 181 |
+
|
| 182 |
+
### `state`
|
| 183 |
+
|
| 184 |
+
Returns the internal state snapshot.
|
| 185 |
+
|
| 186 |
+
## Part 8: Reward Logic
|
| 187 |
+
|
| 188 |
+
Step reward:
|
| 189 |
+
|
| 190 |
+
- just the current ticket score clamped to `[0.0, 1.0]`
|
| 191 |
+
|
| 192 |
+
Final reward:
|
| 193 |
+
|
| 194 |
+
- average of all per-ticket scores
|
| 195 |
+
- minus a small overshoot penalty if too many steps were taken
|
| 196 |
+
|
| 197 |
+
This keeps the signal dense and easy to interpret.
|
| 198 |
+
|
| 199 |
+
## Part 9: Dataset Shape
|
| 200 |
+
|
| 201 |
+
Each ticket record contains:
|
| 202 |
+
|
| 203 |
+
- `ticket_id`
|
| 204 |
+
- `title`
|
| 205 |
+
- `requester`
|
| 206 |
+
- `description`
|
| 207 |
+
- `issue_type`
|
| 208 |
+
- `priority`
|
| 209 |
+
- `assignment_group`
|
| 210 |
+
- `resolution_action`
|
| 211 |
+
- optional `ambiguity_note`
|
| 212 |
+
- optional `related_ticket_id`
|
| 213 |
+
|
| 214 |
+
The current dataset contains 45 tickets.
|
| 215 |
+
|
| 216 |
+
It includes:
|
| 217 |
+
|
| 218 |
+
- straightforward tickets
|
| 219 |
+
- ambiguous tickets
|
| 220 |
+
- follow-up references to earlier tickets
|
| 221 |
+
|
| 222 |
+
## Part 10: Inference Script In Simple Terms
|
| 223 |
+
|
| 224 |
+
`inference.py` is the script that actually "plays" the environment.
|
| 225 |
+
|
| 226 |
+
For each task it:
|
| 227 |
+
|
| 228 |
+
1. connects to the server
|
| 229 |
+
2. resets the environment
|
| 230 |
+
3. reads the current ticket
|
| 231 |
+
4. decides an action
|
| 232 |
+
5. sends the action back
|
| 233 |
+
6. collects scores
|
| 234 |
+
7. prints a summary
|
| 235 |
+
|
| 236 |
+
If LLM credentials are available, it uses an LLM.
|
| 237 |
+
If not, it uses keyword rules.
|
| 238 |
+
|
| 239 |
+
## Part 11: What Still Needs Verification
|
| 240 |
+
|
| 241 |
+
The important next checks are:
|
| 242 |
+
|
| 243 |
+
1. run the server locally
|
| 244 |
+
2. verify the ticket-routing client path works end to end
|
| 245 |
+
3. rerun `inference.py`
|
| 246 |
+
4. record fresh baseline scores
|
| 247 |
+
5. validate Docker and OpenEnv behavior
|
| 248 |
+
|
| 249 |
+
## Part 12: One-Minute Summary
|
| 250 |
+
|
| 251 |
+
If you only remember one thing, remember this:
|
| 252 |
+
|
| 253 |
+
- this repo is now an IT helpdesk ticket router
|
| 254 |
+
- the mechanics are still the same multi-step OpenEnv pattern
|
| 255 |
+
- one ticket is shown at a time
|
| 256 |
+
- the agent predicts structured routing fields
|
| 257 |
+
- the grader gives deterministic partial credit
|
| 258 |
+
- `inference.py` is the baseline agent runner
|
LABEL_AUDIT.md
ADDED
|
@@ -0,0 +1,56 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Label Audit Notes
|
| 2 |
+
|
| 3 |
+
This file records the March 31 and April 1 label-and-grader pass on the Roopal-owned files:
|
| 4 |
+
|
| 5 |
+
- `data/dataset.json`
|
| 6 |
+
- `server/tasks.py`
|
| 7 |
+
- `server/grader.py`
|
| 8 |
+
|
| 9 |
+
## Dataset Decisions
|
| 10 |
+
|
| 11 |
+
### Tightened ambiguity cases
|
| 12 |
+
|
| 13 |
+
- `ticket-022`
|
| 14 |
+
Reworded to make the billing-versus-application ambiguity clearer while keeping the chosen label as `application_support`.
|
| 15 |
+
|
| 16 |
+
- `ticket-027`
|
| 17 |
+
Reworded to make the vendor-offer ambiguity clearer between `general_inquiry` and `service_request`.
|
| 18 |
+
|
| 19 |
+
- `ticket-029`
|
| 20 |
+
Reworded to make the seat-expansion versus prorating ambiguity clearer and changed `resolution_action` from `fulfill` to `assign`.
|
| 21 |
+
|
| 22 |
+
- `ticket-040`
|
| 23 |
+
Reworded to make the feature-gap versus support-issue ambiguity clearer.
|
| 24 |
+
|
| 25 |
+
### Corrected label consistency
|
| 26 |
+
|
| 27 |
+
- `ticket-026`
|
| 28 |
+
Changed from `feature_request` / `application_team` to `general_inquiry` / `service_desk` because it is a thank-you note, not a product change request.
|
| 29 |
+
|
| 30 |
+
## Task Wording Changes
|
| 31 |
+
|
| 32 |
+
The task instructions in `server/tasks.py` were tightened so they now:
|
| 33 |
+
|
| 34 |
+
- sound more like helpdesk triage
|
| 35 |
+
- emphasize choosing the single best label
|
| 36 |
+
- describe operational priority more clearly
|
| 37 |
+
- describe full triage more concretely for Task 3
|
| 38 |
+
|
| 39 |
+
## Grader Changes
|
| 40 |
+
|
| 41 |
+
The grader was polished by:
|
| 42 |
+
|
| 43 |
+
- making task weights explicit in `TASK_WEIGHTS`
|
| 44 |
+
- adding partial-credit pairs for:
|
| 45 |
+
- `application_support` vs `feature_request`
|
| 46 |
+
- `general_inquiry` vs `service_request`
|
| 47 |
+
- keeping the scoring deterministic and task-specific
|
| 48 |
+
|
| 49 |
+
## Intent
|
| 50 |
+
|
| 51 |
+
These edits are meant to improve:
|
| 52 |
+
|
| 53 |
+
- dataset realism
|
| 54 |
+
- label consistency
|
| 55 |
+
- hard-task ambiguity quality
|
| 56 |
+
- reviewability for judges and teammates
|
MARCH30_STATUS.md
ADDED
|
@@ -0,0 +1,117 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# March 30 Status Report
|
| 2 |
+
|
| 3 |
+
This file captures the code checkpoint completed for March 30, 2026 so both Codex sessions can compare against the same source of truth.
|
| 4 |
+
|
| 5 |
+
## Scope Completed
|
| 6 |
+
|
| 7 |
+
The March 30 code checkpoint is complete for the foundational files named in `ROADMAP.md`:
|
| 8 |
+
|
| 9 |
+
- `models.py`
|
| 10 |
+
- `server/tasks.py`
|
| 11 |
+
- `server/grader.py`
|
| 12 |
+
- `server/environment.py`
|
| 13 |
+
|
| 14 |
+
Related supporting files were also aligned:
|
| 15 |
+
|
| 16 |
+
- `client.py`
|
| 17 |
+
- `server/app.py`
|
| 18 |
+
- `inference.py`
|
| 19 |
+
- `vocabulary.py`
|
| 20 |
+
|
| 21 |
+
## What Is Locked
|
| 22 |
+
|
| 23 |
+
### Team and project identity
|
| 24 |
+
|
| 25 |
+
- Team: Hackstreet Boys
|
| 26 |
+
- Members: Roopal Guha Neogi, Suyash Kumar
|
| 27 |
+
- Domain: IT Helpdesk Ticket Routing
|
| 28 |
+
|
| 29 |
+
### Frozen class names
|
| 30 |
+
|
| 31 |
+
- `HelpdeskTicketRecord`
|
| 32 |
+
- `HelpdeskTicketAction`
|
| 33 |
+
- `HelpdeskTicketObservation`
|
| 34 |
+
- `HelpdeskTicketState`
|
| 35 |
+
- `HelpdeskTicketRoutingEnvironment`
|
| 36 |
+
- `HelpdeskTicketEnvClient`
|
| 37 |
+
|
| 38 |
+
### Frozen field names
|
| 39 |
+
|
| 40 |
+
- `ticket_id`
|
| 41 |
+
- `title`
|
| 42 |
+
- `requester`
|
| 43 |
+
- `description`
|
| 44 |
+
- `issue_type`
|
| 45 |
+
- `priority`
|
| 46 |
+
- `assignment_group`
|
| 47 |
+
- `resolution_action`
|
| 48 |
+
- `related_ticket_id`
|
| 49 |
+
|
| 50 |
+
## Code That Exists Now
|
| 51 |
+
|
| 52 |
+
### `vocabulary.py`
|
| 53 |
+
|
| 54 |
+
Shared frozen constants now live in one place:
|
| 55 |
+
|
| 56 |
+
- team metadata
|
| 57 |
+
- environment names
|
| 58 |
+
- issue types
|
| 59 |
+
- priorities
|
| 60 |
+
- assignment groups
|
| 61 |
+
- resolution actions
|
| 62 |
+
- default issue-type mappings used by inference
|
| 63 |
+
|
| 64 |
+
### `models.py`
|
| 65 |
+
|
| 66 |
+
The typed models are defined and the vocabulary is enforced through validators, so unsupported labels should fail fast instead of silently drifting.
|
| 67 |
+
|
| 68 |
+
### `server/tasks.py`
|
| 69 |
+
|
| 70 |
+
All three tasks are defined with locked names, instructions, and allowed fields.
|
| 71 |
+
|
| 72 |
+
### `server/grader.py`
|
| 73 |
+
|
| 74 |
+
Deterministic scoring is in place with:
|
| 75 |
+
|
| 76 |
+
- partial credit for near-miss `issue_type`
|
| 77 |
+
- proximity scoring for `priority`
|
| 78 |
+
- exact match for `assignment_group`
|
| 79 |
+
- exact match for `resolution_action`
|
| 80 |
+
|
| 81 |
+
### `server/environment.py`
|
| 82 |
+
|
| 83 |
+
The environment implements:
|
| 84 |
+
|
| 85 |
+
- queue sampling
|
| 86 |
+
- reset flow
|
| 87 |
+
- step flow
|
| 88 |
+
- state tracking
|
| 89 |
+
- final trajectory reward handoff
|
| 90 |
+
|
| 91 |
+
### `inference.py`
|
| 92 |
+
|
| 93 |
+
The baseline runner is aligned to the locked vocabulary and supports:
|
| 94 |
+
|
| 95 |
+
- LLM mode
|
| 96 |
+
- heuristic mode
|
| 97 |
+
- task loop over all 3 tasks
|
| 98 |
+
|
| 99 |
+
## Expected Agreement For The Other Codex Session
|
| 100 |
+
|
| 101 |
+
Your teammate's Codex should agree on all of the following:
|
| 102 |
+
|
| 103 |
+
1. the schema names above are frozen
|
| 104 |
+
2. the vocabulary now has a single source of truth in `vocabulary.py`
|
| 105 |
+
3. no one should rename labels after this checkpoint
|
| 106 |
+
4. future work should build on these names, not replace them
|
| 107 |
+
|
| 108 |
+
## What Is Not Verified Yet
|
| 109 |
+
|
| 110 |
+
This checkpoint is a code-and-consistency checkpoint, not a runtime-complete checkpoint.
|
| 111 |
+
|
| 112 |
+
Still pending:
|
| 113 |
+
|
| 114 |
+
- local execution
|
| 115 |
+
- heuristic baseline run
|
| 116 |
+
- Docker validation
|
| 117 |
+
- final benchmark numbers
|
MENTAL_MODEL.md
ADDED
|
@@ -0,0 +1,155 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# IT Helpdesk Ticket Routing Mental Model
|
| 2 |
+
|
| 3 |
+
This file is the practical mental model of the repo in its current form.
|
| 4 |
+
|
| 5 |
+
## What The Project Is
|
| 6 |
+
|
| 7 |
+
This repository is an OpenEnv environment for IT helpdesk ticket routing.
|
| 8 |
+
|
| 9 |
+
The environment presents a small queue of tickets. For each ticket, the agent must decide:
|
| 10 |
+
|
| 11 |
+
- issue type
|
| 12 |
+
- priority
|
| 13 |
+
- assignment group
|
| 14 |
+
- resolution action
|
| 15 |
+
|
| 16 |
+
## Main Runtime Flow
|
| 17 |
+
|
| 18 |
+
```text
|
| 19 |
+
inference.py
|
| 20 |
+
|
|
| 21 |
+
v
|
| 22 |
+
client.py <----> server/app.py
|
| 23 |
+
|
|
| 24 |
+
v
|
| 25 |
+
server/environment.py
|
| 26 |
+
| | |
|
| 27 |
+
v v v
|
| 28 |
+
grader.py reward.py tasks.py
|
| 29 |
+
|
|
| 30 |
+
v
|
| 31 |
+
data/dataset.json
|
| 32 |
+
```
|
| 33 |
+
|
| 34 |
+
## Main Files
|
| 35 |
+
|
| 36 |
+
- `models.py`
|
| 37 |
+
Typed models for tickets, actions, observations, and state.
|
| 38 |
+
|
| 39 |
+
- `server/environment.py`
|
| 40 |
+
Main environment engine.
|
| 41 |
+
|
| 42 |
+
- `server/grader.py`
|
| 43 |
+
Deterministic partial-credit scorer.
|
| 44 |
+
|
| 45 |
+
- `server/reward.py`
|
| 46 |
+
Step and trajectory reward helpers.
|
| 47 |
+
|
| 48 |
+
- `server/tasks.py`
|
| 49 |
+
Task definitions and dataset loading.
|
| 50 |
+
|
| 51 |
+
- `client.py`
|
| 52 |
+
Typed client used for multi-step interaction.
|
| 53 |
+
|
| 54 |
+
- `inference.py`
|
| 55 |
+
Baseline runner with LLM mode and heuristic mode.
|
| 56 |
+
|
| 57 |
+
## Task Ladder
|
| 58 |
+
|
| 59 |
+
### Task 1
|
| 60 |
+
|
| 61 |
+
- predict `issue_type`
|
| 62 |
+
|
| 63 |
+
### Task 2
|
| 64 |
+
|
| 65 |
+
- predict `issue_type`
|
| 66 |
+
- predict `priority`
|
| 67 |
+
|
| 68 |
+
### Task 3
|
| 69 |
+
|
| 70 |
+
- predict `issue_type`
|
| 71 |
+
- predict `priority`
|
| 72 |
+
- predict `assignment_group`
|
| 73 |
+
- predict `resolution_action`
|
| 74 |
+
|
| 75 |
+
## Label Vocabulary
|
| 76 |
+
|
| 77 |
+
### Issue types
|
| 78 |
+
|
| 79 |
+
- `billing_license`
|
| 80 |
+
- `identity_access`
|
| 81 |
+
- `application_support`
|
| 82 |
+
- `service_request`
|
| 83 |
+
- `spam_phishing`
|
| 84 |
+
- `general_inquiry`
|
| 85 |
+
- `security_compliance`
|
| 86 |
+
- `onboarding`
|
| 87 |
+
- `feature_request`
|
| 88 |
+
|
| 89 |
+
### Assignment groups
|
| 90 |
+
|
| 91 |
+
- `license_ops`
|
| 92 |
+
- `service_desk`
|
| 93 |
+
- `application_team`
|
| 94 |
+
- `procurement`
|
| 95 |
+
- `security_team`
|
| 96 |
+
- `onboarding_ops`
|
| 97 |
+
|
| 98 |
+
### Resolution actions
|
| 99 |
+
|
| 100 |
+
- `fulfill`
|
| 101 |
+
- `escalate`
|
| 102 |
+
- `assign`
|
| 103 |
+
- `ignore`
|
| 104 |
+
- `acknowledge`
|
| 105 |
+
|
| 106 |
+
## Observation And State
|
| 107 |
+
|
| 108 |
+
The observation exposes:
|
| 109 |
+
|
| 110 |
+
- task metadata
|
| 111 |
+
- the current ticket
|
| 112 |
+
- queue progress counters
|
| 113 |
+
- history
|
| 114 |
+
- reward and done status
|
| 115 |
+
|
| 116 |
+
The state tracks:
|
| 117 |
+
|
| 118 |
+
- current task
|
| 119 |
+
- seed
|
| 120 |
+
- queue ticket IDs
|
| 121 |
+
- current ticket index
|
| 122 |
+
- per-ticket scores
|
| 123 |
+
- total reward
|
| 124 |
+
|
| 125 |
+
## Reward Logic
|
| 126 |
+
|
| 127 |
+
- each step returns the current ticket score
|
| 128 |
+
- the final reward is the average of per-ticket scores
|
| 129 |
+
- a small overshoot penalty exists as a safeguard
|
| 130 |
+
|
| 131 |
+
## Dataset Shape
|
| 132 |
+
|
| 133 |
+
Each record includes:
|
| 134 |
+
|
| 135 |
+
- `ticket_id`
|
| 136 |
+
- `title`
|
| 137 |
+
- `requester`
|
| 138 |
+
- `description`
|
| 139 |
+
- `issue_type`
|
| 140 |
+
- `priority`
|
| 141 |
+
- `assignment_group`
|
| 142 |
+
- `resolution_action`
|
| 143 |
+
- optional `ambiguity_note`
|
| 144 |
+
- optional `related_ticket_id`
|
| 145 |
+
|
| 146 |
+
## Short Version
|
| 147 |
+
|
| 148 |
+
If coming back later, remember this:
|
| 149 |
+
|
| 150 |
+
- the repo is a helpdesk ticket router
|
| 151 |
+
- the architecture is a small OpenEnv stack
|
| 152 |
+
- one ticket is shown at a time
|
| 153 |
+
- the agent predicts structured routing fields
|
| 154 |
+
- the grader gives deterministic partial credit
|
| 155 |
+
- `inference.py` is the baseline agent runner
|
PLAN.md
ADDED
|
@@ -0,0 +1,147 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# IT Helpdesk Ticket Routing OpenEnv - Project Plan
|
| 2 |
+
|
| 3 |
+
## Project Goal
|
| 4 |
+
|
| 5 |
+
Build a polished OpenEnv environment for IT helpdesk ticket routing that satisfies:
|
| 6 |
+
|
| 7 |
+
- real-world utility
|
| 8 |
+
- strong task and grader quality
|
| 9 |
+
- clean environment design
|
| 10 |
+
- OpenEnv spec compliance
|
| 11 |
+
- reproducible baseline inference
|
| 12 |
+
- Docker and Hugging Face deployment readiness
|
| 13 |
+
|
| 14 |
+
## Current Product Definition
|
| 15 |
+
|
| 16 |
+
The environment simulates a helpdesk queue. An agent receives one ticket at a time and predicts:
|
| 17 |
+
|
| 18 |
+
- `issue_type`
|
| 19 |
+
- `priority`
|
| 20 |
+
- `assignment_group`
|
| 21 |
+
- `resolution_action`
|
| 22 |
+
|
| 23 |
+
The project keeps three tasks:
|
| 24 |
+
|
| 25 |
+
1. Issue Type Classification
|
| 26 |
+
2. Issue Type And Priority
|
| 27 |
+
3. Full Ticket Routing
|
| 28 |
+
|
| 29 |
+
## What Must Be True At Submission
|
| 30 |
+
|
| 31 |
+
### Pass or fail requirements
|
| 32 |
+
|
| 33 |
+
- the environment responds correctly
|
| 34 |
+
- OpenEnv metadata is valid
|
| 35 |
+
- `reset()`, `step()`, and `state()` work
|
| 36 |
+
- there are at least 3 tasks
|
| 37 |
+
- graders return scores in `[0.0, 1.0]`
|
| 38 |
+
- `inference.py` runs and prints reproducible results
|
| 39 |
+
- Docker builds and starts cleanly
|
| 40 |
+
|
| 41 |
+
### Scored requirements
|
| 42 |
+
|
| 43 |
+
- the task should clearly feel like real helpdesk work
|
| 44 |
+
- the hard task should require meaningful reasoning
|
| 45 |
+
- partial credit should be useful and deterministic
|
| 46 |
+
- docs should be clear enough for judges to understand quickly
|
| 47 |
+
|
| 48 |
+
## Core Files
|
| 49 |
+
|
| 50 |
+
### Runtime
|
| 51 |
+
|
| 52 |
+
- `models.py`
|
| 53 |
+
- `server/environment.py`
|
| 54 |
+
- `server/grader.py`
|
| 55 |
+
- `server/reward.py`
|
| 56 |
+
- `server/tasks.py`
|
| 57 |
+
- `server/app.py`
|
| 58 |
+
- `client.py`
|
| 59 |
+
- `inference.py`
|
| 60 |
+
|
| 61 |
+
### Data and metadata
|
| 62 |
+
|
| 63 |
+
- `data/dataset.json`
|
| 64 |
+
- `openenv.yaml`
|
| 65 |
+
- `server/Dockerfile`
|
| 66 |
+
- `pyproject.toml`
|
| 67 |
+
- `requirements.txt`
|
| 68 |
+
|
| 69 |
+
### Docs
|
| 70 |
+
|
| 71 |
+
- `README.md`
|
| 72 |
+
- `KNOWLEDGE.md`
|
| 73 |
+
- `MENTAL_MODEL.md`
|
| 74 |
+
|
| 75 |
+
## Technical Priorities
|
| 76 |
+
|
| 77 |
+
### P0
|
| 78 |
+
|
| 79 |
+
1. keep the environment behavior correct
|
| 80 |
+
2. verify the task definitions and graders
|
| 81 |
+
3. make the baseline script reliable
|
| 82 |
+
4. confirm dataset coverage and label consistency
|
| 83 |
+
|
| 84 |
+
### P1
|
| 85 |
+
|
| 86 |
+
1. validate Docker
|
| 87 |
+
2. validate deployment assumptions
|
| 88 |
+
3. record baseline scores
|
| 89 |
+
4. polish docs
|
| 90 |
+
|
| 91 |
+
### P2
|
| 92 |
+
|
| 93 |
+
1. strengthen ticket wording for realism
|
| 94 |
+
2. expand hard-case examples if needed
|
| 95 |
+
3. remove low-signal artifacts from the repo
|
| 96 |
+
|
| 97 |
+
## Quality Checks To Perform
|
| 98 |
+
|
| 99 |
+
### Environment
|
| 100 |
+
|
| 101 |
+
- reset starts a clean episode
|
| 102 |
+
- each step advances the queue correctly
|
| 103 |
+
- the final step returns trajectory reward
|
| 104 |
+
- state reflects the real internal status
|
| 105 |
+
|
| 106 |
+
### Grader
|
| 107 |
+
|
| 108 |
+
- exact matches score `1.0`
|
| 109 |
+
- near misses get partial credit where intended
|
| 110 |
+
- unsupported task IDs fail clearly
|
| 111 |
+
- scores vary across examples
|
| 112 |
+
|
| 113 |
+
### Inference
|
| 114 |
+
|
| 115 |
+
- heuristic mode works without model credentials
|
| 116 |
+
- LLM mode reads `API_BASE_URL`, `MODEL_NAME`, and `HF_TOKEN`
|
| 117 |
+
- output is reproducible when the seed is fixed
|
| 118 |
+
|
| 119 |
+
### Docs
|
| 120 |
+
|
| 121 |
+
- no outdated domain references remain
|
| 122 |
+
- team and project metadata are correct
|
| 123 |
+
- setup and run instructions are accurate
|
| 124 |
+
|
| 125 |
+
## Risks
|
| 126 |
+
|
| 127 |
+
### Runtime risk
|
| 128 |
+
|
| 129 |
+
The repo still needs a proper local execution pass to confirm everything after the latest edits.
|
| 130 |
+
|
| 131 |
+
### Benchmark risk
|
| 132 |
+
|
| 133 |
+
Fresh scores must be generated and then reflected in docs.
|
| 134 |
+
|
| 135 |
+
### Deployment risk
|
| 136 |
+
|
| 137 |
+
Docker and Hugging Face behavior should be validated before the final submission window.
|
| 138 |
+
|
| 139 |
+
## Definition Of Done
|
| 140 |
+
|
| 141 |
+
The project is ready when:
|
| 142 |
+
|
| 143 |
+
1. the environment runs locally end to end
|
| 144 |
+
2. the heuristic baseline runs successfully
|
| 145 |
+
3. Docker build and run both succeed
|
| 146 |
+
4. the docs are clean, current, and submission-ready
|
| 147 |
+
5. the repo clearly presents Hackstreet Boys as the team
|
Preparation
ADDED
|
File without changes
|
ProblemDetails
ADDED
|
@@ -0,0 +1,472 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
Round 1 — Problem Statement
|
| 2 |
+
|
| 3 |
+
The Task
|
| 4 |
+
|
| 5 |
+
Build a complete, real-world OpenEnv environment that an AI agent can learn from through the standard step() / reset() / state() API.
|
| 6 |
+
|
| 7 |
+
Key Requirements at a Glance
|
| 8 |
+
|
| 9 |
+
Must simulate a real-world task (not games or toys)
|
| 10 |
+
|
| 11 |
+
Implement full OpenEnv spec: typed models, step()/reset()/state(), openenv.yaml
|
| 12 |
+
|
| 13 |
+
Minimum 3 tasks with agent graders (easy → medium → hard, scores 0.0–1.0)
|
| 14 |
+
|
| 15 |
+
Meaningful reward function with partial progress signals
|
| 16 |
+
|
| 17 |
+
Baseline inference script with reproducible scores
|
| 18 |
+
|
| 19 |
+
Deploy to Hugging Face Spaces + working Dockerfile
|
| 20 |
+
|
| 21 |
+
README with environment description, action/observation spaces, setup instructions
|
| 22 |
+
|
| 23 |
+
|
| 24 |
+
Real-world task simulation
|
| 25 |
+
|
| 26 |
+
The environment must simulate a task humans actually do. Not games, not toys. Examples: email triage, code review, data cleaning, scheduling, customer support, content moderation.
|
| 27 |
+
|
| 28 |
+
OpenEnv spec compliance
|
| 29 |
+
|
| 30 |
+
Implement the full OpenEnv interface: typed Observation, Action, and Reward Pydantic models. step(action) → returns observation, reward, done, info. reset() → returns initial observation. state() → returns current state. openenv.yaml with metadata. Tested via openenv validate.
|
| 31 |
+
|
| 32 |
+
Minimum 3 tasks with agent graders
|
| 33 |
+
|
| 34 |
+
Each task defines a concrete objective an agent must accomplish, with a programmatic grader that scores performance (0.0–1.0). Tasks should range: easy → medium → hard. Graders must have clear, deterministic success/failure criteria.
|
| 35 |
+
|
| 36 |
+
Meaningful reward function
|
| 37 |
+
|
| 38 |
+
Provides signal over the full trajectory (not just binary end-of-episode). Rewards partial progress toward task completion. Penalizes clearly undesirable behavior (e.g. infinite loops, destructive actions).
|
| 39 |
+
|
| 40 |
+
Baseline inference script
|
| 41 |
+
|
| 42 |
+
Uses the OpenAI API client to run a model against the environment. Reads API credentials from environment variables (OPENAI_API_KEY). Produces a reproducible baseline score on all 3 tasks.
|
| 43 |
+
___________________________________________
|
| 44 |
+
Detailed Requirements
|
| 45 |
+
|
| 46 |
+
Non-Functional Requirements
|
| 47 |
+
|
| 48 |
+
Deploys to a Hugging Face Space
|
| 49 |
+
|
| 50 |
+
Environment must run as a containerized HF Space tagged with openenv.
|
| 51 |
+
|
| 52 |
+
Containerized execution
|
| 53 |
+
|
| 54 |
+
Must include a working Dockerfile. The environment should start cleanly with docker build + docker run.
|
| 55 |
+
|
| 56 |
+
Documentation
|
| 57 |
+
|
| 58 |
+
README must include: environment description and motivation, action and observation space definitions, task descriptions with expected difficulty, setup and usage instructions, baseline scores.
|
| 59 |
+
___________________________________________
|
| 60 |
+
|
| 61 |
+
Parameter
|
| 62 |
+
|
| 63 |
+
Weight
|
| 64 |
+
|
| 65 |
+
Description
|
| 66 |
+
|
| 67 |
+
Real-world utility
|
| 68 |
+
|
| 69 |
+
30%
|
| 70 |
+
|
| 71 |
+
Does the environment model a genuine task? Would someone actually use this to train or evaluate agents?
|
| 72 |
+
|
| 73 |
+
Task & grader quality
|
| 74 |
+
|
| 75 |
+
25%
|
| 76 |
+
|
| 77 |
+
Are tasks well-defined with clear objectives? Do graders accurately and fairly measure success? Meaningful difficulty progression?
|
| 78 |
+
|
| 79 |
+
Environment design
|
| 80 |
+
|
| 81 |
+
20%
|
| 82 |
+
|
| 83 |
+
Clean state management, sensible action/observation spaces, good reward shaping, proper episode boundaries.
|
| 84 |
+
|
| 85 |
+
Code quality & spec compliance
|
| 86 |
+
|
| 87 |
+
15%
|
| 88 |
+
|
| 89 |
+
Follows OpenEnv spec, clean project structure, typed models, documented, tested, Dockerfile works.
|
| 90 |
+
|
| 91 |
+
Creativity & novelty
|
| 92 |
+
|
| 93 |
+
10%
|
| 94 |
+
|
| 95 |
+
Novel problem domain, interesting mechanics, clever reward design, original approach.
|
| 96 |
+
|
| 97 |
+
Scoring Breakdown
|
| 98 |
+
|
| 99 |
+
Real-world utility (30%)
|
| 100 |
+
|
| 101 |
+
• 0–5: Toy/artificial problem with no practical application
|
| 102 |
+
|
| 103 |
+
• 6–15: Valid domain but shallow modeling of the real task
|
| 104 |
+
|
| 105 |
+
• 16–25: Good domain modeling, would be useful for agent evaluation
|
| 106 |
+
|
| 107 |
+
• 26–30: Excellent — fills a real gap, immediate value for the RL/agent community
|
| 108 |
+
|
| 109 |
+
Task & grader quality (25%)
|
| 110 |
+
|
| 111 |
+
• 3+ tasks with difficulty range?
|
| 112 |
+
|
| 113 |
+
• Graders produce scores between 0.0–1.0?
|
| 114 |
+
|
| 115 |
+
• Graders deterministic and reproducible?
|
| 116 |
+
|
| 117 |
+
• Hard task genuinely challenges frontier models?
|
| 118 |
+
|
| 119 |
+
Environment design (20%)
|
| 120 |
+
|
| 121 |
+
• reset() produces clean state?
|
| 122 |
+
|
| 123 |
+
• Action/observation types well-designed and documented?
|
| 124 |
+
|
| 125 |
+
• Reward function provides useful varying signal (not just sparse)?
|
| 126 |
+
|
| 127 |
+
• Episode boundaries sensible?
|
| 128 |
+
|
| 129 |
+
Code quality & spec compliance (15%)
|
| 130 |
+
|
| 131 |
+
• openenv validate passes?
|
| 132 |
+
|
| 133 |
+
• docker build && docker run works?
|
| 134 |
+
|
| 135 |
+
• HF Space deploys and responds?
|
| 136 |
+
|
| 137 |
+
• Baseline script runs and reproduces scores?
|
| 138 |
+
|
| 139 |
+
Creativity & novelty (10%)
|
| 140 |
+
|
| 141 |
+
• Domain we haven’t seen in OpenEnv before?
|
| 142 |
+
|
| 143 |
+
• Reward design has interesting properties?
|
| 144 |
+
|
| 145 |
+
• Clever mechanics that make the environment engaging
|
| 146 |
+
________________________________________
|
| 147 |
+
|
| 148 |
+
Phase 1: Automated Validation
|
| 149 |
+
|
| 150 |
+
Pass/fail gate — HF Space deploys, OpenEnv spec compliance, Dockerfile builds, baseline reproduces, 3+ tasks with graders.
|
| 151 |
+
|
| 152 |
+
Phase 2: Agentic Evaluation
|
| 153 |
+
|
| 154 |
+
Scored — baseline agent re-run, standard Open LLM agent (e.g. Nemotron 3 Super) run against all environments, score variance check.
|
| 155 |
+
|
| 156 |
+
Phase 3: Human Review
|
| 157 |
+
|
| 158 |
+
Top submissions reviewed by Meta and Hugging Face engineers for real-world utility, creativity, and exploit checks.
|
| 159 |
+
|
| 160 |
+
Disqualification Criteria
|
| 161 |
+
|
| 162 |
+
Environment does not deploy or respond
|
| 163 |
+
|
| 164 |
+
Plagiarized or trivially modified existing environments
|
| 165 |
+
|
| 166 |
+
Graders that always return the same score
|
| 167 |
+
|
| 168 |
+
No baseline inference script
|
| 169 |
+
__________________________________________
|
| 170 |
+
|
| 171 |
+
HF Space deploys
|
| 172 |
+
|
| 173 |
+
Automated ping to the Space URL — must return 200 and respond to reset()
|
| 174 |
+
|
| 175 |
+
OpenEnv spec compliance
|
| 176 |
+
|
| 177 |
+
Validate openenv.yaml, typed models, step()/reset()/state() endpoints
|
| 178 |
+
|
| 179 |
+
Dockerfile builds
|
| 180 |
+
|
| 181 |
+
Automated docker build on the submitted repo
|
| 182 |
+
|
| 183 |
+
Baseline reproduces
|
| 184 |
+
|
| 185 |
+
Run the submitted inference script — must complete without error and produce scores
|
| 186 |
+
|
| 187 |
+
3+ tasks with graders
|
| 188 |
+
|
| 189 |
+
Enumerate tasks, run each grader, verify scores in 0.0–1.0 range
|
| 190 |
+
|
| 191 |
+
Additional Instructions
|
| 192 |
+
|
| 193 |
+
Before submitting, ensure the following variables are defined in your environment configuration:
|
| 194 |
+
|
| 195 |
+
API_BASE_URL The API endpoint for the LLM.
|
| 196 |
+
|
| 197 |
+
MODEL_NAME The model identifier to use for inference.
|
| 198 |
+
|
| 199 |
+
HF_TOKEN Your Hugging Face / API key.
|
| 200 |
+
|
| 201 |
+
The inference script must be named `inference.py` and placed in the root directory of the project
|
| 202 |
+
|
| 203 |
+
Participants must use OpenAI Client for all LLM calls using above variables
|
| 204 |
+
|
| 205 |
+
Infra Restrictions
|
| 206 |
+
|
| 207 |
+
Runtime of inference script should be less than 20min
|
| 208 |
+
|
| 209 |
+
Make sure your env and inference can run on a machine with vcpu=2, memory=8gb
|
| 210 |
+
|
| 211 |
+
Validator
|
| 212 |
+
|
| 213 |
+
Run the pre-submission validation script before submitting
|
| 214 |
+
|
| 215 |
+
__________________________________________
|
| 216 |
+
SAMPLE INFERENCE SCRIPT:
|
| 217 |
+
________________________
|
| 218 |
+
Inference Script Example
|
| 219 |
+
===================================
|
| 220 |
+
MANDATORY
|
| 221 |
+
- Before submitting, ensure the following variables are defined in your environment configuration:
|
| 222 |
+
API_BASE_URL The API endpoint for the LLM.
|
| 223 |
+
MODEL_NAME The model identifier to use for inference.
|
| 224 |
+
HF_TOKEN Your Hugging Face / API key.
|
| 225 |
+
|
| 226 |
+
- The inference script must be named `inference.py` and placed in the root directory of the project
|
| 227 |
+
- Participants must use OpenAI Client for all LLM calls using above variables
|
| 228 |
+
"""
|
| 229 |
+
|
| 230 |
+
import os
|
| 231 |
+
import re
|
| 232 |
+
import base64
|
| 233 |
+
import textwrap
|
| 234 |
+
from io import BytesIO
|
| 235 |
+
from typing import List, Optional, Dict
|
| 236 |
+
|
| 237 |
+
from openai import OpenAI
|
| 238 |
+
import numpy as np
|
| 239 |
+
from PIL import Image
|
| 240 |
+
|
| 241 |
+
from browsergym_env import BrowserGymAction, BrowserGymEnv
|
| 242 |
+
|
| 243 |
+
API_BASE_URL = os.getenv("API_BASE_URL") // "https://router.huggingface.co/v1"
|
| 244 |
+
API_KEY = os.getenv("HF_TOKEN") or os.getenv("API_KEY")
|
| 245 |
+
MODEL_NAME = os.getenv("MODEL_NAME")
|
| 246 |
+
MAX_STEPS = 8
|
| 247 |
+
MAX_DOM_CHARS = 3500
|
| 248 |
+
TEMPERATURE = 0.2
|
| 249 |
+
MAX_TOKENS = 200
|
| 250 |
+
FALLBACK_ACTION = "noop()"
|
| 251 |
+
|
| 252 |
+
DEBUG = True
|
| 253 |
+
ACTION_PREFIX_RE = re.compile(
|
| 254 |
+
r"^(action|next action)\s*[:\-]\s*",
|
| 255 |
+
re.IGNORECASE,
|
| 256 |
+
)
|
| 257 |
+
ACTION_PATTERN = re.compile(r"[A-Za-z_]+\s*\(.*\)", re.DOTALL)
|
| 258 |
+
|
| 259 |
+
|
| 260 |
+
SYSTEM_PROMPT = textwrap.dedent(
|
| 261 |
+
"""
|
| 262 |
+
You control a web browser through BrowserGym.
|
| 263 |
+
Reply with exactly one action string.
|
| 264 |
+
The action must be a valid BrowserGym command such as:
|
| 265 |
+
- noop()
|
| 266 |
+
- click('<BID>')
|
| 267 |
+
- type('selector', 'text to enter')
|
| 268 |
+
- fill('selector', 'text to enter')
|
| 269 |
+
- send_keys('Enter')
|
| 270 |
+
- scroll('down')
|
| 271 |
+
Use single quotes around string arguments.
|
| 272 |
+
When clicking, use the BrowserGym element IDs (BIDs) listed in the user message.
|
| 273 |
+
If you are unsure, respond with noop().
|
| 274 |
+
Do not include explanations or additional text.
|
| 275 |
+
"""
|
| 276 |
+
).strip()
|
| 277 |
+
|
| 278 |
+
|
| 279 |
+
def build_history_lines(history: List[str]) -> str:
|
| 280 |
+
if not history:
|
| 281 |
+
return "None"
|
| 282 |
+
return "\n".join(history[-4:])
|
| 283 |
+
|
| 284 |
+
|
| 285 |
+
def extract_screenshot_uri(observation) -> Optional[str]:
|
| 286 |
+
if observation.screenshot is None:
|
| 287 |
+
return None
|
| 288 |
+
screen_array = np.array(observation.screenshot, dtype=np.uint8)
|
| 289 |
+
image = Image.fromarray(screen_array)
|
| 290 |
+
buffer = BytesIO()
|
| 291 |
+
image.save(buffer, format="PNG")
|
| 292 |
+
buffer.seek(0)
|
| 293 |
+
data_uri = base64.b64encode(buffer.read()).decode("utf-8")
|
| 294 |
+
return f"data:image/png;base64,{data_uri}"
|
| 295 |
+
|
| 296 |
+
|
| 297 |
+
def extract_clickable_elements(observation) -> List[Dict[str, str]]:
|
| 298 |
+
"""Collect BrowserGym element IDs that can be clicked."""
|
| 299 |
+
|
| 300 |
+
metadata = getattr(observation, "metadata", {}) or {}
|
| 301 |
+
obs_dict = metadata.get("browsergym_obs", {}) or {}
|
| 302 |
+
extra_props = obs_dict.get("extra_element_properties", {}) or {}
|
| 303 |
+
|
| 304 |
+
clickables: List[Dict[str, str]] = []
|
| 305 |
+
for bid, props in extra_props.items():
|
| 306 |
+
if not props.get("clickable"):
|
| 307 |
+
continue
|
| 308 |
+
|
| 309 |
+
bbox = props.get("bbox") or []
|
| 310 |
+
bbox_str = ", ".join(bbox) if bbox else "?"
|
| 311 |
+
clickables.append(
|
| 312 |
+
{
|
| 313 |
+
"bid": str(bid),
|
| 314 |
+
"bbox": bbox_str,
|
| 315 |
+
}
|
| 316 |
+
)
|
| 317 |
+
|
| 318 |
+
# Keep a stable ordering for readability
|
| 319 |
+
clickables.sort(key=lambda item: item["bid"])
|
| 320 |
+
return clickables
|
| 321 |
+
|
| 322 |
+
|
| 323 |
+
def build_user_prompt(step: int, observation, history: List[str]) -> str:
|
| 324 |
+
goal = observation.goal or "(not provided)"
|
| 325 |
+
url = observation.url or "(unknown)"
|
| 326 |
+
error_note = "Yes" if observation.last_action_error else "No"
|
| 327 |
+
|
| 328 |
+
clickables = extract_clickable_elements(observation)
|
| 329 |
+
if clickables:
|
| 330 |
+
actions_hint = "\n".join(
|
| 331 |
+
f" - {item['bid']} (bbox: {item['bbox']})" for item in clickables
|
| 332 |
+
)
|
| 333 |
+
else:
|
| 334 |
+
actions_hint = " (none detected)"
|
| 335 |
+
|
| 336 |
+
prompt = textwrap.dedent(
|
| 337 |
+
f"""
|
| 338 |
+
Step: {step}
|
| 339 |
+
Goal: {goal}
|
| 340 |
+
Current URL: {url}
|
| 341 |
+
Previous steps:
|
| 342 |
+
{build_history_lines(history)}
|
| 343 |
+
Last action error: {error_note}
|
| 344 |
+
Available clickable element IDs: {actions_hint}
|
| 345 |
+
Reply with exactly one BrowserGym action string.
|
| 346 |
+
"""
|
| 347 |
+
).strip()
|
| 348 |
+
return prompt
|
| 349 |
+
|
| 350 |
+
|
| 351 |
+
def parse_model_action(response_text: str) -> str:
|
| 352 |
+
if not response_text:
|
| 353 |
+
return FALLBACK_ACTION
|
| 354 |
+
|
| 355 |
+
# Prefer the first line that looks like an action string
|
| 356 |
+
lines = response_text.splitlines()
|
| 357 |
+
for raw_line in lines:
|
| 358 |
+
line = raw_line.strip()
|
| 359 |
+
if not line:
|
| 360 |
+
continue
|
| 361 |
+
line = ACTION_PREFIX_RE.sub("", line)
|
| 362 |
+
match = ACTION_PATTERN.search(line)
|
| 363 |
+
if match:
|
| 364 |
+
action = match.group(0).strip()
|
| 365 |
+
# Collapse internal whitespace
|
| 366 |
+
action = re.sub(r"\s+", " ", action)
|
| 367 |
+
# If the model tried to click by natural-language description while we
|
| 368 |
+
# only exposed numeric BrowserGym IDs, fallback to the single detected ID.
|
| 369 |
+
return action
|
| 370 |
+
|
| 371 |
+
# Fall back to searching the whole response
|
| 372 |
+
match = ACTION_PATTERN.search(response_text)
|
| 373 |
+
if match:
|
| 374 |
+
action = match.group(0).strip()
|
| 375 |
+
action = re.sub(r"\s+", " ", action)
|
| 376 |
+
return action
|
| 377 |
+
|
| 378 |
+
return FALLBACK_ACTION
|
| 379 |
+
|
| 380 |
+
|
| 381 |
+
def main() -> None:
|
| 382 |
+
client = OpenAI(base_url=API_BASE_URL, api_key=API_KEY)
|
| 383 |
+
|
| 384 |
+
env = BrowserGymEnv.from_docker_image(
|
| 385 |
+
image="browsergym-env:latest",
|
| 386 |
+
env_vars={
|
| 387 |
+
"BROWSERGYM_BENCHMARK": "miniwob",
|
| 388 |
+
"BROWSERGYM_TASK_NAME": "click-test",
|
| 389 |
+
},
|
| 390 |
+
)
|
| 391 |
+
|
| 392 |
+
history: List[str] = []
|
| 393 |
+
|
| 394 |
+
try:
|
| 395 |
+
result = env.reset()
|
| 396 |
+
observation = result.observation
|
| 397 |
+
print(f"Episode goal: {observation.goal}")
|
| 398 |
+
|
| 399 |
+
for step in range(1, MAX_STEPS + 1):
|
| 400 |
+
if result.done:
|
| 401 |
+
print("Environment signalled done. Stopping early.")
|
| 402 |
+
break
|
| 403 |
+
|
| 404 |
+
user_prompt = build_user_prompt(step, observation, history)
|
| 405 |
+
user_content = [{"type": "text", "text": user_prompt}]
|
| 406 |
+
screenshot_uri = extract_screenshot_uri(observation)
|
| 407 |
+
if screenshot_uri:
|
| 408 |
+
user_content.append(
|
| 409 |
+
{
|
| 410 |
+
"type": "image_url",
|
| 411 |
+
"image_url": {"url": screenshot_uri},
|
| 412 |
+
}
|
| 413 |
+
)
|
| 414 |
+
|
| 415 |
+
messages = [
|
| 416 |
+
{
|
| 417 |
+
"role": "system",
|
| 418 |
+
"content": [{"type": "text", "text": SYSTEM_PROMPT}],
|
| 419 |
+
},
|
| 420 |
+
{
|
| 421 |
+
"role": "user",
|
| 422 |
+
"content": user_content,
|
| 423 |
+
},
|
| 424 |
+
]
|
| 425 |
+
|
| 426 |
+
try:
|
| 427 |
+
completion = client.chat.completions.create(
|
| 428 |
+
model=MODEL_NAME,
|
| 429 |
+
messages=messages,
|
| 430 |
+
temperature=TEMPERATURE,
|
| 431 |
+
max_tokens=MAX_TOKENS,
|
| 432 |
+
stream=False,
|
| 433 |
+
)
|
| 434 |
+
response_text = completion.choices[0].message.content or ""
|
| 435 |
+
# pylint: disable=broad-except
|
| 436 |
+
except Exception as exc: # noqa: BLE001
|
| 437 |
+
failure_msg = f"Model request failed ({exc}). Using fallback action."
|
| 438 |
+
print(failure_msg)
|
| 439 |
+
response_text = FALLBACK_ACTION
|
| 440 |
+
|
| 441 |
+
action_str = parse_model_action(response_text)
|
| 442 |
+
print(f"Step {step}: model suggested -> {action_str}")
|
| 443 |
+
|
| 444 |
+
result = env.step(BrowserGymAction(action_str=action_str))
|
| 445 |
+
observation = result.observation
|
| 446 |
+
|
| 447 |
+
reward = result.reward or 0.0
|
| 448 |
+
error_flag = " ERROR" if observation.last_action_error else ""
|
| 449 |
+
history_line = (
|
| 450 |
+
f"Step {step}: {action_str} -> reward {reward:+.2f}{error_flag}"
|
| 451 |
+
)
|
| 452 |
+
history.append(history_line)
|
| 453 |
+
print(
|
| 454 |
+
" Reward: "
|
| 455 |
+
f"{reward:+.2f} | Done: {result.done} | Last action error: "
|
| 456 |
+
f"{observation.last_action_error}"
|
| 457 |
+
)
|
| 458 |
+
|
| 459 |
+
if result.done:
|
| 460 |
+
print("Episode complete.")
|
| 461 |
+
break
|
| 462 |
+
|
| 463 |
+
else:
|
| 464 |
+
print(f"Reached max steps ({MAX_STEPS}).")
|
| 465 |
+
|
| 466 |
+
finally:
|
| 467 |
+
env.close()
|
| 468 |
+
|
| 469 |
+
|
| 470 |
+
if __name__ == "__main__":
|
| 471 |
+
main()
|
| 472 |
+
____________________________________
|
README.md
ADDED
|
@@ -0,0 +1,258 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# IT Helpdesk Ticket Routing OpenEnv
|
| 2 |
+
|
| 3 |
+
> Meta PyTorch OpenEnv Hackathon - Round 1 Submission
|
| 4 |
+
> Team Hackstreet Boys - Roopal Guha Neogi, Suyash Kumar
|
| 5 |
+
|
| 6 |
+
A deterministic, multi-step IT helpdesk ticket routing environment built on the OpenEnv framework. An AI agent receives a small queue of helpdesk tickets and must classify the issue type, estimate priority, assign the correct resolver group, and choose the best next action.
|
| 7 |
+
|
| 8 |
+
## Why IT Helpdesk Ticket Routing?
|
| 9 |
+
|
| 10 |
+
IT service desks do this work every day:
|
| 11 |
+
|
| 12 |
+
- read a newly created ticket
|
| 13 |
+
- decide what kind of issue it is
|
| 14 |
+
- judge urgency
|
| 15 |
+
- route it to the right team
|
| 16 |
+
- decide whether to fulfill, escalate, assign, ignore, or acknowledge it
|
| 17 |
+
|
| 18 |
+
This makes the domain:
|
| 19 |
+
|
| 20 |
+
- genuinely real-world
|
| 21 |
+
- easy to evaluate deterministically
|
| 22 |
+
- naturally multi-step
|
| 23 |
+
- well aligned with enterprise support and agent-routing workflows
|
| 24 |
+
|
| 25 |
+
## Architecture
|
| 26 |
+
|
| 27 |
+
```text
|
| 28 |
+
inference.py
|
| 29 |
+
|
|
| 30 |
+
v
|
| 31 |
+
client.py <----> server/app.py
|
| 32 |
+
|
|
| 33 |
+
v
|
| 34 |
+
server/environment.py
|
| 35 |
+
| | |
|
| 36 |
+
v v v
|
| 37 |
+
grader.py reward.py tasks.py
|
| 38 |
+
|
|
| 39 |
+
v
|
| 40 |
+
data/dataset.json
|
| 41 |
+
```
|
| 42 |
+
|
| 43 |
+
Key architectural detail:
|
| 44 |
+
|
| 45 |
+
- the environment is designed as a multi-step ticket queue
|
| 46 |
+
- the client path is used for persistent episode flow
|
| 47 |
+
- the environment still follows the standard OpenEnv `reset()`, `step()`, and `state()` interface
|
| 48 |
+
|
| 49 |
+
## Tasks
|
| 50 |
+
|
| 51 |
+
| ID | Name | Difficulty | Fields Required | Description |
|
| 52 |
+
|----|------|------------|-----------------|-------------|
|
| 53 |
+
| 1 | Issue Type Classification | Easy | `issue_type` | Classify the ticket into the correct IT issue type |
|
| 54 |
+
| 2 | Issue Type And Priority | Medium | `issue_type`, `priority` | Classify the issue and estimate urgency |
|
| 55 |
+
| 3 | Full Ticket Routing | Hard | `issue_type`, `priority`, `assignment_group`, `resolution_action` | Perform full helpdesk routing |
|
| 56 |
+
|
| 57 |
+
## Action Space
|
| 58 |
+
|
| 59 |
+
The agent submits a `HelpdeskTicketAction`. Only the fields relevant to the current task are scored.
|
| 60 |
+
|
| 61 |
+
```json
|
| 62 |
+
{
|
| 63 |
+
"issue_type": "billing_license | identity_access | application_support | service_request | spam_phishing | general_inquiry | security_compliance | onboarding | feature_request",
|
| 64 |
+
"priority": "critical | high | medium | low",
|
| 65 |
+
"assignment_group": "license_ops | service_desk | application_team | procurement | security_team | onboarding_ops",
|
| 66 |
+
"resolution_action": "fulfill | escalate | assign | ignore | acknowledge"
|
| 67 |
+
}
|
| 68 |
+
```
|
| 69 |
+
|
| 70 |
+
## Observation Space
|
| 71 |
+
|
| 72 |
+
Each observation contains:
|
| 73 |
+
|
| 74 |
+
- `task_id`
|
| 75 |
+
- `task_name`
|
| 76 |
+
- `instructions`
|
| 77 |
+
- `allowed_fields`
|
| 78 |
+
- `current_ticket`
|
| 79 |
+
- `queue_size`
|
| 80 |
+
- `tickets_remaining`
|
| 81 |
+
- `tickets_processed`
|
| 82 |
+
- `history`
|
| 83 |
+
- inherited OpenEnv fields such as `done` and `reward`
|
| 84 |
+
|
| 85 |
+
The visible ticket fields are:
|
| 86 |
+
|
| 87 |
+
- `ticket_id`
|
| 88 |
+
- `title`
|
| 89 |
+
- `requester`
|
| 90 |
+
- `description`
|
| 91 |
+
|
| 92 |
+
Ground-truth labels are not exposed to the agent.
|
| 93 |
+
|
| 94 |
+
## State
|
| 95 |
+
|
| 96 |
+
The internal `HelpdeskTicketState` tracks:
|
| 97 |
+
|
| 98 |
+
- `episode_id`
|
| 99 |
+
- `step_count`
|
| 100 |
+
- `current_task_id`
|
| 101 |
+
- `seed`
|
| 102 |
+
- `queue_ticket_ids`
|
| 103 |
+
- `current_ticket_index`
|
| 104 |
+
- `per_ticket_scores`
|
| 105 |
+
- `total_reward`
|
| 106 |
+
|
| 107 |
+
## Grading
|
| 108 |
+
|
| 109 |
+
Scoring is deterministic and ranges from `0.0` to `1.0`.
|
| 110 |
+
|
| 111 |
+
### Per-field logic
|
| 112 |
+
|
| 113 |
+
- `issue_type`: exact match or partial credit for near-miss pairs
|
| 114 |
+
- `priority`: exact match or proximity score
|
| 115 |
+
- `assignment_group`: exact match
|
| 116 |
+
- `resolution_action`: exact match
|
| 117 |
+
|
| 118 |
+
### Task weights
|
| 119 |
+
|
| 120 |
+
| Task | Issue Type | Priority | Assignment Group | Resolution Action |
|
| 121 |
+
|------|------------|----------|------------------|-------------------|
|
| 122 |
+
| 1 | 100% | - | - | - |
|
| 123 |
+
| 2 | 60% | 40% | - | - |
|
| 124 |
+
| 3 | 35% | 20% | 25% | 20% |
|
| 125 |
+
|
| 126 |
+
### Trajectory reward
|
| 127 |
+
|
| 128 |
+
At episode end:
|
| 129 |
+
|
| 130 |
+
```text
|
| 131 |
+
trajectory_reward = average(per_ticket_scores) - 0.03 * max(0, steps_taken - queue_size)
|
| 132 |
+
```
|
| 133 |
+
|
| 134 |
+
The result is clamped to `[0.0, 1.0]`.
|
| 135 |
+
|
| 136 |
+
## Dataset
|
| 137 |
+
|
| 138 |
+
`data/dataset.json` contains 45 labeled helpdesk tickets covering:
|
| 139 |
+
|
| 140 |
+
- issue classification
|
| 141 |
+
- access requests
|
| 142 |
+
- application incidents
|
| 143 |
+
- procurement and service requests
|
| 144 |
+
- phishing or spam reports
|
| 145 |
+
- security and compliance work
|
| 146 |
+
- onboarding tickets
|
| 147 |
+
- feature requests
|
| 148 |
+
|
| 149 |
+
The dataset also includes:
|
| 150 |
+
|
| 151 |
+
- ambiguous cases
|
| 152 |
+
- follow-up thread references
|
| 153 |
+
- multiple priority levels
|
| 154 |
+
|
| 155 |
+
## Project Structure
|
| 156 |
+
|
| 157 |
+
```text
|
| 158 |
+
server/
|
| 159 |
+
app.py
|
| 160 |
+
environment.py
|
| 161 |
+
grader.py
|
| 162 |
+
reward.py
|
| 163 |
+
tasks.py
|
| 164 |
+
Dockerfile
|
| 165 |
+
data/
|
| 166 |
+
dataset.json
|
| 167 |
+
models.py
|
| 168 |
+
client.py
|
| 169 |
+
inference.py
|
| 170 |
+
openenv.yaml
|
| 171 |
+
pyproject.toml
|
| 172 |
+
requirements.txt
|
| 173 |
+
README.md
|
| 174 |
+
KNOWLEDGE.md
|
| 175 |
+
PLAN.md
|
| 176 |
+
MENTAL_MODEL.md
|
| 177 |
+
```
|
| 178 |
+
|
| 179 |
+
## Setup
|
| 180 |
+
|
| 181 |
+
Install dependencies:
|
| 182 |
+
|
| 183 |
+
```bash
|
| 184 |
+
pip install -r requirements.txt
|
| 185 |
+
```
|
| 186 |
+
|
| 187 |
+
Start the server:
|
| 188 |
+
|
| 189 |
+
```bash
|
| 190 |
+
uvicorn server.app:app --host 0.0.0.0 --port 8000
|
| 191 |
+
```
|
| 192 |
+
|
| 193 |
+
Basic checks:
|
| 194 |
+
|
| 195 |
+
```bash
|
| 196 |
+
curl http://localhost:8000/health
|
| 197 |
+
curl http://localhost:8000/tasks
|
| 198 |
+
```
|
| 199 |
+
|
| 200 |
+
## Running Inference
|
| 201 |
+
|
| 202 |
+
### LLM mode
|
| 203 |
+
|
| 204 |
+
Set:
|
| 205 |
+
|
| 206 |
+
- `API_BASE_URL`
|
| 207 |
+
- `MODEL_NAME`
|
| 208 |
+
- `HF_TOKEN`
|
| 209 |
+
|
| 210 |
+
Then run:
|
| 211 |
+
|
| 212 |
+
```bash
|
| 213 |
+
python inference.py
|
| 214 |
+
```
|
| 215 |
+
|
| 216 |
+
### Heuristic mode
|
| 217 |
+
|
| 218 |
+
If those variables are not set, the script falls back to a keyword-based ticket router:
|
| 219 |
+
|
| 220 |
+
```bash
|
| 221 |
+
python inference.py
|
| 222 |
+
```
|
| 223 |
+
|
| 224 |
+
Optional server target:
|
| 225 |
+
|
| 226 |
+
- `ENV_URL` default: `http://localhost:8000`
|
| 227 |
+
|
| 228 |
+
## Docker
|
| 229 |
+
|
| 230 |
+
Build and run:
|
| 231 |
+
|
| 232 |
+
```bash
|
| 233 |
+
docker build -f server/Dockerfile -t helpdesk-ticket-routing .
|
| 234 |
+
docker run -p 7860:7860 helpdesk-ticket-routing
|
| 235 |
+
```
|
| 236 |
+
|
| 237 |
+
## API Endpoints
|
| 238 |
+
|
| 239 |
+
OpenEnv auto-generates the main endpoints, and the repo adds `/tasks`.
|
| 240 |
+
|
| 241 |
+
| Method | Path | Description |
|
| 242 |
+
|--------|------|-------------|
|
| 243 |
+
| GET | `/health` | Health check |
|
| 244 |
+
| POST | `/reset` | Start a new episode |
|
| 245 |
+
| POST | `/step` | Submit an action |
|
| 246 |
+
| GET | `/state` | Inspect state |
|
| 247 |
+
| WebSocket | `/ws` | Persistent client channel |
|
| 248 |
+
| GET | `/tasks` | List available tasks |
|
| 249 |
+
| GET | `/docs` | API docs |
|
| 250 |
+
|
| 251 |
+
## Baseline Status
|
| 252 |
+
|
| 253 |
+
Fresh baseline scores should be recorded after the next validation pass. The recommended order is:
|
| 254 |
+
|
| 255 |
+
1. run the environment locally
|
| 256 |
+
2. run the heuristic baseline in `inference.py`
|
| 257 |
+
3. record per-task and overall scores
|
| 258 |
+
4. update the docs only after those numbers are verified
|
ROADMAP.md
ADDED
|
@@ -0,0 +1,339 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Hackstreet Boys Roadmap
|
| 2 |
+
|
| 3 |
+
## Team
|
| 4 |
+
|
| 5 |
+
- Team name: Hackstreet Boys
|
| 6 |
+
- Members:
|
| 7 |
+
- Roopal Guha Neogi
|
| 8 |
+
- Suyash Kumar
|
| 9 |
+
- Submission deadline: April 8, 2026, 11:59 PM IST
|
| 10 |
+
|
| 11 |
+
## Goal
|
| 12 |
+
|
| 13 |
+
Ship a clean, well-documented OpenEnv environment for IT helpdesk ticket routing that:
|
| 14 |
+
|
| 15 |
+
- passes all submission gates
|
| 16 |
+
- scores well on real-world utility
|
| 17 |
+
- has deterministic, defensible grading
|
| 18 |
+
- is easy for judges to understand and rerun
|
| 19 |
+
|
| 20 |
+
## When You Start Coding
|
| 21 |
+
|
| 22 |
+
Start coding immediately on **March 30, 2026** after a short 30 to 60 minute alignment pass.
|
| 23 |
+
|
| 24 |
+
That first coding session should do only high-leverage foundation work:
|
| 25 |
+
|
| 26 |
+
- lock the exact ticket vocabulary
|
| 27 |
+
- freeze field names in `models.py`
|
| 28 |
+
- confirm task fields in `server/tasks.py`
|
| 29 |
+
- agree on grader labels in `server/grader.py`
|
| 30 |
+
- agree that no one changes schema names casually after this point
|
| 31 |
+
|
| 32 |
+
### First coding targets on March 30, 2026
|
| 33 |
+
|
| 34 |
+
Roopal should start with:
|
| 35 |
+
|
| 36 |
+
- `data/dataset.json`
|
| 37 |
+
- `server/tasks.py`
|
| 38 |
+
- `server/grader.py`
|
| 39 |
+
|
| 40 |
+
Suyash should start with:
|
| 41 |
+
|
| 42 |
+
- `models.py`
|
| 43 |
+
- `server/environment.py`
|
| 44 |
+
- `inference.py`
|
| 45 |
+
|
| 46 |
+
By the end of the first coding block, both of you should have:
|
| 47 |
+
|
| 48 |
+
- matching field names
|
| 49 |
+
- matching task labels
|
| 50 |
+
- matching issue-type vocabulary
|
| 51 |
+
- no unresolved schema disagreements
|
| 52 |
+
|
| 53 |
+
## Working Model For Two People
|
| 54 |
+
|
| 55 |
+
The safest way for two people to work separately and merge cleanly is to divide ownership by file groups, not by abstract ideas.
|
| 56 |
+
|
| 57 |
+
### Roopal ownership
|
| 58 |
+
|
| 59 |
+
- `data/dataset.json`
|
| 60 |
+
- `server/tasks.py`
|
| 61 |
+
- `server/grader.py`
|
| 62 |
+
- `README.md`
|
| 63 |
+
- `KNOWLEDGE.md`
|
| 64 |
+
- `MENTAL_MODEL.md`
|
| 65 |
+
|
| 66 |
+
Primary responsibilities:
|
| 67 |
+
|
| 68 |
+
- dataset quality
|
| 69 |
+
- label consistency
|
| 70 |
+
- task wording
|
| 71 |
+
- grader realism
|
| 72 |
+
- documentation clarity
|
| 73 |
+
- judging-story polish
|
| 74 |
+
|
| 75 |
+
### Suyash ownership
|
| 76 |
+
|
| 77 |
+
- `models.py`
|
| 78 |
+
- `server/environment.py`
|
| 79 |
+
- `server/app.py`
|
| 80 |
+
- `server/reward.py`
|
| 81 |
+
- `client.py`
|
| 82 |
+
- `inference.py`
|
| 83 |
+
- `openenv.yaml`
|
| 84 |
+
- `server/Dockerfile`
|
| 85 |
+
- `pyproject.toml`
|
| 86 |
+
- `requirements.txt`
|
| 87 |
+
|
| 88 |
+
Primary responsibilities:
|
| 89 |
+
|
| 90 |
+
- runtime correctness
|
| 91 |
+
- OpenEnv interface
|
| 92 |
+
- inference reliability
|
| 93 |
+
- Docker and deployment readiness
|
| 94 |
+
- integration behavior
|
| 95 |
+
|
| 96 |
+
## Merge Strategy
|
| 97 |
+
|
| 98 |
+
To keep parallel work easy to combine:
|
| 99 |
+
|
| 100 |
+
1. avoid editing the same file on the same day unless planned
|
| 101 |
+
2. use one shared terminology list and do not invent alternate labels
|
| 102 |
+
3. sync once daily with a 10 minute review of:
|
| 103 |
+
- changed files
|
| 104 |
+
- open blockers
|
| 105 |
+
- any schema changes
|
| 106 |
+
4. freeze the dataset schema early
|
| 107 |
+
5. freeze the action and observation field names early
|
| 108 |
+
|
| 109 |
+
## Shared Source Of Truth
|
| 110 |
+
|
| 111 |
+
These files should be treated as authoritative:
|
| 112 |
+
|
| 113 |
+
- `README.md` for the public project story
|
| 114 |
+
- `PLAN.md` for project requirements and definition of done
|
| 115 |
+
- `MENTAL_MODEL.md` for the current system shape
|
| 116 |
+
- `openenv.yaml` for environment metadata
|
| 117 |
+
- `server/tasks.py` and `server/grader.py` for task rules
|
| 118 |
+
|
| 119 |
+
## AI Usage Policy
|
| 120 |
+
|
| 121 |
+
AI is permitted, so use it aggressively where it saves time, but do not outsource judgment.
|
| 122 |
+
|
| 123 |
+
Good uses of AI:
|
| 124 |
+
|
| 125 |
+
- draft clearer task descriptions
|
| 126 |
+
- propose additional hard-case tickets
|
| 127 |
+
- suggest edge cases and label audits
|
| 128 |
+
- improve prompts in `inference.py`
|
| 129 |
+
- generate test ideas and checklists
|
| 130 |
+
- improve README structure and wording
|
| 131 |
+
|
| 132 |
+
Human review required for:
|
| 133 |
+
|
| 134 |
+
- final dataset labels
|
| 135 |
+
- grader weights and partial-credit rules
|
| 136 |
+
- any claims in README
|
| 137 |
+
- final benchmark numbers
|
| 138 |
+
- submission metadata and deployment settings
|
| 139 |
+
|
| 140 |
+
## Submission Criteria Checklist
|
| 141 |
+
|
| 142 |
+
### Must pass
|
| 143 |
+
|
| 144 |
+
- environment starts correctly
|
| 145 |
+
- `reset()`, `step()`, and `state()` behave correctly
|
| 146 |
+
- 3 tasks exist and are meaningfully different
|
| 147 |
+
- grader scores are in `[0.0, 1.0]`
|
| 148 |
+
- `inference.py` runs without error
|
| 149 |
+
- Docker builds and starts
|
| 150 |
+
- docs are complete and current
|
| 151 |
+
|
| 152 |
+
### Must score well
|
| 153 |
+
|
| 154 |
+
- the task feels like real IT helpdesk work
|
| 155 |
+
- the hard task is genuinely harder
|
| 156 |
+
- the grader gives partial credit in sensible ways
|
| 157 |
+
- the environment is easy to understand and rerun
|
| 158 |
+
|
| 159 |
+
## Timeline
|
| 160 |
+
|
| 161 |
+
### March 30, 2026
|
| 162 |
+
|
| 163 |
+
- lock team name, domain, and vocabulary
|
| 164 |
+
- finish repo cleanup
|
| 165 |
+
- agree on ownership split
|
| 166 |
+
- start coding the core schema and task logic immediately after the vocabulary lock
|
| 167 |
+
- target a same-day checkpoint on:
|
| 168 |
+
- `models.py`
|
| 169 |
+
- `server/tasks.py`
|
| 170 |
+
- `server/grader.py`
|
| 171 |
+
- `server/environment.py`
|
| 172 |
+
|
| 173 |
+
### March 31, 2026
|
| 174 |
+
|
| 175 |
+
Roopal:
|
| 176 |
+
|
| 177 |
+
- audit `data/dataset.json` labels end to end
|
| 178 |
+
- tighten ambiguous cases
|
| 179 |
+
- review task wording in `server/tasks.py`
|
| 180 |
+
- continue code work in `server/grader.py` if partial-credit tuning is still needed
|
| 181 |
+
|
| 182 |
+
Suyash:
|
| 183 |
+
|
| 184 |
+
- sanity-check `models.py`, `server/environment.py`, and `client.py`
|
| 185 |
+
- check that the field names align everywhere
|
| 186 |
+
- continue code work in `inference.py` and `server/app.py`
|
| 187 |
+
|
| 188 |
+
Shared checkpoint:
|
| 189 |
+
|
| 190 |
+
- confirm no schema changes are still pending
|
| 191 |
+
|
| 192 |
+
### April 1, 2026
|
| 193 |
+
|
| 194 |
+
Roopal:
|
| 195 |
+
|
| 196 |
+
- polish `server/grader.py`
|
| 197 |
+
- confirm hard-task logic and partial-credit behavior
|
| 198 |
+
- finish any remaining dataset label corrections
|
| 199 |
+
|
| 200 |
+
Suyash:
|
| 201 |
+
|
| 202 |
+
- polish `inference.py`
|
| 203 |
+
- confirm heuristic mode uses the new ticket vocabulary consistently
|
| 204 |
+
- finish runtime code adjustments in `client.py`, `server/app.py`, and `server/reward.py`
|
| 205 |
+
|
| 206 |
+
Shared checkpoint:
|
| 207 |
+
|
| 208 |
+
- agree on the exact labels and examples used in docs
|
| 209 |
+
|
| 210 |
+
### April 2, 2026
|
| 211 |
+
|
| 212 |
+
Roopal:
|
| 213 |
+
|
| 214 |
+
- improve `README.md`
|
| 215 |
+
- improve `KNOWLEDGE.md`
|
| 216 |
+
|
| 217 |
+
Suyash:
|
| 218 |
+
|
| 219 |
+
- validate `openenv.yaml`
|
| 220 |
+
- validate `server/Dockerfile`
|
| 221 |
+
- validate dependency files
|
| 222 |
+
|
| 223 |
+
Shared checkpoint:
|
| 224 |
+
|
| 225 |
+
- ensure docs and code tell the same story
|
| 226 |
+
|
| 227 |
+
### April 3, 2026
|
| 228 |
+
|
| 229 |
+
Roopal:
|
| 230 |
+
|
| 231 |
+
- do a dataset realism pass
|
| 232 |
+
- make sure examples clearly cover easy, medium, and hard cases
|
| 233 |
+
|
| 234 |
+
Suyash:
|
| 235 |
+
|
| 236 |
+
- perform the first full local runtime pass
|
| 237 |
+
- run heuristic inference
|
| 238 |
+
- note bugs or schema mismatches
|
| 239 |
+
|
| 240 |
+
Shared checkpoint:
|
| 241 |
+
|
| 242 |
+
- bug triage and fix list
|
| 243 |
+
|
| 244 |
+
### Practical coding rule
|
| 245 |
+
|
| 246 |
+
If you are wondering "should we still be planning or should we code now?", the answer is:
|
| 247 |
+
|
| 248 |
+
- **March 30 to April 4, 2026 = active coding and fixes**
|
| 249 |
+
- **April 5 to April 6, 2026 = validation, docs, and score recording**
|
| 250 |
+
- **April 7 to April 8, 2026 = freeze, smoke tests, and submission**
|
| 251 |
+
|
| 252 |
+
### April 4, 2026
|
| 253 |
+
|
| 254 |
+
Roopal:
|
| 255 |
+
|
| 256 |
+
- fix data, wording, and documentation issues from runtime feedback
|
| 257 |
+
|
| 258 |
+
Suyash:
|
| 259 |
+
|
| 260 |
+
- fix environment, inference, and Docker issues from runtime feedback
|
| 261 |
+
|
| 262 |
+
Shared checkpoint:
|
| 263 |
+
|
| 264 |
+
- second full local run
|
| 265 |
+
|
| 266 |
+
### April 5, 2026
|
| 267 |
+
|
| 268 |
+
Roopal:
|
| 269 |
+
|
| 270 |
+
- finalize README and knowledge docs
|
| 271 |
+
- prepare a concise judge-facing explanation of the domain
|
| 272 |
+
|
| 273 |
+
Suyash:
|
| 274 |
+
|
| 275 |
+
- confirm Docker flow
|
| 276 |
+
- confirm all required env vars are documented and handled
|
| 277 |
+
|
| 278 |
+
Shared checkpoint:
|
| 279 |
+
|
| 280 |
+
- record benchmark numbers if stable
|
| 281 |
+
|
| 282 |
+
### April 6, 2026
|
| 283 |
+
|
| 284 |
+
- full dry run from a clean copy if possible
|
| 285 |
+
- verify every required file is present
|
| 286 |
+
- check for stale claims and outdated wording
|
| 287 |
+
|
| 288 |
+
### April 7, 2026
|
| 289 |
+
|
| 290 |
+
- freeze feature changes
|
| 291 |
+
- only bug fixes, validation, and submission packaging
|
| 292 |
+
- verify final docs, metadata, and benchmark numbers
|
| 293 |
+
|
| 294 |
+
### April 8, 2026
|
| 295 |
+
|
| 296 |
+
- do one last deployment and smoke test early in the day
|
| 297 |
+
- stop risky edits several hours before deadline
|
| 298 |
+
- submit before 11:59 PM IST
|
| 299 |
+
|
| 300 |
+
## Integration Rules
|
| 301 |
+
|
| 302 |
+
To keep merges painless:
|
| 303 |
+
|
| 304 |
+
1. do not rename schemas after April 1, 2026
|
| 305 |
+
2. do not change task labels after April 2, 2026 without both agreeing
|
| 306 |
+
3. do not edit ownership files casually
|
| 307 |
+
4. if one person must touch the other person's file, call it out before doing it
|
| 308 |
+
5. keep a short daily changelog in chat or a shared note
|
| 309 |
+
|
| 310 |
+
## Definition Of Done For Each Member
|
| 311 |
+
|
| 312 |
+
### Roopal done means
|
| 313 |
+
|
| 314 |
+
- dataset labels are internally consistent
|
| 315 |
+
- docs are submission-ready
|
| 316 |
+
- the hard task feels meaningfully harder than the easy and medium tasks
|
| 317 |
+
|
| 318 |
+
### Suyash done means
|
| 319 |
+
|
| 320 |
+
- the environment runs end to end
|
| 321 |
+
- the inference script works in heuristic mode
|
| 322 |
+
- Docker and metadata are in good shape
|
| 323 |
+
|
| 324 |
+
## Final Two-Day Priority Order
|
| 325 |
+
|
| 326 |
+
If time gets tight, prioritize in this exact order:
|
| 327 |
+
|
| 328 |
+
1. working environment
|
| 329 |
+
2. working inference script
|
| 330 |
+
3. valid grader and tasks
|
| 331 |
+
4. Docker and metadata
|
| 332 |
+
5. README clarity
|
| 333 |
+
6. extra polish
|
| 334 |
+
|
| 335 |
+
## Simple Rule To Remember
|
| 336 |
+
|
| 337 |
+
Roopal owns the story and the labels.
|
| 338 |
+
Suyash owns the runtime and the rails.
|
| 339 |
+
Both review the final submission together.
|
__init__.cpython-313.pyc
ADDED
|
Binary file (166 Bytes). View file
|
|
|
__init__.py
ADDED
|
File without changes
|
app.cpython-313.pyc
ADDED
|
Binary file (1.54 kB). View file
|
|
|
client.cpython-313.pyc
ADDED
|
Binary file (1.86 kB). View file
|
|
|
client.py
ADDED
|
@@ -0,0 +1,28 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from __future__ import annotations
|
| 2 |
+
|
| 3 |
+
from typing import Any, Dict, Optional
|
| 4 |
+
|
| 5 |
+
from openenv.core.env_client import EnvClient, StepResult
|
| 6 |
+
|
| 7 |
+
from models import HelpdeskTicketAction, HelpdeskTicketObservation, HelpdeskTicketState
|
| 8 |
+
|
| 9 |
+
|
| 10 |
+
class HelpdeskTicketEnvClient(
|
| 11 |
+
EnvClient[HelpdeskTicketAction, HelpdeskTicketObservation, HelpdeskTicketState]
|
| 12 |
+
):
|
| 13 |
+
def _step_payload(self, action: HelpdeskTicketAction) -> Dict[str, Any]:
|
| 14 |
+
return action.model_dump(exclude_none=True)
|
| 15 |
+
|
| 16 |
+
def _parse_result(
|
| 17 |
+
self, payload: Dict[str, Any]
|
| 18 |
+
) -> StepResult[HelpdeskTicketObservation]:
|
| 19 |
+
obs_data = payload.get("observation", payload)
|
| 20 |
+
obs = HelpdeskTicketObservation.model_validate(obs_data)
|
| 21 |
+
return StepResult(
|
| 22 |
+
observation=obs,
|
| 23 |
+
reward=payload.get("reward", obs.reward),
|
| 24 |
+
done=payload.get("done", obs.done),
|
| 25 |
+
)
|
| 26 |
+
|
| 27 |
+
def _parse_state(self, payload: Dict[str, Any]) -> HelpdeskTicketState:
|
| 28 |
+
return HelpdeskTicketState.model_validate(payload)
|
data/dataset.json
ADDED
|
@@ -0,0 +1,543 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
[
|
| 2 |
+
{
|
| 3 |
+
"ticket_id": "ticket-001",
|
| 4 |
+
"title": "Urgent: customer charged twice for March invoice",
|
| 5 |
+
"requester": "ap@northstar-retail.com",
|
| 6 |
+
"description": "Our finance team found two charges on the same invoice and needs a refund processed today.",
|
| 7 |
+
"issue_type": "billing_license",
|
| 8 |
+
"priority": "high",
|
| 9 |
+
"assignment_group": "license_ops",
|
| 10 |
+
"resolution_action": "escalate",
|
| 11 |
+
"ambiguity_note": null,
|
| 12 |
+
"related_ticket_id": null
|
| 13 |
+
},
|
| 14 |
+
{
|
| 15 |
+
"ticket_id": "ticket-002",
|
| 16 |
+
"title": "Can not sign in after 2FA reset",
|
| 17 |
+
"requester": "ops@laneeight.io",
|
| 18 |
+
"description": "I was forced to reset 2FA and now the account stays locked even with the backup code.",
|
| 19 |
+
"issue_type": "identity_access",
|
| 20 |
+
"priority": "high",
|
| 21 |
+
"assignment_group": "service_desk",
|
| 22 |
+
"resolution_action": "fulfill",
|
| 23 |
+
"ambiguity_note": null,
|
| 24 |
+
"related_ticket_id": null
|
| 25 |
+
},
|
| 26 |
+
{
|
| 27 |
+
"ticket_id": "ticket-003",
|
| 28 |
+
"title": "Production checkout throwing null reference exception",
|
| 29 |
+
"requester": "sre@paperkite.dev",
|
| 30 |
+
"description": "Customers can not complete payment in production. This is blocking revenue right now.",
|
| 31 |
+
"issue_type": "application_support",
|
| 32 |
+
"priority": "critical",
|
| 33 |
+
"assignment_group": "application_team",
|
| 34 |
+
"resolution_action": "escalate",
|
| 35 |
+
"ambiguity_note": null,
|
| 36 |
+
"related_ticket_id": null
|
| 37 |
+
},
|
| 38 |
+
{
|
| 39 |
+
"ticket_id": "ticket-004",
|
| 40 |
+
"title": "Requesting pricing for 300-seat rollout",
|
| 41 |
+
"requester": "procurement@solsticehealth.org",
|
| 42 |
+
"description": "We are evaluating vendors and want a quote for an enterprise rollout next quarter.",
|
| 43 |
+
"issue_type": "service_request",
|
| 44 |
+
"priority": "medium",
|
| 45 |
+
"assignment_group": "procurement",
|
| 46 |
+
"resolution_action": "assign",
|
| 47 |
+
"ambiguity_note": null,
|
| 48 |
+
"related_ticket_id": null
|
| 49 |
+
},
|
| 50 |
+
{
|
| 51 |
+
"ticket_id": "ticket-005",
|
| 52 |
+
"title": "Guaranteed crypto income from home",
|
| 53 |
+
"requester": "promo@fastwealth.example",
|
| 54 |
+
"description": "Limited time offer. Click now to multiply your income and unsubscribe never.",
|
| 55 |
+
"issue_type": "spam_phishing",
|
| 56 |
+
"priority": "low",
|
| 57 |
+
"assignment_group": "security_team",
|
| 58 |
+
"resolution_action": "ignore",
|
| 59 |
+
"ambiguity_note": null,
|
| 60 |
+
"related_ticket_id": null
|
| 61 |
+
},
|
| 62 |
+
{
|
| 63 |
+
"ticket_id": "ticket-006",
|
| 64 |
+
"title": "Refund still missing for canceled annual plan",
|
| 65 |
+
"requester": "controller@redcedar.co",
|
| 66 |
+
"description": "We canceled three weeks ago and the refund has not arrived. Please confirm status.",
|
| 67 |
+
"issue_type": "billing_license",
|
| 68 |
+
"priority": "medium",
|
| 69 |
+
"assignment_group": "license_ops",
|
| 70 |
+
"resolution_action": "fulfill",
|
| 71 |
+
"ambiguity_note": null,
|
| 72 |
+
"related_ticket_id": null
|
| 73 |
+
},
|
| 74 |
+
{
|
| 75 |
+
"ticket_id": "ticket-007",
|
| 76 |
+
"title": "GDPR data deletion request — 30 day deadline",
|
| 77 |
+
"requester": "legal@eurocorp.de",
|
| 78 |
+
"description": "Per GDPR Article 17, we request deletion of all personal data associated with our account within 30 days. Failure to comply may result in regulatory action.",
|
| 79 |
+
"issue_type": "security_compliance",
|
| 80 |
+
"priority": "critical",
|
| 81 |
+
"assignment_group": "security_team",
|
| 82 |
+
"resolution_action": "escalate",
|
| 83 |
+
"ambiguity_note": null,
|
| 84 |
+
"related_ticket_id": null
|
| 85 |
+
},
|
| 86 |
+
{
|
| 87 |
+
"ticket_id": "ticket-008",
|
| 88 |
+
"title": "Welcome aboard — getting started with your new account",
|
| 89 |
+
"requester": "success@brightpath.io",
|
| 90 |
+
"description": "Thanks for signing up! We\u0027d like to schedule an onboarding call this week. What time works for your team?",
|
| 91 |
+
"issue_type": "onboarding",
|
| 92 |
+
"priority": "medium",
|
| 93 |
+
"assignment_group": "onboarding_ops",
|
| 94 |
+
"resolution_action": "fulfill",
|
| 95 |
+
"ambiguity_note": null,
|
| 96 |
+
"related_ticket_id": null
|
| 97 |
+
},
|
| 98 |
+
{
|
| 99 |
+
"ticket_id": "ticket-009",
|
| 100 |
+
"title": "Feature suggestion: dark mode for dashboard",
|
| 101 |
+
"requester": "ux-team@designhub.co",
|
| 102 |
+
"description": "Our users have been requesting dark mode for months. Would love to see this on the roadmap.",
|
| 103 |
+
"issue_type": "feature_request",
|
| 104 |
+
"priority": "low",
|
| 105 |
+
"assignment_group": "application_team",
|
| 106 |
+
"resolution_action": "acknowledge",
|
| 107 |
+
"ambiguity_note": null,
|
| 108 |
+
"related_ticket_id": null
|
| 109 |
+
},
|
| 110 |
+
{
|
| 111 |
+
"ticket_id": "ticket-010",
|
| 112 |
+
"title": "Password reset link expired before I could use it",
|
| 113 |
+
"requester": "jsmith@midtownlogistics.com",
|
| 114 |
+
"description": "I requested a password reset but by the time I checked my email the link had expired. Can you send a new one?",
|
| 115 |
+
"issue_type": "identity_access",
|
| 116 |
+
"priority": "medium",
|
| 117 |
+
"assignment_group": "service_desk",
|
| 118 |
+
"resolution_action": "fulfill",
|
| 119 |
+
"ambiguity_note": null,
|
| 120 |
+
"related_ticket_id": null
|
| 121 |
+
},
|
| 122 |
+
{
|
| 123 |
+
"ticket_id": "ticket-011",
|
| 124 |
+
"title": "API rate limiting causing data sync failures",
|
| 125 |
+
"requester": "devops@streamline.app",
|
| 126 |
+
"description": "Our integration is hitting 429 errors every hour during peak load. We need the rate limit raised or a bulk endpoint.",
|
| 127 |
+
"issue_type": "application_support",
|
| 128 |
+
"priority": "high",
|
| 129 |
+
"assignment_group": "application_team",
|
| 130 |
+
"resolution_action": "escalate",
|
| 131 |
+
"ambiguity_note": null,
|
| 132 |
+
"related_ticket_id": null
|
| 133 |
+
},
|
| 134 |
+
{
|
| 135 |
+
"ticket_id": "ticket-012",
|
| 136 |
+
"title": "Interested in a live demo for our leadership team",
|
| 137 |
+
"requester": "cto@nexwave.io",
|
| 138 |
+
"description": "We have budget allocated for Q3 and would like a 30-minute demo with our CTO and VP Eng.",
|
| 139 |
+
"issue_type": "service_request",
|
| 140 |
+
"priority": "high",
|
| 141 |
+
"assignment_group": "procurement",
|
| 142 |
+
"resolution_action": "assign",
|
| 143 |
+
"ambiguity_note": null,
|
| 144 |
+
"related_ticket_id": null
|
| 145 |
+
},
|
| 146 |
+
{
|
| 147 |
+
"ticket_id": "ticket-013",
|
| 148 |
+
"title": "Free vacation giveaway — claim your prize",
|
| 149 |
+
"requester": "offers@tropicaldeals.example",
|
| 150 |
+
"description": "Congratulations! You have been selected for an all-expenses-paid trip. Click here immediately.",
|
| 151 |
+
"issue_type": "spam_phishing",
|
| 152 |
+
"priority": "low",
|
| 153 |
+
"assignment_group": "security_team",
|
| 154 |
+
"resolution_action": "ignore",
|
| 155 |
+
"ambiguity_note": null,
|
| 156 |
+
"related_ticket_id": null
|
| 157 |
+
},
|
| 158 |
+
{
|
| 159 |
+
"ticket_id": "ticket-014",
|
| 160 |
+
"title": "Audit report findings — action required by Friday",
|
| 161 |
+
"requester": "audit@compliancepartners.com",
|
| 162 |
+
"description": "The SOC2 audit uncovered three medium-severity findings. Remediation evidence is due by end of week.",
|
| 163 |
+
"issue_type": "security_compliance",
|
| 164 |
+
"priority": "high",
|
| 165 |
+
"assignment_group": "security_team",
|
| 166 |
+
"resolution_action": "escalate",
|
| 167 |
+
"ambiguity_note": null,
|
| 168 |
+
"related_ticket_id": null
|
| 169 |
+
},
|
| 170 |
+
{
|
| 171 |
+
"ticket_id": "ticket-015",
|
| 172 |
+
"title": "Invoice discrepancy for order #4821",
|
| 173 |
+
"requester": "accounts@meridianfoods.com",
|
| 174 |
+
"description": "The invoice total doesn\u0027t match our purchase order. There\u0027s a $2,400 overcharge on the line items.",
|
| 175 |
+
"issue_type": "billing_license",
|
| 176 |
+
"priority": "high",
|
| 177 |
+
"assignment_group": "license_ops",
|
| 178 |
+
"resolution_action": "fulfill",
|
| 179 |
+
"ambiguity_note": null,
|
| 180 |
+
"related_ticket_id": null
|
| 181 |
+
},
|
| 182 |
+
{
|
| 183 |
+
"ticket_id": "ticket-016",
|
| 184 |
+
"title": "New hire onboarding checklist incomplete",
|
| 185 |
+
"requester": "hr@talentbridge.co",
|
| 186 |
+
"description": "Three new engineers start Monday and their accounts haven\u0027t been provisioned yet. Please expedite.",
|
| 187 |
+
"issue_type": "onboarding",
|
| 188 |
+
"priority": "high",
|
| 189 |
+
"assignment_group": "onboarding_ops",
|
| 190 |
+
"resolution_action": "fulfill",
|
| 191 |
+
"ambiguity_note": null,
|
| 192 |
+
"related_ticket_id": null
|
| 193 |
+
},
|
| 194 |
+
{
|
| 195 |
+
"ticket_id": "ticket-017",
|
| 196 |
+
"title": "Dashboard latency is unacceptable",
|
| 197 |
+
"requester": "ops-lead@fastfreight.com",
|
| 198 |
+
"description": "Pages are taking 12+ seconds to load. This is impacting our dispatchers during peak hours. We need this fixed ASAP.",
|
| 199 |
+
"issue_type": "application_support",
|
| 200 |
+
"priority": "critical",
|
| 201 |
+
"assignment_group": "application_team",
|
| 202 |
+
"resolution_action": "escalate",
|
| 203 |
+
"ambiguity_note": null,
|
| 204 |
+
"related_ticket_id": null
|
| 205 |
+
},
|
| 206 |
+
{
|
| 207 |
+
"ticket_id": "ticket-018",
|
| 208 |
+
"title": "Question about enterprise tier pricing",
|
| 209 |
+
"requester": "finance@urbanstack.io",
|
| 210 |
+
"description": "We\u0027re comparing your enterprise plan against two competitors. Can you send over a detailed pricing breakdown?",
|
| 211 |
+
"issue_type": "service_request",
|
| 212 |
+
"priority": "medium",
|
| 213 |
+
"assignment_group": "procurement",
|
| 214 |
+
"resolution_action": "assign",
|
| 215 |
+
"ambiguity_note": null,
|
| 216 |
+
"related_ticket_id": null
|
| 217 |
+
},
|
| 218 |
+
{
|
| 219 |
+
"ticket_id": "ticket-019",
|
| 220 |
+
"title": "Make $5000/week with this one simple trick",
|
| 221 |
+
"requester": "noreply@quickcash.example",
|
| 222 |
+
"description": "No experience needed. Start earning today. Limited spots available. Act now before it\u0027s too late.",
|
| 223 |
+
"issue_type": "spam_phishing",
|
| 224 |
+
"priority": "low",
|
| 225 |
+
"assignment_group": "security_team",
|
| 226 |
+
"resolution_action": "ignore",
|
| 227 |
+
"ambiguity_note": null,
|
| 228 |
+
"related_ticket_id": null
|
| 229 |
+
},
|
| 230 |
+
{
|
| 231 |
+
"ticket_id": "ticket-020",
|
| 232 |
+
"title": "General inquiry about your platform capabilities",
|
| 233 |
+
"requester": "info@greenleaf.org",
|
| 234 |
+
"description": "Hi, I stumbled across your website and was curious about what your platform does. Can you send some information?",
|
| 235 |
+
"issue_type": "general_inquiry",
|
| 236 |
+
"priority": "low",
|
| 237 |
+
"assignment_group": "service_desk",
|
| 238 |
+
"resolution_action": "acknowledge",
|
| 239 |
+
"ambiguity_note": null,
|
| 240 |
+
"related_ticket_id": null
|
| 241 |
+
},
|
| 242 |
+
{
|
| 243 |
+
"ticket_id": "ticket-021",
|
| 244 |
+
"title": "Re: Production checkout throwing null reference exception",
|
| 245 |
+
"requester": "sre@paperkite.dev",
|
| 246 |
+
"description": "Following up on ticket-003. The hotfix was deployed but we\u0027re seeing a regression in staging. Same null reference on the payment confirmation page. This is still blocking.",
|
| 247 |
+
"issue_type": "application_support",
|
| 248 |
+
"priority": "critical",
|
| 249 |
+
"assignment_group": "application_team",
|
| 250 |
+
"resolution_action": "escalate",
|
| 251 |
+
"ambiguity_note": null,
|
| 252 |
+
"related_ticket_id": "ticket-003"
|
| 253 |
+
},
|
| 254 |
+
{
|
| 255 |
+
"ticket_id": "ticket-022",
|
| 256 |
+
"title": "Usage charge dispute tied to API failures",
|
| 257 |
+
"requester": "admin@crossfitbayarea.com",
|
| 258 |
+
"description": "Our usage charges increased while the integration returned 500 errors for two weeks. We need both charge review and API investigation before approving the invoice.",
|
| 259 |
+
"issue_type": "application_support",
|
| 260 |
+
"priority": "high",
|
| 261 |
+
"assignment_group": "application_team",
|
| 262 |
+
"resolution_action": "escalate",
|
| 263 |
+
"ambiguity_note": "Mentions billing, but the root cause is an application issue. The issue type could reasonably be billing_license or application_support.",
|
| 264 |
+
"related_ticket_id": null
|
| 265 |
+
},
|
| 266 |
+
{
|
| 267 |
+
"ticket_id": "ticket-023",
|
| 268 |
+
"title": "Cancel subscription and process final refund",
|
| 269 |
+
"requester": "ops@smallbatch.co",
|
| 270 |
+
"description": "We\u0027ve decided to go with another vendor. Please cancel our subscription effective immediately and refund the remaining balance on our annual plan.",
|
| 271 |
+
"issue_type": "billing_license",
|
| 272 |
+
"priority": "medium",
|
| 273 |
+
"assignment_group": "license_ops",
|
| 274 |
+
"resolution_action": "fulfill",
|
| 275 |
+
"ambiguity_note": null,
|
| 276 |
+
"related_ticket_id": null
|
| 277 |
+
},
|
| 278 |
+
{
|
| 279 |
+
"ticket_id": "ticket-024",
|
| 280 |
+
"title": "SSO configuration failing silently",
|
| 281 |
+
"requester": "it@megacorp.com",
|
| 282 |
+
"description": "We configured SAML SSO per your docs but users get redirected to a blank page. No error messages. This is affecting 2000+ employees.",
|
| 283 |
+
"issue_type": "application_support",
|
| 284 |
+
"priority": "critical",
|
| 285 |
+
"assignment_group": "application_team",
|
| 286 |
+
"resolution_action": "escalate",
|
| 287 |
+
"ambiguity_note": null,
|
| 288 |
+
"related_ticket_id": null
|
| 289 |
+
},
|
| 290 |
+
{
|
| 291 |
+
"ticket_id": "ticket-025",
|
| 292 |
+
"title": "Data residency requirements for EU deployment",
|
| 293 |
+
"requester": "dpo@nordicbank.fi",
|
| 294 |
+
"description": "We need confirmation that all data for EU customers is stored within EU borders. Please provide your data processing addendum.",
|
| 295 |
+
"issue_type": "security_compliance",
|
| 296 |
+
"priority": "high",
|
| 297 |
+
"assignment_group": "security_team",
|
| 298 |
+
"resolution_action": "fulfill",
|
| 299 |
+
"ambiguity_note": null,
|
| 300 |
+
"related_ticket_id": null
|
| 301 |
+
},
|
| 302 |
+
{
|
| 303 |
+
"ticket_id": "ticket-026",
|
| 304 |
+
"title": "Positive feedback on recent API support case",
|
| 305 |
+
"requester": "pm@littlefox.dev",
|
| 306 |
+
"description": "Sharing positive feedback after last week\u0027s API support case. No action is needed beyond acknowledging the note and logging the feedback.",
|
| 307 |
+
"issue_type": "general_inquiry",
|
| 308 |
+
"priority": "low",
|
| 309 |
+
"assignment_group": "service_desk",
|
| 310 |
+
"resolution_action": "acknowledge",
|
| 311 |
+
"ambiguity_note": null,
|
| 312 |
+
"related_ticket_id": null
|
| 313 |
+
},
|
| 314 |
+
{
|
| 315 |
+
"ticket_id": "ticket-027",
|
| 316 |
+
"title": "Vendor upgrade offer for Premium tier",
|
| 317 |
+
"requester": "marketing@legitsaas.com",
|
| 318 |
+
"description": "A current vendor sent a 30% Premium-tier offer that expires in 48 hours. The team is unsure whether this should just be acknowledged or routed for procurement review.",
|
| 319 |
+
"issue_type": "general_inquiry",
|
| 320 |
+
"priority": "low",
|
| 321 |
+
"assignment_group": "service_desk",
|
| 322 |
+
"resolution_action": "acknowledge",
|
| 323 |
+
"ambiguity_note": "Could be treated as general_inquiry or escalated into a service_request if procurement wants to review the offer.",
|
| 324 |
+
"related_ticket_id": null
|
| 325 |
+
},
|
| 326 |
+
{
|
| 327 |
+
"ticket_id": "ticket-028",
|
| 328 |
+
"title": "Webhook delivery failures since Tuesday",
|
| 329 |
+
"requester": "backend@paystream.io",
|
| 330 |
+
"description": "Our webhook endpoint hasn\u0027t received any events since Tuesday. We\u0027ve verified our server is up. Is there an outage on your side?",
|
| 331 |
+
"issue_type": "application_support",
|
| 332 |
+
"priority": "high",
|
| 333 |
+
"assignment_group": "application_team",
|
| 334 |
+
"resolution_action": "fulfill",
|
| 335 |
+
"ambiguity_note": null,
|
| 336 |
+
"related_ticket_id": null
|
| 337 |
+
},
|
| 338 |
+
{
|
| 339 |
+
"ticket_id": "ticket-029",
|
| 340 |
+
"title": "Seat expansion request with prorating question",
|
| 341 |
+
"requester": "admin@growthworks.co",
|
| 342 |
+
"description": "Our team needs 50 additional seats immediately. We also need to know how prorating will be handled before the change is approved.",
|
| 343 |
+
"issue_type": "service_request",
|
| 344 |
+
"priority": "medium",
|
| 345 |
+
"assignment_group": "procurement",
|
| 346 |
+
"resolution_action": "assign",
|
| 347 |
+
"ambiguity_note": "Could be billing_license (prorating) or service_request (seat expansion).",
|
| 348 |
+
"related_ticket_id": null
|
| 349 |
+
},
|
| 350 |
+
{
|
| 351 |
+
"ticket_id": "ticket-030",
|
| 352 |
+
"title": "Account suspended without warning",
|
| 353 |
+
"requester": "ceo@startupxyz.io",
|
| 354 |
+
"description": "Our entire company account was suspended this morning with no prior notice. We have 80 employees locked out. This is unacceptable and needs immediate resolution.",
|
| 355 |
+
"issue_type": "identity_access",
|
| 356 |
+
"priority": "critical",
|
| 357 |
+
"assignment_group": "service_desk",
|
| 358 |
+
"resolution_action": "escalate",
|
| 359 |
+
"ambiguity_note": null,
|
| 360 |
+
"related_ticket_id": null
|
| 361 |
+
},
|
| 362 |
+
{
|
| 363 |
+
"ticket_id": "ticket-031",
|
| 364 |
+
"title": "Payment method update required",
|
| 365 |
+
"requester": "billing@yourplatform.com",
|
| 366 |
+
"description": "The credit card on file for account #7829 expired last month. We attempted to charge three times without success. Please update your payment method to avoid service interruption.",
|
| 367 |
+
"issue_type": "billing_license",
|
| 368 |
+
"priority": "medium",
|
| 369 |
+
"assignment_group": "license_ops",
|
| 370 |
+
"resolution_action": "fulfill",
|
| 371 |
+
"ambiguity_note": null,
|
| 372 |
+
"related_ticket_id": null
|
| 373 |
+
},
|
| 374 |
+
{
|
| 375 |
+
"ticket_id": "ticket-032",
|
| 376 |
+
"title": "Penetration test results — critical vulnerabilities found",
|
| 377 |
+
"requester": "security@redteam-auditors.com",
|
| 378 |
+
"description": "Our pentest revealed two critical and five high-severity vulnerabilities in your API endpoints. Full report attached. Remediation should begin immediately.",
|
| 379 |
+
"issue_type": "security_compliance",
|
| 380 |
+
"priority": "critical",
|
| 381 |
+
"assignment_group": "security_team",
|
| 382 |
+
"resolution_action": "escalate",
|
| 383 |
+
"ambiguity_note": null,
|
| 384 |
+
"related_ticket_id": null
|
| 385 |
+
},
|
| 386 |
+
{
|
| 387 |
+
"ticket_id": "ticket-033",
|
| 388 |
+
"title": "Getting started guide seems outdated",
|
| 389 |
+
"requester": "newuser@freshstart.io",
|
| 390 |
+
"description": "I just signed up yesterday and the getting started guide references features that don\u0027t seem to exist in the current UI. Can you point me to updated docs?",
|
| 391 |
+
"issue_type": "onboarding",
|
| 392 |
+
"priority": "medium",
|
| 393 |
+
"assignment_group": "onboarding_ops",
|
| 394 |
+
"resolution_action": "fulfill",
|
| 395 |
+
"ambiguity_note": null,
|
| 396 |
+
"related_ticket_id": null
|
| 397 |
+
},
|
| 398 |
+
{
|
| 399 |
+
"ticket_id": "ticket-034",
|
| 400 |
+
"title": "Mobile app crashes on launch after latest update",
|
| 401 |
+
"requester": "qa@betatesters.org",
|
| 402 |
+
"description": "Version 4.2.1 crashes immediately on iOS 18. Reproducible on iPhone 15 and 16. Stack trace included below.",
|
| 403 |
+
"issue_type": "application_support",
|
| 404 |
+
"priority": "high",
|
| 405 |
+
"assignment_group": "application_team",
|
| 406 |
+
"resolution_action": "fulfill",
|
| 407 |
+
"ambiguity_note": null,
|
| 408 |
+
"related_ticket_id": null
|
| 409 |
+
},
|
| 410 |
+
{
|
| 411 |
+
"ticket_id": "ticket-035",
|
| 412 |
+
"title": "Wire transfer for annual enterprise contract",
|
| 413 |
+
"requester": "treasury@bigbank.com",
|
| 414 |
+
"description": "We\u0027ve initiated a wire transfer of $240,000 for the annual enterprise contract. Please confirm receipt and send the signed agreement.",
|
| 415 |
+
"issue_type": "billing_license",
|
| 416 |
+
"priority": "high",
|
| 417 |
+
"assignment_group": "license_ops",
|
| 418 |
+
"resolution_action": "fulfill",
|
| 419 |
+
"ambiguity_note": null,
|
| 420 |
+
"related_ticket_id": null
|
| 421 |
+
},
|
| 422 |
+
{
|
| 423 |
+
"ticket_id": "ticket-036",
|
| 424 |
+
"title": "Can we get API access for a proof of concept?",
|
| 425 |
+
"requester": "architect@cloudnine.tech",
|
| 426 |
+
"description": "We are evaluating your platform for a large migration project. Is there a sandbox or trial API we can use for a 2-week proof of concept?",
|
| 427 |
+
"issue_type": "service_request",
|
| 428 |
+
"priority": "medium",
|
| 429 |
+
"assignment_group": "procurement",
|
| 430 |
+
"resolution_action": "assign",
|
| 431 |
+
"ambiguity_note": null,
|
| 432 |
+
"related_ticket_id": null
|
| 433 |
+
},
|
| 434 |
+
{
|
| 435 |
+
"ticket_id": "ticket-037",
|
| 436 |
+
"title": "Earn a degree in just 2 weeks!",
|
| 437 |
+
"requester": "admissions@diplomamill.example",
|
| 438 |
+
"description": "No exams, no classes. Get your accredited degree today. Reply for more information.",
|
| 439 |
+
"issue_type": "spam_phishing",
|
| 440 |
+
"priority": "low",
|
| 441 |
+
"assignment_group": "security_team",
|
| 442 |
+
"resolution_action": "ignore",
|
| 443 |
+
"ambiguity_note": null,
|
| 444 |
+
"related_ticket_id": null
|
| 445 |
+
},
|
| 446 |
+
{
|
| 447 |
+
"ticket_id": "ticket-038",
|
| 448 |
+
"title": "Re: Invoice discrepancy for order #4821",
|
| 449 |
+
"requester": "accounts@meridianfoods.com",
|
| 450 |
+
"description": "Following up on ticket-015. We still haven\u0027t received the corrected invoice. Our payment is now 15 days overdue because of this. Please prioritize.",
|
| 451 |
+
"issue_type": "billing_license",
|
| 452 |
+
"priority": "critical",
|
| 453 |
+
"assignment_group": "license_ops",
|
| 454 |
+
"resolution_action": "escalate",
|
| 455 |
+
"ambiguity_note": null,
|
| 456 |
+
"related_ticket_id": "ticket-015"
|
| 457 |
+
},
|
| 458 |
+
{
|
| 459 |
+
"ticket_id": "ticket-039",
|
| 460 |
+
"title": "MFA enrollment mandatory for all users by EOD Friday",
|
| 461 |
+
"requester": "security@internal.corp",
|
| 462 |
+
"description": "Per our updated security policy, all user accounts must have MFA enabled by end of day Friday. Non-compliant accounts will be suspended.",
|
| 463 |
+
"issue_type": "security_compliance",
|
| 464 |
+
"priority": "high",
|
| 465 |
+
"assignment_group": "security_team",
|
| 466 |
+
"resolution_action": "fulfill",
|
| 467 |
+
"ambiguity_note": null,
|
| 468 |
+
"related_ticket_id": null
|
| 469 |
+
},
|
| 470 |
+
{
|
| 471 |
+
"ticket_id": "ticket-040",
|
| 472 |
+
"title": "Reporting module needs better export options",
|
| 473 |
+
"requester": "analyst@datacrunchers.co",
|
| 474 |
+
"description": "CSV export exists, but the team also needs Excel and PDF with date filters. This blocks monthly reporting and could be interpreted as either a feature gap or an application-support issue.",
|
| 475 |
+
"issue_type": "feature_request",
|
| 476 |
+
"priority": "medium",
|
| 477 |
+
"assignment_group": "application_team",
|
| 478 |
+
"resolution_action": "acknowledge",
|
| 479 |
+
"ambiguity_note": "Could be feature_request or application_support depending on urgency interpretation.",
|
| 480 |
+
"related_ticket_id": null
|
| 481 |
+
},
|
| 482 |
+
{
|
| 483 |
+
"ticket_id": "ticket-041",
|
| 484 |
+
"title": "Account access request for new contractor",
|
| 485 |
+
"requester": "pm@buildit.agency",
|
| 486 |
+
"description": "We have a new contractor starting next week who needs read-only access to our project dashboard. Please set up their account.",
|
| 487 |
+
"issue_type": "onboarding",
|
| 488 |
+
"priority": "medium",
|
| 489 |
+
"assignment_group": "onboarding_ops",
|
| 490 |
+
"resolution_action": "fulfill",
|
| 491 |
+
"ambiguity_note": null,
|
| 492 |
+
"related_ticket_id": null
|
| 493 |
+
},
|
| 494 |
+
{
|
| 495 |
+
"ticket_id": "ticket-042",
|
| 496 |
+
"title": "Database migration script failing on large tables",
|
| 497 |
+
"requester": "dba@megastore.com",
|
| 498 |
+
"description": "The v3 to v4 migration script times out on tables with more than 10M rows. We have three such tables. Need guidance or a fix.",
|
| 499 |
+
"issue_type": "application_support",
|
| 500 |
+
"priority": "high",
|
| 501 |
+
"assignment_group": "application_team",
|
| 502 |
+
"resolution_action": "fulfill",
|
| 503 |
+
"ambiguity_note": null,
|
| 504 |
+
"related_ticket_id": null
|
| 505 |
+
},
|
| 506 |
+
{
|
| 507 |
+
"ticket_id": "ticket-043",
|
| 508 |
+
"title": "Negotiate volume discount for 1000+ licenses",
|
| 509 |
+
"requester": "procurement@globalcorp.com",
|
| 510 |
+
"description": "We\u0027re looking to standardize on your platform across all subsidiaries. Approximately 1200 seats. What volume discount can you offer?",
|
| 511 |
+
"issue_type": "service_request",
|
| 512 |
+
"priority": "high",
|
| 513 |
+
"assignment_group": "procurement",
|
| 514 |
+
"resolution_action": "assign",
|
| 515 |
+
"ambiguity_note": null,
|
| 516 |
+
"related_ticket_id": null
|
| 517 |
+
},
|
| 518 |
+
{
|
| 519 |
+
"ticket_id": "ticket-044",
|
| 520 |
+
"title": "Your account has been compromised — act now",
|
| 521 |
+
"requester": "security-alert@phishing.example",
|
| 522 |
+
"description": "We detected unusual activity on your account. Click the link below to verify your identity and secure your account immediately.",
|
| 523 |
+
"issue_type": "spam_phishing",
|
| 524 |
+
"priority": "low",
|
| 525 |
+
"assignment_group": "security_team",
|
| 526 |
+
"resolution_action": "ignore",
|
| 527 |
+
"ambiguity_note": null,
|
| 528 |
+
"related_ticket_id": null
|
| 529 |
+
},
|
| 530 |
+
{
|
| 531 |
+
"ticket_id": "ticket-045",
|
| 532 |
+
"title": "Re: Account suspended without warning",
|
| 533 |
+
"requester": "ceo@startupxyz.io",
|
| 534 |
+
"description": "This is my third update about this in 24 hours. 80 people are still locked out. If this isn\u0027t resolved in the next 2 hours we\u0027re escalating to legal. Reference ticket-030.",
|
| 535 |
+
"issue_type": "identity_access",
|
| 536 |
+
"priority": "critical",
|
| 537 |
+
"assignment_group": "service_desk",
|
| 538 |
+
"resolution_action": "escalate",
|
| 539 |
+
"ambiguity_note": null,
|
| 540 |
+
"related_ticket_id": "ticket-030"
|
| 541 |
+
}
|
| 542 |
+
]
|
| 543 |
+
|
environment.cpython-313.pyc
ADDED
|
Binary file (6.66 kB). View file
|
|
|
grader.cpython-313.pyc
ADDED
|
Binary file (3.25 kB). View file
|
|
|
inference.py
ADDED
|
@@ -0,0 +1,276 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python3
|
| 2 |
+
"""
|
| 3 |
+
Inference script for the IT Helpdesk Ticket Routing OpenEnv environment.
|
| 4 |
+
|
| 5 |
+
Uses the competition-mandated environment variables:
|
| 6 |
+
API_BASE_URL - LLM provider base URL
|
| 7 |
+
MODEL_NAME - model identifier
|
| 8 |
+
HF_TOKEN - authentication token
|
| 9 |
+
|
| 10 |
+
Can run against a local server (default http://localhost:8000) or a
|
| 11 |
+
remote HuggingFace Space URL passed via ENV_URL.
|
| 12 |
+
|
| 13 |
+
Uses the WebSocket-based EnvClient for multi-step episodes.
|
| 14 |
+
"""
|
| 15 |
+
from __future__ import annotations
|
| 16 |
+
|
| 17 |
+
import json
|
| 18 |
+
import os
|
| 19 |
+
|
| 20 |
+
import httpx
|
| 21 |
+
from openai import OpenAI
|
| 22 |
+
|
| 23 |
+
from client import HelpdeskTicketEnvClient
|
| 24 |
+
from models import HelpdeskTicketAction
|
| 25 |
+
from vocabulary import (
|
| 26 |
+
ASSIGNMENT_GROUPS,
|
| 27 |
+
ISSUE_TYPES,
|
| 28 |
+
ISSUE_TYPE_TO_ASSIGNMENT_GROUP,
|
| 29 |
+
ISSUE_TYPE_TO_RESOLUTION_ACTION,
|
| 30 |
+
PRIORITIES,
|
| 31 |
+
RESOLUTION_ACTIONS,
|
| 32 |
+
TASK_IDS,
|
| 33 |
+
)
|
| 34 |
+
|
| 35 |
+
# ---------------------------------------------------------------------------
|
| 36 |
+
# Configuration
|
| 37 |
+
# ---------------------------------------------------------------------------
|
| 38 |
+
API_BASE_URL = os.getenv("API_BASE_URL", "https://router.huggingface.co/v1")
|
| 39 |
+
MODEL_NAME = os.getenv("MODEL_NAME", "")
|
| 40 |
+
HF_TOKEN = os.getenv("HF_TOKEN", "")
|
| 41 |
+
ENV_URL = os.getenv("ENV_URL", "http://localhost:8000")
|
| 42 |
+
|
| 43 |
+
SEED = 42
|
| 44 |
+
TASKS = list(TASK_IDS)
|
| 45 |
+
|
| 46 |
+
# ---------------------------------------------------------------------------
|
| 47 |
+
# LLM helper
|
| 48 |
+
# ---------------------------------------------------------------------------
|
| 49 |
+
|
| 50 |
+
llm_client: OpenAI | None = None
|
| 51 |
+
|
| 52 |
+
if MODEL_NAME and HF_TOKEN:
|
| 53 |
+
llm_client = OpenAI(base_url=API_BASE_URL, api_key=HF_TOKEN)
|
| 54 |
+
|
| 55 |
+
|
| 56 |
+
SYSTEM_PROMPT = """\
|
| 57 |
+
You are an expert IT helpdesk ticket routing agent. Given a helpdesk ticket, you must produce a JSON object with the requested fields.
|
| 58 |
+
|
| 59 |
+
Valid values:
|
| 60 |
+
- issue_type: {issue_types}
|
| 61 |
+
- priority: {priorities}
|
| 62 |
+
- assignment_group: {assignment_groups}
|
| 63 |
+
- resolution_action: {resolution_actions}
|
| 64 |
+
|
| 65 |
+
Return ONLY valid JSON with the requested fields. No markdown, no explanation.""".format(
|
| 66 |
+
issue_types=", ".join(ISSUE_TYPES),
|
| 67 |
+
priorities=", ".join(PRIORITIES),
|
| 68 |
+
assignment_groups=", ".join(ASSIGNMENT_GROUPS),
|
| 69 |
+
resolution_actions=", ".join(RESOLUTION_ACTIONS),
|
| 70 |
+
)
|
| 71 |
+
|
| 72 |
+
|
| 73 |
+
def call_llm(ticket: dict, allowed_fields: list[str], instructions: str) -> dict:
|
| 74 |
+
assert llm_client is not None, "LLM client not configured"
|
| 75 |
+
|
| 76 |
+
user_msg = (
|
| 77 |
+
f"Instructions: {instructions}\n\n"
|
| 78 |
+
f"Allowed fields: {', '.join(allowed_fields)}\n\n"
|
| 79 |
+
f"Title: {ticket['title']}\n"
|
| 80 |
+
f"Requester: {ticket['requester']}\n"
|
| 81 |
+
f"Description: {ticket['description']}\n\n"
|
| 82 |
+
f"Respond with JSON containing ONLY these fields: {', '.join(allowed_fields)}"
|
| 83 |
+
)
|
| 84 |
+
|
| 85 |
+
response = llm_client.chat.completions.create(
|
| 86 |
+
model=MODEL_NAME,
|
| 87 |
+
messages=[
|
| 88 |
+
{"role": "system", "content": SYSTEM_PROMPT},
|
| 89 |
+
{"role": "user", "content": user_msg},
|
| 90 |
+
],
|
| 91 |
+
temperature=0.0,
|
| 92 |
+
max_tokens=256,
|
| 93 |
+
)
|
| 94 |
+
|
| 95 |
+
text = response.choices[0].message.content or "{}"
|
| 96 |
+
text = text.strip()
|
| 97 |
+
if text.startswith("```"):
|
| 98 |
+
text = text.split("\n", 1)[-1].rsplit("```", 1)[0].strip()
|
| 99 |
+
|
| 100 |
+
try:
|
| 101 |
+
return json.loads(text)
|
| 102 |
+
except json.JSONDecodeError:
|
| 103 |
+
return {}
|
| 104 |
+
|
| 105 |
+
|
| 106 |
+
# ---------------------------------------------------------------------------
|
| 107 |
+
# Heuristic fallback (no LLM needed)
|
| 108 |
+
# ---------------------------------------------------------------------------
|
| 109 |
+
|
| 110 |
+
KEYWORD_ISSUE_TYPES = {
|
| 111 |
+
"invoice": "billing_license",
|
| 112 |
+
"charge": "billing_license",
|
| 113 |
+
"refund": "billing_license",
|
| 114 |
+
"payment": "billing_license",
|
| 115 |
+
"billing": "billing_license",
|
| 116 |
+
"license": "billing_license",
|
| 117 |
+
"sign in": "identity_access",
|
| 118 |
+
"login": "identity_access",
|
| 119 |
+
"password": "identity_access",
|
| 120 |
+
"locked": "identity_access",
|
| 121 |
+
"2fa": "identity_access",
|
| 122 |
+
"sso": "identity_access",
|
| 123 |
+
"bug": "application_support",
|
| 124 |
+
"error": "application_support",
|
| 125 |
+
"exception": "application_support",
|
| 126 |
+
"crash": "application_support",
|
| 127 |
+
"production": "application_support",
|
| 128 |
+
"latency": "application_support",
|
| 129 |
+
"timeout": "application_support",
|
| 130 |
+
"webhook": "application_support",
|
| 131 |
+
"migration": "application_support",
|
| 132 |
+
"pricing": "service_request",
|
| 133 |
+
"quote": "service_request",
|
| 134 |
+
"demo": "service_request",
|
| 135 |
+
"enterprise": "service_request",
|
| 136 |
+
"rollout": "service_request",
|
| 137 |
+
"sandbox": "service_request",
|
| 138 |
+
"trial": "service_request",
|
| 139 |
+
"seat": "service_request",
|
| 140 |
+
"seats": "service_request",
|
| 141 |
+
"spam": "spam_phishing",
|
| 142 |
+
"click now": "spam_phishing",
|
| 143 |
+
"guaranteed": "spam_phishing",
|
| 144 |
+
"unsubscribe": "spam_phishing",
|
| 145 |
+
"phishing": "spam_phishing",
|
| 146 |
+
"compromised": "spam_phishing",
|
| 147 |
+
"compliance": "security_compliance",
|
| 148 |
+
"regulation": "security_compliance",
|
| 149 |
+
"gdpr": "security_compliance",
|
| 150 |
+
"audit": "security_compliance",
|
| 151 |
+
"pentest": "security_compliance",
|
| 152 |
+
"vulnerabilities": "security_compliance",
|
| 153 |
+
"security policy": "security_compliance",
|
| 154 |
+
"onboarding": "onboarding",
|
| 155 |
+
"welcome": "onboarding",
|
| 156 |
+
"getting started": "onboarding",
|
| 157 |
+
"new hire": "onboarding",
|
| 158 |
+
"contractor": "onboarding",
|
| 159 |
+
"feedback": "feature_request",
|
| 160 |
+
"suggestion": "feature_request",
|
| 161 |
+
"improve": "feature_request",
|
| 162 |
+
"roadmap": "feature_request",
|
| 163 |
+
"export": "feature_request",
|
| 164 |
+
}
|
| 165 |
+
|
| 166 |
+
def heuristic_action(ticket: dict, allowed_fields: list[str]) -> dict:
|
| 167 |
+
text = (ticket.get("title", "") + " " + ticket.get("description", "")).lower()
|
| 168 |
+
|
| 169 |
+
issue_type = "general_inquiry"
|
| 170 |
+
for kw, mapped_issue_type in KEYWORD_ISSUE_TYPES.items():
|
| 171 |
+
if kw in text:
|
| 172 |
+
issue_type = mapped_issue_type
|
| 173 |
+
break
|
| 174 |
+
|
| 175 |
+
priority = "medium"
|
| 176 |
+
if any(w in text for w in ["urgent", "critical", "blocking", "asap", "immediately"]):
|
| 177 |
+
priority = "critical"
|
| 178 |
+
elif any(w in text for w in ["important", "high priority", "revenue"]):
|
| 179 |
+
priority = "high"
|
| 180 |
+
elif any(w in text for w in ["low", "whenever", "no rush"]):
|
| 181 |
+
priority = "low"
|
| 182 |
+
|
| 183 |
+
result: dict = {}
|
| 184 |
+
if "issue_type" in allowed_fields:
|
| 185 |
+
result["issue_type"] = issue_type
|
| 186 |
+
if "priority" in allowed_fields:
|
| 187 |
+
result["priority"] = priority
|
| 188 |
+
if "assignment_group" in allowed_fields:
|
| 189 |
+
result["assignment_group"] = ISSUE_TYPE_TO_ASSIGNMENT_GROUP.get(
|
| 190 |
+
issue_type, "service_desk"
|
| 191 |
+
)
|
| 192 |
+
if "resolution_action" in allowed_fields:
|
| 193 |
+
result["resolution_action"] = ISSUE_TYPE_TO_RESOLUTION_ACTION.get(
|
| 194 |
+
issue_type, "acknowledge"
|
| 195 |
+
)
|
| 196 |
+
return result
|
| 197 |
+
|
| 198 |
+
|
| 199 |
+
# ---------------------------------------------------------------------------
|
| 200 |
+
# Main loop using WebSocket client for multi-step episodes
|
| 201 |
+
# ---------------------------------------------------------------------------
|
| 202 |
+
|
| 203 |
+
def run():
|
| 204 |
+
# Quick HTTP health check
|
| 205 |
+
http = httpx.Client(base_url=ENV_URL, timeout=30.0)
|
| 206 |
+
health = http.get("/health")
|
| 207 |
+
health.raise_for_status()
|
| 208 |
+
print(f"Connected to {ENV_URL}: {health.json()}")
|
| 209 |
+
|
| 210 |
+
tasks_resp = http.get("/tasks")
|
| 211 |
+
tasks_resp.raise_for_status()
|
| 212 |
+
available_tasks = {t["id"]: t for t in tasks_resp.json()["tasks"]}
|
| 213 |
+
print(f"Available tasks: {[t['name'] for t in available_tasks.values()]}")
|
| 214 |
+
http.close()
|
| 215 |
+
|
| 216 |
+
all_scores: dict[int, list[float]] = {}
|
| 217 |
+
|
| 218 |
+
for task_id in TASKS:
|
| 219 |
+
if task_id not in available_tasks:
|
| 220 |
+
print(f"Task {task_id} not available, skipping")
|
| 221 |
+
continue
|
| 222 |
+
|
| 223 |
+
task = available_tasks[task_id]
|
| 224 |
+
print(f"\n--- Task {task_id}: {task['name']} ({task['difficulty']}) ---")
|
| 225 |
+
|
| 226 |
+
# Use sync WebSocket client for multi-step episode
|
| 227 |
+
sync_client = HelpdeskTicketEnvClient(base_url=ENV_URL).sync()
|
| 228 |
+
with sync_client:
|
| 229 |
+
result = sync_client.reset(seed=SEED, task_id=task_id)
|
| 230 |
+
obs = result.observation
|
| 231 |
+
|
| 232 |
+
task_scores: list[float] = []
|
| 233 |
+
step_num = 0
|
| 234 |
+
|
| 235 |
+
while not result.done:
|
| 236 |
+
ticket = obs.current_ticket
|
| 237 |
+
if ticket is None:
|
| 238 |
+
break
|
| 239 |
+
|
| 240 |
+
allowed = obs.allowed_fields
|
| 241 |
+
instructions = obs.instructions
|
| 242 |
+
|
| 243 |
+
if llm_client is not None:
|
| 244 |
+
action_dict = call_llm(ticket, allowed, instructions)
|
| 245 |
+
else:
|
| 246 |
+
action_dict = heuristic_action(ticket, allowed)
|
| 247 |
+
|
| 248 |
+
action = HelpdeskTicketAction(**action_dict)
|
| 249 |
+
result = sync_client.step(action)
|
| 250 |
+
obs = result.observation
|
| 251 |
+
|
| 252 |
+
step_num += 1
|
| 253 |
+
print(f" Step {step_num}: reward={result.reward} done={result.done}")
|
| 254 |
+
|
| 255 |
+
if result.reward is not None:
|
| 256 |
+
task_scores.append(result.reward)
|
| 257 |
+
|
| 258 |
+
all_scores[task_id] = task_scores
|
| 259 |
+
final = task_scores[-1] if task_scores else 0.0
|
| 260 |
+
print(f" Task {task_id} final reward: {final:.4f}")
|
| 261 |
+
|
| 262 |
+
# Summary
|
| 263 |
+
print("\n=== RESULTS ===")
|
| 264 |
+
overall = []
|
| 265 |
+
for tid in TASKS:
|
| 266 |
+
if tid in all_scores:
|
| 267 |
+
scores = all_scores[tid]
|
| 268 |
+
avg = sum(scores) / len(scores) if scores else 0.0
|
| 269 |
+
overall.append(avg)
|
| 270 |
+
print(f"Task {tid}: avg_score={avg:.4f} ({len(scores)} steps)")
|
| 271 |
+
if overall:
|
| 272 |
+
print(f"Overall: {sum(overall) / len(overall):.4f}")
|
| 273 |
+
|
| 274 |
+
|
| 275 |
+
if __name__ == "__main__":
|
| 276 |
+
run()
|
models.cpython-313.pyc
ADDED
|
Binary file (2.62 kB). View file
|
|
|
models.py
ADDED
|
@@ -0,0 +1,114 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from __future__ import annotations
|
| 2 |
+
|
| 3 |
+
from typing import Any, Optional
|
| 4 |
+
|
| 5 |
+
from pydantic import BaseModel, Field, field_validator
|
| 6 |
+
from openenv.core.env_server.types import Action, Observation, State
|
| 7 |
+
from vocabulary import (
|
| 8 |
+
ASSIGNMENT_GROUPS,
|
| 9 |
+
ISSUE_TYPES,
|
| 10 |
+
PRIORITIES,
|
| 11 |
+
RESOLUTION_ACTIONS,
|
| 12 |
+
)
|
| 13 |
+
|
| 14 |
+
|
| 15 |
+
ISSUE_TYPE_SET = set(ISSUE_TYPES)
|
| 16 |
+
PRIORITY_SET = set(PRIORITIES)
|
| 17 |
+
ASSIGNMENT_GROUP_SET = set(ASSIGNMENT_GROUPS)
|
| 18 |
+
RESOLUTION_ACTION_SET = set(RESOLUTION_ACTIONS)
|
| 19 |
+
|
| 20 |
+
|
| 21 |
+
def _validate_choice(value: str, allowed: set[str], field_name: str) -> str:
|
| 22 |
+
if value not in allowed:
|
| 23 |
+
allowed_values = ", ".join(sorted(allowed))
|
| 24 |
+
raise ValueError(f"{field_name} must be one of: {allowed_values}")
|
| 25 |
+
return value
|
| 26 |
+
|
| 27 |
+
|
| 28 |
+
def _validate_optional_choice(
|
| 29 |
+
value: Optional[str], allowed: set[str], field_name: str
|
| 30 |
+
) -> Optional[str]:
|
| 31 |
+
if value is None:
|
| 32 |
+
return None
|
| 33 |
+
return _validate_choice(value, allowed, field_name)
|
| 34 |
+
|
| 35 |
+
|
| 36 |
+
class HelpdeskTicketRecord(BaseModel):
|
| 37 |
+
ticket_id: str
|
| 38 |
+
title: str
|
| 39 |
+
requester: str
|
| 40 |
+
description: str
|
| 41 |
+
issue_type: str
|
| 42 |
+
priority: str
|
| 43 |
+
assignment_group: str
|
| 44 |
+
resolution_action: str
|
| 45 |
+
ambiguity_note: Optional[str] = None
|
| 46 |
+
related_ticket_id: Optional[str] = None
|
| 47 |
+
|
| 48 |
+
@field_validator("issue_type")
|
| 49 |
+
@classmethod
|
| 50 |
+
def validate_issue_type(cls, value: str) -> str:
|
| 51 |
+
return _validate_choice(value, ISSUE_TYPE_SET, "issue_type")
|
| 52 |
+
|
| 53 |
+
@field_validator("priority")
|
| 54 |
+
@classmethod
|
| 55 |
+
def validate_priority(cls, value: str) -> str:
|
| 56 |
+
return _validate_choice(value, PRIORITY_SET, "priority")
|
| 57 |
+
|
| 58 |
+
@field_validator("assignment_group")
|
| 59 |
+
@classmethod
|
| 60 |
+
def validate_assignment_group(cls, value: str) -> str:
|
| 61 |
+
return _validate_choice(value, ASSIGNMENT_GROUP_SET, "assignment_group")
|
| 62 |
+
|
| 63 |
+
@field_validator("resolution_action")
|
| 64 |
+
@classmethod
|
| 65 |
+
def validate_resolution_action(cls, value: str) -> str:
|
| 66 |
+
return _validate_choice(value, RESOLUTION_ACTION_SET, "resolution_action")
|
| 67 |
+
|
| 68 |
+
|
| 69 |
+
class HelpdeskTicketAction(Action):
|
| 70 |
+
issue_type: Optional[str] = None
|
| 71 |
+
priority: Optional[str] = None
|
| 72 |
+
assignment_group: Optional[str] = None
|
| 73 |
+
resolution_action: Optional[str] = None
|
| 74 |
+
|
| 75 |
+
@field_validator("issue_type")
|
| 76 |
+
@classmethod
|
| 77 |
+
def validate_issue_type(cls, value: Optional[str]) -> Optional[str]:
|
| 78 |
+
return _validate_optional_choice(value, ISSUE_TYPE_SET, "issue_type")
|
| 79 |
+
|
| 80 |
+
@field_validator("priority")
|
| 81 |
+
@classmethod
|
| 82 |
+
def validate_priority(cls, value: Optional[str]) -> Optional[str]:
|
| 83 |
+
return _validate_optional_choice(value, PRIORITY_SET, "priority")
|
| 84 |
+
|
| 85 |
+
@field_validator("assignment_group")
|
| 86 |
+
@classmethod
|
| 87 |
+
def validate_assignment_group(cls, value: Optional[str]) -> Optional[str]:
|
| 88 |
+
return _validate_optional_choice(value, ASSIGNMENT_GROUP_SET, "assignment_group")
|
| 89 |
+
|
| 90 |
+
@field_validator("resolution_action")
|
| 91 |
+
@classmethod
|
| 92 |
+
def validate_resolution_action(cls, value: Optional[str]) -> Optional[str]:
|
| 93 |
+
return _validate_optional_choice(value, RESOLUTION_ACTION_SET, "resolution_action")
|
| 94 |
+
|
| 95 |
+
|
| 96 |
+
class HelpdeskTicketObservation(Observation):
|
| 97 |
+
task_id: int = 0
|
| 98 |
+
task_name: str = ""
|
| 99 |
+
instructions: str = ""
|
| 100 |
+
allowed_fields: list[str] = Field(default_factory=list)
|
| 101 |
+
current_ticket: Optional[dict[str, str]] = None
|
| 102 |
+
queue_size: int = 0
|
| 103 |
+
tickets_remaining: int = 0
|
| 104 |
+
tickets_processed: int = 0
|
| 105 |
+
history: list[dict[str, Any]] = Field(default_factory=list)
|
| 106 |
+
|
| 107 |
+
|
| 108 |
+
class HelpdeskTicketState(State):
|
| 109 |
+
current_task_id: Optional[int] = None
|
| 110 |
+
seed: Optional[int] = None
|
| 111 |
+
queue_ticket_ids: list[str] = Field(default_factory=list)
|
| 112 |
+
current_ticket_index: int = 0
|
| 113 |
+
per_ticket_scores: list[float] = Field(default_factory=list)
|
| 114 |
+
total_reward: float = 0.0
|
openenv.yaml
ADDED
|
@@ -0,0 +1,59 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
name: it_helpdesk_ticket_routing_openenv
|
| 2 |
+
version: "0.1.0"
|
| 3 |
+
description: >
|
| 4 |
+
Deterministic IT helpdesk ticket routing environment for issue classification,
|
| 5 |
+
prioritization, assignment, and resolution decisions. Built on the OpenEnv framework.
|
| 6 |
+
author: Hackstreet Boys - Roopal Guha Neogi, Suyash Kumar
|
| 7 |
+
|
| 8 |
+
environment:
|
| 9 |
+
type: openenv
|
| 10 |
+
entry_point: server.environment:HelpdeskTicketRoutingEnvironment
|
| 11 |
+
action_model: models:HelpdeskTicketAction
|
| 12 |
+
observation_model: models:HelpdeskTicketObservation
|
| 13 |
+
state_model: models:HelpdeskTicketState
|
| 14 |
+
|
| 15 |
+
tasks:
|
| 16 |
+
- name: Issue Type Classification
|
| 17 |
+
difficulty: easy
|
| 18 |
+
objective: Predict the correct IT issue type for a helpdesk ticket.
|
| 19 |
+
- name: Issue Type And Priority
|
| 20 |
+
difficulty: medium
|
| 21 |
+
objective: Predict the correct issue type and priority.
|
| 22 |
+
- name: Full Ticket Routing
|
| 23 |
+
difficulty: hard
|
| 24 |
+
objective: Predict issue type, priority, assignment group, and resolution action.
|
| 25 |
+
|
| 26 |
+
api:
|
| 27 |
+
endpoints:
|
| 28 |
+
- /health
|
| 29 |
+
- /reset
|
| 30 |
+
- /step
|
| 31 |
+
- /state
|
| 32 |
+
- /tasks
|
| 33 |
+
- /docs
|
| 34 |
+
|
| 35 |
+
evaluation:
|
| 36 |
+
reward_range:
|
| 37 |
+
min: 0.0
|
| 38 |
+
max: 1.0
|
| 39 |
+
deterministic: true
|
| 40 |
+
|
| 41 |
+
grading: normalized
|
| 42 |
+
reproducible: true
|
| 43 |
+
|
| 44 |
+
inference:
|
| 45 |
+
script: inference.py
|
| 46 |
+
env_vars:
|
| 47 |
+
- API_BASE_URL
|
| 48 |
+
- MODEL_NAME
|
| 49 |
+
- HF_TOKEN
|
| 50 |
+
|
| 51 |
+
requirements:
|
| 52 |
+
python: ">=3.11"
|
| 53 |
+
dependencies:
|
| 54 |
+
- openenv-core
|
| 55 |
+
- fastapi>=0.115
|
| 56 |
+
- pydantic>=2.7
|
| 57 |
+
- uvicorn>=0.30
|
| 58 |
+
- httpx>=0.25
|
| 59 |
+
- openai>=1.68
|
pyproject.toml
ADDED
|
@@ -0,0 +1,26 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
[build-system]
|
| 2 |
+
requires = ["setuptools>=68.0", "wheel"]
|
| 3 |
+
build-backend = "setuptools.build_meta"
|
| 4 |
+
|
| 5 |
+
[project]
|
| 6 |
+
name = "it-helpdesk-ticket-routing-openenv"
|
| 7 |
+
version = "0.1.0"
|
| 8 |
+
description = "IT helpdesk ticket routing environment for the OpenEnv framework"
|
| 9 |
+
requires-python = ">=3.11"
|
| 10 |
+
dependencies = [
|
| 11 |
+
"openenv-core @ git+https://github.com/meta-pytorch/OpenEnv.git",
|
| 12 |
+
"fastapi>=0.115",
|
| 13 |
+
"pydantic>=2.7",
|
| 14 |
+
"uvicorn>=0.30",
|
| 15 |
+
"openai>=1.0",
|
| 16 |
+
"httpx>=0.25",
|
| 17 |
+
]
|
| 18 |
+
|
| 19 |
+
[project.optional-dependencies]
|
| 20 |
+
dev = ["pytest", "httpx"]
|
| 21 |
+
|
| 22 |
+
[tool.setuptools]
|
| 23 |
+
py-modules = ["models", "client", "vocabulary"]
|
| 24 |
+
|
| 25 |
+
[tool.setuptools.packages.find]
|
| 26 |
+
include = ["server*"]
|
requirements.txt
ADDED
|
@@ -0,0 +1,6 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
openenv-core @ git+https://github.com/meta-pytorch/OpenEnv.git
|
| 2 |
+
fastapi>=0.115
|
| 3 |
+
pydantic>=2.7
|
| 4 |
+
uvicorn>=0.30
|
| 5 |
+
openai>=1.0
|
| 6 |
+
httpx>=0.25
|
reward.cpython-313.pyc
ADDED
|
Binary file (1 kB). View file
|
|
|
server/Dockerfile
ADDED
|
@@ -0,0 +1,12 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
FROM python:3.11-slim
|
| 2 |
+
|
| 3 |
+
WORKDIR /app
|
| 4 |
+
|
| 5 |
+
COPY requirements.txt .
|
| 6 |
+
RUN pip install --no-cache-dir -r requirements.txt
|
| 7 |
+
|
| 8 |
+
COPY . .
|
| 9 |
+
|
| 10 |
+
EXPOSE 7860
|
| 11 |
+
|
| 12 |
+
CMD ["uvicorn", "server.app:app", "--host", "0.0.0.0", "--port", "7860"]
|
server/app.py
ADDED
|
@@ -0,0 +1,43 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import sys
|
| 2 |
+
from pathlib import Path
|
| 3 |
+
|
| 4 |
+
# Ensure repo root is on sys.path so `models` and `server` are importable
|
| 5 |
+
_repo_root = str(Path(__file__).resolve().parent.parent)
|
| 6 |
+
if _repo_root not in sys.path:
|
| 7 |
+
sys.path.insert(0, _repo_root)
|
| 8 |
+
|
| 9 |
+
from openenv.core.env_server import create_app
|
| 10 |
+
|
| 11 |
+
from models import HelpdeskTicketAction, HelpdeskTicketObservation
|
| 12 |
+
from server.environment import HelpdeskTicketRoutingEnvironment
|
| 13 |
+
from server.tasks import TASKS
|
| 14 |
+
from vocabulary import APP_ENV_NAME
|
| 15 |
+
|
| 16 |
+
app = create_app(
|
| 17 |
+
HelpdeskTicketRoutingEnvironment,
|
| 18 |
+
HelpdeskTicketAction,
|
| 19 |
+
HelpdeskTicketObservation,
|
| 20 |
+
env_name=APP_ENV_NAME,
|
| 21 |
+
)
|
| 22 |
+
|
| 23 |
+
|
| 24 |
+
@app.get("/tasks")
|
| 25 |
+
def list_tasks():
|
| 26 |
+
return {
|
| 27 |
+
"tasks": [
|
| 28 |
+
{
|
| 29 |
+
"id": t["id"],
|
| 30 |
+
"name": t["name"],
|
| 31 |
+
"difficulty": t["difficulty"],
|
| 32 |
+
"instructions": t["instructions"],
|
| 33 |
+
"allowed_fields": t["allowed_fields"],
|
| 34 |
+
}
|
| 35 |
+
for t in TASKS.values()
|
| 36 |
+
]
|
| 37 |
+
}
|
| 38 |
+
|
| 39 |
+
|
| 40 |
+
if __name__ == "__main__":
|
| 41 |
+
import uvicorn
|
| 42 |
+
|
| 43 |
+
uvicorn.run("server.app:app", host="0.0.0.0", port=8000, reload=True)
|
server/environment.py
ADDED
|
@@ -0,0 +1,163 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from __future__ import annotations
|
| 2 |
+
|
| 3 |
+
import random
|
| 4 |
+
import uuid
|
| 5 |
+
from typing import Any, Optional
|
| 6 |
+
|
| 7 |
+
from openenv.core.env_server.interfaces import Environment
|
| 8 |
+
|
| 9 |
+
from models import (
|
| 10 |
+
HelpdeskTicketAction,
|
| 11 |
+
HelpdeskTicketObservation,
|
| 12 |
+
HelpdeskTicketRecord,
|
| 13 |
+
HelpdeskTicketState,
|
| 14 |
+
)
|
| 15 |
+
from server.grader import grade_action
|
| 16 |
+
from server.reward import compute_step_reward, compute_trajectory_reward
|
| 17 |
+
from server.tasks import get_task_definition, load_dataset
|
| 18 |
+
|
| 19 |
+
|
| 20 |
+
QUEUE_SIZE_RANGE = (3, 5)
|
| 21 |
+
|
| 22 |
+
|
| 23 |
+
class HelpdeskTicketRoutingEnvironment(
|
| 24 |
+
Environment[HelpdeskTicketAction, HelpdeskTicketObservation, HelpdeskTicketState]
|
| 25 |
+
):
|
| 26 |
+
def __init__(self) -> None:
|
| 27 |
+
super().__init__()
|
| 28 |
+
self._dataset = load_dataset()
|
| 29 |
+
self._rng = random.Random()
|
| 30 |
+
self._queue: list[HelpdeskTicketRecord] = []
|
| 31 |
+
self._state = HelpdeskTicketState()
|
| 32 |
+
|
| 33 |
+
# ------------------------------------------------------------------
|
| 34 |
+
# OpenEnv required interface
|
| 35 |
+
# ------------------------------------------------------------------
|
| 36 |
+
|
| 37 |
+
def reset(
|
| 38 |
+
self,
|
| 39 |
+
seed: Optional[int] = None,
|
| 40 |
+
episode_id: Optional[str] = None,
|
| 41 |
+
**kwargs: Any,
|
| 42 |
+
) -> HelpdeskTicketObservation:
|
| 43 |
+
task_id: int = kwargs.get("task_id", 1)
|
| 44 |
+
task = get_task_definition(task_id)
|
| 45 |
+
|
| 46 |
+
if seed is not None:
|
| 47 |
+
self._rng.seed(seed)
|
| 48 |
+
|
| 49 |
+
queue_size = self._rng.randint(*QUEUE_SIZE_RANGE)
|
| 50 |
+
self._queue = self._rng.sample(self._dataset, min(queue_size, len(self._dataset)))
|
| 51 |
+
|
| 52 |
+
self._state = HelpdeskTicketState(
|
| 53 |
+
episode_id=episode_id or str(uuid.uuid4()),
|
| 54 |
+
step_count=0,
|
| 55 |
+
current_task_id=task_id,
|
| 56 |
+
seed=seed,
|
| 57 |
+
queue_ticket_ids=[t.ticket_id for t in self._queue],
|
| 58 |
+
current_ticket_index=0,
|
| 59 |
+
per_ticket_scores=[],
|
| 60 |
+
total_reward=0.0,
|
| 61 |
+
)
|
| 62 |
+
|
| 63 |
+
return self._build_observation(task)
|
| 64 |
+
|
| 65 |
+
def step(
|
| 66 |
+
self,
|
| 67 |
+
action: HelpdeskTicketAction,
|
| 68 |
+
timeout_s: Optional[float] = None,
|
| 69 |
+
**kwargs: Any,
|
| 70 |
+
) -> HelpdeskTicketObservation:
|
| 71 |
+
if not self._queue or self._state.current_task_id is None:
|
| 72 |
+
raise RuntimeError("Environment has not been reset.")
|
| 73 |
+
|
| 74 |
+
idx = self._state.current_ticket_index
|
| 75 |
+
if idx >= len(self._queue):
|
| 76 |
+
raise RuntimeError("Episode already done — call reset().")
|
| 77 |
+
|
| 78 |
+
current_ticket = self._queue[idx]
|
| 79 |
+
task_id = self._state.current_task_id
|
| 80 |
+
task = get_task_definition(task_id)
|
| 81 |
+
|
| 82 |
+
score, breakdown = grade_action(action, current_ticket, task_id)
|
| 83 |
+
step_reward = compute_step_reward(score)
|
| 84 |
+
|
| 85 |
+
self._state.per_ticket_scores.append(score)
|
| 86 |
+
self._state.step_count += 1
|
| 87 |
+
self._state.current_ticket_index += 1
|
| 88 |
+
|
| 89 |
+
is_done = self._state.current_ticket_index >= len(self._queue)
|
| 90 |
+
|
| 91 |
+
if is_done:
|
| 92 |
+
traj_reward = compute_trajectory_reward(
|
| 93 |
+
self._state.per_ticket_scores,
|
| 94 |
+
len(self._queue),
|
| 95 |
+
self._state.step_count,
|
| 96 |
+
)
|
| 97 |
+
self._state.total_reward = traj_reward
|
| 98 |
+
final_reward = traj_reward
|
| 99 |
+
else:
|
| 100 |
+
final_reward = step_reward
|
| 101 |
+
|
| 102 |
+
history_entry = {
|
| 103 |
+
"ticket_id": current_ticket.ticket_id,
|
| 104 |
+
"score": score,
|
| 105 |
+
"breakdown": breakdown,
|
| 106 |
+
}
|
| 107 |
+
|
| 108 |
+
return self._build_observation(
|
| 109 |
+
task,
|
| 110 |
+
done=is_done,
|
| 111 |
+
reward=final_reward,
|
| 112 |
+
extra_history=history_entry,
|
| 113 |
+
)
|
| 114 |
+
|
| 115 |
+
@property
|
| 116 |
+
def state(self) -> HelpdeskTicketState:
|
| 117 |
+
return self._state.model_copy(deep=True)
|
| 118 |
+
|
| 119 |
+
# ------------------------------------------------------------------
|
| 120 |
+
# Helpers
|
| 121 |
+
# ------------------------------------------------------------------
|
| 122 |
+
|
| 123 |
+
def _build_observation(
|
| 124 |
+
self,
|
| 125 |
+
task: dict,
|
| 126 |
+
done: bool = False,
|
| 127 |
+
reward: float | None = None,
|
| 128 |
+
extra_history: dict | None = None,
|
| 129 |
+
) -> HelpdeskTicketObservation:
|
| 130 |
+
idx = self._state.current_ticket_index
|
| 131 |
+
queue_size = len(self._queue)
|
| 132 |
+
|
| 133 |
+
if idx < queue_size:
|
| 134 |
+
ticket = self._queue[idx]
|
| 135 |
+
ticket_view = {
|
| 136 |
+
"ticket_id": ticket.ticket_id,
|
| 137 |
+
"title": ticket.title,
|
| 138 |
+
"requester": ticket.requester,
|
| 139 |
+
"description": ticket.description,
|
| 140 |
+
}
|
| 141 |
+
else:
|
| 142 |
+
ticket_view = None
|
| 143 |
+
|
| 144 |
+
history: list[dict] = []
|
| 145 |
+
for i, s in enumerate(self._state.per_ticket_scores):
|
| 146 |
+
history.append({"step": i + 1, "score": s})
|
| 147 |
+
if extra_history and history:
|
| 148 |
+
history[-1] = {"step": len(history), **extra_history}
|
| 149 |
+
|
| 150 |
+
return HelpdeskTicketObservation(
|
| 151 |
+
done=done,
|
| 152 |
+
reward=reward,
|
| 153 |
+
metadata={},
|
| 154 |
+
task_id=task["id"],
|
| 155 |
+
task_name=task["name"],
|
| 156 |
+
instructions=task["instructions"],
|
| 157 |
+
allowed_fields=list(task["allowed_fields"]),
|
| 158 |
+
current_ticket=ticket_view,
|
| 159 |
+
queue_size=queue_size,
|
| 160 |
+
tickets_remaining=max(0, queue_size - idx),
|
| 161 |
+
tickets_processed=idx,
|
| 162 |
+
history=history,
|
| 163 |
+
)
|
server/grader.py
ADDED
|
@@ -0,0 +1,103 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from __future__ import annotations
|
| 2 |
+
|
| 3 |
+
from models import HelpdeskTicketAction, HelpdeskTicketRecord
|
| 4 |
+
|
| 5 |
+
|
| 6 |
+
ISSUE_TYPE_SIMILARITY = {
|
| 7 |
+
("billing_license", "service_request"): 0.4,
|
| 8 |
+
("service_request", "billing_license"): 0.4,
|
| 9 |
+
("application_support", "identity_access"): 0.5,
|
| 10 |
+
("identity_access", "application_support"): 0.5,
|
| 11 |
+
("application_support", "feature_request"): 0.35,
|
| 12 |
+
("feature_request", "application_support"): 0.35,
|
| 13 |
+
("onboarding", "identity_access"): 0.4,
|
| 14 |
+
("identity_access", "onboarding"): 0.4,
|
| 15 |
+
("general_inquiry", "feature_request"): 0.3,
|
| 16 |
+
("feature_request", "general_inquiry"): 0.3,
|
| 17 |
+
("general_inquiry", "service_request"): 0.25,
|
| 18 |
+
("service_request", "general_inquiry"): 0.25,
|
| 19 |
+
("spam_phishing", "security_compliance"): 0.4,
|
| 20 |
+
("security_compliance", "spam_phishing"): 0.4,
|
| 21 |
+
("security_compliance", "billing_license"): 0.2,
|
| 22 |
+
("billing_license", "security_compliance"): 0.2,
|
| 23 |
+
}
|
| 24 |
+
|
| 25 |
+
PRIORITY_SCORES = {
|
| 26 |
+
("critical", "high"): 0.6,
|
| 27 |
+
("high", "critical"): 0.6,
|
| 28 |
+
("high", "medium"): 0.5,
|
| 29 |
+
("medium", "high"): 0.5,
|
| 30 |
+
("medium", "low"): 0.4,
|
| 31 |
+
("low", "medium"): 0.4,
|
| 32 |
+
("critical", "medium"): 0.3,
|
| 33 |
+
("medium", "critical"): 0.3,
|
| 34 |
+
("critical", "low"): 0.1,
|
| 35 |
+
("low", "critical"): 0.1,
|
| 36 |
+
("high", "low"): 0.2,
|
| 37 |
+
("low", "high"): 0.2,
|
| 38 |
+
}
|
| 39 |
+
|
| 40 |
+
|
| 41 |
+
TASK_WEIGHTS = {
|
| 42 |
+
1: {"issue_type": 1.0},
|
| 43 |
+
2: {"issue_type": 0.6, "priority": 0.4},
|
| 44 |
+
3: {
|
| 45 |
+
"issue_type": 0.35,
|
| 46 |
+
"priority": 0.20,
|
| 47 |
+
"assignment_group": 0.25,
|
| 48 |
+
"resolution_action": 0.20,
|
| 49 |
+
},
|
| 50 |
+
}
|
| 51 |
+
|
| 52 |
+
|
| 53 |
+
def _normalized(value: str | None) -> str:
|
| 54 |
+
return (value or "").strip().lower()
|
| 55 |
+
|
| 56 |
+
|
| 57 |
+
def _score_exact_or_similar(predicted: str | None, expected: str) -> float:
|
| 58 |
+
pred = _normalized(predicted)
|
| 59 |
+
exp = _normalized(expected)
|
| 60 |
+
if not pred:
|
| 61 |
+
return 0.0
|
| 62 |
+
if pred == exp:
|
| 63 |
+
return 1.0
|
| 64 |
+
return ISSUE_TYPE_SIMILARITY.get((pred, exp), 0.0)
|
| 65 |
+
|
| 66 |
+
|
| 67 |
+
def _score_priority(predicted: str | None, expected: str) -> float:
|
| 68 |
+
pred = _normalized(predicted)
|
| 69 |
+
exp = _normalized(expected)
|
| 70 |
+
if not pred:
|
| 71 |
+
return 0.0
|
| 72 |
+
if pred == exp:
|
| 73 |
+
return 1.0
|
| 74 |
+
return PRIORITY_SCORES.get((pred, exp), 0.0)
|
| 75 |
+
|
| 76 |
+
|
| 77 |
+
def _score_exact(predicted: str | None, expected: str) -> float:
|
| 78 |
+
return 1.0 if _normalized(predicted) == _normalized(expected) and predicted else 0.0
|
| 79 |
+
|
| 80 |
+
|
| 81 |
+
def grade_action(
|
| 82 |
+
action: HelpdeskTicketAction,
|
| 83 |
+
ticket: HelpdeskTicketRecord,
|
| 84 |
+
task_id: int,
|
| 85 |
+
) -> tuple[float, dict[str, float]]:
|
| 86 |
+
if task_id not in TASK_WEIGHTS:
|
| 87 |
+
raise ValueError(f"Unsupported task_id: {task_id}")
|
| 88 |
+
|
| 89 |
+
field_scores = {
|
| 90 |
+
"issue_type": _score_exact_or_similar(action.issue_type, ticket.issue_type),
|
| 91 |
+
"priority": _score_priority(action.priority, ticket.priority),
|
| 92 |
+
"assignment_group": _score_exact(
|
| 93 |
+
action.assignment_group, ticket.assignment_group
|
| 94 |
+
),
|
| 95 |
+
"resolution_action": _score_exact(
|
| 96 |
+
action.resolution_action, ticket.resolution_action
|
| 97 |
+
),
|
| 98 |
+
}
|
| 99 |
+
|
| 100 |
+
weights = TASK_WEIGHTS[task_id]
|
| 101 |
+
score = sum(field_scores[field] * weight for field, weight in weights.items())
|
| 102 |
+
breakdown = {field: field_scores[field] for field in weights}
|
| 103 |
+
return score, breakdown
|
server/reward.py
ADDED
|
@@ -0,0 +1,16 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from __future__ import annotations
|
| 2 |
+
|
| 3 |
+
|
| 4 |
+
def compute_step_reward(score: float) -> float:
|
| 5 |
+
return max(0.0, min(1.0, score))
|
| 6 |
+
|
| 7 |
+
|
| 8 |
+
def compute_trajectory_reward(
|
| 9 |
+
per_ticket_scores: list[float], queue_size: int, steps_taken: int
|
| 10 |
+
) -> float:
|
| 11 |
+
if not per_ticket_scores:
|
| 12 |
+
return 0.0
|
| 13 |
+
avg = sum(per_ticket_scores) / len(per_ticket_scores)
|
| 14 |
+
overshoot = max(0, steps_taken - queue_size)
|
| 15 |
+
penalty = overshoot * 0.03
|
| 16 |
+
return max(0.0, min(1.0, avg - penalty))
|
server/tasks.py
ADDED
|
@@ -0,0 +1,60 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from __future__ import annotations
|
| 2 |
+
|
| 3 |
+
import json
|
| 4 |
+
from pathlib import Path
|
| 5 |
+
|
| 6 |
+
from models import HelpdeskTicketRecord
|
| 7 |
+
from vocabulary import TASK_IDS
|
| 8 |
+
|
| 9 |
+
|
| 10 |
+
TASKS = {
|
| 11 |
+
1: {
|
| 12 |
+
"id": 1,
|
| 13 |
+
"name": "Issue Type Classification",
|
| 14 |
+
"difficulty": "easy",
|
| 15 |
+
"instructions": (
|
| 16 |
+
"Read the ticket and select the single best IT issue type."
|
| 17 |
+
),
|
| 18 |
+
"allowed_fields": ["issue_type"],
|
| 19 |
+
},
|
| 20 |
+
2: {
|
| 21 |
+
"id": 2,
|
| 22 |
+
"name": "Issue Type And Priority",
|
| 23 |
+
"difficulty": "medium",
|
| 24 |
+
"instructions": (
|
| 25 |
+
"Read the ticket, select the best IT issue type, and estimate the "
|
| 26 |
+
"correct operational priority."
|
| 27 |
+
),
|
| 28 |
+
"allowed_fields": ["issue_type", "priority"],
|
| 29 |
+
},
|
| 30 |
+
3: {
|
| 31 |
+
"id": 3,
|
| 32 |
+
"name": "Full Ticket Routing",
|
| 33 |
+
"difficulty": "hard",
|
| 34 |
+
"instructions": (
|
| 35 |
+
"Perform full helpdesk triage by selecting the best issue type, "
|
| 36 |
+
"priority, assignment group, and resolution action for the ticket."
|
| 37 |
+
),
|
| 38 |
+
"allowed_fields": [
|
| 39 |
+
"issue_type",
|
| 40 |
+
"priority",
|
| 41 |
+
"assignment_group",
|
| 42 |
+
"resolution_action",
|
| 43 |
+
],
|
| 44 |
+
},
|
| 45 |
+
}
|
| 46 |
+
|
| 47 |
+
assert tuple(TASKS.keys()) == TASK_IDS
|
| 48 |
+
|
| 49 |
+
|
| 50 |
+
def load_dataset() -> list[HelpdeskTicketRecord]:
|
| 51 |
+
dataset_path = Path(__file__).resolve().parent.parent / "data" / "dataset.json"
|
| 52 |
+
with dataset_path.open("r", encoding="utf-8") as f:
|
| 53 |
+
raw = json.load(f)
|
| 54 |
+
return [HelpdeskTicketRecord.model_validate(r) for r in raw]
|
| 55 |
+
|
| 56 |
+
|
| 57 |
+
def get_task_definition(task_id: int) -> dict:
|
| 58 |
+
if task_id not in TASKS:
|
| 59 |
+
raise ValueError(f"Unsupported task_id: {task_id}")
|
| 60 |
+
return TASKS[task_id]
|
studymaterialLinks
ADDED
|
@@ -0,0 +1,16 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
The following study material links were provided from the competeition-
|
| 2 |
+
|
| 3 |
+
Module 1: Why OpenEnv?
|
| 4 |
+
https://github.com/meta-pytorch/OpenEnv/blob/main/tutorial/01-environments.md
|
| 5 |
+
|
| 6 |
+
Module 2: Using Existing Environments
|
| 7 |
+
https://github.com/meta-pytorch/OpenEnv/blob/main/tutorial/02-deployment.md
|
| 8 |
+
|
| 9 |
+
Module 3: Deploying Environments
|
| 10 |
+
https://github.com/meta-pytorch/OpenEnv/blob/main/tutorial/03-scaling.md
|
| 11 |
+
|
| 12 |
+
Module 4: Building Your Own Environment
|
| 13 |
+
|
| 14 |
+
MOST IMPORTANT FOR ROUND 1
|
| 15 |
+
https://github.com/meta-pytorch/OpenEnv/blob/main/tutorial/04-training.md
|
| 16 |
+
|
tasks.cpython-313.pyc
ADDED
|
Binary file (1.93 kB). View file
|
|
|
vocabulary.py
ADDED
|
@@ -0,0 +1,67 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from __future__ import annotations
|
| 2 |
+
|
| 3 |
+
TEAM_NAME = "Hackstreet Boys"
|
| 4 |
+
TEAM_MEMBERS = ("Roopal Guha Neogi", "Suyash Kumar")
|
| 5 |
+
|
| 6 |
+
PROJECT_TITLE = "IT Helpdesk Ticket Routing OpenEnv"
|
| 7 |
+
DOMAIN_NAME = "IT Helpdesk Ticket Routing"
|
| 8 |
+
|
| 9 |
+
OPENENV_NAME = "it_helpdesk_ticket_routing_openenv"
|
| 10 |
+
APP_ENV_NAME = "it_helpdesk_ticket_routing"
|
| 11 |
+
|
| 12 |
+
ISSUE_TYPES = (
|
| 13 |
+
"billing_license",
|
| 14 |
+
"identity_access",
|
| 15 |
+
"application_support",
|
| 16 |
+
"service_request",
|
| 17 |
+
"spam_phishing",
|
| 18 |
+
"general_inquiry",
|
| 19 |
+
"security_compliance",
|
| 20 |
+
"onboarding",
|
| 21 |
+
"feature_request",
|
| 22 |
+
)
|
| 23 |
+
|
| 24 |
+
PRIORITIES = ("critical", "high", "medium", "low")
|
| 25 |
+
|
| 26 |
+
ASSIGNMENT_GROUPS = (
|
| 27 |
+
"license_ops",
|
| 28 |
+
"service_desk",
|
| 29 |
+
"application_team",
|
| 30 |
+
"procurement",
|
| 31 |
+
"security_team",
|
| 32 |
+
"onboarding_ops",
|
| 33 |
+
)
|
| 34 |
+
|
| 35 |
+
RESOLUTION_ACTIONS = (
|
| 36 |
+
"fulfill",
|
| 37 |
+
"escalate",
|
| 38 |
+
"assign",
|
| 39 |
+
"ignore",
|
| 40 |
+
"acknowledge",
|
| 41 |
+
)
|
| 42 |
+
|
| 43 |
+
TASK_IDS = (1, 2, 3)
|
| 44 |
+
|
| 45 |
+
ISSUE_TYPE_TO_ASSIGNMENT_GROUP = {
|
| 46 |
+
"billing_license": "license_ops",
|
| 47 |
+
"identity_access": "service_desk",
|
| 48 |
+
"application_support": "application_team",
|
| 49 |
+
"service_request": "procurement",
|
| 50 |
+
"spam_phishing": "security_team",
|
| 51 |
+
"general_inquiry": "service_desk",
|
| 52 |
+
"security_compliance": "security_team",
|
| 53 |
+
"onboarding": "onboarding_ops",
|
| 54 |
+
"feature_request": "application_team",
|
| 55 |
+
}
|
| 56 |
+
|
| 57 |
+
ISSUE_TYPE_TO_RESOLUTION_ACTION = {
|
| 58 |
+
"billing_license": "fulfill",
|
| 59 |
+
"identity_access": "fulfill",
|
| 60 |
+
"application_support": "escalate",
|
| 61 |
+
"service_request": "assign",
|
| 62 |
+
"spam_phishing": "ignore",
|
| 63 |
+
"general_inquiry": "acknowledge",
|
| 64 |
+
"security_compliance": "escalate",
|
| 65 |
+
"onboarding": "fulfill",
|
| 66 |
+
"feature_request": "acknowledge",
|
| 67 |
+
}
|