Roopalgn commited on
Commit
3752981
·
0 Parent(s):

Initial commit

Browse files
KNOWLEDGE.md ADDED
@@ -0,0 +1,258 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # IT Helpdesk Ticket Routing OpenEnv - Knowledge Guide
2
+
3
+ ## Part 1: What The Hackathon Wants
4
+
5
+ The hackathon is asking for a real-world environment that an AI agent can learn from through the standard OpenEnv interface.
6
+
7
+ In plain terms, the judges want:
8
+
9
+ 1. a real human job, not a toy problem
10
+ 2. typed models for actions, observations, and state
11
+ 3. `reset()`, `step()`, and `state()`
12
+ 4. at least 3 tasks with increasing difficulty
13
+ 5. deterministic graders that return scores from `0.0` to `1.0`
14
+ 6. a meaningful reward function
15
+ 7. a baseline `inference.py`
16
+ 8. Docker and deployment readiness
17
+
18
+ ## Part 2: Why This Repo Uses IT Helpdesk Ticket Routing
19
+
20
+ IT helpdesk ticket routing is a strong OpenEnv domain because it is:
21
+
22
+ - a real operational workflow
23
+ - naturally multi-step
24
+ - easy to express with typed actions and observations
25
+ - easy to score deterministically
26
+ - useful for evaluating planning, classification, and routing ability in agents
27
+
28
+ ## Part 3: The Core Mental Model
29
+
30
+ Think of this environment as a queue of helpdesk tickets.
31
+
32
+ For each ticket, the agent must answer:
33
+
34
+ - what kind of issue is this
35
+ - how urgent is it
36
+ - which resolver group should own it
37
+ - what should happen next
38
+
39
+ The environment shows one ticket at a time. The agent responds with structured fields. The grader scores that response. Then the environment moves to the next ticket.
40
+
41
+ ## Part 4: Main Files
42
+
43
+ ### `models.py`
44
+
45
+ Defines the typed objects used everywhere:
46
+
47
+ - `HelpdeskTicketRecord`
48
+ - `HelpdeskTicketAction`
49
+ - `HelpdeskTicketObservation`
50
+ - `HelpdeskTicketState`
51
+
52
+ ### `server/environment.py`
53
+
54
+ This is the core engine.
55
+
56
+ It:
57
+
58
+ - loads the dataset
59
+ - samples a queue of 3 to 5 tickets
60
+ - tracks progress
61
+ - grades each step
62
+ - computes the final episode reward
63
+
64
+ ### `server/grader.py`
65
+
66
+ Contains deterministic scoring logic.
67
+
68
+ It gives:
69
+
70
+ - exact or partial credit for `issue_type`
71
+ - exact or proximity credit for `priority`
72
+ - exact credit for `assignment_group`
73
+ - exact credit for `resolution_action`
74
+
75
+ ### `server/reward.py`
76
+
77
+ Contains reward helpers:
78
+
79
+ - per-step reward clamping
80
+ - final trajectory reward calculation
81
+
82
+ ### `server/tasks.py`
83
+
84
+ Defines the difficulty ladder:
85
+
86
+ - Task 1: issue type only
87
+ - Task 2: issue type plus priority
88
+ - Task 3: full routing
89
+
90
+ ### `server/app.py`
91
+
92
+ Creates the OpenEnv app and exposes a custom `/tasks` route.
93
+
94
+ ### `client.py`
95
+
96
+ Typed client used by the inference script.
97
+
98
+ ### `inference.py`
99
+
100
+ The baseline agent runner.
101
+
102
+ It can:
103
+
104
+ - use a real LLM through an OpenAI-compatible API
105
+ - or fall back to a keyword heuristic
106
+
107
+ ## Part 5: Tasks
108
+
109
+ ### Task 1: Issue Type Classification
110
+
111
+ The agent predicts:
112
+
113
+ - `issue_type`
114
+
115
+ ### Task 2: Issue Type And Priority
116
+
117
+ The agent predicts:
118
+
119
+ - `issue_type`
120
+ - `priority`
121
+
122
+ ### Task 3: Full Ticket Routing
123
+
124
+ The agent predicts:
125
+
126
+ - `issue_type`
127
+ - `priority`
128
+ - `assignment_group`
129
+ - `resolution_action`
130
+
131
+ ## Part 6: Ticket Vocabulary
132
+
133
+ ### Issue types
134
+
135
+ - `billing_license`
136
+ - `identity_access`
137
+ - `application_support`
138
+ - `service_request`
139
+ - `spam_phishing`
140
+ - `general_inquiry`
141
+ - `security_compliance`
142
+ - `onboarding`
143
+ - `feature_request`
144
+
145
+ ### Assignment groups
146
+
147
+ - `license_ops`
148
+ - `service_desk`
149
+ - `application_team`
150
+ - `procurement`
151
+ - `security_team`
152
+ - `onboarding_ops`
153
+
154
+ ### Resolution actions
155
+
156
+ - `fulfill`
157
+ - `escalate`
158
+ - `assign`
159
+ - `ignore`
160
+ - `acknowledge`
161
+
162
+ ## Part 7: Episode Flow
163
+
164
+ ### `reset()`
165
+
166
+ Starts a new episode:
167
+
168
+ 1. chooses a task
169
+ 2. samples a queue of tickets
170
+ 3. resets state
171
+ 4. returns the first observation
172
+
173
+ ### `step(action)`
174
+
175
+ Processes one ticket:
176
+
177
+ 1. grades the action
178
+ 2. stores the score
179
+ 3. advances the queue index
180
+ 4. returns the next ticket or the final reward
181
+
182
+ ### `state`
183
+
184
+ Returns the internal state snapshot.
185
+
186
+ ## Part 8: Reward Logic
187
+
188
+ Step reward:
189
+
190
+ - just the current ticket score clamped to `[0.0, 1.0]`
191
+
192
+ Final reward:
193
+
194
+ - average of all per-ticket scores
195
+ - minus a small overshoot penalty if too many steps were taken
196
+
197
+ This keeps the signal dense and easy to interpret.
198
+
199
+ ## Part 9: Dataset Shape
200
+
201
+ Each ticket record contains:
202
+
203
+ - `ticket_id`
204
+ - `title`
205
+ - `requester`
206
+ - `description`
207
+ - `issue_type`
208
+ - `priority`
209
+ - `assignment_group`
210
+ - `resolution_action`
211
+ - optional `ambiguity_note`
212
+ - optional `related_ticket_id`
213
+
214
+ The current dataset contains 45 tickets.
215
+
216
+ It includes:
217
+
218
+ - straightforward tickets
219
+ - ambiguous tickets
220
+ - follow-up references to earlier tickets
221
+
222
+ ## Part 10: Inference Script In Simple Terms
223
+
224
+ `inference.py` is the script that actually "plays" the environment.
225
+
226
+ For each task it:
227
+
228
+ 1. connects to the server
229
+ 2. resets the environment
230
+ 3. reads the current ticket
231
+ 4. decides an action
232
+ 5. sends the action back
233
+ 6. collects scores
234
+ 7. prints a summary
235
+
236
+ If LLM credentials are available, it uses an LLM.
237
+ If not, it uses keyword rules.
238
+
239
+ ## Part 11: What Still Needs Verification
240
+
241
+ The important next checks are:
242
+
243
+ 1. run the server locally
244
+ 2. verify the ticket-routing client path works end to end
245
+ 3. rerun `inference.py`
246
+ 4. record fresh baseline scores
247
+ 5. validate Docker and OpenEnv behavior
248
+
249
+ ## Part 12: One-Minute Summary
250
+
251
+ If you only remember one thing, remember this:
252
+
253
+ - this repo is now an IT helpdesk ticket router
254
+ - the mechanics are still the same multi-step OpenEnv pattern
255
+ - one ticket is shown at a time
256
+ - the agent predicts structured routing fields
257
+ - the grader gives deterministic partial credit
258
+ - `inference.py` is the baseline agent runner
LABEL_AUDIT.md ADDED
@@ -0,0 +1,56 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Label Audit Notes
2
+
3
+ This file records the March 31 and April 1 label-and-grader pass on the Roopal-owned files:
4
+
5
+ - `data/dataset.json`
6
+ - `server/tasks.py`
7
+ - `server/grader.py`
8
+
9
+ ## Dataset Decisions
10
+
11
+ ### Tightened ambiguity cases
12
+
13
+ - `ticket-022`
14
+ Reworded to make the billing-versus-application ambiguity clearer while keeping the chosen label as `application_support`.
15
+
16
+ - `ticket-027`
17
+ Reworded to make the vendor-offer ambiguity clearer between `general_inquiry` and `service_request`.
18
+
19
+ - `ticket-029`
20
+ Reworded to make the seat-expansion versus prorating ambiguity clearer and changed `resolution_action` from `fulfill` to `assign`.
21
+
22
+ - `ticket-040`
23
+ Reworded to make the feature-gap versus support-issue ambiguity clearer.
24
+
25
+ ### Corrected label consistency
26
+
27
+ - `ticket-026`
28
+ Changed from `feature_request` / `application_team` to `general_inquiry` / `service_desk` because it is a thank-you note, not a product change request.
29
+
30
+ ## Task Wording Changes
31
+
32
+ The task instructions in `server/tasks.py` were tightened so they now:
33
+
34
+ - sound more like helpdesk triage
35
+ - emphasize choosing the single best label
36
+ - describe operational priority more clearly
37
+ - describe full triage more concretely for Task 3
38
+
39
+ ## Grader Changes
40
+
41
+ The grader was polished by:
42
+
43
+ - making task weights explicit in `TASK_WEIGHTS`
44
+ - adding partial-credit pairs for:
45
+ - `application_support` vs `feature_request`
46
+ - `general_inquiry` vs `service_request`
47
+ - keeping the scoring deterministic and task-specific
48
+
49
+ ## Intent
50
+
51
+ These edits are meant to improve:
52
+
53
+ - dataset realism
54
+ - label consistency
55
+ - hard-task ambiguity quality
56
+ - reviewability for judges and teammates
MARCH30_STATUS.md ADDED
@@ -0,0 +1,117 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # March 30 Status Report
2
+
3
+ This file captures the code checkpoint completed for March 30, 2026 so both Codex sessions can compare against the same source of truth.
4
+
5
+ ## Scope Completed
6
+
7
+ The March 30 code checkpoint is complete for the foundational files named in `ROADMAP.md`:
8
+
9
+ - `models.py`
10
+ - `server/tasks.py`
11
+ - `server/grader.py`
12
+ - `server/environment.py`
13
+
14
+ Related supporting files were also aligned:
15
+
16
+ - `client.py`
17
+ - `server/app.py`
18
+ - `inference.py`
19
+ - `vocabulary.py`
20
+
21
+ ## What Is Locked
22
+
23
+ ### Team and project identity
24
+
25
+ - Team: Hackstreet Boys
26
+ - Members: Roopal Guha Neogi, Suyash Kumar
27
+ - Domain: IT Helpdesk Ticket Routing
28
+
29
+ ### Frozen class names
30
+
31
+ - `HelpdeskTicketRecord`
32
+ - `HelpdeskTicketAction`
33
+ - `HelpdeskTicketObservation`
34
+ - `HelpdeskTicketState`
35
+ - `HelpdeskTicketRoutingEnvironment`
36
+ - `HelpdeskTicketEnvClient`
37
+
38
+ ### Frozen field names
39
+
40
+ - `ticket_id`
41
+ - `title`
42
+ - `requester`
43
+ - `description`
44
+ - `issue_type`
45
+ - `priority`
46
+ - `assignment_group`
47
+ - `resolution_action`
48
+ - `related_ticket_id`
49
+
50
+ ## Code That Exists Now
51
+
52
+ ### `vocabulary.py`
53
+
54
+ Shared frozen constants now live in one place:
55
+
56
+ - team metadata
57
+ - environment names
58
+ - issue types
59
+ - priorities
60
+ - assignment groups
61
+ - resolution actions
62
+ - default issue-type mappings used by inference
63
+
64
+ ### `models.py`
65
+
66
+ The typed models are defined and the vocabulary is enforced through validators, so unsupported labels should fail fast instead of silently drifting.
67
+
68
+ ### `server/tasks.py`
69
+
70
+ All three tasks are defined with locked names, instructions, and allowed fields.
71
+
72
+ ### `server/grader.py`
73
+
74
+ Deterministic scoring is in place with:
75
+
76
+ - partial credit for near-miss `issue_type`
77
+ - proximity scoring for `priority`
78
+ - exact match for `assignment_group`
79
+ - exact match for `resolution_action`
80
+
81
+ ### `server/environment.py`
82
+
83
+ The environment implements:
84
+
85
+ - queue sampling
86
+ - reset flow
87
+ - step flow
88
+ - state tracking
89
+ - final trajectory reward handoff
90
+
91
+ ### `inference.py`
92
+
93
+ The baseline runner is aligned to the locked vocabulary and supports:
94
+
95
+ - LLM mode
96
+ - heuristic mode
97
+ - task loop over all 3 tasks
98
+
99
+ ## Expected Agreement For The Other Codex Session
100
+
101
+ Your teammate's Codex should agree on all of the following:
102
+
103
+ 1. the schema names above are frozen
104
+ 2. the vocabulary now has a single source of truth in `vocabulary.py`
105
+ 3. no one should rename labels after this checkpoint
106
+ 4. future work should build on these names, not replace them
107
+
108
+ ## What Is Not Verified Yet
109
+
110
+ This checkpoint is a code-and-consistency checkpoint, not a runtime-complete checkpoint.
111
+
112
+ Still pending:
113
+
114
+ - local execution
115
+ - heuristic baseline run
116
+ - Docker validation
117
+ - final benchmark numbers
MENTAL_MODEL.md ADDED
@@ -0,0 +1,155 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # IT Helpdesk Ticket Routing Mental Model
2
+
3
+ This file is the practical mental model of the repo in its current form.
4
+
5
+ ## What The Project Is
6
+
7
+ This repository is an OpenEnv environment for IT helpdesk ticket routing.
8
+
9
+ The environment presents a small queue of tickets. For each ticket, the agent must decide:
10
+
11
+ - issue type
12
+ - priority
13
+ - assignment group
14
+ - resolution action
15
+
16
+ ## Main Runtime Flow
17
+
18
+ ```text
19
+ inference.py
20
+ |
21
+ v
22
+ client.py <----> server/app.py
23
+ |
24
+ v
25
+ server/environment.py
26
+ | | |
27
+ v v v
28
+ grader.py reward.py tasks.py
29
+ |
30
+ v
31
+ data/dataset.json
32
+ ```
33
+
34
+ ## Main Files
35
+
36
+ - `models.py`
37
+ Typed models for tickets, actions, observations, and state.
38
+
39
+ - `server/environment.py`
40
+ Main environment engine.
41
+
42
+ - `server/grader.py`
43
+ Deterministic partial-credit scorer.
44
+
45
+ - `server/reward.py`
46
+ Step and trajectory reward helpers.
47
+
48
+ - `server/tasks.py`
49
+ Task definitions and dataset loading.
50
+
51
+ - `client.py`
52
+ Typed client used for multi-step interaction.
53
+
54
+ - `inference.py`
55
+ Baseline runner with LLM mode and heuristic mode.
56
+
57
+ ## Task Ladder
58
+
59
+ ### Task 1
60
+
61
+ - predict `issue_type`
62
+
63
+ ### Task 2
64
+
65
+ - predict `issue_type`
66
+ - predict `priority`
67
+
68
+ ### Task 3
69
+
70
+ - predict `issue_type`
71
+ - predict `priority`
72
+ - predict `assignment_group`
73
+ - predict `resolution_action`
74
+
75
+ ## Label Vocabulary
76
+
77
+ ### Issue types
78
+
79
+ - `billing_license`
80
+ - `identity_access`
81
+ - `application_support`
82
+ - `service_request`
83
+ - `spam_phishing`
84
+ - `general_inquiry`
85
+ - `security_compliance`
86
+ - `onboarding`
87
+ - `feature_request`
88
+
89
+ ### Assignment groups
90
+
91
+ - `license_ops`
92
+ - `service_desk`
93
+ - `application_team`
94
+ - `procurement`
95
+ - `security_team`
96
+ - `onboarding_ops`
97
+
98
+ ### Resolution actions
99
+
100
+ - `fulfill`
101
+ - `escalate`
102
+ - `assign`
103
+ - `ignore`
104
+ - `acknowledge`
105
+
106
+ ## Observation And State
107
+
108
+ The observation exposes:
109
+
110
+ - task metadata
111
+ - the current ticket
112
+ - queue progress counters
113
+ - history
114
+ - reward and done status
115
+
116
+ The state tracks:
117
+
118
+ - current task
119
+ - seed
120
+ - queue ticket IDs
121
+ - current ticket index
122
+ - per-ticket scores
123
+ - total reward
124
+
125
+ ## Reward Logic
126
+
127
+ - each step returns the current ticket score
128
+ - the final reward is the average of per-ticket scores
129
+ - a small overshoot penalty exists as a safeguard
130
+
131
+ ## Dataset Shape
132
+
133
+ Each record includes:
134
+
135
+ - `ticket_id`
136
+ - `title`
137
+ - `requester`
138
+ - `description`
139
+ - `issue_type`
140
+ - `priority`
141
+ - `assignment_group`
142
+ - `resolution_action`
143
+ - optional `ambiguity_note`
144
+ - optional `related_ticket_id`
145
+
146
+ ## Short Version
147
+
148
+ If coming back later, remember this:
149
+
150
+ - the repo is a helpdesk ticket router
151
+ - the architecture is a small OpenEnv stack
152
+ - one ticket is shown at a time
153
+ - the agent predicts structured routing fields
154
+ - the grader gives deterministic partial credit
155
+ - `inference.py` is the baseline agent runner
PLAN.md ADDED
@@ -0,0 +1,147 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # IT Helpdesk Ticket Routing OpenEnv - Project Plan
2
+
3
+ ## Project Goal
4
+
5
+ Build a polished OpenEnv environment for IT helpdesk ticket routing that satisfies:
6
+
7
+ - real-world utility
8
+ - strong task and grader quality
9
+ - clean environment design
10
+ - OpenEnv spec compliance
11
+ - reproducible baseline inference
12
+ - Docker and Hugging Face deployment readiness
13
+
14
+ ## Current Product Definition
15
+
16
+ The environment simulates a helpdesk queue. An agent receives one ticket at a time and predicts:
17
+
18
+ - `issue_type`
19
+ - `priority`
20
+ - `assignment_group`
21
+ - `resolution_action`
22
+
23
+ The project keeps three tasks:
24
+
25
+ 1. Issue Type Classification
26
+ 2. Issue Type And Priority
27
+ 3. Full Ticket Routing
28
+
29
+ ## What Must Be True At Submission
30
+
31
+ ### Pass or fail requirements
32
+
33
+ - the environment responds correctly
34
+ - OpenEnv metadata is valid
35
+ - `reset()`, `step()`, and `state()` work
36
+ - there are at least 3 tasks
37
+ - graders return scores in `[0.0, 1.0]`
38
+ - `inference.py` runs and prints reproducible results
39
+ - Docker builds and starts cleanly
40
+
41
+ ### Scored requirements
42
+
43
+ - the task should clearly feel like real helpdesk work
44
+ - the hard task should require meaningful reasoning
45
+ - partial credit should be useful and deterministic
46
+ - docs should be clear enough for judges to understand quickly
47
+
48
+ ## Core Files
49
+
50
+ ### Runtime
51
+
52
+ - `models.py`
53
+ - `server/environment.py`
54
+ - `server/grader.py`
55
+ - `server/reward.py`
56
+ - `server/tasks.py`
57
+ - `server/app.py`
58
+ - `client.py`
59
+ - `inference.py`
60
+
61
+ ### Data and metadata
62
+
63
+ - `data/dataset.json`
64
+ - `openenv.yaml`
65
+ - `server/Dockerfile`
66
+ - `pyproject.toml`
67
+ - `requirements.txt`
68
+
69
+ ### Docs
70
+
71
+ - `README.md`
72
+ - `KNOWLEDGE.md`
73
+ - `MENTAL_MODEL.md`
74
+
75
+ ## Technical Priorities
76
+
77
+ ### P0
78
+
79
+ 1. keep the environment behavior correct
80
+ 2. verify the task definitions and graders
81
+ 3. make the baseline script reliable
82
+ 4. confirm dataset coverage and label consistency
83
+
84
+ ### P1
85
+
86
+ 1. validate Docker
87
+ 2. validate deployment assumptions
88
+ 3. record baseline scores
89
+ 4. polish docs
90
+
91
+ ### P2
92
+
93
+ 1. strengthen ticket wording for realism
94
+ 2. expand hard-case examples if needed
95
+ 3. remove low-signal artifacts from the repo
96
+
97
+ ## Quality Checks To Perform
98
+
99
+ ### Environment
100
+
101
+ - reset starts a clean episode
102
+ - each step advances the queue correctly
103
+ - the final step returns trajectory reward
104
+ - state reflects the real internal status
105
+
106
+ ### Grader
107
+
108
+ - exact matches score `1.0`
109
+ - near misses get partial credit where intended
110
+ - unsupported task IDs fail clearly
111
+ - scores vary across examples
112
+
113
+ ### Inference
114
+
115
+ - heuristic mode works without model credentials
116
+ - LLM mode reads `API_BASE_URL`, `MODEL_NAME`, and `HF_TOKEN`
117
+ - output is reproducible when the seed is fixed
118
+
119
+ ### Docs
120
+
121
+ - no outdated domain references remain
122
+ - team and project metadata are correct
123
+ - setup and run instructions are accurate
124
+
125
+ ## Risks
126
+
127
+ ### Runtime risk
128
+
129
+ The repo still needs a proper local execution pass to confirm everything after the latest edits.
130
+
131
+ ### Benchmark risk
132
+
133
+ Fresh scores must be generated and then reflected in docs.
134
+
135
+ ### Deployment risk
136
+
137
+ Docker and Hugging Face behavior should be validated before the final submission window.
138
+
139
+ ## Definition Of Done
140
+
141
+ The project is ready when:
142
+
143
+ 1. the environment runs locally end to end
144
+ 2. the heuristic baseline runs successfully
145
+ 3. Docker build and run both succeed
146
+ 4. the docs are clean, current, and submission-ready
147
+ 5. the repo clearly presents Hackstreet Boys as the team
Preparation ADDED
File without changes
ProblemDetails ADDED
@@ -0,0 +1,472 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Round 1 — Problem Statement
2
+
3
+ The Task
4
+
5
+ Build a complete, real-world OpenEnv environment that an AI agent can learn from through the standard step() / reset() / state() API.
6
+
7
+ Key Requirements at a Glance
8
+
9
+ Must simulate a real-world task (not games or toys)
10
+
11
+ Implement full OpenEnv spec: typed models, step()/reset()/state(), openenv.yaml
12
+
13
+ Minimum 3 tasks with agent graders (easy → medium → hard, scores 0.0–1.0)
14
+
15
+ Meaningful reward function with partial progress signals
16
+
17
+ Baseline inference script with reproducible scores
18
+
19
+ Deploy to Hugging Face Spaces + working Dockerfile
20
+
21
+ README with environment description, action/observation spaces, setup instructions
22
+
23
+
24
+ Real-world task simulation
25
+
26
+ The environment must simulate a task humans actually do. Not games, not toys. Examples: email triage, code review, data cleaning, scheduling, customer support, content moderation.
27
+
28
+ OpenEnv spec compliance
29
+
30
+ Implement the full OpenEnv interface: typed Observation, Action, and Reward Pydantic models. step(action) → returns observation, reward, done, info. reset() → returns initial observation. state() → returns current state. openenv.yaml with metadata. Tested via openenv validate.
31
+
32
+ Minimum 3 tasks with agent graders
33
+
34
+ Each task defines a concrete objective an agent must accomplish, with a programmatic grader that scores performance (0.0–1.0). Tasks should range: easy → medium → hard. Graders must have clear, deterministic success/failure criteria.
35
+
36
+ Meaningful reward function
37
+
38
+ Provides signal over the full trajectory (not just binary end-of-episode). Rewards partial progress toward task completion. Penalizes clearly undesirable behavior (e.g. infinite loops, destructive actions).
39
+
40
+ Baseline inference script
41
+
42
+ Uses the OpenAI API client to run a model against the environment. Reads API credentials from environment variables (OPENAI_API_KEY). Produces a reproducible baseline score on all 3 tasks.
43
+ ___________________________________________
44
+ Detailed Requirements
45
+
46
+ Non-Functional Requirements
47
+
48
+ Deploys to a Hugging Face Space
49
+
50
+ Environment must run as a containerized HF Space tagged with openenv.
51
+
52
+ Containerized execution
53
+
54
+ Must include a working Dockerfile. The environment should start cleanly with docker build + docker run.
55
+
56
+ Documentation
57
+
58
+ README must include: environment description and motivation, action and observation space definitions, task descriptions with expected difficulty, setup and usage instructions, baseline scores.
59
+ ___________________________________________
60
+
61
+ Parameter
62
+
63
+ Weight
64
+
65
+ Description
66
+
67
+ Real-world utility
68
+
69
+ 30%
70
+
71
+ Does the environment model a genuine task? Would someone actually use this to train or evaluate agents?
72
+
73
+ Task & grader quality
74
+
75
+ 25%
76
+
77
+ Are tasks well-defined with clear objectives? Do graders accurately and fairly measure success? Meaningful difficulty progression?
78
+
79
+ Environment design
80
+
81
+ 20%
82
+
83
+ Clean state management, sensible action/observation spaces, good reward shaping, proper episode boundaries.
84
+
85
+ Code quality & spec compliance
86
+
87
+ 15%
88
+
89
+ Follows OpenEnv spec, clean project structure, typed models, documented, tested, Dockerfile works.
90
+
91
+ Creativity & novelty
92
+
93
+ 10%
94
+
95
+ Novel problem domain, interesting mechanics, clever reward design, original approach.
96
+
97
+ Scoring Breakdown
98
+
99
+ Real-world utility (30%)
100
+
101
+ • 0–5: Toy/artificial problem with no practical application
102
+
103
+ • 6–15: Valid domain but shallow modeling of the real task
104
+
105
+ • 16–25: Good domain modeling, would be useful for agent evaluation
106
+
107
+ • 26–30: Excellent — fills a real gap, immediate value for the RL/agent community
108
+
109
+ Task & grader quality (25%)
110
+
111
+ • 3+ tasks with difficulty range?
112
+
113
+ • Graders produce scores between 0.0–1.0?
114
+
115
+ • Graders deterministic and reproducible?
116
+
117
+ • Hard task genuinely challenges frontier models?
118
+
119
+ Environment design (20%)
120
+
121
+ • reset() produces clean state?
122
+
123
+ • Action/observation types well-designed and documented?
124
+
125
+ • Reward function provides useful varying signal (not just sparse)?
126
+
127
+ • Episode boundaries sensible?
128
+
129
+ Code quality & spec compliance (15%)
130
+
131
+ • openenv validate passes?
132
+
133
+ • docker build && docker run works?
134
+
135
+ • HF Space deploys and responds?
136
+
137
+ • Baseline script runs and reproduces scores?
138
+
139
+ Creativity & novelty (10%)
140
+
141
+ • Domain we haven’t seen in OpenEnv before?
142
+
143
+ • Reward design has interesting properties?
144
+
145
+ • Clever mechanics that make the environment engaging
146
+ ________________________________________
147
+
148
+ Phase 1: Automated Validation
149
+
150
+ Pass/fail gate — HF Space deploys, OpenEnv spec compliance, Dockerfile builds, baseline reproduces, 3+ tasks with graders.
151
+
152
+ Phase 2: Agentic Evaluation
153
+
154
+ Scored — baseline agent re-run, standard Open LLM agent (e.g. Nemotron 3 Super) run against all environments, score variance check.
155
+
156
+ Phase 3: Human Review
157
+
158
+ Top submissions reviewed by Meta and Hugging Face engineers for real-world utility, creativity, and exploit checks.
159
+
160
+ Disqualification Criteria
161
+
162
+ Environment does not deploy or respond
163
+
164
+ Plagiarized or trivially modified existing environments
165
+
166
+ Graders that always return the same score
167
+
168
+ No baseline inference script
169
+ __________________________________________
170
+
171
+ HF Space deploys
172
+
173
+ Automated ping to the Space URL — must return 200 and respond to reset()
174
+
175
+ OpenEnv spec compliance
176
+
177
+ Validate openenv.yaml, typed models, step()/reset()/state() endpoints
178
+
179
+ Dockerfile builds
180
+
181
+ Automated docker build on the submitted repo
182
+
183
+ Baseline reproduces
184
+
185
+ Run the submitted inference script — must complete without error and produce scores
186
+
187
+ 3+ tasks with graders
188
+
189
+ Enumerate tasks, run each grader, verify scores in 0.0–1.0 range
190
+
191
+ Additional Instructions
192
+
193
+ Before submitting, ensure the following variables are defined in your environment configuration:
194
+
195
+ API_BASE_URL The API endpoint for the LLM.
196
+
197
+ MODEL_NAME The model identifier to use for inference.
198
+
199
+ HF_TOKEN Your Hugging Face / API key.
200
+
201
+ The inference script must be named `inference.py` and placed in the root directory of the project
202
+
203
+ Participants must use OpenAI Client for all LLM calls using above variables
204
+
205
+ Infra Restrictions
206
+
207
+ Runtime of inference script should be less than 20min
208
+
209
+ Make sure your env and inference can run on a machine with vcpu=2, memory=8gb
210
+
211
+ Validator
212
+
213
+ Run the pre-submission validation script before submitting
214
+
215
+ __________________________________________
216
+ SAMPLE INFERENCE SCRIPT:
217
+ ________________________
218
+ Inference Script Example
219
+ ===================================
220
+ MANDATORY
221
+ - Before submitting, ensure the following variables are defined in your environment configuration:
222
+ API_BASE_URL The API endpoint for the LLM.
223
+ MODEL_NAME The model identifier to use for inference.
224
+ HF_TOKEN Your Hugging Face / API key.
225
+
226
+ - The inference script must be named `inference.py` and placed in the root directory of the project
227
+ - Participants must use OpenAI Client for all LLM calls using above variables
228
+ """
229
+
230
+ import os
231
+ import re
232
+ import base64
233
+ import textwrap
234
+ from io import BytesIO
235
+ from typing import List, Optional, Dict
236
+
237
+ from openai import OpenAI
238
+ import numpy as np
239
+ from PIL import Image
240
+
241
+ from browsergym_env import BrowserGymAction, BrowserGymEnv
242
+
243
+ API_BASE_URL = os.getenv("API_BASE_URL") // "https://router.huggingface.co/v1"
244
+ API_KEY = os.getenv("HF_TOKEN") or os.getenv("API_KEY")
245
+ MODEL_NAME = os.getenv("MODEL_NAME")
246
+ MAX_STEPS = 8
247
+ MAX_DOM_CHARS = 3500
248
+ TEMPERATURE = 0.2
249
+ MAX_TOKENS = 200
250
+ FALLBACK_ACTION = "noop()"
251
+
252
+ DEBUG = True
253
+ ACTION_PREFIX_RE = re.compile(
254
+ r"^(action|next action)\s*[:\-]\s*",
255
+ re.IGNORECASE,
256
+ )
257
+ ACTION_PATTERN = re.compile(r"[A-Za-z_]+\s*\(.*\)", re.DOTALL)
258
+
259
+
260
+ SYSTEM_PROMPT = textwrap.dedent(
261
+ """
262
+ You control a web browser through BrowserGym.
263
+ Reply with exactly one action string.
264
+ The action must be a valid BrowserGym command such as:
265
+ - noop()
266
+ - click('<BID>')
267
+ - type('selector', 'text to enter')
268
+ - fill('selector', 'text to enter')
269
+ - send_keys('Enter')
270
+ - scroll('down')
271
+ Use single quotes around string arguments.
272
+ When clicking, use the BrowserGym element IDs (BIDs) listed in the user message.
273
+ If you are unsure, respond with noop().
274
+ Do not include explanations or additional text.
275
+ """
276
+ ).strip()
277
+
278
+
279
+ def build_history_lines(history: List[str]) -> str:
280
+ if not history:
281
+ return "None"
282
+ return "\n".join(history[-4:])
283
+
284
+
285
+ def extract_screenshot_uri(observation) -> Optional[str]:
286
+ if observation.screenshot is None:
287
+ return None
288
+ screen_array = np.array(observation.screenshot, dtype=np.uint8)
289
+ image = Image.fromarray(screen_array)
290
+ buffer = BytesIO()
291
+ image.save(buffer, format="PNG")
292
+ buffer.seek(0)
293
+ data_uri = base64.b64encode(buffer.read()).decode("utf-8")
294
+ return f"data:image/png;base64,{data_uri}"
295
+
296
+
297
+ def extract_clickable_elements(observation) -> List[Dict[str, str]]:
298
+ """Collect BrowserGym element IDs that can be clicked."""
299
+
300
+ metadata = getattr(observation, "metadata", {}) or {}
301
+ obs_dict = metadata.get("browsergym_obs", {}) or {}
302
+ extra_props = obs_dict.get("extra_element_properties", {}) or {}
303
+
304
+ clickables: List[Dict[str, str]] = []
305
+ for bid, props in extra_props.items():
306
+ if not props.get("clickable"):
307
+ continue
308
+
309
+ bbox = props.get("bbox") or []
310
+ bbox_str = ", ".join(bbox) if bbox else "?"
311
+ clickables.append(
312
+ {
313
+ "bid": str(bid),
314
+ "bbox": bbox_str,
315
+ }
316
+ )
317
+
318
+ # Keep a stable ordering for readability
319
+ clickables.sort(key=lambda item: item["bid"])
320
+ return clickables
321
+
322
+
323
+ def build_user_prompt(step: int, observation, history: List[str]) -> str:
324
+ goal = observation.goal or "(not provided)"
325
+ url = observation.url or "(unknown)"
326
+ error_note = "Yes" if observation.last_action_error else "No"
327
+
328
+ clickables = extract_clickable_elements(observation)
329
+ if clickables:
330
+ actions_hint = "\n".join(
331
+ f" - {item['bid']} (bbox: {item['bbox']})" for item in clickables
332
+ )
333
+ else:
334
+ actions_hint = " (none detected)"
335
+
336
+ prompt = textwrap.dedent(
337
+ f"""
338
+ Step: {step}
339
+ Goal: {goal}
340
+ Current URL: {url}
341
+ Previous steps:
342
+ {build_history_lines(history)}
343
+ Last action error: {error_note}
344
+ Available clickable element IDs: {actions_hint}
345
+ Reply with exactly one BrowserGym action string.
346
+ """
347
+ ).strip()
348
+ return prompt
349
+
350
+
351
+ def parse_model_action(response_text: str) -> str:
352
+ if not response_text:
353
+ return FALLBACK_ACTION
354
+
355
+ # Prefer the first line that looks like an action string
356
+ lines = response_text.splitlines()
357
+ for raw_line in lines:
358
+ line = raw_line.strip()
359
+ if not line:
360
+ continue
361
+ line = ACTION_PREFIX_RE.sub("", line)
362
+ match = ACTION_PATTERN.search(line)
363
+ if match:
364
+ action = match.group(0).strip()
365
+ # Collapse internal whitespace
366
+ action = re.sub(r"\s+", " ", action)
367
+ # If the model tried to click by natural-language description while we
368
+ # only exposed numeric BrowserGym IDs, fallback to the single detected ID.
369
+ return action
370
+
371
+ # Fall back to searching the whole response
372
+ match = ACTION_PATTERN.search(response_text)
373
+ if match:
374
+ action = match.group(0).strip()
375
+ action = re.sub(r"\s+", " ", action)
376
+ return action
377
+
378
+ return FALLBACK_ACTION
379
+
380
+
381
+ def main() -> None:
382
+ client = OpenAI(base_url=API_BASE_URL, api_key=API_KEY)
383
+
384
+ env = BrowserGymEnv.from_docker_image(
385
+ image="browsergym-env:latest",
386
+ env_vars={
387
+ "BROWSERGYM_BENCHMARK": "miniwob",
388
+ "BROWSERGYM_TASK_NAME": "click-test",
389
+ },
390
+ )
391
+
392
+ history: List[str] = []
393
+
394
+ try:
395
+ result = env.reset()
396
+ observation = result.observation
397
+ print(f"Episode goal: {observation.goal}")
398
+
399
+ for step in range(1, MAX_STEPS + 1):
400
+ if result.done:
401
+ print("Environment signalled done. Stopping early.")
402
+ break
403
+
404
+ user_prompt = build_user_prompt(step, observation, history)
405
+ user_content = [{"type": "text", "text": user_prompt}]
406
+ screenshot_uri = extract_screenshot_uri(observation)
407
+ if screenshot_uri:
408
+ user_content.append(
409
+ {
410
+ "type": "image_url",
411
+ "image_url": {"url": screenshot_uri},
412
+ }
413
+ )
414
+
415
+ messages = [
416
+ {
417
+ "role": "system",
418
+ "content": [{"type": "text", "text": SYSTEM_PROMPT}],
419
+ },
420
+ {
421
+ "role": "user",
422
+ "content": user_content,
423
+ },
424
+ ]
425
+
426
+ try:
427
+ completion = client.chat.completions.create(
428
+ model=MODEL_NAME,
429
+ messages=messages,
430
+ temperature=TEMPERATURE,
431
+ max_tokens=MAX_TOKENS,
432
+ stream=False,
433
+ )
434
+ response_text = completion.choices[0].message.content or ""
435
+ # pylint: disable=broad-except
436
+ except Exception as exc: # noqa: BLE001
437
+ failure_msg = f"Model request failed ({exc}). Using fallback action."
438
+ print(failure_msg)
439
+ response_text = FALLBACK_ACTION
440
+
441
+ action_str = parse_model_action(response_text)
442
+ print(f"Step {step}: model suggested -> {action_str}")
443
+
444
+ result = env.step(BrowserGymAction(action_str=action_str))
445
+ observation = result.observation
446
+
447
+ reward = result.reward or 0.0
448
+ error_flag = " ERROR" if observation.last_action_error else ""
449
+ history_line = (
450
+ f"Step {step}: {action_str} -> reward {reward:+.2f}{error_flag}"
451
+ )
452
+ history.append(history_line)
453
+ print(
454
+ " Reward: "
455
+ f"{reward:+.2f} | Done: {result.done} | Last action error: "
456
+ f"{observation.last_action_error}"
457
+ )
458
+
459
+ if result.done:
460
+ print("Episode complete.")
461
+ break
462
+
463
+ else:
464
+ print(f"Reached max steps ({MAX_STEPS}).")
465
+
466
+ finally:
467
+ env.close()
468
+
469
+
470
+ if __name__ == "__main__":
471
+ main()
472
+ ____________________________________
README.md ADDED
@@ -0,0 +1,258 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # IT Helpdesk Ticket Routing OpenEnv
2
+
3
+ > Meta PyTorch OpenEnv Hackathon - Round 1 Submission
4
+ > Team Hackstreet Boys - Roopal Guha Neogi, Suyash Kumar
5
+
6
+ A deterministic, multi-step IT helpdesk ticket routing environment built on the OpenEnv framework. An AI agent receives a small queue of helpdesk tickets and must classify the issue type, estimate priority, assign the correct resolver group, and choose the best next action.
7
+
8
+ ## Why IT Helpdesk Ticket Routing?
9
+
10
+ IT service desks do this work every day:
11
+
12
+ - read a newly created ticket
13
+ - decide what kind of issue it is
14
+ - judge urgency
15
+ - route it to the right team
16
+ - decide whether to fulfill, escalate, assign, ignore, or acknowledge it
17
+
18
+ This makes the domain:
19
+
20
+ - genuinely real-world
21
+ - easy to evaluate deterministically
22
+ - naturally multi-step
23
+ - well aligned with enterprise support and agent-routing workflows
24
+
25
+ ## Architecture
26
+
27
+ ```text
28
+ inference.py
29
+ |
30
+ v
31
+ client.py <----> server/app.py
32
+ |
33
+ v
34
+ server/environment.py
35
+ | | |
36
+ v v v
37
+ grader.py reward.py tasks.py
38
+ |
39
+ v
40
+ data/dataset.json
41
+ ```
42
+
43
+ Key architectural detail:
44
+
45
+ - the environment is designed as a multi-step ticket queue
46
+ - the client path is used for persistent episode flow
47
+ - the environment still follows the standard OpenEnv `reset()`, `step()`, and `state()` interface
48
+
49
+ ## Tasks
50
+
51
+ | ID | Name | Difficulty | Fields Required | Description |
52
+ |----|------|------------|-----------------|-------------|
53
+ | 1 | Issue Type Classification | Easy | `issue_type` | Classify the ticket into the correct IT issue type |
54
+ | 2 | Issue Type And Priority | Medium | `issue_type`, `priority` | Classify the issue and estimate urgency |
55
+ | 3 | Full Ticket Routing | Hard | `issue_type`, `priority`, `assignment_group`, `resolution_action` | Perform full helpdesk routing |
56
+
57
+ ## Action Space
58
+
59
+ The agent submits a `HelpdeskTicketAction`. Only the fields relevant to the current task are scored.
60
+
61
+ ```json
62
+ {
63
+ "issue_type": "billing_license | identity_access | application_support | service_request | spam_phishing | general_inquiry | security_compliance | onboarding | feature_request",
64
+ "priority": "critical | high | medium | low",
65
+ "assignment_group": "license_ops | service_desk | application_team | procurement | security_team | onboarding_ops",
66
+ "resolution_action": "fulfill | escalate | assign | ignore | acknowledge"
67
+ }
68
+ ```
69
+
70
+ ## Observation Space
71
+
72
+ Each observation contains:
73
+
74
+ - `task_id`
75
+ - `task_name`
76
+ - `instructions`
77
+ - `allowed_fields`
78
+ - `current_ticket`
79
+ - `queue_size`
80
+ - `tickets_remaining`
81
+ - `tickets_processed`
82
+ - `history`
83
+ - inherited OpenEnv fields such as `done` and `reward`
84
+
85
+ The visible ticket fields are:
86
+
87
+ - `ticket_id`
88
+ - `title`
89
+ - `requester`
90
+ - `description`
91
+
92
+ Ground-truth labels are not exposed to the agent.
93
+
94
+ ## State
95
+
96
+ The internal `HelpdeskTicketState` tracks:
97
+
98
+ - `episode_id`
99
+ - `step_count`
100
+ - `current_task_id`
101
+ - `seed`
102
+ - `queue_ticket_ids`
103
+ - `current_ticket_index`
104
+ - `per_ticket_scores`
105
+ - `total_reward`
106
+
107
+ ## Grading
108
+
109
+ Scoring is deterministic and ranges from `0.0` to `1.0`.
110
+
111
+ ### Per-field logic
112
+
113
+ - `issue_type`: exact match or partial credit for near-miss pairs
114
+ - `priority`: exact match or proximity score
115
+ - `assignment_group`: exact match
116
+ - `resolution_action`: exact match
117
+
118
+ ### Task weights
119
+
120
+ | Task | Issue Type | Priority | Assignment Group | Resolution Action |
121
+ |------|------------|----------|------------------|-------------------|
122
+ | 1 | 100% | - | - | - |
123
+ | 2 | 60% | 40% | - | - |
124
+ | 3 | 35% | 20% | 25% | 20% |
125
+
126
+ ### Trajectory reward
127
+
128
+ At episode end:
129
+
130
+ ```text
131
+ trajectory_reward = average(per_ticket_scores) - 0.03 * max(0, steps_taken - queue_size)
132
+ ```
133
+
134
+ The result is clamped to `[0.0, 1.0]`.
135
+
136
+ ## Dataset
137
+
138
+ `data/dataset.json` contains 45 labeled helpdesk tickets covering:
139
+
140
+ - issue classification
141
+ - access requests
142
+ - application incidents
143
+ - procurement and service requests
144
+ - phishing or spam reports
145
+ - security and compliance work
146
+ - onboarding tickets
147
+ - feature requests
148
+
149
+ The dataset also includes:
150
+
151
+ - ambiguous cases
152
+ - follow-up thread references
153
+ - multiple priority levels
154
+
155
+ ## Project Structure
156
+
157
+ ```text
158
+ server/
159
+ app.py
160
+ environment.py
161
+ grader.py
162
+ reward.py
163
+ tasks.py
164
+ Dockerfile
165
+ data/
166
+ dataset.json
167
+ models.py
168
+ client.py
169
+ inference.py
170
+ openenv.yaml
171
+ pyproject.toml
172
+ requirements.txt
173
+ README.md
174
+ KNOWLEDGE.md
175
+ PLAN.md
176
+ MENTAL_MODEL.md
177
+ ```
178
+
179
+ ## Setup
180
+
181
+ Install dependencies:
182
+
183
+ ```bash
184
+ pip install -r requirements.txt
185
+ ```
186
+
187
+ Start the server:
188
+
189
+ ```bash
190
+ uvicorn server.app:app --host 0.0.0.0 --port 8000
191
+ ```
192
+
193
+ Basic checks:
194
+
195
+ ```bash
196
+ curl http://localhost:8000/health
197
+ curl http://localhost:8000/tasks
198
+ ```
199
+
200
+ ## Running Inference
201
+
202
+ ### LLM mode
203
+
204
+ Set:
205
+
206
+ - `API_BASE_URL`
207
+ - `MODEL_NAME`
208
+ - `HF_TOKEN`
209
+
210
+ Then run:
211
+
212
+ ```bash
213
+ python inference.py
214
+ ```
215
+
216
+ ### Heuristic mode
217
+
218
+ If those variables are not set, the script falls back to a keyword-based ticket router:
219
+
220
+ ```bash
221
+ python inference.py
222
+ ```
223
+
224
+ Optional server target:
225
+
226
+ - `ENV_URL` default: `http://localhost:8000`
227
+
228
+ ## Docker
229
+
230
+ Build and run:
231
+
232
+ ```bash
233
+ docker build -f server/Dockerfile -t helpdesk-ticket-routing .
234
+ docker run -p 7860:7860 helpdesk-ticket-routing
235
+ ```
236
+
237
+ ## API Endpoints
238
+
239
+ OpenEnv auto-generates the main endpoints, and the repo adds `/tasks`.
240
+
241
+ | Method | Path | Description |
242
+ |--------|------|-------------|
243
+ | GET | `/health` | Health check |
244
+ | POST | `/reset` | Start a new episode |
245
+ | POST | `/step` | Submit an action |
246
+ | GET | `/state` | Inspect state |
247
+ | WebSocket | `/ws` | Persistent client channel |
248
+ | GET | `/tasks` | List available tasks |
249
+ | GET | `/docs` | API docs |
250
+
251
+ ## Baseline Status
252
+
253
+ Fresh baseline scores should be recorded after the next validation pass. The recommended order is:
254
+
255
+ 1. run the environment locally
256
+ 2. run the heuristic baseline in `inference.py`
257
+ 3. record per-task and overall scores
258
+ 4. update the docs only after those numbers are verified
ROADMAP.md ADDED
@@ -0,0 +1,339 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Hackstreet Boys Roadmap
2
+
3
+ ## Team
4
+
5
+ - Team name: Hackstreet Boys
6
+ - Members:
7
+ - Roopal Guha Neogi
8
+ - Suyash Kumar
9
+ - Submission deadline: April 8, 2026, 11:59 PM IST
10
+
11
+ ## Goal
12
+
13
+ Ship a clean, well-documented OpenEnv environment for IT helpdesk ticket routing that:
14
+
15
+ - passes all submission gates
16
+ - scores well on real-world utility
17
+ - has deterministic, defensible grading
18
+ - is easy for judges to understand and rerun
19
+
20
+ ## When You Start Coding
21
+
22
+ Start coding immediately on **March 30, 2026** after a short 30 to 60 minute alignment pass.
23
+
24
+ That first coding session should do only high-leverage foundation work:
25
+
26
+ - lock the exact ticket vocabulary
27
+ - freeze field names in `models.py`
28
+ - confirm task fields in `server/tasks.py`
29
+ - agree on grader labels in `server/grader.py`
30
+ - agree that no one changes schema names casually after this point
31
+
32
+ ### First coding targets on March 30, 2026
33
+
34
+ Roopal should start with:
35
+
36
+ - `data/dataset.json`
37
+ - `server/tasks.py`
38
+ - `server/grader.py`
39
+
40
+ Suyash should start with:
41
+
42
+ - `models.py`
43
+ - `server/environment.py`
44
+ - `inference.py`
45
+
46
+ By the end of the first coding block, both of you should have:
47
+
48
+ - matching field names
49
+ - matching task labels
50
+ - matching issue-type vocabulary
51
+ - no unresolved schema disagreements
52
+
53
+ ## Working Model For Two People
54
+
55
+ The safest way for two people to work separately and merge cleanly is to divide ownership by file groups, not by abstract ideas.
56
+
57
+ ### Roopal ownership
58
+
59
+ - `data/dataset.json`
60
+ - `server/tasks.py`
61
+ - `server/grader.py`
62
+ - `README.md`
63
+ - `KNOWLEDGE.md`
64
+ - `MENTAL_MODEL.md`
65
+
66
+ Primary responsibilities:
67
+
68
+ - dataset quality
69
+ - label consistency
70
+ - task wording
71
+ - grader realism
72
+ - documentation clarity
73
+ - judging-story polish
74
+
75
+ ### Suyash ownership
76
+
77
+ - `models.py`
78
+ - `server/environment.py`
79
+ - `server/app.py`
80
+ - `server/reward.py`
81
+ - `client.py`
82
+ - `inference.py`
83
+ - `openenv.yaml`
84
+ - `server/Dockerfile`
85
+ - `pyproject.toml`
86
+ - `requirements.txt`
87
+
88
+ Primary responsibilities:
89
+
90
+ - runtime correctness
91
+ - OpenEnv interface
92
+ - inference reliability
93
+ - Docker and deployment readiness
94
+ - integration behavior
95
+
96
+ ## Merge Strategy
97
+
98
+ To keep parallel work easy to combine:
99
+
100
+ 1. avoid editing the same file on the same day unless planned
101
+ 2. use one shared terminology list and do not invent alternate labels
102
+ 3. sync once daily with a 10 minute review of:
103
+ - changed files
104
+ - open blockers
105
+ - any schema changes
106
+ 4. freeze the dataset schema early
107
+ 5. freeze the action and observation field names early
108
+
109
+ ## Shared Source Of Truth
110
+
111
+ These files should be treated as authoritative:
112
+
113
+ - `README.md` for the public project story
114
+ - `PLAN.md` for project requirements and definition of done
115
+ - `MENTAL_MODEL.md` for the current system shape
116
+ - `openenv.yaml` for environment metadata
117
+ - `server/tasks.py` and `server/grader.py` for task rules
118
+
119
+ ## AI Usage Policy
120
+
121
+ AI is permitted, so use it aggressively where it saves time, but do not outsource judgment.
122
+
123
+ Good uses of AI:
124
+
125
+ - draft clearer task descriptions
126
+ - propose additional hard-case tickets
127
+ - suggest edge cases and label audits
128
+ - improve prompts in `inference.py`
129
+ - generate test ideas and checklists
130
+ - improve README structure and wording
131
+
132
+ Human review required for:
133
+
134
+ - final dataset labels
135
+ - grader weights and partial-credit rules
136
+ - any claims in README
137
+ - final benchmark numbers
138
+ - submission metadata and deployment settings
139
+
140
+ ## Submission Criteria Checklist
141
+
142
+ ### Must pass
143
+
144
+ - environment starts correctly
145
+ - `reset()`, `step()`, and `state()` behave correctly
146
+ - 3 tasks exist and are meaningfully different
147
+ - grader scores are in `[0.0, 1.0]`
148
+ - `inference.py` runs without error
149
+ - Docker builds and starts
150
+ - docs are complete and current
151
+
152
+ ### Must score well
153
+
154
+ - the task feels like real IT helpdesk work
155
+ - the hard task is genuinely harder
156
+ - the grader gives partial credit in sensible ways
157
+ - the environment is easy to understand and rerun
158
+
159
+ ## Timeline
160
+
161
+ ### March 30, 2026
162
+
163
+ - lock team name, domain, and vocabulary
164
+ - finish repo cleanup
165
+ - agree on ownership split
166
+ - start coding the core schema and task logic immediately after the vocabulary lock
167
+ - target a same-day checkpoint on:
168
+ - `models.py`
169
+ - `server/tasks.py`
170
+ - `server/grader.py`
171
+ - `server/environment.py`
172
+
173
+ ### March 31, 2026
174
+
175
+ Roopal:
176
+
177
+ - audit `data/dataset.json` labels end to end
178
+ - tighten ambiguous cases
179
+ - review task wording in `server/tasks.py`
180
+ - continue code work in `server/grader.py` if partial-credit tuning is still needed
181
+
182
+ Suyash:
183
+
184
+ - sanity-check `models.py`, `server/environment.py`, and `client.py`
185
+ - check that the field names align everywhere
186
+ - continue code work in `inference.py` and `server/app.py`
187
+
188
+ Shared checkpoint:
189
+
190
+ - confirm no schema changes are still pending
191
+
192
+ ### April 1, 2026
193
+
194
+ Roopal:
195
+
196
+ - polish `server/grader.py`
197
+ - confirm hard-task logic and partial-credit behavior
198
+ - finish any remaining dataset label corrections
199
+
200
+ Suyash:
201
+
202
+ - polish `inference.py`
203
+ - confirm heuristic mode uses the new ticket vocabulary consistently
204
+ - finish runtime code adjustments in `client.py`, `server/app.py`, and `server/reward.py`
205
+
206
+ Shared checkpoint:
207
+
208
+ - agree on the exact labels and examples used in docs
209
+
210
+ ### April 2, 2026
211
+
212
+ Roopal:
213
+
214
+ - improve `README.md`
215
+ - improve `KNOWLEDGE.md`
216
+
217
+ Suyash:
218
+
219
+ - validate `openenv.yaml`
220
+ - validate `server/Dockerfile`
221
+ - validate dependency files
222
+
223
+ Shared checkpoint:
224
+
225
+ - ensure docs and code tell the same story
226
+
227
+ ### April 3, 2026
228
+
229
+ Roopal:
230
+
231
+ - do a dataset realism pass
232
+ - make sure examples clearly cover easy, medium, and hard cases
233
+
234
+ Suyash:
235
+
236
+ - perform the first full local runtime pass
237
+ - run heuristic inference
238
+ - note bugs or schema mismatches
239
+
240
+ Shared checkpoint:
241
+
242
+ - bug triage and fix list
243
+
244
+ ### Practical coding rule
245
+
246
+ If you are wondering "should we still be planning or should we code now?", the answer is:
247
+
248
+ - **March 30 to April 4, 2026 = active coding and fixes**
249
+ - **April 5 to April 6, 2026 = validation, docs, and score recording**
250
+ - **April 7 to April 8, 2026 = freeze, smoke tests, and submission**
251
+
252
+ ### April 4, 2026
253
+
254
+ Roopal:
255
+
256
+ - fix data, wording, and documentation issues from runtime feedback
257
+
258
+ Suyash:
259
+
260
+ - fix environment, inference, and Docker issues from runtime feedback
261
+
262
+ Shared checkpoint:
263
+
264
+ - second full local run
265
+
266
+ ### April 5, 2026
267
+
268
+ Roopal:
269
+
270
+ - finalize README and knowledge docs
271
+ - prepare a concise judge-facing explanation of the domain
272
+
273
+ Suyash:
274
+
275
+ - confirm Docker flow
276
+ - confirm all required env vars are documented and handled
277
+
278
+ Shared checkpoint:
279
+
280
+ - record benchmark numbers if stable
281
+
282
+ ### April 6, 2026
283
+
284
+ - full dry run from a clean copy if possible
285
+ - verify every required file is present
286
+ - check for stale claims and outdated wording
287
+
288
+ ### April 7, 2026
289
+
290
+ - freeze feature changes
291
+ - only bug fixes, validation, and submission packaging
292
+ - verify final docs, metadata, and benchmark numbers
293
+
294
+ ### April 8, 2026
295
+
296
+ - do one last deployment and smoke test early in the day
297
+ - stop risky edits several hours before deadline
298
+ - submit before 11:59 PM IST
299
+
300
+ ## Integration Rules
301
+
302
+ To keep merges painless:
303
+
304
+ 1. do not rename schemas after April 1, 2026
305
+ 2. do not change task labels after April 2, 2026 without both agreeing
306
+ 3. do not edit ownership files casually
307
+ 4. if one person must touch the other person's file, call it out before doing it
308
+ 5. keep a short daily changelog in chat or a shared note
309
+
310
+ ## Definition Of Done For Each Member
311
+
312
+ ### Roopal done means
313
+
314
+ - dataset labels are internally consistent
315
+ - docs are submission-ready
316
+ - the hard task feels meaningfully harder than the easy and medium tasks
317
+
318
+ ### Suyash done means
319
+
320
+ - the environment runs end to end
321
+ - the inference script works in heuristic mode
322
+ - Docker and metadata are in good shape
323
+
324
+ ## Final Two-Day Priority Order
325
+
326
+ If time gets tight, prioritize in this exact order:
327
+
328
+ 1. working environment
329
+ 2. working inference script
330
+ 3. valid grader and tasks
331
+ 4. Docker and metadata
332
+ 5. README clarity
333
+ 6. extra polish
334
+
335
+ ## Simple Rule To Remember
336
+
337
+ Roopal owns the story and the labels.
338
+ Suyash owns the runtime and the rails.
339
+ Both review the final submission together.
__init__.cpython-313.pyc ADDED
Binary file (166 Bytes). View file
 
__init__.py ADDED
File without changes
app.cpython-313.pyc ADDED
Binary file (1.54 kB). View file
 
client.cpython-313.pyc ADDED
Binary file (1.86 kB). View file
 
client.py ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+ from typing import Any, Dict, Optional
4
+
5
+ from openenv.core.env_client import EnvClient, StepResult
6
+
7
+ from models import HelpdeskTicketAction, HelpdeskTicketObservation, HelpdeskTicketState
8
+
9
+
10
+ class HelpdeskTicketEnvClient(
11
+ EnvClient[HelpdeskTicketAction, HelpdeskTicketObservation, HelpdeskTicketState]
12
+ ):
13
+ def _step_payload(self, action: HelpdeskTicketAction) -> Dict[str, Any]:
14
+ return action.model_dump(exclude_none=True)
15
+
16
+ def _parse_result(
17
+ self, payload: Dict[str, Any]
18
+ ) -> StepResult[HelpdeskTicketObservation]:
19
+ obs_data = payload.get("observation", payload)
20
+ obs = HelpdeskTicketObservation.model_validate(obs_data)
21
+ return StepResult(
22
+ observation=obs,
23
+ reward=payload.get("reward", obs.reward),
24
+ done=payload.get("done", obs.done),
25
+ )
26
+
27
+ def _parse_state(self, payload: Dict[str, Any]) -> HelpdeskTicketState:
28
+ return HelpdeskTicketState.model_validate(payload)
data/dataset.json ADDED
@@ -0,0 +1,543 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "ticket_id": "ticket-001",
4
+ "title": "Urgent: customer charged twice for March invoice",
5
+ "requester": "ap@northstar-retail.com",
6
+ "description": "Our finance team found two charges on the same invoice and needs a refund processed today.",
7
+ "issue_type": "billing_license",
8
+ "priority": "high",
9
+ "assignment_group": "license_ops",
10
+ "resolution_action": "escalate",
11
+ "ambiguity_note": null,
12
+ "related_ticket_id": null
13
+ },
14
+ {
15
+ "ticket_id": "ticket-002",
16
+ "title": "Can not sign in after 2FA reset",
17
+ "requester": "ops@laneeight.io",
18
+ "description": "I was forced to reset 2FA and now the account stays locked even with the backup code.",
19
+ "issue_type": "identity_access",
20
+ "priority": "high",
21
+ "assignment_group": "service_desk",
22
+ "resolution_action": "fulfill",
23
+ "ambiguity_note": null,
24
+ "related_ticket_id": null
25
+ },
26
+ {
27
+ "ticket_id": "ticket-003",
28
+ "title": "Production checkout throwing null reference exception",
29
+ "requester": "sre@paperkite.dev",
30
+ "description": "Customers can not complete payment in production. This is blocking revenue right now.",
31
+ "issue_type": "application_support",
32
+ "priority": "critical",
33
+ "assignment_group": "application_team",
34
+ "resolution_action": "escalate",
35
+ "ambiguity_note": null,
36
+ "related_ticket_id": null
37
+ },
38
+ {
39
+ "ticket_id": "ticket-004",
40
+ "title": "Requesting pricing for 300-seat rollout",
41
+ "requester": "procurement@solsticehealth.org",
42
+ "description": "We are evaluating vendors and want a quote for an enterprise rollout next quarter.",
43
+ "issue_type": "service_request",
44
+ "priority": "medium",
45
+ "assignment_group": "procurement",
46
+ "resolution_action": "assign",
47
+ "ambiguity_note": null,
48
+ "related_ticket_id": null
49
+ },
50
+ {
51
+ "ticket_id": "ticket-005",
52
+ "title": "Guaranteed crypto income from home",
53
+ "requester": "promo@fastwealth.example",
54
+ "description": "Limited time offer. Click now to multiply your income and unsubscribe never.",
55
+ "issue_type": "spam_phishing",
56
+ "priority": "low",
57
+ "assignment_group": "security_team",
58
+ "resolution_action": "ignore",
59
+ "ambiguity_note": null,
60
+ "related_ticket_id": null
61
+ },
62
+ {
63
+ "ticket_id": "ticket-006",
64
+ "title": "Refund still missing for canceled annual plan",
65
+ "requester": "controller@redcedar.co",
66
+ "description": "We canceled three weeks ago and the refund has not arrived. Please confirm status.",
67
+ "issue_type": "billing_license",
68
+ "priority": "medium",
69
+ "assignment_group": "license_ops",
70
+ "resolution_action": "fulfill",
71
+ "ambiguity_note": null,
72
+ "related_ticket_id": null
73
+ },
74
+ {
75
+ "ticket_id": "ticket-007",
76
+ "title": "GDPR data deletion request — 30 day deadline",
77
+ "requester": "legal@eurocorp.de",
78
+ "description": "Per GDPR Article 17, we request deletion of all personal data associated with our account within 30 days. Failure to comply may result in regulatory action.",
79
+ "issue_type": "security_compliance",
80
+ "priority": "critical",
81
+ "assignment_group": "security_team",
82
+ "resolution_action": "escalate",
83
+ "ambiguity_note": null,
84
+ "related_ticket_id": null
85
+ },
86
+ {
87
+ "ticket_id": "ticket-008",
88
+ "title": "Welcome aboard — getting started with your new account",
89
+ "requester": "success@brightpath.io",
90
+ "description": "Thanks for signing up! We\u0027d like to schedule an onboarding call this week. What time works for your team?",
91
+ "issue_type": "onboarding",
92
+ "priority": "medium",
93
+ "assignment_group": "onboarding_ops",
94
+ "resolution_action": "fulfill",
95
+ "ambiguity_note": null,
96
+ "related_ticket_id": null
97
+ },
98
+ {
99
+ "ticket_id": "ticket-009",
100
+ "title": "Feature suggestion: dark mode for dashboard",
101
+ "requester": "ux-team@designhub.co",
102
+ "description": "Our users have been requesting dark mode for months. Would love to see this on the roadmap.",
103
+ "issue_type": "feature_request",
104
+ "priority": "low",
105
+ "assignment_group": "application_team",
106
+ "resolution_action": "acknowledge",
107
+ "ambiguity_note": null,
108
+ "related_ticket_id": null
109
+ },
110
+ {
111
+ "ticket_id": "ticket-010",
112
+ "title": "Password reset link expired before I could use it",
113
+ "requester": "jsmith@midtownlogistics.com",
114
+ "description": "I requested a password reset but by the time I checked my email the link had expired. Can you send a new one?",
115
+ "issue_type": "identity_access",
116
+ "priority": "medium",
117
+ "assignment_group": "service_desk",
118
+ "resolution_action": "fulfill",
119
+ "ambiguity_note": null,
120
+ "related_ticket_id": null
121
+ },
122
+ {
123
+ "ticket_id": "ticket-011",
124
+ "title": "API rate limiting causing data sync failures",
125
+ "requester": "devops@streamline.app",
126
+ "description": "Our integration is hitting 429 errors every hour during peak load. We need the rate limit raised or a bulk endpoint.",
127
+ "issue_type": "application_support",
128
+ "priority": "high",
129
+ "assignment_group": "application_team",
130
+ "resolution_action": "escalate",
131
+ "ambiguity_note": null,
132
+ "related_ticket_id": null
133
+ },
134
+ {
135
+ "ticket_id": "ticket-012",
136
+ "title": "Interested in a live demo for our leadership team",
137
+ "requester": "cto@nexwave.io",
138
+ "description": "We have budget allocated for Q3 and would like a 30-minute demo with our CTO and VP Eng.",
139
+ "issue_type": "service_request",
140
+ "priority": "high",
141
+ "assignment_group": "procurement",
142
+ "resolution_action": "assign",
143
+ "ambiguity_note": null,
144
+ "related_ticket_id": null
145
+ },
146
+ {
147
+ "ticket_id": "ticket-013",
148
+ "title": "Free vacation giveaway — claim your prize",
149
+ "requester": "offers@tropicaldeals.example",
150
+ "description": "Congratulations! You have been selected for an all-expenses-paid trip. Click here immediately.",
151
+ "issue_type": "spam_phishing",
152
+ "priority": "low",
153
+ "assignment_group": "security_team",
154
+ "resolution_action": "ignore",
155
+ "ambiguity_note": null,
156
+ "related_ticket_id": null
157
+ },
158
+ {
159
+ "ticket_id": "ticket-014",
160
+ "title": "Audit report findings — action required by Friday",
161
+ "requester": "audit@compliancepartners.com",
162
+ "description": "The SOC2 audit uncovered three medium-severity findings. Remediation evidence is due by end of week.",
163
+ "issue_type": "security_compliance",
164
+ "priority": "high",
165
+ "assignment_group": "security_team",
166
+ "resolution_action": "escalate",
167
+ "ambiguity_note": null,
168
+ "related_ticket_id": null
169
+ },
170
+ {
171
+ "ticket_id": "ticket-015",
172
+ "title": "Invoice discrepancy for order #4821",
173
+ "requester": "accounts@meridianfoods.com",
174
+ "description": "The invoice total doesn\u0027t match our purchase order. There\u0027s a $2,400 overcharge on the line items.",
175
+ "issue_type": "billing_license",
176
+ "priority": "high",
177
+ "assignment_group": "license_ops",
178
+ "resolution_action": "fulfill",
179
+ "ambiguity_note": null,
180
+ "related_ticket_id": null
181
+ },
182
+ {
183
+ "ticket_id": "ticket-016",
184
+ "title": "New hire onboarding checklist incomplete",
185
+ "requester": "hr@talentbridge.co",
186
+ "description": "Three new engineers start Monday and their accounts haven\u0027t been provisioned yet. Please expedite.",
187
+ "issue_type": "onboarding",
188
+ "priority": "high",
189
+ "assignment_group": "onboarding_ops",
190
+ "resolution_action": "fulfill",
191
+ "ambiguity_note": null,
192
+ "related_ticket_id": null
193
+ },
194
+ {
195
+ "ticket_id": "ticket-017",
196
+ "title": "Dashboard latency is unacceptable",
197
+ "requester": "ops-lead@fastfreight.com",
198
+ "description": "Pages are taking 12+ seconds to load. This is impacting our dispatchers during peak hours. We need this fixed ASAP.",
199
+ "issue_type": "application_support",
200
+ "priority": "critical",
201
+ "assignment_group": "application_team",
202
+ "resolution_action": "escalate",
203
+ "ambiguity_note": null,
204
+ "related_ticket_id": null
205
+ },
206
+ {
207
+ "ticket_id": "ticket-018",
208
+ "title": "Question about enterprise tier pricing",
209
+ "requester": "finance@urbanstack.io",
210
+ "description": "We\u0027re comparing your enterprise plan against two competitors. Can you send over a detailed pricing breakdown?",
211
+ "issue_type": "service_request",
212
+ "priority": "medium",
213
+ "assignment_group": "procurement",
214
+ "resolution_action": "assign",
215
+ "ambiguity_note": null,
216
+ "related_ticket_id": null
217
+ },
218
+ {
219
+ "ticket_id": "ticket-019",
220
+ "title": "Make $5000/week with this one simple trick",
221
+ "requester": "noreply@quickcash.example",
222
+ "description": "No experience needed. Start earning today. Limited spots available. Act now before it\u0027s too late.",
223
+ "issue_type": "spam_phishing",
224
+ "priority": "low",
225
+ "assignment_group": "security_team",
226
+ "resolution_action": "ignore",
227
+ "ambiguity_note": null,
228
+ "related_ticket_id": null
229
+ },
230
+ {
231
+ "ticket_id": "ticket-020",
232
+ "title": "General inquiry about your platform capabilities",
233
+ "requester": "info@greenleaf.org",
234
+ "description": "Hi, I stumbled across your website and was curious about what your platform does. Can you send some information?",
235
+ "issue_type": "general_inquiry",
236
+ "priority": "low",
237
+ "assignment_group": "service_desk",
238
+ "resolution_action": "acknowledge",
239
+ "ambiguity_note": null,
240
+ "related_ticket_id": null
241
+ },
242
+ {
243
+ "ticket_id": "ticket-021",
244
+ "title": "Re: Production checkout throwing null reference exception",
245
+ "requester": "sre@paperkite.dev",
246
+ "description": "Following up on ticket-003. The hotfix was deployed but we\u0027re seeing a regression in staging. Same null reference on the payment confirmation page. This is still blocking.",
247
+ "issue_type": "application_support",
248
+ "priority": "critical",
249
+ "assignment_group": "application_team",
250
+ "resolution_action": "escalate",
251
+ "ambiguity_note": null,
252
+ "related_ticket_id": "ticket-003"
253
+ },
254
+ {
255
+ "ticket_id": "ticket-022",
256
+ "title": "Usage charge dispute tied to API failures",
257
+ "requester": "admin@crossfitbayarea.com",
258
+ "description": "Our usage charges increased while the integration returned 500 errors for two weeks. We need both charge review and API investigation before approving the invoice.",
259
+ "issue_type": "application_support",
260
+ "priority": "high",
261
+ "assignment_group": "application_team",
262
+ "resolution_action": "escalate",
263
+ "ambiguity_note": "Mentions billing, but the root cause is an application issue. The issue type could reasonably be billing_license or application_support.",
264
+ "related_ticket_id": null
265
+ },
266
+ {
267
+ "ticket_id": "ticket-023",
268
+ "title": "Cancel subscription and process final refund",
269
+ "requester": "ops@smallbatch.co",
270
+ "description": "We\u0027ve decided to go with another vendor. Please cancel our subscription effective immediately and refund the remaining balance on our annual plan.",
271
+ "issue_type": "billing_license",
272
+ "priority": "medium",
273
+ "assignment_group": "license_ops",
274
+ "resolution_action": "fulfill",
275
+ "ambiguity_note": null,
276
+ "related_ticket_id": null
277
+ },
278
+ {
279
+ "ticket_id": "ticket-024",
280
+ "title": "SSO configuration failing silently",
281
+ "requester": "it@megacorp.com",
282
+ "description": "We configured SAML SSO per your docs but users get redirected to a blank page. No error messages. This is affecting 2000+ employees.",
283
+ "issue_type": "application_support",
284
+ "priority": "critical",
285
+ "assignment_group": "application_team",
286
+ "resolution_action": "escalate",
287
+ "ambiguity_note": null,
288
+ "related_ticket_id": null
289
+ },
290
+ {
291
+ "ticket_id": "ticket-025",
292
+ "title": "Data residency requirements for EU deployment",
293
+ "requester": "dpo@nordicbank.fi",
294
+ "description": "We need confirmation that all data for EU customers is stored within EU borders. Please provide your data processing addendum.",
295
+ "issue_type": "security_compliance",
296
+ "priority": "high",
297
+ "assignment_group": "security_team",
298
+ "resolution_action": "fulfill",
299
+ "ambiguity_note": null,
300
+ "related_ticket_id": null
301
+ },
302
+ {
303
+ "ticket_id": "ticket-026",
304
+ "title": "Positive feedback on recent API support case",
305
+ "requester": "pm@littlefox.dev",
306
+ "description": "Sharing positive feedback after last week\u0027s API support case. No action is needed beyond acknowledging the note and logging the feedback.",
307
+ "issue_type": "general_inquiry",
308
+ "priority": "low",
309
+ "assignment_group": "service_desk",
310
+ "resolution_action": "acknowledge",
311
+ "ambiguity_note": null,
312
+ "related_ticket_id": null
313
+ },
314
+ {
315
+ "ticket_id": "ticket-027",
316
+ "title": "Vendor upgrade offer for Premium tier",
317
+ "requester": "marketing@legitsaas.com",
318
+ "description": "A current vendor sent a 30% Premium-tier offer that expires in 48 hours. The team is unsure whether this should just be acknowledged or routed for procurement review.",
319
+ "issue_type": "general_inquiry",
320
+ "priority": "low",
321
+ "assignment_group": "service_desk",
322
+ "resolution_action": "acknowledge",
323
+ "ambiguity_note": "Could be treated as general_inquiry or escalated into a service_request if procurement wants to review the offer.",
324
+ "related_ticket_id": null
325
+ },
326
+ {
327
+ "ticket_id": "ticket-028",
328
+ "title": "Webhook delivery failures since Tuesday",
329
+ "requester": "backend@paystream.io",
330
+ "description": "Our webhook endpoint hasn\u0027t received any events since Tuesday. We\u0027ve verified our server is up. Is there an outage on your side?",
331
+ "issue_type": "application_support",
332
+ "priority": "high",
333
+ "assignment_group": "application_team",
334
+ "resolution_action": "fulfill",
335
+ "ambiguity_note": null,
336
+ "related_ticket_id": null
337
+ },
338
+ {
339
+ "ticket_id": "ticket-029",
340
+ "title": "Seat expansion request with prorating question",
341
+ "requester": "admin@growthworks.co",
342
+ "description": "Our team needs 50 additional seats immediately. We also need to know how prorating will be handled before the change is approved.",
343
+ "issue_type": "service_request",
344
+ "priority": "medium",
345
+ "assignment_group": "procurement",
346
+ "resolution_action": "assign",
347
+ "ambiguity_note": "Could be billing_license (prorating) or service_request (seat expansion).",
348
+ "related_ticket_id": null
349
+ },
350
+ {
351
+ "ticket_id": "ticket-030",
352
+ "title": "Account suspended without warning",
353
+ "requester": "ceo@startupxyz.io",
354
+ "description": "Our entire company account was suspended this morning with no prior notice. We have 80 employees locked out. This is unacceptable and needs immediate resolution.",
355
+ "issue_type": "identity_access",
356
+ "priority": "critical",
357
+ "assignment_group": "service_desk",
358
+ "resolution_action": "escalate",
359
+ "ambiguity_note": null,
360
+ "related_ticket_id": null
361
+ },
362
+ {
363
+ "ticket_id": "ticket-031",
364
+ "title": "Payment method update required",
365
+ "requester": "billing@yourplatform.com",
366
+ "description": "The credit card on file for account #7829 expired last month. We attempted to charge three times without success. Please update your payment method to avoid service interruption.",
367
+ "issue_type": "billing_license",
368
+ "priority": "medium",
369
+ "assignment_group": "license_ops",
370
+ "resolution_action": "fulfill",
371
+ "ambiguity_note": null,
372
+ "related_ticket_id": null
373
+ },
374
+ {
375
+ "ticket_id": "ticket-032",
376
+ "title": "Penetration test results — critical vulnerabilities found",
377
+ "requester": "security@redteam-auditors.com",
378
+ "description": "Our pentest revealed two critical and five high-severity vulnerabilities in your API endpoints. Full report attached. Remediation should begin immediately.",
379
+ "issue_type": "security_compliance",
380
+ "priority": "critical",
381
+ "assignment_group": "security_team",
382
+ "resolution_action": "escalate",
383
+ "ambiguity_note": null,
384
+ "related_ticket_id": null
385
+ },
386
+ {
387
+ "ticket_id": "ticket-033",
388
+ "title": "Getting started guide seems outdated",
389
+ "requester": "newuser@freshstart.io",
390
+ "description": "I just signed up yesterday and the getting started guide references features that don\u0027t seem to exist in the current UI. Can you point me to updated docs?",
391
+ "issue_type": "onboarding",
392
+ "priority": "medium",
393
+ "assignment_group": "onboarding_ops",
394
+ "resolution_action": "fulfill",
395
+ "ambiguity_note": null,
396
+ "related_ticket_id": null
397
+ },
398
+ {
399
+ "ticket_id": "ticket-034",
400
+ "title": "Mobile app crashes on launch after latest update",
401
+ "requester": "qa@betatesters.org",
402
+ "description": "Version 4.2.1 crashes immediately on iOS 18. Reproducible on iPhone 15 and 16. Stack trace included below.",
403
+ "issue_type": "application_support",
404
+ "priority": "high",
405
+ "assignment_group": "application_team",
406
+ "resolution_action": "fulfill",
407
+ "ambiguity_note": null,
408
+ "related_ticket_id": null
409
+ },
410
+ {
411
+ "ticket_id": "ticket-035",
412
+ "title": "Wire transfer for annual enterprise contract",
413
+ "requester": "treasury@bigbank.com",
414
+ "description": "We\u0027ve initiated a wire transfer of $240,000 for the annual enterprise contract. Please confirm receipt and send the signed agreement.",
415
+ "issue_type": "billing_license",
416
+ "priority": "high",
417
+ "assignment_group": "license_ops",
418
+ "resolution_action": "fulfill",
419
+ "ambiguity_note": null,
420
+ "related_ticket_id": null
421
+ },
422
+ {
423
+ "ticket_id": "ticket-036",
424
+ "title": "Can we get API access for a proof of concept?",
425
+ "requester": "architect@cloudnine.tech",
426
+ "description": "We are evaluating your platform for a large migration project. Is there a sandbox or trial API we can use for a 2-week proof of concept?",
427
+ "issue_type": "service_request",
428
+ "priority": "medium",
429
+ "assignment_group": "procurement",
430
+ "resolution_action": "assign",
431
+ "ambiguity_note": null,
432
+ "related_ticket_id": null
433
+ },
434
+ {
435
+ "ticket_id": "ticket-037",
436
+ "title": "Earn a degree in just 2 weeks!",
437
+ "requester": "admissions@diplomamill.example",
438
+ "description": "No exams, no classes. Get your accredited degree today. Reply for more information.",
439
+ "issue_type": "spam_phishing",
440
+ "priority": "low",
441
+ "assignment_group": "security_team",
442
+ "resolution_action": "ignore",
443
+ "ambiguity_note": null,
444
+ "related_ticket_id": null
445
+ },
446
+ {
447
+ "ticket_id": "ticket-038",
448
+ "title": "Re: Invoice discrepancy for order #4821",
449
+ "requester": "accounts@meridianfoods.com",
450
+ "description": "Following up on ticket-015. We still haven\u0027t received the corrected invoice. Our payment is now 15 days overdue because of this. Please prioritize.",
451
+ "issue_type": "billing_license",
452
+ "priority": "critical",
453
+ "assignment_group": "license_ops",
454
+ "resolution_action": "escalate",
455
+ "ambiguity_note": null,
456
+ "related_ticket_id": "ticket-015"
457
+ },
458
+ {
459
+ "ticket_id": "ticket-039",
460
+ "title": "MFA enrollment mandatory for all users by EOD Friday",
461
+ "requester": "security@internal.corp",
462
+ "description": "Per our updated security policy, all user accounts must have MFA enabled by end of day Friday. Non-compliant accounts will be suspended.",
463
+ "issue_type": "security_compliance",
464
+ "priority": "high",
465
+ "assignment_group": "security_team",
466
+ "resolution_action": "fulfill",
467
+ "ambiguity_note": null,
468
+ "related_ticket_id": null
469
+ },
470
+ {
471
+ "ticket_id": "ticket-040",
472
+ "title": "Reporting module needs better export options",
473
+ "requester": "analyst@datacrunchers.co",
474
+ "description": "CSV export exists, but the team also needs Excel and PDF with date filters. This blocks monthly reporting and could be interpreted as either a feature gap or an application-support issue.",
475
+ "issue_type": "feature_request",
476
+ "priority": "medium",
477
+ "assignment_group": "application_team",
478
+ "resolution_action": "acknowledge",
479
+ "ambiguity_note": "Could be feature_request or application_support depending on urgency interpretation.",
480
+ "related_ticket_id": null
481
+ },
482
+ {
483
+ "ticket_id": "ticket-041",
484
+ "title": "Account access request for new contractor",
485
+ "requester": "pm@buildit.agency",
486
+ "description": "We have a new contractor starting next week who needs read-only access to our project dashboard. Please set up their account.",
487
+ "issue_type": "onboarding",
488
+ "priority": "medium",
489
+ "assignment_group": "onboarding_ops",
490
+ "resolution_action": "fulfill",
491
+ "ambiguity_note": null,
492
+ "related_ticket_id": null
493
+ },
494
+ {
495
+ "ticket_id": "ticket-042",
496
+ "title": "Database migration script failing on large tables",
497
+ "requester": "dba@megastore.com",
498
+ "description": "The v3 to v4 migration script times out on tables with more than 10M rows. We have three such tables. Need guidance or a fix.",
499
+ "issue_type": "application_support",
500
+ "priority": "high",
501
+ "assignment_group": "application_team",
502
+ "resolution_action": "fulfill",
503
+ "ambiguity_note": null,
504
+ "related_ticket_id": null
505
+ },
506
+ {
507
+ "ticket_id": "ticket-043",
508
+ "title": "Negotiate volume discount for 1000+ licenses",
509
+ "requester": "procurement@globalcorp.com",
510
+ "description": "We\u0027re looking to standardize on your platform across all subsidiaries. Approximately 1200 seats. What volume discount can you offer?",
511
+ "issue_type": "service_request",
512
+ "priority": "high",
513
+ "assignment_group": "procurement",
514
+ "resolution_action": "assign",
515
+ "ambiguity_note": null,
516
+ "related_ticket_id": null
517
+ },
518
+ {
519
+ "ticket_id": "ticket-044",
520
+ "title": "Your account has been compromised — act now",
521
+ "requester": "security-alert@phishing.example",
522
+ "description": "We detected unusual activity on your account. Click the link below to verify your identity and secure your account immediately.",
523
+ "issue_type": "spam_phishing",
524
+ "priority": "low",
525
+ "assignment_group": "security_team",
526
+ "resolution_action": "ignore",
527
+ "ambiguity_note": null,
528
+ "related_ticket_id": null
529
+ },
530
+ {
531
+ "ticket_id": "ticket-045",
532
+ "title": "Re: Account suspended without warning",
533
+ "requester": "ceo@startupxyz.io",
534
+ "description": "This is my third update about this in 24 hours. 80 people are still locked out. If this isn\u0027t resolved in the next 2 hours we\u0027re escalating to legal. Reference ticket-030.",
535
+ "issue_type": "identity_access",
536
+ "priority": "critical",
537
+ "assignment_group": "service_desk",
538
+ "resolution_action": "escalate",
539
+ "ambiguity_note": null,
540
+ "related_ticket_id": "ticket-030"
541
+ }
542
+ ]
543
+
environment.cpython-313.pyc ADDED
Binary file (6.66 kB). View file
 
grader.cpython-313.pyc ADDED
Binary file (3.25 kB). View file
 
inference.py ADDED
@@ -0,0 +1,276 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ Inference script for the IT Helpdesk Ticket Routing OpenEnv environment.
4
+
5
+ Uses the competition-mandated environment variables:
6
+ API_BASE_URL - LLM provider base URL
7
+ MODEL_NAME - model identifier
8
+ HF_TOKEN - authentication token
9
+
10
+ Can run against a local server (default http://localhost:8000) or a
11
+ remote HuggingFace Space URL passed via ENV_URL.
12
+
13
+ Uses the WebSocket-based EnvClient for multi-step episodes.
14
+ """
15
+ from __future__ import annotations
16
+
17
+ import json
18
+ import os
19
+
20
+ import httpx
21
+ from openai import OpenAI
22
+
23
+ from client import HelpdeskTicketEnvClient
24
+ from models import HelpdeskTicketAction
25
+ from vocabulary import (
26
+ ASSIGNMENT_GROUPS,
27
+ ISSUE_TYPES,
28
+ ISSUE_TYPE_TO_ASSIGNMENT_GROUP,
29
+ ISSUE_TYPE_TO_RESOLUTION_ACTION,
30
+ PRIORITIES,
31
+ RESOLUTION_ACTIONS,
32
+ TASK_IDS,
33
+ )
34
+
35
+ # ---------------------------------------------------------------------------
36
+ # Configuration
37
+ # ---------------------------------------------------------------------------
38
+ API_BASE_URL = os.getenv("API_BASE_URL", "https://router.huggingface.co/v1")
39
+ MODEL_NAME = os.getenv("MODEL_NAME", "")
40
+ HF_TOKEN = os.getenv("HF_TOKEN", "")
41
+ ENV_URL = os.getenv("ENV_URL", "http://localhost:8000")
42
+
43
+ SEED = 42
44
+ TASKS = list(TASK_IDS)
45
+
46
+ # ---------------------------------------------------------------------------
47
+ # LLM helper
48
+ # ---------------------------------------------------------------------------
49
+
50
+ llm_client: OpenAI | None = None
51
+
52
+ if MODEL_NAME and HF_TOKEN:
53
+ llm_client = OpenAI(base_url=API_BASE_URL, api_key=HF_TOKEN)
54
+
55
+
56
+ SYSTEM_PROMPT = """\
57
+ You are an expert IT helpdesk ticket routing agent. Given a helpdesk ticket, you must produce a JSON object with the requested fields.
58
+
59
+ Valid values:
60
+ - issue_type: {issue_types}
61
+ - priority: {priorities}
62
+ - assignment_group: {assignment_groups}
63
+ - resolution_action: {resolution_actions}
64
+
65
+ Return ONLY valid JSON with the requested fields. No markdown, no explanation.""".format(
66
+ issue_types=", ".join(ISSUE_TYPES),
67
+ priorities=", ".join(PRIORITIES),
68
+ assignment_groups=", ".join(ASSIGNMENT_GROUPS),
69
+ resolution_actions=", ".join(RESOLUTION_ACTIONS),
70
+ )
71
+
72
+
73
+ def call_llm(ticket: dict, allowed_fields: list[str], instructions: str) -> dict:
74
+ assert llm_client is not None, "LLM client not configured"
75
+
76
+ user_msg = (
77
+ f"Instructions: {instructions}\n\n"
78
+ f"Allowed fields: {', '.join(allowed_fields)}\n\n"
79
+ f"Title: {ticket['title']}\n"
80
+ f"Requester: {ticket['requester']}\n"
81
+ f"Description: {ticket['description']}\n\n"
82
+ f"Respond with JSON containing ONLY these fields: {', '.join(allowed_fields)}"
83
+ )
84
+
85
+ response = llm_client.chat.completions.create(
86
+ model=MODEL_NAME,
87
+ messages=[
88
+ {"role": "system", "content": SYSTEM_PROMPT},
89
+ {"role": "user", "content": user_msg},
90
+ ],
91
+ temperature=0.0,
92
+ max_tokens=256,
93
+ )
94
+
95
+ text = response.choices[0].message.content or "{}"
96
+ text = text.strip()
97
+ if text.startswith("```"):
98
+ text = text.split("\n", 1)[-1].rsplit("```", 1)[0].strip()
99
+
100
+ try:
101
+ return json.loads(text)
102
+ except json.JSONDecodeError:
103
+ return {}
104
+
105
+
106
+ # ---------------------------------------------------------------------------
107
+ # Heuristic fallback (no LLM needed)
108
+ # ---------------------------------------------------------------------------
109
+
110
+ KEYWORD_ISSUE_TYPES = {
111
+ "invoice": "billing_license",
112
+ "charge": "billing_license",
113
+ "refund": "billing_license",
114
+ "payment": "billing_license",
115
+ "billing": "billing_license",
116
+ "license": "billing_license",
117
+ "sign in": "identity_access",
118
+ "login": "identity_access",
119
+ "password": "identity_access",
120
+ "locked": "identity_access",
121
+ "2fa": "identity_access",
122
+ "sso": "identity_access",
123
+ "bug": "application_support",
124
+ "error": "application_support",
125
+ "exception": "application_support",
126
+ "crash": "application_support",
127
+ "production": "application_support",
128
+ "latency": "application_support",
129
+ "timeout": "application_support",
130
+ "webhook": "application_support",
131
+ "migration": "application_support",
132
+ "pricing": "service_request",
133
+ "quote": "service_request",
134
+ "demo": "service_request",
135
+ "enterprise": "service_request",
136
+ "rollout": "service_request",
137
+ "sandbox": "service_request",
138
+ "trial": "service_request",
139
+ "seat": "service_request",
140
+ "seats": "service_request",
141
+ "spam": "spam_phishing",
142
+ "click now": "spam_phishing",
143
+ "guaranteed": "spam_phishing",
144
+ "unsubscribe": "spam_phishing",
145
+ "phishing": "spam_phishing",
146
+ "compromised": "spam_phishing",
147
+ "compliance": "security_compliance",
148
+ "regulation": "security_compliance",
149
+ "gdpr": "security_compliance",
150
+ "audit": "security_compliance",
151
+ "pentest": "security_compliance",
152
+ "vulnerabilities": "security_compliance",
153
+ "security policy": "security_compliance",
154
+ "onboarding": "onboarding",
155
+ "welcome": "onboarding",
156
+ "getting started": "onboarding",
157
+ "new hire": "onboarding",
158
+ "contractor": "onboarding",
159
+ "feedback": "feature_request",
160
+ "suggestion": "feature_request",
161
+ "improve": "feature_request",
162
+ "roadmap": "feature_request",
163
+ "export": "feature_request",
164
+ }
165
+
166
+ def heuristic_action(ticket: dict, allowed_fields: list[str]) -> dict:
167
+ text = (ticket.get("title", "") + " " + ticket.get("description", "")).lower()
168
+
169
+ issue_type = "general_inquiry"
170
+ for kw, mapped_issue_type in KEYWORD_ISSUE_TYPES.items():
171
+ if kw in text:
172
+ issue_type = mapped_issue_type
173
+ break
174
+
175
+ priority = "medium"
176
+ if any(w in text for w in ["urgent", "critical", "blocking", "asap", "immediately"]):
177
+ priority = "critical"
178
+ elif any(w in text for w in ["important", "high priority", "revenue"]):
179
+ priority = "high"
180
+ elif any(w in text for w in ["low", "whenever", "no rush"]):
181
+ priority = "low"
182
+
183
+ result: dict = {}
184
+ if "issue_type" in allowed_fields:
185
+ result["issue_type"] = issue_type
186
+ if "priority" in allowed_fields:
187
+ result["priority"] = priority
188
+ if "assignment_group" in allowed_fields:
189
+ result["assignment_group"] = ISSUE_TYPE_TO_ASSIGNMENT_GROUP.get(
190
+ issue_type, "service_desk"
191
+ )
192
+ if "resolution_action" in allowed_fields:
193
+ result["resolution_action"] = ISSUE_TYPE_TO_RESOLUTION_ACTION.get(
194
+ issue_type, "acknowledge"
195
+ )
196
+ return result
197
+
198
+
199
+ # ---------------------------------------------------------------------------
200
+ # Main loop using WebSocket client for multi-step episodes
201
+ # ---------------------------------------------------------------------------
202
+
203
+ def run():
204
+ # Quick HTTP health check
205
+ http = httpx.Client(base_url=ENV_URL, timeout=30.0)
206
+ health = http.get("/health")
207
+ health.raise_for_status()
208
+ print(f"Connected to {ENV_URL}: {health.json()}")
209
+
210
+ tasks_resp = http.get("/tasks")
211
+ tasks_resp.raise_for_status()
212
+ available_tasks = {t["id"]: t for t in tasks_resp.json()["tasks"]}
213
+ print(f"Available tasks: {[t['name'] for t in available_tasks.values()]}")
214
+ http.close()
215
+
216
+ all_scores: dict[int, list[float]] = {}
217
+
218
+ for task_id in TASKS:
219
+ if task_id not in available_tasks:
220
+ print(f"Task {task_id} not available, skipping")
221
+ continue
222
+
223
+ task = available_tasks[task_id]
224
+ print(f"\n--- Task {task_id}: {task['name']} ({task['difficulty']}) ---")
225
+
226
+ # Use sync WebSocket client for multi-step episode
227
+ sync_client = HelpdeskTicketEnvClient(base_url=ENV_URL).sync()
228
+ with sync_client:
229
+ result = sync_client.reset(seed=SEED, task_id=task_id)
230
+ obs = result.observation
231
+
232
+ task_scores: list[float] = []
233
+ step_num = 0
234
+
235
+ while not result.done:
236
+ ticket = obs.current_ticket
237
+ if ticket is None:
238
+ break
239
+
240
+ allowed = obs.allowed_fields
241
+ instructions = obs.instructions
242
+
243
+ if llm_client is not None:
244
+ action_dict = call_llm(ticket, allowed, instructions)
245
+ else:
246
+ action_dict = heuristic_action(ticket, allowed)
247
+
248
+ action = HelpdeskTicketAction(**action_dict)
249
+ result = sync_client.step(action)
250
+ obs = result.observation
251
+
252
+ step_num += 1
253
+ print(f" Step {step_num}: reward={result.reward} done={result.done}")
254
+
255
+ if result.reward is not None:
256
+ task_scores.append(result.reward)
257
+
258
+ all_scores[task_id] = task_scores
259
+ final = task_scores[-1] if task_scores else 0.0
260
+ print(f" Task {task_id} final reward: {final:.4f}")
261
+
262
+ # Summary
263
+ print("\n=== RESULTS ===")
264
+ overall = []
265
+ for tid in TASKS:
266
+ if tid in all_scores:
267
+ scores = all_scores[tid]
268
+ avg = sum(scores) / len(scores) if scores else 0.0
269
+ overall.append(avg)
270
+ print(f"Task {tid}: avg_score={avg:.4f} ({len(scores)} steps)")
271
+ if overall:
272
+ print(f"Overall: {sum(overall) / len(overall):.4f}")
273
+
274
+
275
+ if __name__ == "__main__":
276
+ run()
models.cpython-313.pyc ADDED
Binary file (2.62 kB). View file
 
models.py ADDED
@@ -0,0 +1,114 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+ from typing import Any, Optional
4
+
5
+ from pydantic import BaseModel, Field, field_validator
6
+ from openenv.core.env_server.types import Action, Observation, State
7
+ from vocabulary import (
8
+ ASSIGNMENT_GROUPS,
9
+ ISSUE_TYPES,
10
+ PRIORITIES,
11
+ RESOLUTION_ACTIONS,
12
+ )
13
+
14
+
15
+ ISSUE_TYPE_SET = set(ISSUE_TYPES)
16
+ PRIORITY_SET = set(PRIORITIES)
17
+ ASSIGNMENT_GROUP_SET = set(ASSIGNMENT_GROUPS)
18
+ RESOLUTION_ACTION_SET = set(RESOLUTION_ACTIONS)
19
+
20
+
21
+ def _validate_choice(value: str, allowed: set[str], field_name: str) -> str:
22
+ if value not in allowed:
23
+ allowed_values = ", ".join(sorted(allowed))
24
+ raise ValueError(f"{field_name} must be one of: {allowed_values}")
25
+ return value
26
+
27
+
28
+ def _validate_optional_choice(
29
+ value: Optional[str], allowed: set[str], field_name: str
30
+ ) -> Optional[str]:
31
+ if value is None:
32
+ return None
33
+ return _validate_choice(value, allowed, field_name)
34
+
35
+
36
+ class HelpdeskTicketRecord(BaseModel):
37
+ ticket_id: str
38
+ title: str
39
+ requester: str
40
+ description: str
41
+ issue_type: str
42
+ priority: str
43
+ assignment_group: str
44
+ resolution_action: str
45
+ ambiguity_note: Optional[str] = None
46
+ related_ticket_id: Optional[str] = None
47
+
48
+ @field_validator("issue_type")
49
+ @classmethod
50
+ def validate_issue_type(cls, value: str) -> str:
51
+ return _validate_choice(value, ISSUE_TYPE_SET, "issue_type")
52
+
53
+ @field_validator("priority")
54
+ @classmethod
55
+ def validate_priority(cls, value: str) -> str:
56
+ return _validate_choice(value, PRIORITY_SET, "priority")
57
+
58
+ @field_validator("assignment_group")
59
+ @classmethod
60
+ def validate_assignment_group(cls, value: str) -> str:
61
+ return _validate_choice(value, ASSIGNMENT_GROUP_SET, "assignment_group")
62
+
63
+ @field_validator("resolution_action")
64
+ @classmethod
65
+ def validate_resolution_action(cls, value: str) -> str:
66
+ return _validate_choice(value, RESOLUTION_ACTION_SET, "resolution_action")
67
+
68
+
69
+ class HelpdeskTicketAction(Action):
70
+ issue_type: Optional[str] = None
71
+ priority: Optional[str] = None
72
+ assignment_group: Optional[str] = None
73
+ resolution_action: Optional[str] = None
74
+
75
+ @field_validator("issue_type")
76
+ @classmethod
77
+ def validate_issue_type(cls, value: Optional[str]) -> Optional[str]:
78
+ return _validate_optional_choice(value, ISSUE_TYPE_SET, "issue_type")
79
+
80
+ @field_validator("priority")
81
+ @classmethod
82
+ def validate_priority(cls, value: Optional[str]) -> Optional[str]:
83
+ return _validate_optional_choice(value, PRIORITY_SET, "priority")
84
+
85
+ @field_validator("assignment_group")
86
+ @classmethod
87
+ def validate_assignment_group(cls, value: Optional[str]) -> Optional[str]:
88
+ return _validate_optional_choice(value, ASSIGNMENT_GROUP_SET, "assignment_group")
89
+
90
+ @field_validator("resolution_action")
91
+ @classmethod
92
+ def validate_resolution_action(cls, value: Optional[str]) -> Optional[str]:
93
+ return _validate_optional_choice(value, RESOLUTION_ACTION_SET, "resolution_action")
94
+
95
+
96
+ class HelpdeskTicketObservation(Observation):
97
+ task_id: int = 0
98
+ task_name: str = ""
99
+ instructions: str = ""
100
+ allowed_fields: list[str] = Field(default_factory=list)
101
+ current_ticket: Optional[dict[str, str]] = None
102
+ queue_size: int = 0
103
+ tickets_remaining: int = 0
104
+ tickets_processed: int = 0
105
+ history: list[dict[str, Any]] = Field(default_factory=list)
106
+
107
+
108
+ class HelpdeskTicketState(State):
109
+ current_task_id: Optional[int] = None
110
+ seed: Optional[int] = None
111
+ queue_ticket_ids: list[str] = Field(default_factory=list)
112
+ current_ticket_index: int = 0
113
+ per_ticket_scores: list[float] = Field(default_factory=list)
114
+ total_reward: float = 0.0
openenv.yaml ADDED
@@ -0,0 +1,59 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ name: it_helpdesk_ticket_routing_openenv
2
+ version: "0.1.0"
3
+ description: >
4
+ Deterministic IT helpdesk ticket routing environment for issue classification,
5
+ prioritization, assignment, and resolution decisions. Built on the OpenEnv framework.
6
+ author: Hackstreet Boys - Roopal Guha Neogi, Suyash Kumar
7
+
8
+ environment:
9
+ type: openenv
10
+ entry_point: server.environment:HelpdeskTicketRoutingEnvironment
11
+ action_model: models:HelpdeskTicketAction
12
+ observation_model: models:HelpdeskTicketObservation
13
+ state_model: models:HelpdeskTicketState
14
+
15
+ tasks:
16
+ - name: Issue Type Classification
17
+ difficulty: easy
18
+ objective: Predict the correct IT issue type for a helpdesk ticket.
19
+ - name: Issue Type And Priority
20
+ difficulty: medium
21
+ objective: Predict the correct issue type and priority.
22
+ - name: Full Ticket Routing
23
+ difficulty: hard
24
+ objective: Predict issue type, priority, assignment group, and resolution action.
25
+
26
+ api:
27
+ endpoints:
28
+ - /health
29
+ - /reset
30
+ - /step
31
+ - /state
32
+ - /tasks
33
+ - /docs
34
+
35
+ evaluation:
36
+ reward_range:
37
+ min: 0.0
38
+ max: 1.0
39
+ deterministic: true
40
+
41
+ grading: normalized
42
+ reproducible: true
43
+
44
+ inference:
45
+ script: inference.py
46
+ env_vars:
47
+ - API_BASE_URL
48
+ - MODEL_NAME
49
+ - HF_TOKEN
50
+
51
+ requirements:
52
+ python: ">=3.11"
53
+ dependencies:
54
+ - openenv-core
55
+ - fastapi>=0.115
56
+ - pydantic>=2.7
57
+ - uvicorn>=0.30
58
+ - httpx>=0.25
59
+ - openai>=1.68
pyproject.toml ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [build-system]
2
+ requires = ["setuptools>=68.0", "wheel"]
3
+ build-backend = "setuptools.build_meta"
4
+
5
+ [project]
6
+ name = "it-helpdesk-ticket-routing-openenv"
7
+ version = "0.1.0"
8
+ description = "IT helpdesk ticket routing environment for the OpenEnv framework"
9
+ requires-python = ">=3.11"
10
+ dependencies = [
11
+ "openenv-core @ git+https://github.com/meta-pytorch/OpenEnv.git",
12
+ "fastapi>=0.115",
13
+ "pydantic>=2.7",
14
+ "uvicorn>=0.30",
15
+ "openai>=1.0",
16
+ "httpx>=0.25",
17
+ ]
18
+
19
+ [project.optional-dependencies]
20
+ dev = ["pytest", "httpx"]
21
+
22
+ [tool.setuptools]
23
+ py-modules = ["models", "client", "vocabulary"]
24
+
25
+ [tool.setuptools.packages.find]
26
+ include = ["server*"]
requirements.txt ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ openenv-core @ git+https://github.com/meta-pytorch/OpenEnv.git
2
+ fastapi>=0.115
3
+ pydantic>=2.7
4
+ uvicorn>=0.30
5
+ openai>=1.0
6
+ httpx>=0.25
reward.cpython-313.pyc ADDED
Binary file (1 kB). View file
 
server/Dockerfile ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ FROM python:3.11-slim
2
+
3
+ WORKDIR /app
4
+
5
+ COPY requirements.txt .
6
+ RUN pip install --no-cache-dir -r requirements.txt
7
+
8
+ COPY . .
9
+
10
+ EXPOSE 7860
11
+
12
+ CMD ["uvicorn", "server.app:app", "--host", "0.0.0.0", "--port", "7860"]
server/app.py ADDED
@@ -0,0 +1,43 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import sys
2
+ from pathlib import Path
3
+
4
+ # Ensure repo root is on sys.path so `models` and `server` are importable
5
+ _repo_root = str(Path(__file__).resolve().parent.parent)
6
+ if _repo_root not in sys.path:
7
+ sys.path.insert(0, _repo_root)
8
+
9
+ from openenv.core.env_server import create_app
10
+
11
+ from models import HelpdeskTicketAction, HelpdeskTicketObservation
12
+ from server.environment import HelpdeskTicketRoutingEnvironment
13
+ from server.tasks import TASKS
14
+ from vocabulary import APP_ENV_NAME
15
+
16
+ app = create_app(
17
+ HelpdeskTicketRoutingEnvironment,
18
+ HelpdeskTicketAction,
19
+ HelpdeskTicketObservation,
20
+ env_name=APP_ENV_NAME,
21
+ )
22
+
23
+
24
+ @app.get("/tasks")
25
+ def list_tasks():
26
+ return {
27
+ "tasks": [
28
+ {
29
+ "id": t["id"],
30
+ "name": t["name"],
31
+ "difficulty": t["difficulty"],
32
+ "instructions": t["instructions"],
33
+ "allowed_fields": t["allowed_fields"],
34
+ }
35
+ for t in TASKS.values()
36
+ ]
37
+ }
38
+
39
+
40
+ if __name__ == "__main__":
41
+ import uvicorn
42
+
43
+ uvicorn.run("server.app:app", host="0.0.0.0", port=8000, reload=True)
server/environment.py ADDED
@@ -0,0 +1,163 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+ import random
4
+ import uuid
5
+ from typing import Any, Optional
6
+
7
+ from openenv.core.env_server.interfaces import Environment
8
+
9
+ from models import (
10
+ HelpdeskTicketAction,
11
+ HelpdeskTicketObservation,
12
+ HelpdeskTicketRecord,
13
+ HelpdeskTicketState,
14
+ )
15
+ from server.grader import grade_action
16
+ from server.reward import compute_step_reward, compute_trajectory_reward
17
+ from server.tasks import get_task_definition, load_dataset
18
+
19
+
20
+ QUEUE_SIZE_RANGE = (3, 5)
21
+
22
+
23
+ class HelpdeskTicketRoutingEnvironment(
24
+ Environment[HelpdeskTicketAction, HelpdeskTicketObservation, HelpdeskTicketState]
25
+ ):
26
+ def __init__(self) -> None:
27
+ super().__init__()
28
+ self._dataset = load_dataset()
29
+ self._rng = random.Random()
30
+ self._queue: list[HelpdeskTicketRecord] = []
31
+ self._state = HelpdeskTicketState()
32
+
33
+ # ------------------------------------------------------------------
34
+ # OpenEnv required interface
35
+ # ------------------------------------------------------------------
36
+
37
+ def reset(
38
+ self,
39
+ seed: Optional[int] = None,
40
+ episode_id: Optional[str] = None,
41
+ **kwargs: Any,
42
+ ) -> HelpdeskTicketObservation:
43
+ task_id: int = kwargs.get("task_id", 1)
44
+ task = get_task_definition(task_id)
45
+
46
+ if seed is not None:
47
+ self._rng.seed(seed)
48
+
49
+ queue_size = self._rng.randint(*QUEUE_SIZE_RANGE)
50
+ self._queue = self._rng.sample(self._dataset, min(queue_size, len(self._dataset)))
51
+
52
+ self._state = HelpdeskTicketState(
53
+ episode_id=episode_id or str(uuid.uuid4()),
54
+ step_count=0,
55
+ current_task_id=task_id,
56
+ seed=seed,
57
+ queue_ticket_ids=[t.ticket_id for t in self._queue],
58
+ current_ticket_index=0,
59
+ per_ticket_scores=[],
60
+ total_reward=0.0,
61
+ )
62
+
63
+ return self._build_observation(task)
64
+
65
+ def step(
66
+ self,
67
+ action: HelpdeskTicketAction,
68
+ timeout_s: Optional[float] = None,
69
+ **kwargs: Any,
70
+ ) -> HelpdeskTicketObservation:
71
+ if not self._queue or self._state.current_task_id is None:
72
+ raise RuntimeError("Environment has not been reset.")
73
+
74
+ idx = self._state.current_ticket_index
75
+ if idx >= len(self._queue):
76
+ raise RuntimeError("Episode already done — call reset().")
77
+
78
+ current_ticket = self._queue[idx]
79
+ task_id = self._state.current_task_id
80
+ task = get_task_definition(task_id)
81
+
82
+ score, breakdown = grade_action(action, current_ticket, task_id)
83
+ step_reward = compute_step_reward(score)
84
+
85
+ self._state.per_ticket_scores.append(score)
86
+ self._state.step_count += 1
87
+ self._state.current_ticket_index += 1
88
+
89
+ is_done = self._state.current_ticket_index >= len(self._queue)
90
+
91
+ if is_done:
92
+ traj_reward = compute_trajectory_reward(
93
+ self._state.per_ticket_scores,
94
+ len(self._queue),
95
+ self._state.step_count,
96
+ )
97
+ self._state.total_reward = traj_reward
98
+ final_reward = traj_reward
99
+ else:
100
+ final_reward = step_reward
101
+
102
+ history_entry = {
103
+ "ticket_id": current_ticket.ticket_id,
104
+ "score": score,
105
+ "breakdown": breakdown,
106
+ }
107
+
108
+ return self._build_observation(
109
+ task,
110
+ done=is_done,
111
+ reward=final_reward,
112
+ extra_history=history_entry,
113
+ )
114
+
115
+ @property
116
+ def state(self) -> HelpdeskTicketState:
117
+ return self._state.model_copy(deep=True)
118
+
119
+ # ------------------------------------------------------------------
120
+ # Helpers
121
+ # ------------------------------------------------------------------
122
+
123
+ def _build_observation(
124
+ self,
125
+ task: dict,
126
+ done: bool = False,
127
+ reward: float | None = None,
128
+ extra_history: dict | None = None,
129
+ ) -> HelpdeskTicketObservation:
130
+ idx = self._state.current_ticket_index
131
+ queue_size = len(self._queue)
132
+
133
+ if idx < queue_size:
134
+ ticket = self._queue[idx]
135
+ ticket_view = {
136
+ "ticket_id": ticket.ticket_id,
137
+ "title": ticket.title,
138
+ "requester": ticket.requester,
139
+ "description": ticket.description,
140
+ }
141
+ else:
142
+ ticket_view = None
143
+
144
+ history: list[dict] = []
145
+ for i, s in enumerate(self._state.per_ticket_scores):
146
+ history.append({"step": i + 1, "score": s})
147
+ if extra_history and history:
148
+ history[-1] = {"step": len(history), **extra_history}
149
+
150
+ return HelpdeskTicketObservation(
151
+ done=done,
152
+ reward=reward,
153
+ metadata={},
154
+ task_id=task["id"],
155
+ task_name=task["name"],
156
+ instructions=task["instructions"],
157
+ allowed_fields=list(task["allowed_fields"]),
158
+ current_ticket=ticket_view,
159
+ queue_size=queue_size,
160
+ tickets_remaining=max(0, queue_size - idx),
161
+ tickets_processed=idx,
162
+ history=history,
163
+ )
server/grader.py ADDED
@@ -0,0 +1,103 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+ from models import HelpdeskTicketAction, HelpdeskTicketRecord
4
+
5
+
6
+ ISSUE_TYPE_SIMILARITY = {
7
+ ("billing_license", "service_request"): 0.4,
8
+ ("service_request", "billing_license"): 0.4,
9
+ ("application_support", "identity_access"): 0.5,
10
+ ("identity_access", "application_support"): 0.5,
11
+ ("application_support", "feature_request"): 0.35,
12
+ ("feature_request", "application_support"): 0.35,
13
+ ("onboarding", "identity_access"): 0.4,
14
+ ("identity_access", "onboarding"): 0.4,
15
+ ("general_inquiry", "feature_request"): 0.3,
16
+ ("feature_request", "general_inquiry"): 0.3,
17
+ ("general_inquiry", "service_request"): 0.25,
18
+ ("service_request", "general_inquiry"): 0.25,
19
+ ("spam_phishing", "security_compliance"): 0.4,
20
+ ("security_compliance", "spam_phishing"): 0.4,
21
+ ("security_compliance", "billing_license"): 0.2,
22
+ ("billing_license", "security_compliance"): 0.2,
23
+ }
24
+
25
+ PRIORITY_SCORES = {
26
+ ("critical", "high"): 0.6,
27
+ ("high", "critical"): 0.6,
28
+ ("high", "medium"): 0.5,
29
+ ("medium", "high"): 0.5,
30
+ ("medium", "low"): 0.4,
31
+ ("low", "medium"): 0.4,
32
+ ("critical", "medium"): 0.3,
33
+ ("medium", "critical"): 0.3,
34
+ ("critical", "low"): 0.1,
35
+ ("low", "critical"): 0.1,
36
+ ("high", "low"): 0.2,
37
+ ("low", "high"): 0.2,
38
+ }
39
+
40
+
41
+ TASK_WEIGHTS = {
42
+ 1: {"issue_type": 1.0},
43
+ 2: {"issue_type": 0.6, "priority": 0.4},
44
+ 3: {
45
+ "issue_type": 0.35,
46
+ "priority": 0.20,
47
+ "assignment_group": 0.25,
48
+ "resolution_action": 0.20,
49
+ },
50
+ }
51
+
52
+
53
+ def _normalized(value: str | None) -> str:
54
+ return (value or "").strip().lower()
55
+
56
+
57
+ def _score_exact_or_similar(predicted: str | None, expected: str) -> float:
58
+ pred = _normalized(predicted)
59
+ exp = _normalized(expected)
60
+ if not pred:
61
+ return 0.0
62
+ if pred == exp:
63
+ return 1.0
64
+ return ISSUE_TYPE_SIMILARITY.get((pred, exp), 0.0)
65
+
66
+
67
+ def _score_priority(predicted: str | None, expected: str) -> float:
68
+ pred = _normalized(predicted)
69
+ exp = _normalized(expected)
70
+ if not pred:
71
+ return 0.0
72
+ if pred == exp:
73
+ return 1.0
74
+ return PRIORITY_SCORES.get((pred, exp), 0.0)
75
+
76
+
77
+ def _score_exact(predicted: str | None, expected: str) -> float:
78
+ return 1.0 if _normalized(predicted) == _normalized(expected) and predicted else 0.0
79
+
80
+
81
+ def grade_action(
82
+ action: HelpdeskTicketAction,
83
+ ticket: HelpdeskTicketRecord,
84
+ task_id: int,
85
+ ) -> tuple[float, dict[str, float]]:
86
+ if task_id not in TASK_WEIGHTS:
87
+ raise ValueError(f"Unsupported task_id: {task_id}")
88
+
89
+ field_scores = {
90
+ "issue_type": _score_exact_or_similar(action.issue_type, ticket.issue_type),
91
+ "priority": _score_priority(action.priority, ticket.priority),
92
+ "assignment_group": _score_exact(
93
+ action.assignment_group, ticket.assignment_group
94
+ ),
95
+ "resolution_action": _score_exact(
96
+ action.resolution_action, ticket.resolution_action
97
+ ),
98
+ }
99
+
100
+ weights = TASK_WEIGHTS[task_id]
101
+ score = sum(field_scores[field] * weight for field, weight in weights.items())
102
+ breakdown = {field: field_scores[field] for field in weights}
103
+ return score, breakdown
server/reward.py ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+
4
+ def compute_step_reward(score: float) -> float:
5
+ return max(0.0, min(1.0, score))
6
+
7
+
8
+ def compute_trajectory_reward(
9
+ per_ticket_scores: list[float], queue_size: int, steps_taken: int
10
+ ) -> float:
11
+ if not per_ticket_scores:
12
+ return 0.0
13
+ avg = sum(per_ticket_scores) / len(per_ticket_scores)
14
+ overshoot = max(0, steps_taken - queue_size)
15
+ penalty = overshoot * 0.03
16
+ return max(0.0, min(1.0, avg - penalty))
server/tasks.py ADDED
@@ -0,0 +1,60 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+ import json
4
+ from pathlib import Path
5
+
6
+ from models import HelpdeskTicketRecord
7
+ from vocabulary import TASK_IDS
8
+
9
+
10
+ TASKS = {
11
+ 1: {
12
+ "id": 1,
13
+ "name": "Issue Type Classification",
14
+ "difficulty": "easy",
15
+ "instructions": (
16
+ "Read the ticket and select the single best IT issue type."
17
+ ),
18
+ "allowed_fields": ["issue_type"],
19
+ },
20
+ 2: {
21
+ "id": 2,
22
+ "name": "Issue Type And Priority",
23
+ "difficulty": "medium",
24
+ "instructions": (
25
+ "Read the ticket, select the best IT issue type, and estimate the "
26
+ "correct operational priority."
27
+ ),
28
+ "allowed_fields": ["issue_type", "priority"],
29
+ },
30
+ 3: {
31
+ "id": 3,
32
+ "name": "Full Ticket Routing",
33
+ "difficulty": "hard",
34
+ "instructions": (
35
+ "Perform full helpdesk triage by selecting the best issue type, "
36
+ "priority, assignment group, and resolution action for the ticket."
37
+ ),
38
+ "allowed_fields": [
39
+ "issue_type",
40
+ "priority",
41
+ "assignment_group",
42
+ "resolution_action",
43
+ ],
44
+ },
45
+ }
46
+
47
+ assert tuple(TASKS.keys()) == TASK_IDS
48
+
49
+
50
+ def load_dataset() -> list[HelpdeskTicketRecord]:
51
+ dataset_path = Path(__file__).resolve().parent.parent / "data" / "dataset.json"
52
+ with dataset_path.open("r", encoding="utf-8") as f:
53
+ raw = json.load(f)
54
+ return [HelpdeskTicketRecord.model_validate(r) for r in raw]
55
+
56
+
57
+ def get_task_definition(task_id: int) -> dict:
58
+ if task_id not in TASKS:
59
+ raise ValueError(f"Unsupported task_id: {task_id}")
60
+ return TASKS[task_id]
studymaterialLinks ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ The following study material links were provided from the competeition-
2
+
3
+ Module 1: Why OpenEnv?
4
+ https://github.com/meta-pytorch/OpenEnv/blob/main/tutorial/01-environments.md
5
+
6
+ Module 2: Using Existing Environments
7
+ https://github.com/meta-pytorch/OpenEnv/blob/main/tutorial/02-deployment.md
8
+
9
+ Module 3: Deploying Environments
10
+ https://github.com/meta-pytorch/OpenEnv/blob/main/tutorial/03-scaling.md
11
+
12
+ Module 4: Building Your Own Environment
13
+
14
+ MOST IMPORTANT FOR ROUND 1
15
+ https://github.com/meta-pytorch/OpenEnv/blob/main/tutorial/04-training.md
16
+
tasks.cpython-313.pyc ADDED
Binary file (1.93 kB). View file
 
vocabulary.py ADDED
@@ -0,0 +1,67 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+ TEAM_NAME = "Hackstreet Boys"
4
+ TEAM_MEMBERS = ("Roopal Guha Neogi", "Suyash Kumar")
5
+
6
+ PROJECT_TITLE = "IT Helpdesk Ticket Routing OpenEnv"
7
+ DOMAIN_NAME = "IT Helpdesk Ticket Routing"
8
+
9
+ OPENENV_NAME = "it_helpdesk_ticket_routing_openenv"
10
+ APP_ENV_NAME = "it_helpdesk_ticket_routing"
11
+
12
+ ISSUE_TYPES = (
13
+ "billing_license",
14
+ "identity_access",
15
+ "application_support",
16
+ "service_request",
17
+ "spam_phishing",
18
+ "general_inquiry",
19
+ "security_compliance",
20
+ "onboarding",
21
+ "feature_request",
22
+ )
23
+
24
+ PRIORITIES = ("critical", "high", "medium", "low")
25
+
26
+ ASSIGNMENT_GROUPS = (
27
+ "license_ops",
28
+ "service_desk",
29
+ "application_team",
30
+ "procurement",
31
+ "security_team",
32
+ "onboarding_ops",
33
+ )
34
+
35
+ RESOLUTION_ACTIONS = (
36
+ "fulfill",
37
+ "escalate",
38
+ "assign",
39
+ "ignore",
40
+ "acknowledge",
41
+ )
42
+
43
+ TASK_IDS = (1, 2, 3)
44
+
45
+ ISSUE_TYPE_TO_ASSIGNMENT_GROUP = {
46
+ "billing_license": "license_ops",
47
+ "identity_access": "service_desk",
48
+ "application_support": "application_team",
49
+ "service_request": "procurement",
50
+ "spam_phishing": "security_team",
51
+ "general_inquiry": "service_desk",
52
+ "security_compliance": "security_team",
53
+ "onboarding": "onboarding_ops",
54
+ "feature_request": "application_team",
55
+ }
56
+
57
+ ISSUE_TYPE_TO_RESOLUTION_ACTION = {
58
+ "billing_license": "fulfill",
59
+ "identity_access": "fulfill",
60
+ "application_support": "escalate",
61
+ "service_request": "assign",
62
+ "spam_phishing": "ignore",
63
+ "general_inquiry": "acknowledge",
64
+ "security_compliance": "escalate",
65
+ "onboarding": "fulfill",
66
+ "feature_request": "acknowledge",
67
+ }