Spaces:
Running
Running
Finish Roopal April 5-6 docs and repo audit
Browse files- .dockerignore +8 -0
- .gitignore +7 -0
- KNOWLEDGE.md +24 -6
- MENTAL_MODEL.md +4 -2
- PLAN.md +2 -2
- PROJECT_STATUS.md +43 -4
- README.md +19 -5
- __init__.cpython-313.pyc +0 -0
- app.cpython-313.pyc +0 -0
- client.cpython-313.pyc +0 -0
- environment.cpython-313.pyc +0 -0
- grader.cpython-313.pyc +0 -0
- models.cpython-313.pyc +0 -0
- reward.cpython-313.pyc +0 -0
- tasks.cpython-313.pyc +0 -0
.dockerignore
ADDED
|
@@ -0,0 +1,8 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
.git/
|
| 2 |
+
.venv/
|
| 3 |
+
__pycache__/
|
| 4 |
+
*.py[cod]
|
| 5 |
+
*.egg-info/
|
| 6 |
+
.pytest_cache/
|
| 7 |
+
.mypy_cache/
|
| 8 |
+
.ruff_cache/
|
.gitignore
ADDED
|
@@ -0,0 +1,7 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
.venv/
|
| 2 |
+
__pycache__/
|
| 3 |
+
*.py[cod]
|
| 4 |
+
*.egg-info/
|
| 5 |
+
.pytest_cache/
|
| 6 |
+
.mypy_cache/
|
| 7 |
+
.ruff_cache/
|
KNOWLEDGE.md
CHANGED
|
@@ -30,6 +30,15 @@ A helpdesk agent has to decide what the ticket is about, how urgent it is, who s
|
|
| 30 |
|
| 31 |
This environment simulates a short helpdesk queue where an agent routes one ticket at a time and is graded on structured routing quality.
|
| 32 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 33 |
## Frozen Project Identity
|
| 34 |
|
| 35 |
- Team name: `Hackstreet Boys`
|
|
@@ -277,16 +286,25 @@ The local heuristic baseline completed successfully after that fix with:
|
|
| 277 |
- Task 3: `0.9400`
|
| 278 |
- Overall: `0.9400`
|
| 279 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 280 |
## What Still Needs Hands-On Verification
|
| 281 |
|
| 282 |
-
The biggest remaining checks are
|
| 283 |
|
| 284 |
Still pending:
|
| 285 |
|
| 286 |
-
1.
|
| 287 |
-
2.
|
| 288 |
-
3. do a clean-machine dry run if possible
|
| 289 |
-
4. record final benchmark numbers only after the merged-state rerun
|
| 290 |
|
| 291 |
## One-Minute Summary
|
| 292 |
|
|
@@ -297,4 +315,4 @@ If you come back to this repo later, remember:
|
|
| 297 |
- the agent predicts structured routing fields
|
| 298 |
- grading is deterministic with limited partial credit
|
| 299 |
- the inference script is the baseline player
|
| 300 |
-
-
|
|
|
|
| 30 |
|
| 31 |
This environment simulates a short helpdesk queue where an agent routes one ticket at a time and is graded on structured routing quality.
|
| 32 |
|
| 33 |
+
## Judge-Facing Explanation
|
| 34 |
+
|
| 35 |
+
If a judge asks why this environment is a strong submission, the concise answer is:
|
| 36 |
+
|
| 37 |
+
1. IT helpdesk routing is a real operational workflow with clear business value.
|
| 38 |
+
2. The input is realistic free-form ticket text, but the output is typed and easy to grade deterministically.
|
| 39 |
+
3. The three-task ladder creates a clean progression from basic classification to full queue routing.
|
| 40 |
+
4. The repo stays judge-friendly because the vocabulary, task labels, and scoring rules are all explicit and frozen.
|
| 41 |
+
|
| 42 |
## Frozen Project Identity
|
| 43 |
|
| 44 |
- Team name: `Hackstreet Boys`
|
|
|
|
| 286 |
- Task 3: `0.9400`
|
| 287 |
- Overall: `0.9400`
|
| 288 |
|
| 289 |
+
A merged-state rerun on the current `main` branch matched those same numbers exactly.
|
| 290 |
+
|
| 291 |
+
## April 6 Repo Audit
|
| 292 |
+
|
| 293 |
+
An April 6 documentation and repo audit confirmed:
|
| 294 |
+
|
| 295 |
+
- all required runtime, data, metadata, and documentation files are present in the workspace
|
| 296 |
+
- the docs consistently describe IT helpdesk ticket routing rather than the old email-triage domain
|
| 297 |
+
- the current local benchmark reference is `1.0000`, `0.8800`, `0.9400`, overall `0.9400`
|
| 298 |
+
- the remaining work is execution validation, not documentation cleanup
|
| 299 |
+
|
| 300 |
## What Still Needs Hands-On Verification
|
| 301 |
|
| 302 |
+
The biggest remaining checks are packaging and clean-machine checks, not merge-state local execution.
|
| 303 |
|
| 304 |
Still pending:
|
| 305 |
|
| 306 |
+
1. confirm Docker starts cleanly
|
| 307 |
+
2. do a clean-machine dry run if possible
|
|
|
|
|
|
|
| 308 |
|
| 309 |
## One-Minute Summary
|
| 310 |
|
|
|
|
| 315 |
- the agent predicts structured routing fields
|
| 316 |
- grading is deterministic with limited partial credit
|
| 317 |
- the inference script is the baseline player
|
| 318 |
+
- merged-state local validation is complete, and Docker is the main remaining hands-on check
|
MENTAL_MODEL.md
CHANGED
|
@@ -130,7 +130,7 @@ The state tracks:
|
|
| 130 |
|
| 131 |
## Runtime Notes
|
| 132 |
|
| 133 |
-
The repo has now passed
|
| 134 |
|
| 135 |
Current local baseline:
|
| 136 |
|
|
@@ -139,6 +139,8 @@ Current local baseline:
|
|
| 139 |
- Task 3: `0.9400`
|
| 140 |
- Overall: `0.9400`
|
| 141 |
|
|
|
|
|
|
|
| 142 |
One practical implementation note from runtime validation:
|
| 143 |
|
| 144 |
- `data/dataset.json` may be saved with a UTF-8 BOM on Windows, so `server/tasks.py` intentionally loads it with `utf-8-sig`
|
|
@@ -168,4 +170,4 @@ If coming back later, remember this:
|
|
| 168 |
- the agent predicts structured routing fields
|
| 169 |
- the grader gives deterministic partial credit
|
| 170 |
- `inference.py` is the baseline agent runner
|
| 171 |
-
- the local heuristic path
|
|
|
|
| 130 |
|
| 131 |
## Runtime Notes
|
| 132 |
|
| 133 |
+
The repo has now passed both the initial local heuristic run and a merged-state rerun on the current `main` branch.
|
| 134 |
|
| 135 |
Current local baseline:
|
| 136 |
|
|
|
|
| 139 |
- Task 3: `0.9400`
|
| 140 |
- Overall: `0.9400`
|
| 141 |
|
| 142 |
+
The merged-state rerun matched the same baseline numbers exactly.
|
| 143 |
+
|
| 144 |
One practical implementation note from runtime validation:
|
| 145 |
|
| 146 |
- `data/dataset.json` may be saved with a UTF-8 BOM on Windows, so `server/tasks.py` intentionally loads it with `utf-8-sig`
|
|
|
|
| 170 |
- the agent predicts structured routing fields
|
| 171 |
- the grader gives deterministic partial credit
|
| 172 |
- `inference.py` is the baseline agent runner
|
| 173 |
+
- the local heuristic path now works end to end on the current merged repo state
|
PLAN.md
CHANGED
|
@@ -126,11 +126,11 @@ The project keeps three tasks:
|
|
| 126 |
|
| 127 |
### Runtime risk
|
| 128 |
|
| 129 |
-
The
|
| 130 |
|
| 131 |
### Benchmark risk
|
| 132 |
|
| 133 |
-
|
| 134 |
|
| 135 |
### Deployment risk
|
| 136 |
|
|
|
|
| 126 |
|
| 127 |
### Runtime risk
|
| 128 |
|
| 129 |
+
The first local execution pass and a merged-state rerun have already completed successfully. The remaining runtime risk is Docker and clean-machine behavior, not first-pass local execution.
|
| 130 |
|
| 131 |
### Benchmark risk
|
| 132 |
|
| 133 |
+
The current merged-state local benchmark has already been recorded. The remaining benchmark risk is making sure Docker or clean-machine validation does not surface a late behavioral mismatch.
|
| 134 |
|
| 135 |
### Deployment risk
|
| 136 |
|
PROJECT_STATUS.md
CHANGED
|
@@ -144,11 +144,50 @@ Documentation fixes made from runtime feedback:
|
|
| 144 |
- clarified that merged-state reruns still matter before final benchmark recording
|
| 145 |
- documented the Windows UTF-8 BOM issue and its handling path in `server/tasks.py`
|
| 146 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 147 |
## Open Items
|
| 148 |
|
| 149 |
Still pending after the current checkpoint:
|
| 150 |
|
| 151 |
-
-
|
| 152 |
-
-
|
| 153 |
-
- complete the shared April 4 rerun after Suyash-side fixes land
|
| 154 |
-
- record final benchmark numbers only after the merged-state rerun
|
|
|
|
| 144 |
- clarified that merged-state reruns still matter before final benchmark recording
|
| 145 |
- documented the Windows UTF-8 BOM issue and its handling path in `server/tasks.py`
|
| 146 |
|
| 147 |
+
## April 5, 2026
|
| 148 |
+
|
| 149 |
+
Status: shared merged-state rerun complete, Docker smoke test still pending
|
| 150 |
+
|
| 151 |
+
Shared work completed:
|
| 152 |
+
|
| 153 |
+
- reran local runtime validation on the current `main` branch
|
| 154 |
+
- revalidated `/health` and `/tasks`
|
| 155 |
+
- reran heuristic `inference.py` across all 3 tasks
|
| 156 |
+
- confirmed the merged-state local baseline matched the earlier working numbers exactly
|
| 157 |
+
- added `.gitignore` and `.dockerignore` to keep local artifacts out of git status and Docker build context
|
| 158 |
+
|
| 159 |
+
Merged-state heuristic baseline on the current repo state:
|
| 160 |
+
|
| 161 |
+
- Task 1: `1.0000`
|
| 162 |
+
- Task 2: `0.8800`
|
| 163 |
+
- Task 3: `0.9400`
|
| 164 |
+
- Overall: `0.9400`
|
| 165 |
+
|
| 166 |
+
Environment notes:
|
| 167 |
+
|
| 168 |
+
- the Codex shell could run the project virtualenv successfully once Python execution was allowed outside the sandbox
|
| 169 |
+
- Docker was not available in the current shell context, so the Docker smoke test is still pending on a machine with Docker installed
|
| 170 |
+
|
| 171 |
+
Roopal-side documentation work completed:
|
| 172 |
+
|
| 173 |
+
- finalized `README.md` wording around submission readiness
|
| 174 |
+
- finalized `KNOWLEDGE.md` as the judge-facing knowledge guide
|
| 175 |
+
- added concise judge-facing domain explanations to the docs
|
| 176 |
+
|
| 177 |
+
## April 6, 2026
|
| 178 |
+
|
| 179 |
+
Status: Roopal-side repo audit complete, shared execution checks still pending
|
| 180 |
+
|
| 181 |
+
Roopal-side work completed:
|
| 182 |
+
|
| 183 |
+
- audited required submission files and confirmed they are present in the repo
|
| 184 |
+
- completed a stale-claims and outdated-wording pass across the core docs
|
| 185 |
+
- updated `PLAN.md` to reflect that first-pass local execution is no longer the main runtime risk
|
| 186 |
+
- left the remaining work focused on Docker and clean-machine validation rather than documentation cleanup
|
| 187 |
+
|
| 188 |
## Open Items
|
| 189 |
|
| 190 |
Still pending after the current checkpoint:
|
| 191 |
|
| 192 |
+
- perform a Docker smoke test from the current merged repo state
|
| 193 |
+
- do a clean-machine dry run if possible before final submission freeze
|
|
|
|
|
|
README.md
CHANGED
|
@@ -5,6 +5,15 @@
|
|
| 5 |
|
| 6 |
This repository contains a deterministic OpenEnv environment for IT helpdesk ticket routing. An agent is shown one ticket at a time from a short queue and must predict the right issue type, operational priority, assignment group, and next action.
|
| 7 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 8 |
## What This Environment Simulates
|
| 9 |
|
| 10 |
The environment models a realistic helpdesk workflow:
|
|
@@ -274,7 +283,7 @@ Optional target:
|
|
| 274 |
|
| 275 |
## Runtime Validation Snapshot
|
| 276 |
|
| 277 |
-
The first local heuristic validation pass
|
| 278 |
|
| 279 |
Validated locally:
|
| 280 |
|
|
@@ -293,7 +302,7 @@ Current local heuristic results:
|
|
| 293 |
| Full Ticket Routing | `0.9400` |
|
| 294 |
| Overall | `0.9400` |
|
| 295 |
|
| 296 |
-
|
| 297 |
|
| 298 |
### Windows note
|
| 299 |
|
|
@@ -340,8 +349,13 @@ The repo is already aligned on:
|
|
| 340 |
- grader and reward design
|
| 341 |
- packaging metadata and Docker entry point
|
| 342 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 343 |
Still pending before final submission:
|
| 344 |
|
| 345 |
-
- a
|
| 346 |
-
- a
|
| 347 |
-
- final benchmark values only after merged-state validation
|
|
|
|
| 5 |
|
| 6 |
This repository contains a deterministic OpenEnv environment for IT helpdesk ticket routing. An agent is shown one ticket at a time from a short queue and must predict the right issue type, operational priority, assignment group, and next action.
|
| 7 |
|
| 8 |
+
## Judge-Facing Summary
|
| 9 |
+
|
| 10 |
+
If a judge reads only one short explanation, it should be this:
|
| 11 |
+
|
| 12 |
+
- this environment models a real enterprise workflow, not a toy classification task
|
| 13 |
+
- each ticket requires typed routing decisions that are easy to score deterministically
|
| 14 |
+
- the task ladder moves cleanly from single-field classification to full operational routing
|
| 15 |
+
- the repo is small enough to rerun quickly and explicit enough to understand without hidden business logic
|
| 16 |
+
|
| 17 |
## What This Environment Simulates
|
| 18 |
|
| 19 |
The environment models a realistic helpdesk workflow:
|
|
|
|
| 283 |
|
| 284 |
## Runtime Validation Snapshot
|
| 285 |
|
| 286 |
+
The repo has now completed both the first local heuristic validation pass and a merged-state rerun on the current `main` branch.
|
| 287 |
|
| 288 |
Validated locally:
|
| 289 |
|
|
|
|
| 302 |
| Full Ticket Routing | `0.9400` |
|
| 303 |
| Overall | `0.9400` |
|
| 304 |
|
| 305 |
+
The merged-state rerun matched these same numbers exactly, so they are the current benchmark reference for the repo. A Docker smoke test and clean-machine rerun are still recommended before final submission freeze.
|
| 306 |
|
| 307 |
### Windows note
|
| 308 |
|
|
|
|
| 349 |
- grader and reward design
|
| 350 |
- packaging metadata and Docker entry point
|
| 351 |
|
| 352 |
+
An April 6 repo audit also confirmed that all required submission files are present:
|
| 353 |
+
|
| 354 |
+
- runtime: `models.py`, `client.py`, `inference.py`, `server/app.py`, `server/environment.py`, `server/grader.py`, `server/reward.py`, `server/tasks.py`
|
| 355 |
+
- data and metadata: `data/dataset.json`, `openenv.yaml`, `pyproject.toml`, `requirements.txt`, `server/Dockerfile`
|
| 356 |
+
- docs and planning: `README.md`, `KNOWLEDGE.md`, `MENTAL_MODEL.md`, `PLAN.md`, `PROJECT_STATUS.md`, `ROADMAP.md`
|
| 357 |
+
|
| 358 |
Still pending before final submission:
|
| 359 |
|
| 360 |
+
- a Docker smoke test from a machine with Docker installed
|
| 361 |
+
- a final clean-machine dry run if possible before submission freeze
|
|
|
__init__.cpython-313.pyc
DELETED
|
Binary file (166 Bytes)
|
|
|
app.cpython-313.pyc
DELETED
|
Binary file (1.54 kB)
|
|
|
client.cpython-313.pyc
DELETED
|
Binary file (1.86 kB)
|
|
|
environment.cpython-313.pyc
DELETED
|
Binary file (6.66 kB)
|
|
|
grader.cpython-313.pyc
DELETED
|
Binary file (3.25 kB)
|
|
|
models.cpython-313.pyc
DELETED
|
Binary file (2.62 kB)
|
|
|
reward.cpython-313.pyc
DELETED
|
Binary file (1 kB)
|
|
|
tasks.cpython-313.pyc
DELETED
|
Binary file (1.93 kB)
|
|
|