Spaces:

Roopalgn
/

AIHack-ITHelpDesk

Running

App Files Files Community

Roopalgn commited on Apr 1

Commit

3b8bf40

1 Parent(s): 969eaef

Improve dataset realism and consolidate project status log

Browse files

Files changed (5) hide show

LABEL_AUDIT.md +0 -56
MARCH30_STATUS.md +0 -117
PROJECT_STATUS.md +137 -0
README.md +23 -0
data/dataset.json +21 -21

LABEL_AUDIT.md DELETED Viewed

@@ -1,56 +0,0 @@
-# Label Audit Notes
-This file records the March 31 and April 1 label-and-grader pass on the Roopal-owned files:
-- `data/dataset.json`
-- `server/tasks.py`
-- `server/grader.py`
-## Dataset Decisions
-### Tightened ambiguity cases
-- `ticket-022`
-  Reworded to make the billing-versus-application ambiguity clearer while keeping the chosen label as `application_support`.
-- `ticket-027`
-  Reworded to make the vendor-offer ambiguity clearer between `general_inquiry` and `service_request`.
-- `ticket-029`
-  Reworded to make the seat-expansion versus prorating ambiguity clearer and changed `resolution_action` from `fulfill` to `assign`.
-- `ticket-040`
-  Reworded to make the feature-gap versus support-issue ambiguity clearer.
-### Corrected label consistency
-- `ticket-026`
-  Changed from `feature_request` / `application_team` to `general_inquiry` / `service_desk` because it is a thank-you note, not a product change request.
-## Task Wording Changes
-The task instructions in `server/tasks.py` were tightened so they now:
-- sound more like helpdesk triage
-- emphasize choosing the single best label
-- describe operational priority more clearly
-- describe full triage more concretely for Task 3
-## Grader Changes
-The grader was polished by:
-- making task weights explicit in `TASK_WEIGHTS`
-- adding partial-credit pairs for:
-  - `application_support` vs `feature_request`
-  - `general_inquiry` vs `service_request`
-- keeping the scoring deterministic and task-specific
-## Intent
-These edits are meant to improve:
-- dataset realism
-- label consistency
-- hard-task ambiguity quality
-- reviewability for judges and teammates

MARCH30_STATUS.md DELETED Viewed

@@ -1,117 +0,0 @@
-# March 30 Status Report
-This file captures the code checkpoint completed for March 30, 2026 so both Codex sessions can compare against the same source of truth.
-## Scope Completed
-The March 30 code checkpoint is complete for the foundational files named in `ROADMAP.md`:
-- `models.py`
-- `server/tasks.py`
-- `server/grader.py`
-- `server/environment.py`
-Related supporting files were also aligned:
-- `client.py`
-- `server/app.py`
-- `inference.py`
-- `vocabulary.py`
-## What Is Locked
-### Team and project identity
-- Team: Hackstreet Boys
-- Members: Roopal Guha Neogi, Suyash Kumar
-- Domain: IT Helpdesk Ticket Routing
-### Frozen class names
-- `HelpdeskTicketRecord`
-- `HelpdeskTicketAction`
-- `HelpdeskTicketObservation`
-- `HelpdeskTicketState`
-- `HelpdeskTicketRoutingEnvironment`
-- `HelpdeskTicketEnvClient`
-### Frozen field names
-- `ticket_id`
-- `title`
-- `requester`
-- `description`
-- `issue_type`
-- `priority`
-- `assignment_group`
-- `resolution_action`
-- `related_ticket_id`
-## Code That Exists Now
-### `vocabulary.py`
-Shared frozen constants now live in one place:
-- team metadata
-- environment names
-- issue types
-- priorities
-- assignment groups
-- resolution actions
-- default issue-type mappings used by inference
-### `models.py`
-The typed models are defined and the vocabulary is enforced through validators, so unsupported labels should fail fast instead of silently drifting.
-### `server/tasks.py`
-All three tasks are defined with locked names, instructions, and allowed fields.
-### `server/grader.py`
-Deterministic scoring is in place with:
-- partial credit for near-miss `issue_type`
-- proximity scoring for `priority`
-- exact match for `assignment_group`
-- exact match for `resolution_action`
-### `server/environment.py`
-The environment implements:
-- queue sampling
-- reset flow
-- step flow
-- state tracking
-- final trajectory reward handoff
-### `inference.py`
-The baseline runner is aligned to the locked vocabulary and supports:
-- LLM mode
-- heuristic mode
-- task loop over all 3 tasks
-## Expected Agreement For The Other Codex Session
-Your teammate's Codex should agree on all of the following:
-1. the schema names above are frozen
-2. the vocabulary now has a single source of truth in `vocabulary.py`
-3. no one should rename labels after this checkpoint
-4. future work should build on these names, not replace them
-## What Is Not Verified Yet
-This checkpoint is a code-and-consistency checkpoint, not a runtime-complete checkpoint.
-Still pending:
-- local execution
-- heuristic baseline run
-- Docker validation
-- final benchmark numbers

PROJECT_STATUS.md ADDED Viewed

	@@ -0,0 +1,137 @@

+# Project Status
+This is the canonical running status file for the repo.
+Use this file for future progress updates instead of creating new date-specific status files.
+## March 30, 2026
+Status: complete
+Scope completed:
+- locked team name, domain, and vocabulary
+- aligned the foundational schema and environment surface
+- froze the core class names and field names
+Core files aligned:
+- `models.py`
+- `server/tasks.py`
+- `server/grader.py`
+- `server/environment.py`
+- `client.py`
+- `server/app.py`
+- `inference.py`
+- `vocabulary.py`
+Key checkpoint outcome:
+- the project had a single vocabulary source of truth and no remaining schema disagreement
+## March 31, 2026
+Status: complete
+Roopal-side work completed:
+- audited `data/dataset.json` end to end
+- tightened ambiguity wording in selected tickets
+- reviewed task wording in `server/tasks.py`
+Representative dataset decisions:
+- `ticket-022` kept as `application_support` while making the billing-versus-application ambiguity clearer
+- `ticket-027` kept intentionally ambiguous between `general_inquiry` and `service_request`
+- `ticket-029` was refined to better express seat-expansion versus prorating ambiguity
+- `ticket-040` was kept as `feature_request` while clarifying that some readers could still interpret it as `application_support`
+Task wording changes:
+- Task 1 was tightened to emphasize selecting the single best IT issue type
+- Task 2 now explicitly asks for operational priority, not just generic urgency
+- Task 3 wording was refined to describe full helpdesk routing more concretely
+Shared checkpoint outcome:
+- no schema changes were still pending after the review pass
+## April 1, 2026
+Status: complete
+Roopal-side work completed:
+- polished `server/grader.py`
+- made task weights explicit
+- refined hard-task partial-credit behavior
+- finished remaining dataset label corrections
+Important label/grader notes:
+- `ticket-026` was corrected to `general_inquiry` routed to `service_desk`
+- Task 2 weights were fixed at `issue_type` 60% and `priority` 40%
+- Task 3 weights were fixed at `issue_type` 35%, `priority` 20%, `assignment_group` 25%, and `resolution_action` 20%
+- partial-credit pairs were added for `application_support` vs `feature_request`
+- partial-credit pairs were added for `general_inquiry` vs `service_request`
+Shared checkpoint outcome:
+- the docs and code agreed on the exact task labels and field vocabulary
+## April 2, 2026
+Status: complete
+Roopal-side work completed:
+- improved `README.md`
+- improved `KNOWLEDGE.md`
+Packaging and metadata alignment completed in repo state:
+- `openenv.yaml` aligned with runtime naming and dependency expectations
+- `pyproject.toml` and `requirements.txt` use the same OpenEnv dependency source
+- `server/Dockerfile` installs the local package and documented runtime dependencies
+Shared checkpoint outcome:
+- docs and code tell the same IT helpdesk ticket routing story
+## April 3, 2026
+Status: Roopal work complete, shared validation underway
+Roopal-side work completed:
+- performed a dataset realism pass on `data/dataset.json`
+- replaced several low-realism spam examples with clearer helpdesk-inbox phrasing
+- cleaned visible mojibake dashes from ticket titles
+- added explicit easy, medium, and hard dataset examples to `README.md`
+Runtime validation notes recorded from the local repo state:
+- local `reset()` and `inference.py` validation exposed a UTF-8 BOM issue in dataset loading
+- `server/tasks.py` was updated to read `data/dataset.json` with `utf-8-sig`
+- the heuristic baseline then completed successfully
+Local heuristic baseline on the validated repo state:
+- Task 1: `1.0000`
+- Task 2: `0.8800`
+- Task 3: `0.9400`
+- Overall: `0.9400`
+Shared checkpoint outcome so far:
+- the first bug triage item was identified and fixed
+- a rerun on the latest fully merged branch is still recommended before treating benchmark numbers as final
+## Open Items
+Still pending after the current checkpoint:
+- rerun runtime validation on the latest shared branch after all pending merges land
+- perform a Docker smoke test from the merged repo state
+- do the April 4 issue-fix pass from any runtime feedback
+- record final benchmark numbers only after the merged-state rerun

README.md CHANGED Viewed

@@ -159,6 +159,29 @@ It includes:
 - feature requests
 - follow-up cases linked through `related_ticket_id`
 ## Repository Layout
 ```text

 - feature requests
 - follow-up cases linked through `related_ticket_id`
+## Difficulty Coverage
+The difficulty ladder is visible both in the task fields and in the dataset itself.
+Easy-style examples:
+- `ticket-020`: straightforward general inquiry with low urgency and a clean `general_inquiry` label
+- `ticket-041`: clear onboarding request for a new contractor account
+- `ticket-044`: obvious phishing-style lure that should map cleanly to `spam_phishing`
+Medium-style examples:
+- `ticket-001`: billing dispute that still requires the agent to judge urgency correctly
+- `ticket-028`: application incident where the issue type is clear but priority still matters
+- `ticket-036`: procurement-style proof-of-concept request that should route as a `service_request`
+Hard-style examples:
+- `ticket-022`: mixed billing and application signals in one ticket
+- `ticket-029`: seat expansion combined with a prorating question
+- `ticket-038`: follow-up billing thread with escalated urgency
+- `ticket-045`: repeated account suspension thread with legal-escalation pressure
 ## Repository Layout
 ```text

data/dataset.json CHANGED Viewed

@@ -49,9 +49,9 @@
     },
     {
         "ticket_id":  "ticket-005",
-        "title":  "Guaranteed crypto income from home",
-        "requester":  "promo@fastwealth.example",
-        "description":  "Limited time offer. Click now to multiply your income and unsubscribe never.",
         "issue_type":  "spam_phishing",
         "priority":  "low",
         "assignment_group":  "security_team",
@@ -73,7 +73,7 @@
     },
     {
         "ticket_id":  "ticket-007",
-        "title":  "GDPR data deletion request â€” 30 day deadline",
         "requester":  "legal@eurocorp.de",
         "description":  "Per GDPR Article 17, we request deletion of all personal data associated with our account within 30 days. Failure to comply may result in regulatory action.",
         "issue_type":  "security_compliance",
@@ -85,9 +85,9 @@
     },
     {
         "ticket_id":  "ticket-008",
-        "title":  "Welcome aboard â€” getting started with your new account",
-        "requester":  "success@brightpath.io",
-        "description":  "Thanks for signing up! We\u0027d like to schedule an onboarding call this week. What time works for your team?",
         "issue_type":  "onboarding",
         "priority":  "medium",
         "assignment_group":  "onboarding_ops",
@@ -145,9 +145,9 @@
     },
     {
         "ticket_id":  "ticket-013",
-        "title":  "Free vacation giveaway â€” claim your prize",
-        "requester":  "offers@tropicaldeals.example",
-        "description":  "Congratulations! You have been selected for an all-expenses-paid trip. Click here immediately.",
         "issue_type":  "spam_phishing",
         "priority":  "low",
         "assignment_group":  "security_team",
@@ -157,7 +157,7 @@
     },
     {
         "ticket_id":  "ticket-014",
-        "title":  "Audit report findings â€” action required by Friday",
         "requester":  "audit@compliancepartners.com",
         "description":  "The SOC2 audit uncovered three medium-severity findings. Remediation evidence is due by end of week.",
         "issue_type":  "security_compliance",
@@ -229,9 +229,9 @@
     },
     {
         "ticket_id":  "ticket-020",
-        "title":  "General inquiry about your platform capabilities",
-        "requester":  "info@greenleaf.org",
-        "description":  "Hi, I stumbled across your website and was curious about what your platform does. Can you send some information?",
         "issue_type":  "general_inquiry",
         "priority":  "low",
         "assignment_group":  "service_desk",
@@ -373,7 +373,7 @@
     },
     {
         "ticket_id":  "ticket-032",
-        "title":  "Penetration test results â€” critical vulnerabilities found",
         "requester":  "security@redteam-auditors.com",
         "description":  "Our pentest revealed two critical and five high-severity vulnerabilities in your API endpoints. Full report attached. Remediation should begin immediately.",
         "issue_type":  "security_compliance",
@@ -433,9 +433,9 @@
     },
     {
         "ticket_id":  "ticket-037",
-        "title":  "Earn a degree in just 2 weeks!",
-        "requester":  "admissions@diplomamill.example",
-        "description":  "No exams, no classes. Get your accredited degree today. Reply for more information.",
         "issue_type":  "spam_phishing",
         "priority":  "low",
         "assignment_group":  "security_team",
@@ -517,9 +517,9 @@
     },
     {
         "ticket_id":  "ticket-044",
-        "title":  "Your account has been compromised â€” act now",
-        "requester":  "security-alert@phishing.example",
-        "description":  "We detected unusual activity on your account. Click the link below to verify your identity and secure your account immediately.",
         "issue_type":  "spam_phishing",
         "priority":  "low",
         "assignment_group":  "security_team",

     },
     {
         "ticket_id":  "ticket-005",
+        "title":  "Spam email promising guaranteed crypto returns hit support inbox",
+        "requester":  "shared-inbox@northstar-retail.com",
+        "description":  "A promotional email promising instant crypto income landed in the shared support inbox. It does not reference any legitimate customer account or business request.",
         "issue_type":  "spam_phishing",
         "priority":  "low",
         "assignment_group":  "security_team",
     },
     {
         "ticket_id":  "ticket-007",
+        "title":  "GDPR data deletion request - 30 day deadline",
         "requester":  "legal@eurocorp.de",
         "description":  "Per GDPR Article 17, we request deletion of all personal data associated with our account within 30 days. Failure to comply may result in regulatory action.",
         "issue_type":  "security_compliance",
     },
     {
         "ticket_id":  "ticket-008",
+        "title":  "Kickoff onboarding session for newly activated account",
+        "requester":  "admin@brightpath.io",
+        "description":  "We activated our account this week and need an onboarding call plus admin setup guidance for six internal users.",
         "issue_type":  "onboarding",
         "priority":  "medium",
         "assignment_group":  "onboarding_ops",
     },
     {
         "ticket_id":  "ticket-013",
+        "title":  "Suspicious giveaway message forwarded from shared mailbox",
+        "requester":  "shared-inbox@harborair.io",
+        "description":  "The shared mailbox received a message claiming the recipient had won a free vacation and urging an immediate click-through. It appears to be pure spam.",
         "issue_type":  "spam_phishing",
         "priority":  "low",
         "assignment_group":  "security_team",
     },
     {
         "ticket_id":  "ticket-014",
+        "title":  "Audit report findings - action required by Friday",
         "requester":  "audit@compliancepartners.com",
         "description":  "The SOC2 audit uncovered three medium-severity findings. Remediation evidence is due by end of week.",
         "issue_type":  "security_compliance",
     },
     {
         "ticket_id":  "ticket-020",
+        "title":  "General inquiry about platform admin capabilities",
+        "requester":  "ops-eval@greenleaf.org",
+        "description":  "Our operations team is doing a lightweight vendor scan and wants a short overview of admin controls, reporting, and deployment options.",
         "issue_type":  "general_inquiry",
         "priority":  "low",
         "assignment_group":  "service_desk",
     },
     {
         "ticket_id":  "ticket-032",
+        "title":  "Penetration test results - critical vulnerabilities found",
         "requester":  "security@redteam-auditors.com",
         "description":  "Our pentest revealed two critical and five high-severity vulnerabilities in your API endpoints. Full report attached. Remediation should begin immediately.",
         "issue_type":  "security_compliance",
     },
     {
         "ticket_id":  "ticket-037",
+        "title":  "Obvious diploma scam reached admissions support inbox",
+        "requester":  "support@midcity.edu",
+        "description":  "An unsolicited message promising a degree in two weeks arrived in the support mailbox. It is not tied to any customer case and should be ignored.",
         "issue_type":  "spam_phishing",
         "priority":  "low",
         "assignment_group":  "security_team",
     },
     {
         "ticket_id":  "ticket-044",
+        "title":  "Credential phishing message impersonating security team",
+        "requester":  "helpdesk@startupxyz.io",
+        "description":  "A message claiming the account was compromised asked users to click a verification link immediately. It appears to be a classic credential phishing lure.",
         "issue_type":  "spam_phishing",
         "priority":  "low",
         "assignment_group":  "security_team",