File size: 6,000 Bytes
1395b2e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
---
title: Support Triage OpenEnv
emoji: "📨"
colorFrom: blue
colorTo: teal
sdk: docker
app_port: 7860
tags:
  - openenv
  - reinforcement-learning
  - customer-support
license: mit
---

# Support Triage OpenEnv

A complete, real-world OpenEnv environment for training/evaluating agents on **customer support ticket triage**. The environment simulates what support teams actually do: read inbox tickets, classify urgency/category, draft safe responses, and resolve the right ticket.

## Why this environment

Most agent benchmarks under-model production support workflows. This environment focuses on practical support operations with:
- Multi-ticket inbox context selection
- Policy-compliant communication
- Priority + escalation decisions
- Deterministic graders and dense reward shaping

## OpenEnv API compliance

The environment exposes:
- `reset(task_id?: str) -> Observation`
- `step(action: Action) -> (Observation, Reward, done, info)`
- `state() -> dict`

Typed Pydantic models:
- `Observation`: [`src/support_triage_openenv/models.py`](src/support_triage_openenv/models.py)
- `Action`: [`src/support_triage_openenv/models.py`](src/support_triage_openenv/models.py)
- `Reward`: [`src/support_triage_openenv/models.py`](src/support_triage_openenv/models.py)

Metadata:
- `openenv.yaml`

## Action space

`Action` model fields:
- `action_type`: one of `read_ticket | classify_ticket | draft_reply | resolve_ticket`
- `ticket_id`: required for `read_ticket`, `classify_ticket`, `resolve_ticket`
- `priority`: optional enum `low | medium | high | urgent`
- `category`: optional enum `account | billing | technical | abuse | general`
- `needs_escalation`: optional bool
- `message`: text for `draft_reply`

## Observation space

`Observation` includes:
- `task_id`, `objective`, `step_count`, `max_steps`
- `inbox`: ticket metadata list (`ticket_id`, subject, tier, age, read flag)
- `current_ticket_content`: only visible after reading selected ticket
- `latest_system_note`: feedback from last step
- `score_hint`: partial grader components (`read`, `classify`, `reply`, `resolve`)

## Tasks and difficulty

1. `easy_password_reset` (Easy)
- Correctly process account lockout and send secure reset guidance.

2. `medium_billing_dispute` (Medium)
- Investigate duplicate billing with context ticket and provide policy-compliant refund timeline.

3. `hard_outage_incident` (Hard)
- Handle a high-stakes outage report requiring multi-ticket context, urgent escalation, and careful incident messaging.

Each task has deterministic grading in `support_triage_openenv.graders.grade_task`, returning a score `0.0-1.0`.

## Reward design

Reward is shaped and meaningful across the trajectory:
- Positive dense signal from partial grader progress (read/context, classification fields, reply quality, resolve correctness)
- Penalties for invalid actions, repeated loops, and malformed steps
- Final step guarantees score alignment with deterministic grader output

## Project structure

- `src/support_triage_openenv/env.py` - environment implementation
- `src/support_triage_openenv/models.py` - typed OpenEnv models
- `src/support_triage_openenv/tasks.py` - task specs (easy/medium/hard)
- `src/support_triage_openenv/graders.py` - deterministic grader logic
- `scripts/run_baseline.py` - OpenAI baseline inference runner
- `scripts/validate_env.py` - tests + optional `openenv validate`
- `app.py` - FastAPI app for HF Space runtime
- `Dockerfile` - containerized deployment

## Setup

```bash
cd /home/ai24mtech14005/meta_hackathon
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
```

## Run tests

```bash
python -m pytest -q
```

## Run baseline

OpenAI model baseline:

```bash
export API_BASE_URL=https://your-openai-compatible-endpoint/v1
export MODEL_NAME=your-model-id
export HF_TOKEN=your-api-key
python inference.py --mode openai --output scores/inference_scores.json
```

Deterministic heuristic baseline:

```bash
python inference.py --mode heuristic --output scores/inference_scores.json
```

Outputs JSON report to `scores/inference_scores.json` and structured stdout logs with `[START]`, `[STEP]`, `[END]`.

## Run API locally

```bash
uvicorn app:app --host 0.0.0.0 --port 7860
```

Endpoints:
- `GET /health`
- `POST /reset`
- `POST /step`
- `GET /state`

## Docker

```bash
docker build -t support-triage-openenv .
docker run --rm -p 7860:7860 support-triage-openenv
```

## Hugging Face Space deployment

- Create a **Docker Space**.
- Push this repository to the Space.
- Keep `README.md` frontmatter tags including `openenv`.
- Space serves the API on port `7860`.

## One-command remote bootstrap

If you want this local repo to automatically create and push to both GitHub + HF:

```bash
export GITHUB_USERNAME=your_github_user
export GITHUB_TOKEN=your_github_pat
export HF_USERNAME=your_hf_user
export HF_TOKEN=your_hf_token
bash scripts/bootstrap_remotes.sh support-triage-openenv
```

## Baseline scores (heuristic reproducible)

Generated with:

```bash
python inference.py --mode heuristic --output scores/inference_scores.json
```

- `easy_password_reset`: grader `1.0`, reward `1.0`
- `medium_billing_dispute`: grader `1.0`, reward `1.0`
- `hard_outage_incident`: grader `1.0`, reward `1.0`
- Overall average grader score: `1.0`
- Tracked reference artifact: `baseline_expected_scores.json`

## Pre-submission validator

Run full strict validation (all disqualification gates):

```bash
python pre_submission_validate.py --space-url https://your-space-name.hf.space
```

Local-only run while iterating (skips Docker daemon + remote space ping):

```bash
python pre_submission_validate.py --skip-docker --skip-space
```

Run organizer-provided script directly (integrated path):

```bash
bash scripts/pre_validation_script.sh https://your-space-name.hf.space .
```

Notes:
- `scripts/sample_inference_script.sh` is kept as organizer reference.
- Root `inference.py` is aligned to the required `[START]`, `[STEP]`, `[END]` line format.