File size: 7,439 Bytes
1724801
5f23255
 
 
 
 
 
1724801
 
5f23255
1724801
 
 
 
 
 
 
cd688d7
 
aae9736
cd688d7
 
aae9736
 
 
 
1724801
cd688d7
 
aae9736
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1724801
 
 
 
 
 
 
 
aae9736
1724801
 
 
 
 
 
aae9736
 
 
 
 
1724801
 
 
cd688d7
 
 
 
 
 
 
 
 
 
 
 
aae9736
cd688d7
aae9736
cd688d7
 
aa4f7bc
aae9736
 
 
aa4f7bc
 
aae9736
 
aa4f7bc
 
aae9736
 
 
 
 
 
 
aa4f7bc
aae9736
 
 
aa4f7bc
 
 
 
 
 
aae9736
aa4f7bc
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
---
title: OpenEnv Support Ticket RL Environment
emoji: 🤖
colorFrom: blue
colorTo: green
sdk: docker
app_file: inference.py
license: mit
library_name: openenv
language: en
tags:
  - reinforcement-learning
  - openenv
  - hackathon
  - customer-support
---

# OpenEnv: Support Ticket Resolution System

An OpenEnv standards-compliant reinforcement learning environment for customer support operations. The agent acts as a support specialist and resolves incoming tickets by choosing structured actions (fetch data, check policy, refund, reply, escalate, close).

## Motivation & Real-world Relevance
Most RL evaluations are game-like or synthetic. This environment evaluates policy adherence and operational safety in a realistic business workflow:
- The agent must gather context before taking irreversible actions.
- It is rewarded for compliance and penalized for destructive shortcuts.
- It is scored on both correctness and process quality.

*Please see our detailed [Product Requirements Document (PRD.md)](./PRD.md) for full breakdown.*

## Core RL Task (Domain Clarification)

Each episode is a support ticket lifecycle.
- State: ticket metadata, optional fetched user profile, action history, and termination flag.
- Observation: current ticket, available actions, system message, history, optional tool output, and step count.
- Action: choose one of six typed operations with parameters.
- Reward: dense scorer in [0.01, 0.99] based on whether the action trajectory matches policy-safe resolution behavior.

This is not a navigation/game environment; it is a process-control environment where incorrect sequencing (for example, refunding before policy verification) reduces score.

## Enhanced Domain Explanation

This environment simulates a customer support ticket resolution system. The agent must navigate through a structured workflow to resolve tickets efficiently and safely. The core challenge lies in adhering to policy constraints while optimizing for resolution speed and accuracy.

### Example Episode Walkthrough

Here is a detailed walkthrough of an example episode for `task_easy_1`:

1. **Reset**:
   - Observation: A refund ticket from `USR-A1` with open status and `step_count=0`.

2. **Action 1**: `check_policy({})`
   - Tool output: Refund policy for accidental purchases.
   - Reward: Increases for verifying the policy.

3. **Action 2**: `issue_refund({"amount": "full"})`
   - Tool output: Refund confirmed.
   - Reward: Increases for correct remediation.

4. **Action 3**: `close_ticket({"resolution": "refunded"})`
   - Episode ends.
   - Final score: Near-optimal.

### Visual Representation

A flowchart or diagram can be added here to visually represent the episode flow.

## Episode Walkthrough (Concrete Example)

Example: `task_easy_1` accidental purchase refund.

1. Reset
  - Observation includes refund ticket from `USR-A1`, open status, step_count=0.

2. Action 1: `check_policy({})`
  - Tool output returns refund policy for accidental purchase.
  - Reward increases for policy verification.

3. Action 2: `issue_refund({"amount": "full"})`
  - Tool output confirms refund.
  - Reward increases for correct remediation.

4. Action 3: `close_ticket({"resolution": "refunded"})`
  - Episode ends.
  - Final score reaches near-optimal band.

Flow (high-level):

```
reset -> check_policy -> issue_refund -> close_ticket -> done
```

## Task Set and Difficulty Progression

The environment contains 4 tasks, including 3 required benchmark tasks with increasing difficulty.

| Task | Difficulty | What changes vs previous | Typical Horizon | Stochasticity | Expected Optimal Score |
|---|---|---|---:|---|---:|
| `task_easy_1` | easy | Baseline accidental purchase refund flow | 3 | Low | 0.99 |
| `task_medium_1` | medium | Adds policy-conflict trap: must reject invalid refund | 3 | Low | 0.99 |
| `task_hard_1` | hard | Requires data fetch + correct escalation reason + customer communication | 3 | Medium | 0.99 |
| `task_fraud_detection` | hard | Adds chargeback-based fraud risk and denial behavior | 4 | Medium | 0.99 |

Difficulty metadata is encoded in [env/tasks.py](env/tasks.py).

## Action Space

- `fetch_user_data(user_id)`
- `check_policy(issue_type)`
- `issue_refund(amount)`
- `reply_to_customer(message)`
- `escalate(reason)`
- `close_ticket(resolution)`

## Observation Space

Observation object fields:
- `ticket`
- `available_actions`
- `system_message`
- `history`
- `tool_output`
- `step_count`

Schema is documented in [openenv.yaml](openenv.yaml).

## Inference Interface Contract

The submission entrypoint is [inference.py](inference.py) in repository root.

Required environment variables:
- `API_BASE_URL`: OpenAI-compatible API endpoint
- `MODEL_NAME`: model identifier
- `HF_TOKEN`: API key/token

The inference loop uses OpenAI client calls and emits strict structured logs:
- `[START] task=... env=... model=...`
- `[STEP] step=... action=... reward=... done=... error=...`
- `[END] success=... steps=... score=... rewards=...`

Action serialization format expected from the model:

```json
{"action_type": "check_policy", "parameters": {"issue_type": "refund_request"}}
```

## API Endpoints (Runtime Environment)

Implemented in [server/app.py](server/app.py):
- `GET /` health check
- `POST /reset` starts a new session and returns initial observation
- `POST /step` applies an action for a session
- `GET /state?session_id=...` returns typed environment state

## Reproducibility

- Environment dynamics are deterministic for a fixed action trajectory.
- Graders are deterministic and bounded; tests in [tests/test_graders.py](tests/test_graders.py) verify this.
- Fixed benchmark trajectories are provided in [evaluate.py](evaluate.py).

## Reproducibility Enhancements

- **Seed Management**: The environment supports deterministic runs by setting a random seed. Use the `--seed` flag in scripts to ensure reproducibility.
- **Baseline Scores**:
  - Random Policy: 0.33
  - Greedy Policy: 0.75

These scores are verified in the validation script and can be reproduced using the provided `evaluate.py` script.

## Baseline Reproduction

Run the environment and evaluate the agent:

```bash
# Install dependencies
pip install -r requirements.txt
pip install -e .

# Run baseline evaluator
python evaluate.py
```

Example output:
```json
{
  "results": {
    "task_easy_1": {"score": 0.99},
    "task_medium_1": {"score": 0.99},
    "task_hard_1": {"score": 0.99}
  }
}
```

## Setup and Run

Using Docker:
```bash
docker build -t openenv_support .
# Run API Server (HF Spaces mode):
docker run -p 7860:7860 openenv_support
```

Run baseline inference test script locally:
Ensure you install `pydantic` and `openai` first.
```bash
export API_BASE_URL="https://api.openai.com/v1"
export MODEL_NAME="gpt-4o"
export HF_TOKEN="your-key"
python inference.py
```

## Pre-submission Validation (Non-Docker)

Use the evaluator script introduced for reviewers:

```bash
chmod +x scripts/validate_submission.sh
./scripts/validate_submission.sh
```

The script checks:
- pytest suite
- grader determinism and score bounds
- openenv.yaml parse + required fields
- task difficulty coverage
- baseline evaluation output
- inference smoke run and `[START]/[STEP]/[END]` log structure

## Reviewer Quickstart

For contributors and evaluators:

```bash
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
pip install -e .
python -m pytest -q
```