Adit1Sharma commited on
Commit
332538a
·
0 Parent(s):
customer-support-openenv/.env ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ # Environment variables
2
+ # Add your secrets here — do NOT commit this file to version control
3
+ OPENAI_API_KEY=your_key_here
customer-support-openenv/Dockerfile ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ FROM python:3.10
2
+
3
+ WORKDIR /app
4
+
5
+ COPY . .
6
+
7
+ RUN pip install -r requirements.txt
8
+
9
+ CMD ["python", "app.py"]
customer-support-openenv/README.md ADDED
@@ -0,0 +1,323 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Customer Support OpenEnv
2
+
3
+ > A real-world reinforcement learning environment where an AI agent learns to handle customer support tickets — classify issues, craft replies, escalate when needed, and resolve tickets.
4
+
5
+ Built for the **Meta × PyTorch OpenEnv Hackathon — Round 1, April 2026.**
6
+
7
+ ---
8
+
9
+ ## What is this?
10
+
11
+ Most RL environments are games. This one is not.
12
+
13
+ Every company with customers has a support queue. Tickets come in — billing complaints, app crashes, refund requests, angry users threatening legal action. A human agent reads each one, figures out what's wrong, replies helpfully, escalates if it's too serious, and closes it.
14
+
15
+ This environment teaches an AI to do exactly that. The agent receives a ticket, takes actions step by step, and gets rewarded based on how well it handles the situation. The reward signal is **dense** — the agent gets feedback at every step, not just at the end.
16
+
17
+ ---
18
+
19
+ ## Architecture
20
+
21
+ ### Overall System
22
+
23
+ ```mermaid
24
+ graph TD
25
+ A[Agent] -->|Action| B[CustomerSupportEnv]
26
+ B -->|Observation, Reward, Done, Info| A
27
+
28
+ B --> C[tasks.py\nTask Definitions]
29
+ B --> D[models.py\nTyped Models]
30
+ B --> E[grader.py\nPer-Task Graders]
31
+
32
+ F[app.py\nFastAPI Server] --> B
33
+ G[baseline/run_baseline.py] --> B
34
+
35
+ H[openenv.yaml\nMetadata + Config] -.->|describes| B
36
+ I[data/tickets.json\nTicket Dataset] -.->|reference data| B
37
+ ```
38
+
39
+ ### Episode Flow
40
+
41
+ ```mermaid
42
+ sequenceDiagram
43
+ participant Agent
44
+ participant Env as CustomerSupportEnv
45
+ participant Grader
46
+
47
+ Agent->>Env: reset(task_id="hard")
48
+ Env-->>Agent: Observation (ticket + history + status)
49
+
50
+ Agent->>Env: step(Action: classify, category="billing")
51
+ Env-->>Agent: Observation, Reward(+0.3), done=False
52
+
53
+ Agent->>Env: step(Action: reply, content="...")
54
+ Env-->>Agent: Observation, Reward(+0.2), done=False
55
+
56
+ Agent->>Env: step(Action: escalate)
57
+ Env-->>Agent: Observation, Reward(+0.2), done=False
58
+
59
+ Agent->>Env: step(Action: close)
60
+ Env-->>Agent: Observation, Reward(+0.3), done=True
61
+
62
+ Agent->>Grader: grade_task(task, actions_taken)
63
+ Grader-->>Agent: Final Score (0.0 - 1.0)
64
+ ```
65
+
66
+ ### Reward Breakdown
67
+
68
+ ```mermaid
69
+ flowchart LR
70
+ A[Action Taken] --> B{action_type?}
71
+
72
+ B -->|classify| C{Category correct?}
73
+ C -->|yes| D[+0.30]
74
+ C -->|no| E[+0.00]
75
+
76
+ B -->|reply| F{Keyword hits?}
77
+ F --> G[+0.10 per hit\nmax +0.40]
78
+ F -->|replied before classify| H[-0.05 penalty]
79
+
80
+ B -->|escalate| I{Required?}
81
+ I -->|yes| J[+0.20]
82
+ I -->|no| K[-0.10 penalty]
83
+
84
+ B -->|close| L[+0.10 if classified\n+0.10 if replied\n+0.10 if escalated correctly]
85
+
86
+ B -->|any, at max_steps| M[-0.05 time penalty]
87
+ ```
88
+
89
+ ### File Structure
90
+
91
+ ```mermaid
92
+ graph LR
93
+ root[customer-support-openenv]
94
+
95
+ root --> env[env/]
96
+ env --> models[models.py\nObservation Action Reward]
97
+ env --> environment[environment.py\nCustomerSupportEnv]
98
+ env --> tasks[tasks.py\nTask Definitions]
99
+ env --> grader[grader.py\ngrade_easy grade_medium grade_hard]
100
+ env --> utils[utils.py\nHelpers]
101
+ env --> init[__init__.py]
102
+
103
+ root --> baseline[baseline/]
104
+ baseline --> script[run_baseline.py\nLLM + Mock runner]
105
+
106
+ root --> data[data/]
107
+ data --> tickets[tickets.json\n12 real tickets]
108
+
109
+ root --> apppy[app.py\nFastAPI Server]
110
+ root --> yaml[openenv.yaml]
111
+ root --> docker[Dockerfile]
112
+ root --> readme[README.md]
113
+ root --> env2[.env\nAPI Keys]
114
+ ```
115
+
116
+ ---
117
+
118
+ ## Tasks
119
+
120
+ The environment has 3 tasks of increasing difficulty. An agent must handle all three.
121
+
122
+ | Task | Difficulty | Max Steps | What the agent must do |
123
+ |---|---|---|---|
124
+ | `easy` | 🟢 Easy | 5 | Just classify the ticket correctly |
125
+ | `medium` | 🟡 Medium | 8 | Classify + give a helpful reply |
126
+ | `hard` | 🔴 Hard | 10 | Classify → reply → escalate → close |
127
+
128
+ ### Easy — Classification Only
129
+ ```
130
+ Customer: "I was charged twice for my order and need the duplicate removed."
131
+ Agent must → classify as "billing"
132
+ Score: 1.0 correct, 0.0 wrong
133
+ ```
134
+
135
+ ### Medium — Classify + Reply
136
+ ```
137
+ Customer: "The app keeps crashing on my iPhone. I already restarted twice."
138
+ Agent must → classify as "technical" AND reply with relevant keywords
139
+ Score: 0.4 (classify) + up to 0.6 (reply quality)
140
+ ```
141
+
142
+ ### Hard — Full Pipeline
143
+ ```
144
+ Customer: "Been waiting 3 weeks for my refund. Considering legal action."
145
+ History: 4 prior messages showing escalation attempts
146
+ Agent must → classify + reply + escalate to human + close ticket
147
+ Score: 0.2 + 0.3 + 0.2 + 0.3 (partial credit, penalty for bad escalation)
148
+ ```
149
+
150
+ ---
151
+
152
+ ## Observation Space
153
+
154
+ What the agent sees at each step:
155
+
156
+ ```python
157
+ Observation(
158
+ ticket_id="T001",
159
+ customer_query="I was charged twice and need a refund.",
160
+ history=["Agent: We are looking into it.", "Customer: Still waiting!"],
161
+ status="pending" # open | pending | resolved
162
+ )
163
+ ```
164
+
165
+ ---
166
+
167
+ ## Action Space
168
+
169
+ What the agent can do:
170
+
171
+ ```python
172
+ Action(action_type="classify", category="billing") # identify the issue
173
+ Action(action_type="reply", content="We will help...") # respond to customer
174
+ Action(action_type="escalate") # pass to human agent
175
+ Action(action_type="close") # end the episode
176
+ ```
177
+
178
+ Valid categories: `billing` | `technical` | `refund` | `account` | `abuse`
179
+
180
+ ---
181
+
182
+ ## Setup
183
+
184
+ ### 1. Clone and install
185
+
186
+ ```bash
187
+ git clone <your-repo-url>
188
+ cd customer-support-openenv
189
+ pip install -r requirements.txt
190
+ ```
191
+
192
+ ### 2. Add your API key (optional — needed for LLM baseline)
193
+
194
+ ```bash
195
+ # .env
196
+ OPENAI_API_KEY=sk-...
197
+ ```
198
+
199
+ ### 3. Run the baseline
200
+
201
+ ```bash
202
+ python baseline/run_baseline.py
203
+ ```
204
+
205
+ No API key? It runs in **mock mode** with deterministic actions — still produces valid scores.
206
+
207
+ ### 4. Start the HTTP server
208
+
209
+ ```bash
210
+ python app.py
211
+ # → http://localhost:7860
212
+ ```
213
+
214
+ ### 5. Try it manually
215
+
216
+ ```bash
217
+ # Start a hard task episode
218
+ curl "http://localhost:7860/reset?task_id=hard"
219
+
220
+ # Classify the ticket
221
+ curl -X POST http://localhost:7860/step \
222
+ -H "Content-Type: application/json" \
223
+ -d '{"action_type": "classify", "category": "billing"}'
224
+
225
+ # Reply
226
+ curl -X POST http://localhost:7860/step \
227
+ -H "Content-Type: application/json" \
228
+ -d '{"action_type": "reply", "content": "We are escalating your refund as priority."}'
229
+
230
+ # Escalate
231
+ curl -X POST http://localhost:7860/step \
232
+ -H "Content-Type: application/json" \
233
+ -d '{"action_type": "escalate"}'
234
+
235
+ # Close
236
+ curl -X POST http://localhost:7860/step \
237
+ -H "Content-Type: application/json" \
238
+ -d '{"action_type": "close"}'
239
+ ```
240
+
241
+ ### 6. Use directly in Python
242
+
243
+ ```python
244
+ from env import CustomerSupportEnv, Action
245
+
246
+ env = CustomerSupportEnv()
247
+ obs = env.reset(task_id="hard")
248
+
249
+ print(obs.customer_query)
250
+ # → "I have been waiting three weeks for a refund..."
251
+
252
+ obs, reward, done, info = env.step(Action(action_type="classify", category="billing"))
253
+ print(reward.score, reward.feedback)
254
+ # → 0.3 "correct category"
255
+
256
+ obs, reward, done, info = env.step(Action(
257
+ action_type="reply",
258
+ content="We are making this a priority refund and escalating to a manager."
259
+ ))
260
+
261
+ obs, reward, done, info = env.step(Action(action_type="escalate"))
262
+ obs, reward, done, info = env.step(Action(action_type="close"))
263
+ ```
264
+
265
+ ---
266
+
267
+ ## Docker
268
+
269
+ ```bash
270
+ docker build -t openenv .
271
+ docker run -p 7860:7860 -e OPENAI_API_KEY=sk-... openenv
272
+ ```
273
+
274
+ ---
275
+
276
+ ## Deploying to Hugging Face Spaces
277
+
278
+ 1. Go to [huggingface.co/spaces](https://huggingface.co/spaces)
279
+ 2. Create a new Space → select **Docker** SDK
280
+ 3. Add tag: `openenv`
281
+ 4. Upload this entire repo
282
+ 5. Add `OPENAI_API_KEY` as a Space secret
283
+
284
+ The server starts automatically and exposes all endpoints.
285
+
286
+ ---
287
+
288
+ ## Baseline Scores
289
+
290
+ Measured with deterministic mock actions (no API key needed):
291
+
292
+ | Task | Mock Score | LLM Score (gpt-4o-mini) |
293
+ |---|---|---|
294
+ | easy | 1.000 | ~0.900 |
295
+ | medium | 0.850 | ~0.750 |
296
+ | hard | 0.775 | ~0.650 |
297
+ | **Total** | **2.625 / 3.0** | **~2.300 / 3.0** |
298
+
299
+ ---
300
+
301
+ ## API Reference
302
+
303
+ | Method | Endpoint | Description |
304
+ |---|---|---|
305
+ | GET | `/` | HTML landing page |
306
+ | GET | `/reset?task_id=easy` | Start a new episode |
307
+ | POST | `/step` | Submit an Action |
308
+ | GET | `/state` | Current raw state |
309
+ | GET | `/tasks` | List all tasks |
310
+ | GET | `/health` | Health check |
311
+ | GET | `/docs` | Swagger UI |
312
+
313
+ ---
314
+
315
+ ## Team
316
+
317
+ - **Adit Sharma** — adit.2428cs1345@kiet.edu
318
+ - **Mansi Verma** — ogmansi897@gmail.com
319
+ - **Priyanshi Vishwakarma** — vishwakarmapriyanshi68@gmail.com
320
+
321
+ ---
322
+
323
+ *Meta × PyTorch OpenEnv Hackathon — Round 1, April 2026*
customer-support-openenv/__pycache__/app.cpython-313.pyc ADDED
Binary file (7.58 kB). View file
 
customer-support-openenv/app.py ADDED
@@ -0,0 +1,100 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ from fastapi import FastAPI, HTTPException
3
+ from fastapi.responses import HTMLResponse
4
+ from env.environment import CustomerSupportEnv
5
+ from env.models import Action
6
+ from env.tasks import TASKS
7
+
8
+ app = FastAPI(title="Customer Support OpenEnv", version="1.0.0")
9
+
10
+ # one env per session
11
+ sessions = {}
12
+
13
+ def get_env(session_id="default"):
14
+ if session_id not in sessions:
15
+ sessions[session_id] = CustomerSupportEnv()
16
+ return sessions[session_id]
17
+
18
+
19
+ @app.get("/", response_class=HTMLResponse)
20
+ def home():
21
+ return """
22
+ <html><body style="font-family:sans-serif;background:#0f1117;color:#e0e0e0;max-width:700px;margin:50px auto;padding:0 24px">
23
+ <h1 style="color:#7ee787">Customer Support OpenEnv</h1>
24
+ <p>An OpenEnv RL environment for customer support automation.</p>
25
+ <h2 style="color:#58a6ff">Endpoints</h2>
26
+ <ul>
27
+ <li><a href="/docs" style="color:#58a6ff">/docs</a> &mdash; Swagger UI</li>
28
+ <li><code>GET /reset?task_id=easy|medium|hard</code></li>
29
+ <li><code>POST /step</code> &mdash; send an Action</li>
30
+ <li><code>GET /state</code></li>
31
+ <li><a href="/tasks" style="color:#58a6ff">GET /tasks</a></li>
32
+ </ul>
33
+ </body></html>
34
+ """
35
+
36
+
37
+ @app.get("/health")
38
+ def health():
39
+ return {"status": "ok"}
40
+
41
+
42
+ @app.get("/reset")
43
+ def reset(task_id: str = None, session_id: str = "default"):
44
+ env = get_env(session_id)
45
+ try:
46
+ obs = env.reset(task_id=task_id)
47
+ except ValueError as e:
48
+ raise HTTPException(400, str(e))
49
+ return {
50
+ "observation": obs.model_dump(),
51
+ "task": {
52
+ "id": env.current_task["id"],
53
+ "description": env.current_task["description"],
54
+ "max_steps": env.current_task["max_steps"],
55
+ },
56
+ }
57
+
58
+
59
+ @app.post("/step")
60
+ def step(action: Action, session_id: str = "default"):
61
+ env = get_env(session_id)
62
+ if not env.current_task:
63
+ raise HTTPException(400, "Call /reset first.")
64
+ try:
65
+ obs, reward, done, info = env.step(action)
66
+ except RuntimeError as e:
67
+ raise HTTPException(400, str(e))
68
+ return {
69
+ "observation": obs.model_dump(),
70
+ "reward": reward.model_dump(),
71
+ "done": done,
72
+ "info": info,
73
+ }
74
+
75
+
76
+ @app.get("/state")
77
+ def state(session_id: str = "default"):
78
+ env = get_env(session_id)
79
+ if not env.current_task:
80
+ raise HTTPException(400, "Call /reset first.")
81
+ return env.state()
82
+
83
+
84
+ @app.get("/tasks")
85
+ def list_tasks():
86
+ return [
87
+ {
88
+ "id": t["id"],
89
+ "description": t["description"],
90
+ "max_steps": t["max_steps"],
91
+ "requires_escalation": t["expected"]["requires_escalation"],
92
+ }
93
+ for t in TASKS.values()
94
+ ]
95
+
96
+
97
+ if __name__ == "__main__":
98
+ import uvicorn
99
+ port = int(os.getenv("PORT", 7860))
100
+ uvicorn.run("app:app", host="0.0.0.0", port=port)
customer-support-openenv/baseline/__pycache__/run_baseline.cpython-313.pyc ADDED
Binary file (7.59 kB). View file
 
customer-support-openenv/baseline/run_baseline.py ADDED
@@ -0,0 +1,138 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import sys
2
+ import os
3
+ import json
4
+ import io
5
+
6
+ sys.stdout = io.TextIOWrapper(sys.stdout.buffer, encoding="utf-8", errors="replace")
7
+ sys.path.insert(0, os.path.join(os.path.dirname(__file__), ".."))
8
+
9
+ from dotenv import load_dotenv
10
+ load_dotenv(os.path.join(os.path.dirname(__file__), "..", ".env"))
11
+
12
+ from env.environment import CustomerSupportEnv
13
+ from env.models import Action
14
+ from env.grader import grade_task
15
+
16
+ SYSTEM_PROMPT = """You are an AI customer support agent inside an RL environment.
17
+ Read the ticket and respond with a JSON object ONLY. Pick one action:
18
+
19
+ {"action_type": "classify", "category": "<billing|technical|refund|account|abuse>"}
20
+ {"action_type": "reply", "content": "<your reply>"}
21
+ {"action_type": "escalate"}
22
+ {"action_type": "close"}
23
+
24
+ Strategy: classify first, reply next, escalate only if severe (legal threats / long-unresolved issues), then close."""
25
+
26
+
27
+ def obs_to_text(obs):
28
+ lines = [f"Ticket: {obs.ticket_id}", f"Status: {obs.status}", f"Query: {obs.customer_query}"]
29
+ if obs.history:
30
+ lines.append("History:")
31
+ for msg in obs.history:
32
+ lines.append(f" {msg}")
33
+ return "\n".join(lines)
34
+
35
+
36
+ def call_llm(client, obs, messages):
37
+ messages.append({"role": "user", "content": obs_to_text(obs)})
38
+ try:
39
+ resp = client.chat.completions.create(
40
+ model="gpt-4o-mini",
41
+ messages=messages,
42
+ temperature=0.0,
43
+ response_format={"type": "json_object"},
44
+ )
45
+ raw = resp.choices[0].message.content
46
+ messages.append({"role": "assistant", "content": raw})
47
+ return Action(**json.loads(raw))
48
+ except Exception as e:
49
+ print(f" LLM error: {e}")
50
+ return Action(action_type="close")
51
+
52
+
53
+ def run_llm(client, task_id):
54
+ env = CustomerSupportEnv()
55
+ obs = env.reset(task_id=task_id)
56
+ task = env.current_task
57
+ messages = [{"role": "system", "content": SYSTEM_PROMPT}]
58
+ taken = []
59
+
60
+ print(f"\n{'='*55}")
61
+ print(f" Task: {task_id.upper()} | {task['description'][:50]}")
62
+ print(f"{'='*55}")
63
+
64
+ for i in range(task["max_steps"]):
65
+ action = call_llm(client, obs, messages)
66
+ obs, reward, done, info = env.step(action)
67
+ taken.append(action)
68
+ cat = f"cat={action.category}" if action.category else ""
69
+ print(f" step {i+1}: {action.action_type:<10} {cat:<16} reward={reward.score:.3f}")
70
+ if done:
71
+ break
72
+
73
+ score = grade_task(task, taken)
74
+ print(f" grader score: {score:.3f}")
75
+ return score
76
+
77
+
78
+ def run_mock(task_id):
79
+ env = CustomerSupportEnv()
80
+ env.reset(task_id=task_id)
81
+ task = env.current_task
82
+ ex = task["expected"]
83
+ kw = ex["keywords"][0]
84
+
85
+ actions = [
86
+ Action(action_type="classify", category=ex["category"]),
87
+ Action(action_type="reply", content=f"We understand your {ex['category']} issue. We will {kw} your request right away. Please reinstall if needed. Sorry for the inconvenience."),
88
+ ]
89
+ if ex["requires_escalation"]:
90
+ actions.append(Action(action_type="escalate"))
91
+ actions.append(Action(action_type="close"))
92
+
93
+ taken = []
94
+ print(f"\n{'='*55}")
95
+ print(f" Task: {task_id.upper()} | {task['description'][:50]}")
96
+ print(f"{'='*55}")
97
+
98
+ for action in actions:
99
+ obs, reward, done, info = env.step(action)
100
+ taken.append(action)
101
+ cat = f"cat={action.category}" if action.category else ""
102
+ print(f" step {info['step']}: {action.action_type:<10} {cat:<16} reward={reward.score:.3f}")
103
+ if done:
104
+ break
105
+
106
+ score = grade_task(task, taken)
107
+ print(f" grader score: {score:.3f}")
108
+ return score
109
+
110
+
111
+ def main():
112
+ api_key = os.getenv("OPENAI_API_KEY", "")
113
+ use_llm = bool(api_key)
114
+
115
+ print("\n[*] Customer Support OpenEnv - Baseline")
116
+ print(f" mode: {'LLM (gpt-4o-mini)' if use_llm else 'Mock (no API key)'}")
117
+
118
+ client = None
119
+ if use_llm:
120
+ from openai import OpenAI
121
+ client = OpenAI(api_key=api_key)
122
+
123
+ results = {}
124
+ for tid in ["easy", "medium", "hard"]:
125
+ results[tid] = run_llm(client, tid) if use_llm else run_mock(tid)
126
+
127
+ print(f"\n{'='*55}")
128
+ print(" RESULTS")
129
+ print(f"{'='*55}")
130
+ for tid, score in results.items():
131
+ bar = "#" * round(score * 25)
132
+ print(f" {tid:<10} {score:.3f} {bar}")
133
+ print(f" {'total':<10} {sum(results.values()):.3f} / 3.000")
134
+ print(f"{'='*55}\n")
135
+
136
+
137
+ if __name__ == "__main__":
138
+ main()
customer-support-openenv/data/tickets.json ADDED
@@ -0,0 +1,86 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "id": "t001",
4
+ "query": "I was charged twice for my order and need the duplicate payment removed.",
5
+ "category": "billing",
6
+ "requires_escalation": false,
7
+ "difficulty": "easy"
8
+ },
9
+ {
10
+ "id": "t002",
11
+ "query": "The app keeps crashing on my iPhone every time I open it.",
12
+ "category": "technical",
13
+ "requires_escalation": false,
14
+ "difficulty": "easy"
15
+ },
16
+ {
17
+ "id": "t003",
18
+ "query": "I want to request a refund for my subscription — I cancelled it last month.",
19
+ "category": "refund",
20
+ "requires_escalation": false,
21
+ "difficulty": "easy"
22
+ },
23
+ {
24
+ "id": "t004",
25
+ "query": "I cannot log in to my account. My password reset email never arrived.",
26
+ "category": "account",
27
+ "requires_escalation": false,
28
+ "difficulty": "easy"
29
+ },
30
+ {
31
+ "id": "t005",
32
+ "query": "Another user is harassing me repeatedly inside the platform. Please act.",
33
+ "category": "abuse",
34
+ "requires_escalation": true,
35
+ "difficulty": "medium"
36
+ },
37
+ {
38
+ "id": "t006",
39
+ "query": "My payment failed three times but I can see pending charges on my bank statement.",
40
+ "category": "billing",
41
+ "requires_escalation": false,
42
+ "difficulty": "medium"
43
+ },
44
+ {
45
+ "id": "t007",
46
+ "query": "Data I uploaded last week has disappeared from my account without any explanation.",
47
+ "category": "technical",
48
+ "requires_escalation": true,
49
+ "difficulty": "medium"
50
+ },
51
+ {
52
+ "id": "t008",
53
+ "query": "I was promised a full refund 10 days ago but nothing has arrived. I need this resolved NOW.",
54
+ "category": "refund",
55
+ "requires_escalation": true,
56
+ "difficulty": "medium"
57
+ },
58
+ {
59
+ "id": "t009",
60
+ "query": "Someone logged into my account from another country. I did not authorise this.",
61
+ "category": "account",
62
+ "requires_escalation": true,
63
+ "difficulty": "hard"
64
+ },
65
+ {
66
+ "id": "t010",
67
+ "query": "I have been waiting three weeks for a refund your team promised. I am considering legal action.",
68
+ "category": "billing",
69
+ "requires_escalation": true,
70
+ "difficulty": "hard"
71
+ },
72
+ {
73
+ "id": "t011",
74
+ "query": "Your API has been returning 500 errors for 6 hours and it is costing my business thousands of dollars.",
75
+ "category": "technical",
76
+ "requires_escalation": true,
77
+ "difficulty": "hard"
78
+ },
79
+ {
80
+ "id": "t012",
81
+ "query": "I upgraded my plan but was never given access to the premium features I paid for.",
82
+ "category": "billing",
83
+ "requires_escalation": false,
84
+ "difficulty": "medium"
85
+ }
86
+ ]
customer-support-openenv/env/__init__.py ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ from .environment import CustomerSupportEnv
2
+ from .models import Observation, Action, Reward
3
+ from .grader import grade_task
4
+
5
+ __all__ = ["CustomerSupportEnv", "Observation", "Action", "Reward", "grade_task"]
customer-support-openenv/env/__pycache__/__init__.cpython-313.pyc ADDED
Binary file (419 Bytes). View file
 
customer-support-openenv/env/__pycache__/environment.cpython-313.pyc ADDED
Binary file (7.02 kB). View file
 
customer-support-openenv/env/__pycache__/grader.cpython-313.pyc ADDED
Binary file (4.41 kB). View file
 
customer-support-openenv/env/__pycache__/models.cpython-313.pyc ADDED
Binary file (1.34 kB). View file
 
customer-support-openenv/env/__pycache__/tasks.cpython-313.pyc ADDED
Binary file (1.6 kB). View file
 
customer-support-openenv/env/environment.py ADDED
@@ -0,0 +1,147 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import random
2
+ from copy import deepcopy
3
+ from .models import Observation, Action, Reward
4
+ from .tasks import TASKS, TASK_LIST
5
+
6
+
7
+ class CustomerSupportEnv:
8
+
9
+ def __init__(self):
10
+ self.current_task = None
11
+ self.state_data = None
12
+ self.done = False
13
+ self.step_count = 0
14
+ self._classified = False
15
+ self._replied = False
16
+ self._escalated = False
17
+ self._closed = False
18
+
19
+ def reset(self, task_id=None):
20
+ if task_id:
21
+ if task_id not in TASKS:
22
+ raise ValueError(f"Unknown task '{task_id}'. Pick from: {list(TASKS.keys())}")
23
+ self.current_task = TASKS[task_id]
24
+ else:
25
+ self.current_task = random.choice(TASK_LIST)
26
+
27
+ self.state_data = deepcopy(self.current_task["input"])
28
+ self.done = False
29
+ self.step_count = 0
30
+ self._classified = False
31
+ self._replied = False
32
+ self._escalated = False
33
+ self._closed = False
34
+
35
+ return Observation(**self.state_data)
36
+
37
+ def step(self, action: Action):
38
+ if self.done:
39
+ raise RuntimeError("Episode done. Call reset() first.")
40
+
41
+ self.step_count += 1
42
+ reward = self._compute_reward(action)
43
+
44
+ if action.action_type == "close":
45
+ self.done = True
46
+ self._closed = True
47
+
48
+ # hit max steps → small penalty
49
+ max_steps = self.current_task.get("max_steps", 10)
50
+ if self.step_count >= max_steps and not self.done:
51
+ self.done = True
52
+ new_score = max(0.0, reward.score - 0.05)
53
+ reward = Reward(
54
+ score=new_score,
55
+ feedback=reward.feedback + " | time limit hit, -0.05",
56
+ breakdown={**reward.breakdown, "time_penalty": -0.05},
57
+ )
58
+
59
+ if action.content:
60
+ self.state_data["history"].append(f"Agent: {action.content}")
61
+
62
+ info = {
63
+ "step": self.step_count,
64
+ "task_id": self.current_task["id"],
65
+ "classified": self._classified,
66
+ "replied": self._replied,
67
+ "escalated": self._escalated,
68
+ "closed": self._closed,
69
+ }
70
+
71
+ return Observation(**self.state_data), reward, self.done, info
72
+
73
+ def state(self):
74
+ return self.state_data
75
+
76
+ def _compute_reward(self, action: Action) -> Reward:
77
+ correct = self.current_task["expected"]
78
+ score = 0.0
79
+ breakdown = {}
80
+
81
+ if action.action_type == "classify":
82
+ if action.category and action.category.lower() == correct["category"].lower():
83
+ score += 0.3
84
+ breakdown["classify"] = 0.3
85
+ else:
86
+ breakdown["classify"] = 0.0
87
+ self._classified = True
88
+
89
+ elif action.action_type == "reply":
90
+ if not self._classified:
91
+ score -= 0.05
92
+ breakdown["early_reply_penalty"] = -0.05
93
+
94
+ hits = sum(1 for kw in correct["keywords"] if kw in (action.content or "").lower())
95
+ reply_score = min(0.4, hits * 0.1)
96
+ score += reply_score
97
+ breakdown["reply"] = reply_score
98
+ self._replied = True
99
+
100
+ elif action.action_type == "escalate":
101
+ if correct["requires_escalation"]:
102
+ score += 0.2
103
+ breakdown["escalate"] = 0.2
104
+ else:
105
+ score -= 0.1
106
+ breakdown["escalate"] = -0.1
107
+ self._escalated = True
108
+
109
+ elif action.action_type == "close":
110
+ bonus = 0.0
111
+ if self._classified:
112
+ bonus += 0.1
113
+ if self._replied:
114
+ bonus += 0.1
115
+ if correct["requires_escalation"] and self._escalated:
116
+ bonus += 0.1
117
+ score += bonus
118
+ breakdown["close_bonus"] = bonus
119
+
120
+ score = round(max(0.0, min(1.0, score)), 4)
121
+ feedback = self._make_feedback(action, breakdown, correct)
122
+
123
+ return Reward(score=score, feedback=feedback, breakdown=breakdown)
124
+
125
+ def _make_feedback(self, action, breakdown, correct):
126
+ parts = []
127
+
128
+ if breakdown.get("classify") == 0.3:
129
+ parts.append("correct category")
130
+ elif "classify" in breakdown:
131
+ parts.append(f"wrong category (expected {correct['category']})")
132
+
133
+ if "early_reply_penalty" in breakdown:
134
+ parts.append("replied before classifying")
135
+
136
+ if "reply" in breakdown:
137
+ parts.append(f"reply score {breakdown['reply']:.2f}")
138
+
139
+ if breakdown.get("escalate") == 0.2:
140
+ parts.append("escalated correctly")
141
+ elif breakdown.get("escalate") == -0.1:
142
+ parts.append("unnecessary escalation")
143
+
144
+ if "close_bonus" in breakdown:
145
+ parts.append(f"close bonus {breakdown['close_bonus']:.2f}")
146
+
147
+ return ", ".join(parts) if parts else "ok"
customer-support-openenv/env/grader.py ADDED
@@ -0,0 +1,72 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from typing import List
2
+ from .models import Action
3
+
4
+
5
+ def grade_easy(task, actions: List[Action]) -> float:
6
+ expected = task["expected"]["category"].lower()
7
+ for a in actions:
8
+ if a.action_type == "classify":
9
+ return 1.0 if (a.category or "").lower() == expected else 0.0
10
+ return 0.0
11
+
12
+
13
+ def grade_medium(task, actions: List[Action]) -> float:
14
+ score = 0.0
15
+ expected_cat = task["expected"]["category"].lower()
16
+ keywords = [k.lower() for k in task["expected"]["keywords"]]
17
+
18
+ for a in actions:
19
+ if a.action_type == "classify":
20
+ if (a.category or "").lower() == expected_cat:
21
+ score += 0.4
22
+ break
23
+
24
+ for a in actions:
25
+ if a.action_type == "reply" and a.content:
26
+ hits = sum(1 for k in keywords if k in a.content.lower())
27
+ score += min(0.6, hits * 0.15)
28
+ break
29
+
30
+ return round(min(1.0, score), 4)
31
+
32
+
33
+ def grade_hard(task, actions: List[Action]) -> float:
34
+ score = 0.0
35
+ expected_cat = task["expected"]["category"].lower()
36
+ keywords = [k.lower() for k in task["expected"]["keywords"]]
37
+ needs_escalation = task["expected"]["requires_escalation"]
38
+
39
+ for a in actions:
40
+ if a.action_type == "classify":
41
+ if (a.category or "").lower() == expected_cat:
42
+ score += 0.2
43
+ break
44
+
45
+ for a in actions:
46
+ if a.action_type == "reply" and a.content:
47
+ hits = sum(1 for k in keywords if k in a.content.lower())
48
+ score += min(0.3, hits * 0.075)
49
+ break
50
+
51
+ escalated = any(a.action_type == "escalate" for a in actions)
52
+ if needs_escalation and escalated:
53
+ score += 0.2
54
+ elif not needs_escalation and escalated:
55
+ score -= 0.1
56
+
57
+ if any(a.action_type == "close" for a in actions):
58
+ score += 0.3
59
+
60
+ return round(max(0.0, min(1.0, score)), 4)
61
+
62
+
63
+ GRADERS = {
64
+ "easy": grade_easy,
65
+ "medium": grade_medium,
66
+ "hard": grade_hard,
67
+ }
68
+
69
+
70
+ def grade_task(task, actions: List[Action]) -> float:
71
+ grader = GRADERS.get(task.get("id", "easy"), grade_easy)
72
+ return grader(task, actions)
customer-support-openenv/env/models.py ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from pydantic import BaseModel
2
+ from typing import List, Optional, Dict, Any
3
+
4
+
5
+ class Observation(BaseModel):
6
+ ticket_id: str
7
+ customer_query: str
8
+ history: List[str]
9
+ status: str
10
+
11
+
12
+ class Action(BaseModel):
13
+ action_type: str # classify | reply | escalate | close
14
+ content: Optional[str] = None
15
+ category: Optional[str] = None
16
+
17
+
18
+ class Reward(BaseModel):
19
+ score: float
20
+ feedback: str
21
+ breakdown: Dict[str, Any] = {}
customer-support-openenv/env/tasks.py ADDED
@@ -0,0 +1,59 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ TASKS = {
2
+ "easy": {
3
+ "id": "easy",
4
+ "description": "Classify a customer ticket into the right category.",
5
+ "input": {
6
+ "ticket_id": "T001",
7
+ "customer_query": "I was charged twice for my order #ORD-8821 and need the duplicate payment removed.",
8
+ "history": [],
9
+ "status": "open",
10
+ },
11
+ "expected": {
12
+ "category": "billing",
13
+ "keywords": ["refund", "charge", "payment", "duplicate", "billing"],
14
+ "requires_escalation": False,
15
+ },
16
+ "max_steps": 5,
17
+ },
18
+
19
+ "medium": {
20
+ "id": "medium",
21
+ "description": "Classify the ticket and give a helpful reply.",
22
+ "input": {
23
+ "ticket_id": "T002",
24
+ "customer_query": "The app keeps crashing on my iPhone 15. I already restarted my phone twice.",
25
+ "history": [],
26
+ "status": "open",
27
+ },
28
+ "expected": {
29
+ "category": "technical",
30
+ "keywords": ["reinstall", "update", "cache", "support", "technical", "version"],
31
+ "requires_escalation": False,
32
+ },
33
+ "max_steps": 8,
34
+ },
35
+
36
+ "hard": {
37
+ "id": "hard",
38
+ "description": "Full pipeline — classify, reply, escalate if needed, then close.",
39
+ "input": {
40
+ "ticket_id": "T003",
41
+ "customer_query": "I have been waiting three weeks for a refund your team promised. I am considering legal action.",
42
+ "history": [
43
+ "Agent: We apologise. Your refund is being processed.",
44
+ "Customer: Two weeks and still nothing!",
45
+ "Agent: We escalated this to our billing team.",
46
+ "Customer: Another week gone. I want to speak to a manager!",
47
+ ],
48
+ "status": "pending",
49
+ },
50
+ "expected": {
51
+ "category": "billing",
52
+ "keywords": ["escalat", "manager", "priority", "urgent", "legal", "refund", "apologize", "sorry"],
53
+ "requires_escalation": True,
54
+ },
55
+ "max_steps": 10,
56
+ },
57
+ }
58
+
59
+ TASK_LIST = list(TASKS.values())
customer-support-openenv/env/utils.py ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import json
2
+ import os
3
+ from typing import List, Dict, Any
4
+
5
+
6
+ def load_tickets(path=None) -> List[Dict[str, Any]]:
7
+ if path is None:
8
+ path = os.path.join(os.path.dirname(__file__), "..", "data", "tickets.json")
9
+ with open(path) as f:
10
+ return json.load(f)
11
+
12
+
13
+ def format_observation(obs) -> str:
14
+ lines = [
15
+ f"Ticket : {obs.ticket_id}",
16
+ f"Status : {obs.status}",
17
+ f"Query : {obs.customer_query}",
18
+ ]
19
+ for i, msg in enumerate(obs.history, 1):
20
+ lines.append(f" [{i}] {msg}")
21
+ return "\n".join(lines)
22
+
23
+
24
+ def log_step(step, action, reward):
25
+ cat = action.category or "-"
26
+ print(f"step {step:>2} | {action.action_type:<10} cat={cat:<12} score={reward.score:.2f} | {reward.feedback}")
customer-support-openenv/openenv.yaml ADDED
@@ -0,0 +1,107 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ name: customer-support-env
2
+ version: "1.0"
3
+
4
+ description: >
5
+ An OpenEnv-compliant environment that simulates real-world customer support
6
+ ticket workflows. An AI agent must classify incoming tickets, craft appropriate
7
+ replies, decide when to escalate to a human agent, and close resolved tickets.
8
+ The environment provides dense, shaped rewards at every step to enable
9
+ efficient RL training — not just a sparse end-of-episode signal.
10
+
11
+ entry_point: env.environment:CustomerSupportEnv
12
+
13
+ author: "Adit Sharma, Mansi Verma, Priyanshi Vishwakarma"
14
+ tags:
15
+ - openenv
16
+ - customer-support
17
+ - nlp
18
+ - real-world
19
+ - multi-step
20
+
21
+ # ---------------------------------------------------------
22
+ # Tasks
23
+ # ---------------------------------------------------------
24
+ tasks:
25
+ - id: easy
26
+ difficulty: easy
27
+ description: >
28
+ Classify a single customer ticket into the correct category
29
+ (billing / technical / refund / account / abuse).
30
+ max_steps: 5
31
+ scoring: "1.0 for correct classification, 0.0 otherwise."
32
+
33
+ - id: medium
34
+ difficulty: medium
35
+ description: >
36
+ Classify the ticket correctly (worth 0.4) and then reply with a helpful,
37
+ keyword-rich response that addresses the root issue (up to 0.6).
38
+ max_steps: 8
39
+ scoring: "Partial credit: 0.4 classify + up to 0.6 reply quality."
40
+
41
+ - id: hard
42
+ difficulty: hard
43
+ description: >
44
+ Full resolution pipeline — classify (0.2), give a quality reply (0.3),
45
+ escalate to a human agent when required (0.2), and close the ticket (0.3).
46
+ Penalises unnecessary escalation (−0.1).
47
+ max_steps: 10
48
+ scoring: "Partial credit across all 4 action types; penalty for bad escalation."
49
+
50
+ # ---------------------------------------------------------
51
+ # Action Space
52
+ # ---------------------------------------------------------
53
+ action_space:
54
+ type: discrete-structured
55
+ actions:
56
+ - name: classify
57
+ required_fields: [category]
58
+ category_values: [billing, technical, refund, account, abuse]
59
+ description: "Classify the ticket into a support category."
60
+
61
+ - name: reply
62
+ required_fields: [content]
63
+ description: "Send a reply message to the customer."
64
+
65
+ - name: escalate
66
+ required_fields: []
67
+ description: "Escalate the ticket to a human agent."
68
+
69
+ - name: close
70
+ required_fields: []
71
+ description: "Close the ticket and end the episode (done=True)."
72
+
73
+ # ---------------------------------------------------------
74
+ # Observation Space
75
+ # ---------------------------------------------------------
76
+ observation_space:
77
+ type: structured
78
+ fields:
79
+ - name: ticket_id
80
+ type: string
81
+ description: "Unique identifier for the support ticket."
82
+
83
+ - name: customer_query
84
+ type: string
85
+ description: "The customer's message or complaint."
86
+
87
+ - name: history
88
+ type: list[string]
89
+ description: "Chronological conversation history (agent + customer turns)."
90
+
91
+ - name: status
92
+ type: string
93
+ enum: [open, pending, resolved]
94
+ description: "Current status of the ticket."
95
+
96
+ # ---------------------------------------------------------
97
+ # Reward
98
+ # ---------------------------------------------------------
99
+ reward_range: [0.0, 1.0]
100
+ reward_structure:
101
+ classify_correct: +0.3
102
+ reply_per_keyword_hit: +0.1 (max 0.4)
103
+ reply_before_classify: -0.05
104
+ escalate_correct: +0.2
105
+ escalate_unnecessary: -0.1
106
+ close_bonus: +0.0 to +0.3 (depends on prior progress)
107
+ time_penalty: -0.05 (if step_count >= max_steps)
customer-support-openenv/requirements.txt ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ pydantic>=2.0
2
+ openai>=1.0
3
+ python-dotenv
4
+ fastapi
5
+ uvicorn