Commit ·
332538a
0
Parent(s):
initial
Browse files- customer-support-openenv/.env +3 -0
- customer-support-openenv/Dockerfile +9 -0
- customer-support-openenv/README.md +323 -0
- customer-support-openenv/__pycache__/app.cpython-313.pyc +0 -0
- customer-support-openenv/app.py +100 -0
- customer-support-openenv/baseline/__pycache__/run_baseline.cpython-313.pyc +0 -0
- customer-support-openenv/baseline/run_baseline.py +138 -0
- customer-support-openenv/data/tickets.json +86 -0
- customer-support-openenv/env/__init__.py +5 -0
- customer-support-openenv/env/__pycache__/__init__.cpython-313.pyc +0 -0
- customer-support-openenv/env/__pycache__/environment.cpython-313.pyc +0 -0
- customer-support-openenv/env/__pycache__/grader.cpython-313.pyc +0 -0
- customer-support-openenv/env/__pycache__/models.cpython-313.pyc +0 -0
- customer-support-openenv/env/__pycache__/tasks.cpython-313.pyc +0 -0
- customer-support-openenv/env/environment.py +147 -0
- customer-support-openenv/env/grader.py +72 -0
- customer-support-openenv/env/models.py +21 -0
- customer-support-openenv/env/tasks.py +59 -0
- customer-support-openenv/env/utils.py +26 -0
- customer-support-openenv/openenv.yaml +107 -0
- customer-support-openenv/requirements.txt +5 -0
customer-support-openenv/.env
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Environment variables
|
| 2 |
+
# Add your secrets here — do NOT commit this file to version control
|
| 3 |
+
OPENAI_API_KEY=your_key_here
|
customer-support-openenv/Dockerfile
ADDED
|
@@ -0,0 +1,9 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
FROM python:3.10
|
| 2 |
+
|
| 3 |
+
WORKDIR /app
|
| 4 |
+
|
| 5 |
+
COPY . .
|
| 6 |
+
|
| 7 |
+
RUN pip install -r requirements.txt
|
| 8 |
+
|
| 9 |
+
CMD ["python", "app.py"]
|
customer-support-openenv/README.md
ADDED
|
@@ -0,0 +1,323 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Customer Support OpenEnv
|
| 2 |
+
|
| 3 |
+
> A real-world reinforcement learning environment where an AI agent learns to handle customer support tickets — classify issues, craft replies, escalate when needed, and resolve tickets.
|
| 4 |
+
|
| 5 |
+
Built for the **Meta × PyTorch OpenEnv Hackathon — Round 1, April 2026.**
|
| 6 |
+
|
| 7 |
+
---
|
| 8 |
+
|
| 9 |
+
## What is this?
|
| 10 |
+
|
| 11 |
+
Most RL environments are games. This one is not.
|
| 12 |
+
|
| 13 |
+
Every company with customers has a support queue. Tickets come in — billing complaints, app crashes, refund requests, angry users threatening legal action. A human agent reads each one, figures out what's wrong, replies helpfully, escalates if it's too serious, and closes it.
|
| 14 |
+
|
| 15 |
+
This environment teaches an AI to do exactly that. The agent receives a ticket, takes actions step by step, and gets rewarded based on how well it handles the situation. The reward signal is **dense** — the agent gets feedback at every step, not just at the end.
|
| 16 |
+
|
| 17 |
+
---
|
| 18 |
+
|
| 19 |
+
## Architecture
|
| 20 |
+
|
| 21 |
+
### Overall System
|
| 22 |
+
|
| 23 |
+
```mermaid
|
| 24 |
+
graph TD
|
| 25 |
+
A[Agent] -->|Action| B[CustomerSupportEnv]
|
| 26 |
+
B -->|Observation, Reward, Done, Info| A
|
| 27 |
+
|
| 28 |
+
B --> C[tasks.py\nTask Definitions]
|
| 29 |
+
B --> D[models.py\nTyped Models]
|
| 30 |
+
B --> E[grader.py\nPer-Task Graders]
|
| 31 |
+
|
| 32 |
+
F[app.py\nFastAPI Server] --> B
|
| 33 |
+
G[baseline/run_baseline.py] --> B
|
| 34 |
+
|
| 35 |
+
H[openenv.yaml\nMetadata + Config] -.->|describes| B
|
| 36 |
+
I[data/tickets.json\nTicket Dataset] -.->|reference data| B
|
| 37 |
+
```
|
| 38 |
+
|
| 39 |
+
### Episode Flow
|
| 40 |
+
|
| 41 |
+
```mermaid
|
| 42 |
+
sequenceDiagram
|
| 43 |
+
participant Agent
|
| 44 |
+
participant Env as CustomerSupportEnv
|
| 45 |
+
participant Grader
|
| 46 |
+
|
| 47 |
+
Agent->>Env: reset(task_id="hard")
|
| 48 |
+
Env-->>Agent: Observation (ticket + history + status)
|
| 49 |
+
|
| 50 |
+
Agent->>Env: step(Action: classify, category="billing")
|
| 51 |
+
Env-->>Agent: Observation, Reward(+0.3), done=False
|
| 52 |
+
|
| 53 |
+
Agent->>Env: step(Action: reply, content="...")
|
| 54 |
+
Env-->>Agent: Observation, Reward(+0.2), done=False
|
| 55 |
+
|
| 56 |
+
Agent->>Env: step(Action: escalate)
|
| 57 |
+
Env-->>Agent: Observation, Reward(+0.2), done=False
|
| 58 |
+
|
| 59 |
+
Agent->>Env: step(Action: close)
|
| 60 |
+
Env-->>Agent: Observation, Reward(+0.3), done=True
|
| 61 |
+
|
| 62 |
+
Agent->>Grader: grade_task(task, actions_taken)
|
| 63 |
+
Grader-->>Agent: Final Score (0.0 - 1.0)
|
| 64 |
+
```
|
| 65 |
+
|
| 66 |
+
### Reward Breakdown
|
| 67 |
+
|
| 68 |
+
```mermaid
|
| 69 |
+
flowchart LR
|
| 70 |
+
A[Action Taken] --> B{action_type?}
|
| 71 |
+
|
| 72 |
+
B -->|classify| C{Category correct?}
|
| 73 |
+
C -->|yes| D[+0.30]
|
| 74 |
+
C -->|no| E[+0.00]
|
| 75 |
+
|
| 76 |
+
B -->|reply| F{Keyword hits?}
|
| 77 |
+
F --> G[+0.10 per hit\nmax +0.40]
|
| 78 |
+
F -->|replied before classify| H[-0.05 penalty]
|
| 79 |
+
|
| 80 |
+
B -->|escalate| I{Required?}
|
| 81 |
+
I -->|yes| J[+0.20]
|
| 82 |
+
I -->|no| K[-0.10 penalty]
|
| 83 |
+
|
| 84 |
+
B -->|close| L[+0.10 if classified\n+0.10 if replied\n+0.10 if escalated correctly]
|
| 85 |
+
|
| 86 |
+
B -->|any, at max_steps| M[-0.05 time penalty]
|
| 87 |
+
```
|
| 88 |
+
|
| 89 |
+
### File Structure
|
| 90 |
+
|
| 91 |
+
```mermaid
|
| 92 |
+
graph LR
|
| 93 |
+
root[customer-support-openenv]
|
| 94 |
+
|
| 95 |
+
root --> env[env/]
|
| 96 |
+
env --> models[models.py\nObservation Action Reward]
|
| 97 |
+
env --> environment[environment.py\nCustomerSupportEnv]
|
| 98 |
+
env --> tasks[tasks.py\nTask Definitions]
|
| 99 |
+
env --> grader[grader.py\ngrade_easy grade_medium grade_hard]
|
| 100 |
+
env --> utils[utils.py\nHelpers]
|
| 101 |
+
env --> init[__init__.py]
|
| 102 |
+
|
| 103 |
+
root --> baseline[baseline/]
|
| 104 |
+
baseline --> script[run_baseline.py\nLLM + Mock runner]
|
| 105 |
+
|
| 106 |
+
root --> data[data/]
|
| 107 |
+
data --> tickets[tickets.json\n12 real tickets]
|
| 108 |
+
|
| 109 |
+
root --> apppy[app.py\nFastAPI Server]
|
| 110 |
+
root --> yaml[openenv.yaml]
|
| 111 |
+
root --> docker[Dockerfile]
|
| 112 |
+
root --> readme[README.md]
|
| 113 |
+
root --> env2[.env\nAPI Keys]
|
| 114 |
+
```
|
| 115 |
+
|
| 116 |
+
---
|
| 117 |
+
|
| 118 |
+
## Tasks
|
| 119 |
+
|
| 120 |
+
The environment has 3 tasks of increasing difficulty. An agent must handle all three.
|
| 121 |
+
|
| 122 |
+
| Task | Difficulty | Max Steps | What the agent must do |
|
| 123 |
+
|---|---|---|---|
|
| 124 |
+
| `easy` | 🟢 Easy | 5 | Just classify the ticket correctly |
|
| 125 |
+
| `medium` | 🟡 Medium | 8 | Classify + give a helpful reply |
|
| 126 |
+
| `hard` | 🔴 Hard | 10 | Classify → reply → escalate → close |
|
| 127 |
+
|
| 128 |
+
### Easy — Classification Only
|
| 129 |
+
```
|
| 130 |
+
Customer: "I was charged twice for my order and need the duplicate removed."
|
| 131 |
+
Agent must → classify as "billing"
|
| 132 |
+
Score: 1.0 correct, 0.0 wrong
|
| 133 |
+
```
|
| 134 |
+
|
| 135 |
+
### Medium — Classify + Reply
|
| 136 |
+
```
|
| 137 |
+
Customer: "The app keeps crashing on my iPhone. I already restarted twice."
|
| 138 |
+
Agent must → classify as "technical" AND reply with relevant keywords
|
| 139 |
+
Score: 0.4 (classify) + up to 0.6 (reply quality)
|
| 140 |
+
```
|
| 141 |
+
|
| 142 |
+
### Hard — Full Pipeline
|
| 143 |
+
```
|
| 144 |
+
Customer: "Been waiting 3 weeks for my refund. Considering legal action."
|
| 145 |
+
History: 4 prior messages showing escalation attempts
|
| 146 |
+
Agent must → classify + reply + escalate to human + close ticket
|
| 147 |
+
Score: 0.2 + 0.3 + 0.2 + 0.3 (partial credit, penalty for bad escalation)
|
| 148 |
+
```
|
| 149 |
+
|
| 150 |
+
---
|
| 151 |
+
|
| 152 |
+
## Observation Space
|
| 153 |
+
|
| 154 |
+
What the agent sees at each step:
|
| 155 |
+
|
| 156 |
+
```python
|
| 157 |
+
Observation(
|
| 158 |
+
ticket_id="T001",
|
| 159 |
+
customer_query="I was charged twice and need a refund.",
|
| 160 |
+
history=["Agent: We are looking into it.", "Customer: Still waiting!"],
|
| 161 |
+
status="pending" # open | pending | resolved
|
| 162 |
+
)
|
| 163 |
+
```
|
| 164 |
+
|
| 165 |
+
---
|
| 166 |
+
|
| 167 |
+
## Action Space
|
| 168 |
+
|
| 169 |
+
What the agent can do:
|
| 170 |
+
|
| 171 |
+
```python
|
| 172 |
+
Action(action_type="classify", category="billing") # identify the issue
|
| 173 |
+
Action(action_type="reply", content="We will help...") # respond to customer
|
| 174 |
+
Action(action_type="escalate") # pass to human agent
|
| 175 |
+
Action(action_type="close") # end the episode
|
| 176 |
+
```
|
| 177 |
+
|
| 178 |
+
Valid categories: `billing` | `technical` | `refund` | `account` | `abuse`
|
| 179 |
+
|
| 180 |
+
---
|
| 181 |
+
|
| 182 |
+
## Setup
|
| 183 |
+
|
| 184 |
+
### 1. Clone and install
|
| 185 |
+
|
| 186 |
+
```bash
|
| 187 |
+
git clone <your-repo-url>
|
| 188 |
+
cd customer-support-openenv
|
| 189 |
+
pip install -r requirements.txt
|
| 190 |
+
```
|
| 191 |
+
|
| 192 |
+
### 2. Add your API key (optional — needed for LLM baseline)
|
| 193 |
+
|
| 194 |
+
```bash
|
| 195 |
+
# .env
|
| 196 |
+
OPENAI_API_KEY=sk-...
|
| 197 |
+
```
|
| 198 |
+
|
| 199 |
+
### 3. Run the baseline
|
| 200 |
+
|
| 201 |
+
```bash
|
| 202 |
+
python baseline/run_baseline.py
|
| 203 |
+
```
|
| 204 |
+
|
| 205 |
+
No API key? It runs in **mock mode** with deterministic actions — still produces valid scores.
|
| 206 |
+
|
| 207 |
+
### 4. Start the HTTP server
|
| 208 |
+
|
| 209 |
+
```bash
|
| 210 |
+
python app.py
|
| 211 |
+
# → http://localhost:7860
|
| 212 |
+
```
|
| 213 |
+
|
| 214 |
+
### 5. Try it manually
|
| 215 |
+
|
| 216 |
+
```bash
|
| 217 |
+
# Start a hard task episode
|
| 218 |
+
curl "http://localhost:7860/reset?task_id=hard"
|
| 219 |
+
|
| 220 |
+
# Classify the ticket
|
| 221 |
+
curl -X POST http://localhost:7860/step \
|
| 222 |
+
-H "Content-Type: application/json" \
|
| 223 |
+
-d '{"action_type": "classify", "category": "billing"}'
|
| 224 |
+
|
| 225 |
+
# Reply
|
| 226 |
+
curl -X POST http://localhost:7860/step \
|
| 227 |
+
-H "Content-Type: application/json" \
|
| 228 |
+
-d '{"action_type": "reply", "content": "We are escalating your refund as priority."}'
|
| 229 |
+
|
| 230 |
+
# Escalate
|
| 231 |
+
curl -X POST http://localhost:7860/step \
|
| 232 |
+
-H "Content-Type: application/json" \
|
| 233 |
+
-d '{"action_type": "escalate"}'
|
| 234 |
+
|
| 235 |
+
# Close
|
| 236 |
+
curl -X POST http://localhost:7860/step \
|
| 237 |
+
-H "Content-Type: application/json" \
|
| 238 |
+
-d '{"action_type": "close"}'
|
| 239 |
+
```
|
| 240 |
+
|
| 241 |
+
### 6. Use directly in Python
|
| 242 |
+
|
| 243 |
+
```python
|
| 244 |
+
from env import CustomerSupportEnv, Action
|
| 245 |
+
|
| 246 |
+
env = CustomerSupportEnv()
|
| 247 |
+
obs = env.reset(task_id="hard")
|
| 248 |
+
|
| 249 |
+
print(obs.customer_query)
|
| 250 |
+
# → "I have been waiting three weeks for a refund..."
|
| 251 |
+
|
| 252 |
+
obs, reward, done, info = env.step(Action(action_type="classify", category="billing"))
|
| 253 |
+
print(reward.score, reward.feedback)
|
| 254 |
+
# → 0.3 "correct category"
|
| 255 |
+
|
| 256 |
+
obs, reward, done, info = env.step(Action(
|
| 257 |
+
action_type="reply",
|
| 258 |
+
content="We are making this a priority refund and escalating to a manager."
|
| 259 |
+
))
|
| 260 |
+
|
| 261 |
+
obs, reward, done, info = env.step(Action(action_type="escalate"))
|
| 262 |
+
obs, reward, done, info = env.step(Action(action_type="close"))
|
| 263 |
+
```
|
| 264 |
+
|
| 265 |
+
---
|
| 266 |
+
|
| 267 |
+
## Docker
|
| 268 |
+
|
| 269 |
+
```bash
|
| 270 |
+
docker build -t openenv .
|
| 271 |
+
docker run -p 7860:7860 -e OPENAI_API_KEY=sk-... openenv
|
| 272 |
+
```
|
| 273 |
+
|
| 274 |
+
---
|
| 275 |
+
|
| 276 |
+
## Deploying to Hugging Face Spaces
|
| 277 |
+
|
| 278 |
+
1. Go to [huggingface.co/spaces](https://huggingface.co/spaces)
|
| 279 |
+
2. Create a new Space → select **Docker** SDK
|
| 280 |
+
3. Add tag: `openenv`
|
| 281 |
+
4. Upload this entire repo
|
| 282 |
+
5. Add `OPENAI_API_KEY` as a Space secret
|
| 283 |
+
|
| 284 |
+
The server starts automatically and exposes all endpoints.
|
| 285 |
+
|
| 286 |
+
---
|
| 287 |
+
|
| 288 |
+
## Baseline Scores
|
| 289 |
+
|
| 290 |
+
Measured with deterministic mock actions (no API key needed):
|
| 291 |
+
|
| 292 |
+
| Task | Mock Score | LLM Score (gpt-4o-mini) |
|
| 293 |
+
|---|---|---|
|
| 294 |
+
| easy | 1.000 | ~0.900 |
|
| 295 |
+
| medium | 0.850 | ~0.750 |
|
| 296 |
+
| hard | 0.775 | ~0.650 |
|
| 297 |
+
| **Total** | **2.625 / 3.0** | **~2.300 / 3.0** |
|
| 298 |
+
|
| 299 |
+
---
|
| 300 |
+
|
| 301 |
+
## API Reference
|
| 302 |
+
|
| 303 |
+
| Method | Endpoint | Description |
|
| 304 |
+
|---|---|---|
|
| 305 |
+
| GET | `/` | HTML landing page |
|
| 306 |
+
| GET | `/reset?task_id=easy` | Start a new episode |
|
| 307 |
+
| POST | `/step` | Submit an Action |
|
| 308 |
+
| GET | `/state` | Current raw state |
|
| 309 |
+
| GET | `/tasks` | List all tasks |
|
| 310 |
+
| GET | `/health` | Health check |
|
| 311 |
+
| GET | `/docs` | Swagger UI |
|
| 312 |
+
|
| 313 |
+
---
|
| 314 |
+
|
| 315 |
+
## Team
|
| 316 |
+
|
| 317 |
+
- **Adit Sharma** — adit.2428cs1345@kiet.edu
|
| 318 |
+
- **Mansi Verma** — ogmansi897@gmail.com
|
| 319 |
+
- **Priyanshi Vishwakarma** — vishwakarmapriyanshi68@gmail.com
|
| 320 |
+
|
| 321 |
+
---
|
| 322 |
+
|
| 323 |
+
*Meta × PyTorch OpenEnv Hackathon — Round 1, April 2026*
|
customer-support-openenv/__pycache__/app.cpython-313.pyc
ADDED
|
Binary file (7.58 kB). View file
|
|
|
customer-support-openenv/app.py
ADDED
|
@@ -0,0 +1,100 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import os
|
| 2 |
+
from fastapi import FastAPI, HTTPException
|
| 3 |
+
from fastapi.responses import HTMLResponse
|
| 4 |
+
from env.environment import CustomerSupportEnv
|
| 5 |
+
from env.models import Action
|
| 6 |
+
from env.tasks import TASKS
|
| 7 |
+
|
| 8 |
+
app = FastAPI(title="Customer Support OpenEnv", version="1.0.0")
|
| 9 |
+
|
| 10 |
+
# one env per session
|
| 11 |
+
sessions = {}
|
| 12 |
+
|
| 13 |
+
def get_env(session_id="default"):
|
| 14 |
+
if session_id not in sessions:
|
| 15 |
+
sessions[session_id] = CustomerSupportEnv()
|
| 16 |
+
return sessions[session_id]
|
| 17 |
+
|
| 18 |
+
|
| 19 |
+
@app.get("/", response_class=HTMLResponse)
|
| 20 |
+
def home():
|
| 21 |
+
return """
|
| 22 |
+
<html><body style="font-family:sans-serif;background:#0f1117;color:#e0e0e0;max-width:700px;margin:50px auto;padding:0 24px">
|
| 23 |
+
<h1 style="color:#7ee787">Customer Support OpenEnv</h1>
|
| 24 |
+
<p>An OpenEnv RL environment for customer support automation.</p>
|
| 25 |
+
<h2 style="color:#58a6ff">Endpoints</h2>
|
| 26 |
+
<ul>
|
| 27 |
+
<li><a href="/docs" style="color:#58a6ff">/docs</a> — Swagger UI</li>
|
| 28 |
+
<li><code>GET /reset?task_id=easy|medium|hard</code></li>
|
| 29 |
+
<li><code>POST /step</code> — send an Action</li>
|
| 30 |
+
<li><code>GET /state</code></li>
|
| 31 |
+
<li><a href="/tasks" style="color:#58a6ff">GET /tasks</a></li>
|
| 32 |
+
</ul>
|
| 33 |
+
</body></html>
|
| 34 |
+
"""
|
| 35 |
+
|
| 36 |
+
|
| 37 |
+
@app.get("/health")
|
| 38 |
+
def health():
|
| 39 |
+
return {"status": "ok"}
|
| 40 |
+
|
| 41 |
+
|
| 42 |
+
@app.get("/reset")
|
| 43 |
+
def reset(task_id: str = None, session_id: str = "default"):
|
| 44 |
+
env = get_env(session_id)
|
| 45 |
+
try:
|
| 46 |
+
obs = env.reset(task_id=task_id)
|
| 47 |
+
except ValueError as e:
|
| 48 |
+
raise HTTPException(400, str(e))
|
| 49 |
+
return {
|
| 50 |
+
"observation": obs.model_dump(),
|
| 51 |
+
"task": {
|
| 52 |
+
"id": env.current_task["id"],
|
| 53 |
+
"description": env.current_task["description"],
|
| 54 |
+
"max_steps": env.current_task["max_steps"],
|
| 55 |
+
},
|
| 56 |
+
}
|
| 57 |
+
|
| 58 |
+
|
| 59 |
+
@app.post("/step")
|
| 60 |
+
def step(action: Action, session_id: str = "default"):
|
| 61 |
+
env = get_env(session_id)
|
| 62 |
+
if not env.current_task:
|
| 63 |
+
raise HTTPException(400, "Call /reset first.")
|
| 64 |
+
try:
|
| 65 |
+
obs, reward, done, info = env.step(action)
|
| 66 |
+
except RuntimeError as e:
|
| 67 |
+
raise HTTPException(400, str(e))
|
| 68 |
+
return {
|
| 69 |
+
"observation": obs.model_dump(),
|
| 70 |
+
"reward": reward.model_dump(),
|
| 71 |
+
"done": done,
|
| 72 |
+
"info": info,
|
| 73 |
+
}
|
| 74 |
+
|
| 75 |
+
|
| 76 |
+
@app.get("/state")
|
| 77 |
+
def state(session_id: str = "default"):
|
| 78 |
+
env = get_env(session_id)
|
| 79 |
+
if not env.current_task:
|
| 80 |
+
raise HTTPException(400, "Call /reset first.")
|
| 81 |
+
return env.state()
|
| 82 |
+
|
| 83 |
+
|
| 84 |
+
@app.get("/tasks")
|
| 85 |
+
def list_tasks():
|
| 86 |
+
return [
|
| 87 |
+
{
|
| 88 |
+
"id": t["id"],
|
| 89 |
+
"description": t["description"],
|
| 90 |
+
"max_steps": t["max_steps"],
|
| 91 |
+
"requires_escalation": t["expected"]["requires_escalation"],
|
| 92 |
+
}
|
| 93 |
+
for t in TASKS.values()
|
| 94 |
+
]
|
| 95 |
+
|
| 96 |
+
|
| 97 |
+
if __name__ == "__main__":
|
| 98 |
+
import uvicorn
|
| 99 |
+
port = int(os.getenv("PORT", 7860))
|
| 100 |
+
uvicorn.run("app:app", host="0.0.0.0", port=port)
|
customer-support-openenv/baseline/__pycache__/run_baseline.cpython-313.pyc
ADDED
|
Binary file (7.59 kB). View file
|
|
|
customer-support-openenv/baseline/run_baseline.py
ADDED
|
@@ -0,0 +1,138 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import sys
|
| 2 |
+
import os
|
| 3 |
+
import json
|
| 4 |
+
import io
|
| 5 |
+
|
| 6 |
+
sys.stdout = io.TextIOWrapper(sys.stdout.buffer, encoding="utf-8", errors="replace")
|
| 7 |
+
sys.path.insert(0, os.path.join(os.path.dirname(__file__), ".."))
|
| 8 |
+
|
| 9 |
+
from dotenv import load_dotenv
|
| 10 |
+
load_dotenv(os.path.join(os.path.dirname(__file__), "..", ".env"))
|
| 11 |
+
|
| 12 |
+
from env.environment import CustomerSupportEnv
|
| 13 |
+
from env.models import Action
|
| 14 |
+
from env.grader import grade_task
|
| 15 |
+
|
| 16 |
+
SYSTEM_PROMPT = """You are an AI customer support agent inside an RL environment.
|
| 17 |
+
Read the ticket and respond with a JSON object ONLY. Pick one action:
|
| 18 |
+
|
| 19 |
+
{"action_type": "classify", "category": "<billing|technical|refund|account|abuse>"}
|
| 20 |
+
{"action_type": "reply", "content": "<your reply>"}
|
| 21 |
+
{"action_type": "escalate"}
|
| 22 |
+
{"action_type": "close"}
|
| 23 |
+
|
| 24 |
+
Strategy: classify first, reply next, escalate only if severe (legal threats / long-unresolved issues), then close."""
|
| 25 |
+
|
| 26 |
+
|
| 27 |
+
def obs_to_text(obs):
|
| 28 |
+
lines = [f"Ticket: {obs.ticket_id}", f"Status: {obs.status}", f"Query: {obs.customer_query}"]
|
| 29 |
+
if obs.history:
|
| 30 |
+
lines.append("History:")
|
| 31 |
+
for msg in obs.history:
|
| 32 |
+
lines.append(f" {msg}")
|
| 33 |
+
return "\n".join(lines)
|
| 34 |
+
|
| 35 |
+
|
| 36 |
+
def call_llm(client, obs, messages):
|
| 37 |
+
messages.append({"role": "user", "content": obs_to_text(obs)})
|
| 38 |
+
try:
|
| 39 |
+
resp = client.chat.completions.create(
|
| 40 |
+
model="gpt-4o-mini",
|
| 41 |
+
messages=messages,
|
| 42 |
+
temperature=0.0,
|
| 43 |
+
response_format={"type": "json_object"},
|
| 44 |
+
)
|
| 45 |
+
raw = resp.choices[0].message.content
|
| 46 |
+
messages.append({"role": "assistant", "content": raw})
|
| 47 |
+
return Action(**json.loads(raw))
|
| 48 |
+
except Exception as e:
|
| 49 |
+
print(f" LLM error: {e}")
|
| 50 |
+
return Action(action_type="close")
|
| 51 |
+
|
| 52 |
+
|
| 53 |
+
def run_llm(client, task_id):
|
| 54 |
+
env = CustomerSupportEnv()
|
| 55 |
+
obs = env.reset(task_id=task_id)
|
| 56 |
+
task = env.current_task
|
| 57 |
+
messages = [{"role": "system", "content": SYSTEM_PROMPT}]
|
| 58 |
+
taken = []
|
| 59 |
+
|
| 60 |
+
print(f"\n{'='*55}")
|
| 61 |
+
print(f" Task: {task_id.upper()} | {task['description'][:50]}")
|
| 62 |
+
print(f"{'='*55}")
|
| 63 |
+
|
| 64 |
+
for i in range(task["max_steps"]):
|
| 65 |
+
action = call_llm(client, obs, messages)
|
| 66 |
+
obs, reward, done, info = env.step(action)
|
| 67 |
+
taken.append(action)
|
| 68 |
+
cat = f"cat={action.category}" if action.category else ""
|
| 69 |
+
print(f" step {i+1}: {action.action_type:<10} {cat:<16} reward={reward.score:.3f}")
|
| 70 |
+
if done:
|
| 71 |
+
break
|
| 72 |
+
|
| 73 |
+
score = grade_task(task, taken)
|
| 74 |
+
print(f" grader score: {score:.3f}")
|
| 75 |
+
return score
|
| 76 |
+
|
| 77 |
+
|
| 78 |
+
def run_mock(task_id):
|
| 79 |
+
env = CustomerSupportEnv()
|
| 80 |
+
env.reset(task_id=task_id)
|
| 81 |
+
task = env.current_task
|
| 82 |
+
ex = task["expected"]
|
| 83 |
+
kw = ex["keywords"][0]
|
| 84 |
+
|
| 85 |
+
actions = [
|
| 86 |
+
Action(action_type="classify", category=ex["category"]),
|
| 87 |
+
Action(action_type="reply", content=f"We understand your {ex['category']} issue. We will {kw} your request right away. Please reinstall if needed. Sorry for the inconvenience."),
|
| 88 |
+
]
|
| 89 |
+
if ex["requires_escalation"]:
|
| 90 |
+
actions.append(Action(action_type="escalate"))
|
| 91 |
+
actions.append(Action(action_type="close"))
|
| 92 |
+
|
| 93 |
+
taken = []
|
| 94 |
+
print(f"\n{'='*55}")
|
| 95 |
+
print(f" Task: {task_id.upper()} | {task['description'][:50]}")
|
| 96 |
+
print(f"{'='*55}")
|
| 97 |
+
|
| 98 |
+
for action in actions:
|
| 99 |
+
obs, reward, done, info = env.step(action)
|
| 100 |
+
taken.append(action)
|
| 101 |
+
cat = f"cat={action.category}" if action.category else ""
|
| 102 |
+
print(f" step {info['step']}: {action.action_type:<10} {cat:<16} reward={reward.score:.3f}")
|
| 103 |
+
if done:
|
| 104 |
+
break
|
| 105 |
+
|
| 106 |
+
score = grade_task(task, taken)
|
| 107 |
+
print(f" grader score: {score:.3f}")
|
| 108 |
+
return score
|
| 109 |
+
|
| 110 |
+
|
| 111 |
+
def main():
|
| 112 |
+
api_key = os.getenv("OPENAI_API_KEY", "")
|
| 113 |
+
use_llm = bool(api_key)
|
| 114 |
+
|
| 115 |
+
print("\n[*] Customer Support OpenEnv - Baseline")
|
| 116 |
+
print(f" mode: {'LLM (gpt-4o-mini)' if use_llm else 'Mock (no API key)'}")
|
| 117 |
+
|
| 118 |
+
client = None
|
| 119 |
+
if use_llm:
|
| 120 |
+
from openai import OpenAI
|
| 121 |
+
client = OpenAI(api_key=api_key)
|
| 122 |
+
|
| 123 |
+
results = {}
|
| 124 |
+
for tid in ["easy", "medium", "hard"]:
|
| 125 |
+
results[tid] = run_llm(client, tid) if use_llm else run_mock(tid)
|
| 126 |
+
|
| 127 |
+
print(f"\n{'='*55}")
|
| 128 |
+
print(" RESULTS")
|
| 129 |
+
print(f"{'='*55}")
|
| 130 |
+
for tid, score in results.items():
|
| 131 |
+
bar = "#" * round(score * 25)
|
| 132 |
+
print(f" {tid:<10} {score:.3f} {bar}")
|
| 133 |
+
print(f" {'total':<10} {sum(results.values()):.3f} / 3.000")
|
| 134 |
+
print(f"{'='*55}\n")
|
| 135 |
+
|
| 136 |
+
|
| 137 |
+
if __name__ == "__main__":
|
| 138 |
+
main()
|
customer-support-openenv/data/tickets.json
ADDED
|
@@ -0,0 +1,86 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
[
|
| 2 |
+
{
|
| 3 |
+
"id": "t001",
|
| 4 |
+
"query": "I was charged twice for my order and need the duplicate payment removed.",
|
| 5 |
+
"category": "billing",
|
| 6 |
+
"requires_escalation": false,
|
| 7 |
+
"difficulty": "easy"
|
| 8 |
+
},
|
| 9 |
+
{
|
| 10 |
+
"id": "t002",
|
| 11 |
+
"query": "The app keeps crashing on my iPhone every time I open it.",
|
| 12 |
+
"category": "technical",
|
| 13 |
+
"requires_escalation": false,
|
| 14 |
+
"difficulty": "easy"
|
| 15 |
+
},
|
| 16 |
+
{
|
| 17 |
+
"id": "t003",
|
| 18 |
+
"query": "I want to request a refund for my subscription — I cancelled it last month.",
|
| 19 |
+
"category": "refund",
|
| 20 |
+
"requires_escalation": false,
|
| 21 |
+
"difficulty": "easy"
|
| 22 |
+
},
|
| 23 |
+
{
|
| 24 |
+
"id": "t004",
|
| 25 |
+
"query": "I cannot log in to my account. My password reset email never arrived.",
|
| 26 |
+
"category": "account",
|
| 27 |
+
"requires_escalation": false,
|
| 28 |
+
"difficulty": "easy"
|
| 29 |
+
},
|
| 30 |
+
{
|
| 31 |
+
"id": "t005",
|
| 32 |
+
"query": "Another user is harassing me repeatedly inside the platform. Please act.",
|
| 33 |
+
"category": "abuse",
|
| 34 |
+
"requires_escalation": true,
|
| 35 |
+
"difficulty": "medium"
|
| 36 |
+
},
|
| 37 |
+
{
|
| 38 |
+
"id": "t006",
|
| 39 |
+
"query": "My payment failed three times but I can see pending charges on my bank statement.",
|
| 40 |
+
"category": "billing",
|
| 41 |
+
"requires_escalation": false,
|
| 42 |
+
"difficulty": "medium"
|
| 43 |
+
},
|
| 44 |
+
{
|
| 45 |
+
"id": "t007",
|
| 46 |
+
"query": "Data I uploaded last week has disappeared from my account without any explanation.",
|
| 47 |
+
"category": "technical",
|
| 48 |
+
"requires_escalation": true,
|
| 49 |
+
"difficulty": "medium"
|
| 50 |
+
},
|
| 51 |
+
{
|
| 52 |
+
"id": "t008",
|
| 53 |
+
"query": "I was promised a full refund 10 days ago but nothing has arrived. I need this resolved NOW.",
|
| 54 |
+
"category": "refund",
|
| 55 |
+
"requires_escalation": true,
|
| 56 |
+
"difficulty": "medium"
|
| 57 |
+
},
|
| 58 |
+
{
|
| 59 |
+
"id": "t009",
|
| 60 |
+
"query": "Someone logged into my account from another country. I did not authorise this.",
|
| 61 |
+
"category": "account",
|
| 62 |
+
"requires_escalation": true,
|
| 63 |
+
"difficulty": "hard"
|
| 64 |
+
},
|
| 65 |
+
{
|
| 66 |
+
"id": "t010",
|
| 67 |
+
"query": "I have been waiting three weeks for a refund your team promised. I am considering legal action.",
|
| 68 |
+
"category": "billing",
|
| 69 |
+
"requires_escalation": true,
|
| 70 |
+
"difficulty": "hard"
|
| 71 |
+
},
|
| 72 |
+
{
|
| 73 |
+
"id": "t011",
|
| 74 |
+
"query": "Your API has been returning 500 errors for 6 hours and it is costing my business thousands of dollars.",
|
| 75 |
+
"category": "technical",
|
| 76 |
+
"requires_escalation": true,
|
| 77 |
+
"difficulty": "hard"
|
| 78 |
+
},
|
| 79 |
+
{
|
| 80 |
+
"id": "t012",
|
| 81 |
+
"query": "I upgraded my plan but was never given access to the premium features I paid for.",
|
| 82 |
+
"category": "billing",
|
| 83 |
+
"requires_escalation": false,
|
| 84 |
+
"difficulty": "medium"
|
| 85 |
+
}
|
| 86 |
+
]
|
customer-support-openenv/env/__init__.py
ADDED
|
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from .environment import CustomerSupportEnv
|
| 2 |
+
from .models import Observation, Action, Reward
|
| 3 |
+
from .grader import grade_task
|
| 4 |
+
|
| 5 |
+
__all__ = ["CustomerSupportEnv", "Observation", "Action", "Reward", "grade_task"]
|
customer-support-openenv/env/__pycache__/__init__.cpython-313.pyc
ADDED
|
Binary file (419 Bytes). View file
|
|
|
customer-support-openenv/env/__pycache__/environment.cpython-313.pyc
ADDED
|
Binary file (7.02 kB). View file
|
|
|
customer-support-openenv/env/__pycache__/grader.cpython-313.pyc
ADDED
|
Binary file (4.41 kB). View file
|
|
|
customer-support-openenv/env/__pycache__/models.cpython-313.pyc
ADDED
|
Binary file (1.34 kB). View file
|
|
|
customer-support-openenv/env/__pycache__/tasks.cpython-313.pyc
ADDED
|
Binary file (1.6 kB). View file
|
|
|
customer-support-openenv/env/environment.py
ADDED
|
@@ -0,0 +1,147 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import random
|
| 2 |
+
from copy import deepcopy
|
| 3 |
+
from .models import Observation, Action, Reward
|
| 4 |
+
from .tasks import TASKS, TASK_LIST
|
| 5 |
+
|
| 6 |
+
|
| 7 |
+
class CustomerSupportEnv:
|
| 8 |
+
|
| 9 |
+
def __init__(self):
|
| 10 |
+
self.current_task = None
|
| 11 |
+
self.state_data = None
|
| 12 |
+
self.done = False
|
| 13 |
+
self.step_count = 0
|
| 14 |
+
self._classified = False
|
| 15 |
+
self._replied = False
|
| 16 |
+
self._escalated = False
|
| 17 |
+
self._closed = False
|
| 18 |
+
|
| 19 |
+
def reset(self, task_id=None):
|
| 20 |
+
if task_id:
|
| 21 |
+
if task_id not in TASKS:
|
| 22 |
+
raise ValueError(f"Unknown task '{task_id}'. Pick from: {list(TASKS.keys())}")
|
| 23 |
+
self.current_task = TASKS[task_id]
|
| 24 |
+
else:
|
| 25 |
+
self.current_task = random.choice(TASK_LIST)
|
| 26 |
+
|
| 27 |
+
self.state_data = deepcopy(self.current_task["input"])
|
| 28 |
+
self.done = False
|
| 29 |
+
self.step_count = 0
|
| 30 |
+
self._classified = False
|
| 31 |
+
self._replied = False
|
| 32 |
+
self._escalated = False
|
| 33 |
+
self._closed = False
|
| 34 |
+
|
| 35 |
+
return Observation(**self.state_data)
|
| 36 |
+
|
| 37 |
+
def step(self, action: Action):
|
| 38 |
+
if self.done:
|
| 39 |
+
raise RuntimeError("Episode done. Call reset() first.")
|
| 40 |
+
|
| 41 |
+
self.step_count += 1
|
| 42 |
+
reward = self._compute_reward(action)
|
| 43 |
+
|
| 44 |
+
if action.action_type == "close":
|
| 45 |
+
self.done = True
|
| 46 |
+
self._closed = True
|
| 47 |
+
|
| 48 |
+
# hit max steps → small penalty
|
| 49 |
+
max_steps = self.current_task.get("max_steps", 10)
|
| 50 |
+
if self.step_count >= max_steps and not self.done:
|
| 51 |
+
self.done = True
|
| 52 |
+
new_score = max(0.0, reward.score - 0.05)
|
| 53 |
+
reward = Reward(
|
| 54 |
+
score=new_score,
|
| 55 |
+
feedback=reward.feedback + " | time limit hit, -0.05",
|
| 56 |
+
breakdown={**reward.breakdown, "time_penalty": -0.05},
|
| 57 |
+
)
|
| 58 |
+
|
| 59 |
+
if action.content:
|
| 60 |
+
self.state_data["history"].append(f"Agent: {action.content}")
|
| 61 |
+
|
| 62 |
+
info = {
|
| 63 |
+
"step": self.step_count,
|
| 64 |
+
"task_id": self.current_task["id"],
|
| 65 |
+
"classified": self._classified,
|
| 66 |
+
"replied": self._replied,
|
| 67 |
+
"escalated": self._escalated,
|
| 68 |
+
"closed": self._closed,
|
| 69 |
+
}
|
| 70 |
+
|
| 71 |
+
return Observation(**self.state_data), reward, self.done, info
|
| 72 |
+
|
| 73 |
+
def state(self):
|
| 74 |
+
return self.state_data
|
| 75 |
+
|
| 76 |
+
def _compute_reward(self, action: Action) -> Reward:
|
| 77 |
+
correct = self.current_task["expected"]
|
| 78 |
+
score = 0.0
|
| 79 |
+
breakdown = {}
|
| 80 |
+
|
| 81 |
+
if action.action_type == "classify":
|
| 82 |
+
if action.category and action.category.lower() == correct["category"].lower():
|
| 83 |
+
score += 0.3
|
| 84 |
+
breakdown["classify"] = 0.3
|
| 85 |
+
else:
|
| 86 |
+
breakdown["classify"] = 0.0
|
| 87 |
+
self._classified = True
|
| 88 |
+
|
| 89 |
+
elif action.action_type == "reply":
|
| 90 |
+
if not self._classified:
|
| 91 |
+
score -= 0.05
|
| 92 |
+
breakdown["early_reply_penalty"] = -0.05
|
| 93 |
+
|
| 94 |
+
hits = sum(1 for kw in correct["keywords"] if kw in (action.content or "").lower())
|
| 95 |
+
reply_score = min(0.4, hits * 0.1)
|
| 96 |
+
score += reply_score
|
| 97 |
+
breakdown["reply"] = reply_score
|
| 98 |
+
self._replied = True
|
| 99 |
+
|
| 100 |
+
elif action.action_type == "escalate":
|
| 101 |
+
if correct["requires_escalation"]:
|
| 102 |
+
score += 0.2
|
| 103 |
+
breakdown["escalate"] = 0.2
|
| 104 |
+
else:
|
| 105 |
+
score -= 0.1
|
| 106 |
+
breakdown["escalate"] = -0.1
|
| 107 |
+
self._escalated = True
|
| 108 |
+
|
| 109 |
+
elif action.action_type == "close":
|
| 110 |
+
bonus = 0.0
|
| 111 |
+
if self._classified:
|
| 112 |
+
bonus += 0.1
|
| 113 |
+
if self._replied:
|
| 114 |
+
bonus += 0.1
|
| 115 |
+
if correct["requires_escalation"] and self._escalated:
|
| 116 |
+
bonus += 0.1
|
| 117 |
+
score += bonus
|
| 118 |
+
breakdown["close_bonus"] = bonus
|
| 119 |
+
|
| 120 |
+
score = round(max(0.0, min(1.0, score)), 4)
|
| 121 |
+
feedback = self._make_feedback(action, breakdown, correct)
|
| 122 |
+
|
| 123 |
+
return Reward(score=score, feedback=feedback, breakdown=breakdown)
|
| 124 |
+
|
| 125 |
+
def _make_feedback(self, action, breakdown, correct):
|
| 126 |
+
parts = []
|
| 127 |
+
|
| 128 |
+
if breakdown.get("classify") == 0.3:
|
| 129 |
+
parts.append("correct category")
|
| 130 |
+
elif "classify" in breakdown:
|
| 131 |
+
parts.append(f"wrong category (expected {correct['category']})")
|
| 132 |
+
|
| 133 |
+
if "early_reply_penalty" in breakdown:
|
| 134 |
+
parts.append("replied before classifying")
|
| 135 |
+
|
| 136 |
+
if "reply" in breakdown:
|
| 137 |
+
parts.append(f"reply score {breakdown['reply']:.2f}")
|
| 138 |
+
|
| 139 |
+
if breakdown.get("escalate") == 0.2:
|
| 140 |
+
parts.append("escalated correctly")
|
| 141 |
+
elif breakdown.get("escalate") == -0.1:
|
| 142 |
+
parts.append("unnecessary escalation")
|
| 143 |
+
|
| 144 |
+
if "close_bonus" in breakdown:
|
| 145 |
+
parts.append(f"close bonus {breakdown['close_bonus']:.2f}")
|
| 146 |
+
|
| 147 |
+
return ", ".join(parts) if parts else "ok"
|
customer-support-openenv/env/grader.py
ADDED
|
@@ -0,0 +1,72 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from typing import List
|
| 2 |
+
from .models import Action
|
| 3 |
+
|
| 4 |
+
|
| 5 |
+
def grade_easy(task, actions: List[Action]) -> float:
|
| 6 |
+
expected = task["expected"]["category"].lower()
|
| 7 |
+
for a in actions:
|
| 8 |
+
if a.action_type == "classify":
|
| 9 |
+
return 1.0 if (a.category or "").lower() == expected else 0.0
|
| 10 |
+
return 0.0
|
| 11 |
+
|
| 12 |
+
|
| 13 |
+
def grade_medium(task, actions: List[Action]) -> float:
|
| 14 |
+
score = 0.0
|
| 15 |
+
expected_cat = task["expected"]["category"].lower()
|
| 16 |
+
keywords = [k.lower() for k in task["expected"]["keywords"]]
|
| 17 |
+
|
| 18 |
+
for a in actions:
|
| 19 |
+
if a.action_type == "classify":
|
| 20 |
+
if (a.category or "").lower() == expected_cat:
|
| 21 |
+
score += 0.4
|
| 22 |
+
break
|
| 23 |
+
|
| 24 |
+
for a in actions:
|
| 25 |
+
if a.action_type == "reply" and a.content:
|
| 26 |
+
hits = sum(1 for k in keywords if k in a.content.lower())
|
| 27 |
+
score += min(0.6, hits * 0.15)
|
| 28 |
+
break
|
| 29 |
+
|
| 30 |
+
return round(min(1.0, score), 4)
|
| 31 |
+
|
| 32 |
+
|
| 33 |
+
def grade_hard(task, actions: List[Action]) -> float:
|
| 34 |
+
score = 0.0
|
| 35 |
+
expected_cat = task["expected"]["category"].lower()
|
| 36 |
+
keywords = [k.lower() for k in task["expected"]["keywords"]]
|
| 37 |
+
needs_escalation = task["expected"]["requires_escalation"]
|
| 38 |
+
|
| 39 |
+
for a in actions:
|
| 40 |
+
if a.action_type == "classify":
|
| 41 |
+
if (a.category or "").lower() == expected_cat:
|
| 42 |
+
score += 0.2
|
| 43 |
+
break
|
| 44 |
+
|
| 45 |
+
for a in actions:
|
| 46 |
+
if a.action_type == "reply" and a.content:
|
| 47 |
+
hits = sum(1 for k in keywords if k in a.content.lower())
|
| 48 |
+
score += min(0.3, hits * 0.075)
|
| 49 |
+
break
|
| 50 |
+
|
| 51 |
+
escalated = any(a.action_type == "escalate" for a in actions)
|
| 52 |
+
if needs_escalation and escalated:
|
| 53 |
+
score += 0.2
|
| 54 |
+
elif not needs_escalation and escalated:
|
| 55 |
+
score -= 0.1
|
| 56 |
+
|
| 57 |
+
if any(a.action_type == "close" for a in actions):
|
| 58 |
+
score += 0.3
|
| 59 |
+
|
| 60 |
+
return round(max(0.0, min(1.0, score)), 4)
|
| 61 |
+
|
| 62 |
+
|
| 63 |
+
GRADERS = {
|
| 64 |
+
"easy": grade_easy,
|
| 65 |
+
"medium": grade_medium,
|
| 66 |
+
"hard": grade_hard,
|
| 67 |
+
}
|
| 68 |
+
|
| 69 |
+
|
| 70 |
+
def grade_task(task, actions: List[Action]) -> float:
|
| 71 |
+
grader = GRADERS.get(task.get("id", "easy"), grade_easy)
|
| 72 |
+
return grader(task, actions)
|
customer-support-openenv/env/models.py
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from pydantic import BaseModel
|
| 2 |
+
from typing import List, Optional, Dict, Any
|
| 3 |
+
|
| 4 |
+
|
| 5 |
+
class Observation(BaseModel):
|
| 6 |
+
ticket_id: str
|
| 7 |
+
customer_query: str
|
| 8 |
+
history: List[str]
|
| 9 |
+
status: str
|
| 10 |
+
|
| 11 |
+
|
| 12 |
+
class Action(BaseModel):
|
| 13 |
+
action_type: str # classify | reply | escalate | close
|
| 14 |
+
content: Optional[str] = None
|
| 15 |
+
category: Optional[str] = None
|
| 16 |
+
|
| 17 |
+
|
| 18 |
+
class Reward(BaseModel):
|
| 19 |
+
score: float
|
| 20 |
+
feedback: str
|
| 21 |
+
breakdown: Dict[str, Any] = {}
|
customer-support-openenv/env/tasks.py
ADDED
|
@@ -0,0 +1,59 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
TASKS = {
|
| 2 |
+
"easy": {
|
| 3 |
+
"id": "easy",
|
| 4 |
+
"description": "Classify a customer ticket into the right category.",
|
| 5 |
+
"input": {
|
| 6 |
+
"ticket_id": "T001",
|
| 7 |
+
"customer_query": "I was charged twice for my order #ORD-8821 and need the duplicate payment removed.",
|
| 8 |
+
"history": [],
|
| 9 |
+
"status": "open",
|
| 10 |
+
},
|
| 11 |
+
"expected": {
|
| 12 |
+
"category": "billing",
|
| 13 |
+
"keywords": ["refund", "charge", "payment", "duplicate", "billing"],
|
| 14 |
+
"requires_escalation": False,
|
| 15 |
+
},
|
| 16 |
+
"max_steps": 5,
|
| 17 |
+
},
|
| 18 |
+
|
| 19 |
+
"medium": {
|
| 20 |
+
"id": "medium",
|
| 21 |
+
"description": "Classify the ticket and give a helpful reply.",
|
| 22 |
+
"input": {
|
| 23 |
+
"ticket_id": "T002",
|
| 24 |
+
"customer_query": "The app keeps crashing on my iPhone 15. I already restarted my phone twice.",
|
| 25 |
+
"history": [],
|
| 26 |
+
"status": "open",
|
| 27 |
+
},
|
| 28 |
+
"expected": {
|
| 29 |
+
"category": "technical",
|
| 30 |
+
"keywords": ["reinstall", "update", "cache", "support", "technical", "version"],
|
| 31 |
+
"requires_escalation": False,
|
| 32 |
+
},
|
| 33 |
+
"max_steps": 8,
|
| 34 |
+
},
|
| 35 |
+
|
| 36 |
+
"hard": {
|
| 37 |
+
"id": "hard",
|
| 38 |
+
"description": "Full pipeline — classify, reply, escalate if needed, then close.",
|
| 39 |
+
"input": {
|
| 40 |
+
"ticket_id": "T003",
|
| 41 |
+
"customer_query": "I have been waiting three weeks for a refund your team promised. I am considering legal action.",
|
| 42 |
+
"history": [
|
| 43 |
+
"Agent: We apologise. Your refund is being processed.",
|
| 44 |
+
"Customer: Two weeks and still nothing!",
|
| 45 |
+
"Agent: We escalated this to our billing team.",
|
| 46 |
+
"Customer: Another week gone. I want to speak to a manager!",
|
| 47 |
+
],
|
| 48 |
+
"status": "pending",
|
| 49 |
+
},
|
| 50 |
+
"expected": {
|
| 51 |
+
"category": "billing",
|
| 52 |
+
"keywords": ["escalat", "manager", "priority", "urgent", "legal", "refund", "apologize", "sorry"],
|
| 53 |
+
"requires_escalation": True,
|
| 54 |
+
},
|
| 55 |
+
"max_steps": 10,
|
| 56 |
+
},
|
| 57 |
+
}
|
| 58 |
+
|
| 59 |
+
TASK_LIST = list(TASKS.values())
|
customer-support-openenv/env/utils.py
ADDED
|
@@ -0,0 +1,26 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import json
|
| 2 |
+
import os
|
| 3 |
+
from typing import List, Dict, Any
|
| 4 |
+
|
| 5 |
+
|
| 6 |
+
def load_tickets(path=None) -> List[Dict[str, Any]]:
|
| 7 |
+
if path is None:
|
| 8 |
+
path = os.path.join(os.path.dirname(__file__), "..", "data", "tickets.json")
|
| 9 |
+
with open(path) as f:
|
| 10 |
+
return json.load(f)
|
| 11 |
+
|
| 12 |
+
|
| 13 |
+
def format_observation(obs) -> str:
|
| 14 |
+
lines = [
|
| 15 |
+
f"Ticket : {obs.ticket_id}",
|
| 16 |
+
f"Status : {obs.status}",
|
| 17 |
+
f"Query : {obs.customer_query}",
|
| 18 |
+
]
|
| 19 |
+
for i, msg in enumerate(obs.history, 1):
|
| 20 |
+
lines.append(f" [{i}] {msg}")
|
| 21 |
+
return "\n".join(lines)
|
| 22 |
+
|
| 23 |
+
|
| 24 |
+
def log_step(step, action, reward):
|
| 25 |
+
cat = action.category or "-"
|
| 26 |
+
print(f"step {step:>2} | {action.action_type:<10} cat={cat:<12} score={reward.score:.2f} | {reward.feedback}")
|
customer-support-openenv/openenv.yaml
ADDED
|
@@ -0,0 +1,107 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
name: customer-support-env
|
| 2 |
+
version: "1.0"
|
| 3 |
+
|
| 4 |
+
description: >
|
| 5 |
+
An OpenEnv-compliant environment that simulates real-world customer support
|
| 6 |
+
ticket workflows. An AI agent must classify incoming tickets, craft appropriate
|
| 7 |
+
replies, decide when to escalate to a human agent, and close resolved tickets.
|
| 8 |
+
The environment provides dense, shaped rewards at every step to enable
|
| 9 |
+
efficient RL training — not just a sparse end-of-episode signal.
|
| 10 |
+
|
| 11 |
+
entry_point: env.environment:CustomerSupportEnv
|
| 12 |
+
|
| 13 |
+
author: "Adit Sharma, Mansi Verma, Priyanshi Vishwakarma"
|
| 14 |
+
tags:
|
| 15 |
+
- openenv
|
| 16 |
+
- customer-support
|
| 17 |
+
- nlp
|
| 18 |
+
- real-world
|
| 19 |
+
- multi-step
|
| 20 |
+
|
| 21 |
+
# ---------------------------------------------------------
|
| 22 |
+
# Tasks
|
| 23 |
+
# ---------------------------------------------------------
|
| 24 |
+
tasks:
|
| 25 |
+
- id: easy
|
| 26 |
+
difficulty: easy
|
| 27 |
+
description: >
|
| 28 |
+
Classify a single customer ticket into the correct category
|
| 29 |
+
(billing / technical / refund / account / abuse).
|
| 30 |
+
max_steps: 5
|
| 31 |
+
scoring: "1.0 for correct classification, 0.0 otherwise."
|
| 32 |
+
|
| 33 |
+
- id: medium
|
| 34 |
+
difficulty: medium
|
| 35 |
+
description: >
|
| 36 |
+
Classify the ticket correctly (worth 0.4) and then reply with a helpful,
|
| 37 |
+
keyword-rich response that addresses the root issue (up to 0.6).
|
| 38 |
+
max_steps: 8
|
| 39 |
+
scoring: "Partial credit: 0.4 classify + up to 0.6 reply quality."
|
| 40 |
+
|
| 41 |
+
- id: hard
|
| 42 |
+
difficulty: hard
|
| 43 |
+
description: >
|
| 44 |
+
Full resolution pipeline — classify (0.2), give a quality reply (0.3),
|
| 45 |
+
escalate to a human agent when required (0.2), and close the ticket (0.3).
|
| 46 |
+
Penalises unnecessary escalation (−0.1).
|
| 47 |
+
max_steps: 10
|
| 48 |
+
scoring: "Partial credit across all 4 action types; penalty for bad escalation."
|
| 49 |
+
|
| 50 |
+
# ---------------------------------------------------------
|
| 51 |
+
# Action Space
|
| 52 |
+
# ---------------------------------------------------------
|
| 53 |
+
action_space:
|
| 54 |
+
type: discrete-structured
|
| 55 |
+
actions:
|
| 56 |
+
- name: classify
|
| 57 |
+
required_fields: [category]
|
| 58 |
+
category_values: [billing, technical, refund, account, abuse]
|
| 59 |
+
description: "Classify the ticket into a support category."
|
| 60 |
+
|
| 61 |
+
- name: reply
|
| 62 |
+
required_fields: [content]
|
| 63 |
+
description: "Send a reply message to the customer."
|
| 64 |
+
|
| 65 |
+
- name: escalate
|
| 66 |
+
required_fields: []
|
| 67 |
+
description: "Escalate the ticket to a human agent."
|
| 68 |
+
|
| 69 |
+
- name: close
|
| 70 |
+
required_fields: []
|
| 71 |
+
description: "Close the ticket and end the episode (done=True)."
|
| 72 |
+
|
| 73 |
+
# ---------------------------------------------------------
|
| 74 |
+
# Observation Space
|
| 75 |
+
# ---------------------------------------------------------
|
| 76 |
+
observation_space:
|
| 77 |
+
type: structured
|
| 78 |
+
fields:
|
| 79 |
+
- name: ticket_id
|
| 80 |
+
type: string
|
| 81 |
+
description: "Unique identifier for the support ticket."
|
| 82 |
+
|
| 83 |
+
- name: customer_query
|
| 84 |
+
type: string
|
| 85 |
+
description: "The customer's message or complaint."
|
| 86 |
+
|
| 87 |
+
- name: history
|
| 88 |
+
type: list[string]
|
| 89 |
+
description: "Chronological conversation history (agent + customer turns)."
|
| 90 |
+
|
| 91 |
+
- name: status
|
| 92 |
+
type: string
|
| 93 |
+
enum: [open, pending, resolved]
|
| 94 |
+
description: "Current status of the ticket."
|
| 95 |
+
|
| 96 |
+
# ---------------------------------------------------------
|
| 97 |
+
# Reward
|
| 98 |
+
# ---------------------------------------------------------
|
| 99 |
+
reward_range: [0.0, 1.0]
|
| 100 |
+
reward_structure:
|
| 101 |
+
classify_correct: +0.3
|
| 102 |
+
reply_per_keyword_hit: +0.1 (max 0.4)
|
| 103 |
+
reply_before_classify: -0.05
|
| 104 |
+
escalate_correct: +0.2
|
| 105 |
+
escalate_unnecessary: -0.1
|
| 106 |
+
close_bonus: +0.0 to +0.3 (depends on prior progress)
|
| 107 |
+
time_penalty: -0.05 (if step_count >= max_steps)
|
customer-support-openenv/requirements.txt
ADDED
|
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
pydantic>=2.0
|
| 2 |
+
openai>=1.0
|
| 3 |
+
python-dotenv
|
| 4 |
+
fastapi
|
| 5 |
+
uvicorn
|