Spaces:

Adit1Sharma
/

customer-support-openenv

Sleeping

+# Customer Support OpenEnv
+> A real-world reinforcement learning environment where an AI agent learns to handle customer support tickets — classify issues, craft replies, escalate when needed, and resolve tickets.
+Built for the **Meta × PyTorch OpenEnv Hackathon — Round 1, April 2026.**
+---
+## What is this?
+Most RL environments are games. This one is not.
+Every company with customers has a support queue. Tickets come in — billing complaints, app crashes, refund requests, angry users threatening legal action. A human agent reads each one, figures out what's wrong, replies helpfully, escalates if it's too serious, and closes it.
+This environment teaches an AI to do exactly that. The agent receives a ticket, takes actions step by step, and gets rewarded based on how well it handles the situation. The reward signal is **dense** — the agent gets feedback at every step, not just at the end.
+---
+## Architecture
+### Overall System
+```mermaid
+graph TD
+    A[Agent] -->|Action| B[CustomerSupportEnv]
+    B -->|Observation, Reward, Done, Info| A
+    B --> C[tasks.py\nTask Definitions]
+    B --> D[models.py\nTyped Models]
+    B --> E[grader.py\nPer-Task Graders]
+    F[app.py\nFastAPI Server] --> B
+    G[baseline/run_baseline.py] --> B
+    H[openenv.yaml\nMetadata + Config] -.->|describes| B
+    I[data/tickets.json\nTicket Dataset] -.->|reference data| B
+```
+### Episode Flow
+```mermaid
+sequenceDiagram
+    participant Agent
+    participant Env as CustomerSupportEnv
+    participant Grader
+    Agent->>Env: reset(task_id="hard")
+    Env-->>Agent: Observation (ticket + history + status)
+    Agent->>Env: step(Action: classify, category="billing")
+    Env-->>Agent: Observation, Reward(+0.3), done=False
+    Agent->>Env: step(Action: reply, content="...")
+    Env-->>Agent: Observation, Reward(+0.2), done=False
+    Agent->>Env: step(Action: escalate)
+    Env-->>Agent: Observation, Reward(+0.2), done=False
+    Agent->>Env: step(Action: close)
+    Env-->>Agent: Observation, Reward(+0.3), done=True
+    Agent->>Grader: grade_task(task, actions_taken)
+    Grader-->>Agent: Final Score (0.0 - 1.0)
+```
+### Reward Breakdown
+```mermaid
+flowchart LR
+    A[Action Taken] --> B{action_type?}
+    B -->|classify| C{Category correct?}
+    C -->|yes| D[+0.30]
+    C -->|no| E[+0.00]
+    B -->|reply| F{Keyword hits?}
+    F --> G[+0.10 per hit\nmax +0.40]
+    F -->|replied before classify| H[-0.05 penalty]
+    B -->|escalate| I{Required?}
+    I -->|yes| J[+0.20]
+    I -->|no| K[-0.10 penalty]
+    B -->|close| L[+0.10 if classified\n+0.10 if replied\n+0.10 if escalated correctly]
+    B -->|any, at max_steps| M[-0.05 time penalty]
+```
+### File Structure
+```mermaid
+graph LR
+    root[customer-support-openenv]
+    root --> env[env/]
+    env --> models[models.py\nObservation Action Reward]
+    env --> environment[environment.py\nCustomerSupportEnv]
+    env --> tasks[tasks.py\nTask Definitions]
+    env --> grader[grader.py\ngrade_easy grade_medium grade_hard]
+    env --> utils[utils.py\nHelpers]
+    env --> init[__init__.py]
+    root --> baseline[baseline/]
+    baseline --> script[run_baseline.py\nLLM + Mock runner]
+    root --> data[data/]
+    data --> tickets[tickets.json\n12 real tickets]
+    root --> apppy[app.py\nFastAPI Server]
+    root --> yaml[openenv.yaml]
+    root --> docker[Dockerfile]
+    root --> readme[README.md]
+    root --> env2[.env\nAPI Keys]
+```
+---
+## Tasks
+The environment has 3 tasks of increasing difficulty. An agent must handle all three.
+| Task | Difficulty | Max Steps | What the agent must do |
+|---|---|---|---|
+| `easy` | 🟢 Easy | 5 | Just classify the ticket correctly |
+| `medium` | 🟡 Medium | 8 | Classify + give a helpful reply |
+| `hard` | 🔴 Hard | 10 | Classify → reply → escalate → close |
+### Easy — Classification Only
+```
+Customer: "I was charged twice for my order and need the duplicate removed."
+Agent must → classify as "billing"
+Score: 1.0 correct, 0.0 wrong
+```
+### Medium — Classify + Reply
+```
+Customer: "The app keeps crashing on my iPhone. I already restarted twice."
+Agent must → classify as "technical" AND reply with relevant keywords
+Score: 0.4 (classify) + up to 0.6 (reply quality)
+```
+### Hard — Full Pipeline
+```
+Customer: "Been waiting 3 weeks for my refund. Considering legal action."
+History: 4 prior messages showing escalation attempts
+Agent must → classify + reply + escalate to human + close ticket
+Score: 0.2 + 0.3 + 0.2 + 0.3 (partial credit, penalty for bad escalation)
+```
+---
+## Observation Space
+What the agent sees at each step:
+```python
+Observation(
+    ticket_id="T001",
+    customer_query="I was charged twice and need a refund.",
+    history=["Agent: We are looking into it.", "Customer: Still waiting!"],
+    status="pending"   # open | pending | resolved
+)
+```
+---
+## Action Space
+What the agent can do:
+```python
+Action(action_type="classify", category="billing")          # identify the issue
+Action(action_type="reply",    content="We will help...")   # respond to customer
+Action(action_type="escalate")                              # pass to human agent
+Action(action_type="close")                                 # end the episode
+```
+Valid categories: `billing` | `technical` | `refund` | `account` | `abuse`
+---
+## Setup
+### 1. Clone and install
+```bash
+git clone <your-repo-url>
+cd customer-support-openenv
+pip install -r requirements.txt
+```
+### 2. Add your API key (optional — needed for LLM baseline)
+```bash
+# .env
+OPENAI_API_KEY=sk-...
+```
+### 3. Run the baseline
+```bash
+python baseline/run_baseline.py
+```
+No API key? It runs in **mock mode** with deterministic actions — still produces valid scores.
+### 4. Start the HTTP server
+```bash
+python app.py
+# → http://localhost:7860
+```
+### 5. Try it manually
+```bash
+# Start a hard task episode
+curl "http://localhost:7860/reset?task_id=hard"
+# Classify the ticket
+curl -X POST http://localhost:7860/step \
+  -H "Content-Type: application/json" \
+  -d '{"action_type": "classify", "category": "billing"}'
+# Reply
+curl -X POST http://localhost:7860/step \
+  -H "Content-Type: application/json" \
+  -d '{"action_type": "reply", "content": "We are escalating your refund as priority."}'
+# Escalate
+curl -X POST http://localhost:7860/step \
+  -H "Content-Type: application/json" \
+  -d '{"action_type": "escalate"}'
+# Close
+curl -X POST http://localhost:7860/step \
+  -H "Content-Type: application/json" \
+  -d '{"action_type": "close"}'
+```
+### 6. Use directly in Python
+```python
+from env import CustomerSupportEnv, Action
+env = CustomerSupportEnv()
+obs = env.reset(task_id="hard")
+print(obs.customer_query)
+# → "I have been waiting three weeks for a refund..."
+obs, reward, done, info = env.step(Action(action_type="classify", category="billing"))
+print(reward.score, reward.feedback)
+# → 0.3  "correct category"
+obs, reward, done, info = env.step(Action(
+    action_type="reply",
+    content="We are making this a priority refund and escalating to a manager."
+))
+obs, reward, done, info = env.step(Action(action_type="escalate"))
+obs, reward, done, info = env.step(Action(action_type="close"))
+```
+---
+## Docker
+```bash
+docker build -t openenv .
+docker run -p 7860:7860 -e OPENAI_API_KEY=sk-... openenv
+```
+---
+## Deploying to Hugging Face Spaces
+1. Go to [huggingface.co/spaces](https://huggingface.co/spaces)
+2. Create a new Space → select **Docker** SDK
+3. Add tag: `openenv`
+4. Upload this entire repo
+5. Add `OPENAI_API_KEY` as a Space secret
+The server starts automatically and exposes all endpoints.
+---
+## Baseline Scores
+Measured with deterministic mock actions (no API key needed):
+| Task | Mock Score | LLM Score (gpt-4o-mini) |
+|---|---|---|
+| easy | 1.000 | ~0.900 |
+| medium | 0.850 | ~0.750 |
+| hard | 0.775 | ~0.650 |
+| **Total** | **2.625 / 3.0** | **~2.300 / 3.0** |
+---
+## API Reference
+| Method | Endpoint | Description |
+|---|---|---|
+| GET | `/` | HTML landing page |
+| GET | `/reset?task_id=easy` | Start a new episode |
+| POST | `/step` | Submit an Action |
+| GET | `/state` | Current raw state |
+| GET | `/tasks` | List all tasks |
+| GET | `/health` | Health check |
+| GET | `/docs` | Swagger UI |
+---
+## Team
+- **Adit Sharma** — adit.2428cs1345@kiet.edu
+- **Mansi Verma** — ogmansi897@gmail.com
+- **Priyanshi Vishwakarma** — vishwakarmapriyanshi68@gmail.com
+---
+*Meta × PyTorch OpenEnv Hackathon — Round 1, April 2026*

customer-support-openenv/__pycache__/app.cpython-313.pyc ADDED Viewed

Binary file (7.58 kB). View file

customer-support-openenv/app.py ADDED Viewed

	@@ -0,0 +1,100 @@

+import os
+from fastapi import FastAPI, HTTPException
+from fastapi.responses import HTMLResponse
+from env.environment import CustomerSupportEnv
+from env.models import Action
+from env.tasks import TASKS
+app = FastAPI(title="Customer Support OpenEnv", version="1.0.0")
+# one env per session
+sessions = {}
+def get_env(session_id="default"):
+    if session_id not in sessions:
+        sessions[session_id] = CustomerSupportEnv()
+    return sessions[session_id]
+@app.get("/", response_class=HTMLResponse)
+def home():
+    return """
+    <html><body style="font-family:sans-serif;background:#0f1117;color:#e0e0e0;max-width:700px;margin:50px auto;padding:0 24px">
+    <h1 style="color:#7ee787">Customer Support OpenEnv</h1>
+    <p>An OpenEnv RL environment for customer support automation.</p>
+    <h2 style="color:#58a6ff">Endpoints</h2>
+    <ul>
+      <li><a href="/docs" style="color:#58a6ff">/docs</a> &mdash; Swagger UI</li>
+      <li><code>GET /reset?task_id=easy|medium|hard</code></li>
+      <li><code>POST /step</code> &mdash; send an Action</li>
+      <li><code>GET /state</code></li>
+      <li><a href="/tasks" style="color:#58a6ff">GET /tasks</a></li>
+    </ul>
+    </body></html>
+    """
+@app.get("/health")
+def health():
+    return {"status": "ok"}
+@app.get("/reset")
+def reset(task_id: str = None, session_id: str = "default"):
+    env = get_env(session_id)
+    try:
+        obs = env.reset(task_id=task_id)
+    except ValueError as e:
+        raise HTTPException(400, str(e))
+    return {
+        "observation": obs.model_dump(),
+        "task": {
+            "id": env.current_task["id"],
+            "description": env.current_task["description"],
+            "max_steps": env.current_task["max_steps"],
+        },
+    }
+@app.post("/step")
+def step(action: Action, session_id: str = "default"):
+    env = get_env(session_id)
+    if not env.current_task:
+        raise HTTPException(400, "Call /reset first.")
+    try:
+        obs, reward, done, info = env.step(action)
+    except RuntimeError as e:
+        raise HTTPException(400, str(e))
+    return {
+        "observation": obs.model_dump(),
+        "reward": reward.model_dump(),
+        "done": done,
+        "info": info,
+    }
+@app.get("/state")
+def state(session_id: str = "default"):
+    env = get_env(session_id)
+    if not env.current_task:
+        raise HTTPException(400, "Call /reset first.")
+    return env.state()
+@app.get("/tasks")
+def list_tasks():
+    return [
+        {
+            "id": t["id"],
+            "description": t["description"],
+            "max_steps": t["max_steps"],
+            "requires_escalation": t["expected"]["requires_escalation"],
+        }
+        for t in TASKS.values()
+    ]
+if __name__ == "__main__":
+    import uvicorn
+    port = int(os.getenv("PORT", 7860))
+    uvicorn.run("app:app", host="0.0.0.0", port=port)

customer-support-openenv/baseline/__pycache__/run_baseline.cpython-313.pyc ADDED Viewed

Binary file (7.59 kB). View file

customer-support-openenv/baseline/run_baseline.py ADDED Viewed

	@@ -0,0 +1,138 @@

+import sys
+import os
+import json
+import io
+sys.stdout = io.TextIOWrapper(sys.stdout.buffer, encoding="utf-8", errors="replace")
+sys.path.insert(0, os.path.join(os.path.dirname(__file__), ".."))
+from dotenv import load_dotenv
+load_dotenv(os.path.join(os.path.dirname(__file__), "..", ".env"))
+from env.environment import CustomerSupportEnv
+from env.models import Action
+from env.grader import grade_task
+SYSTEM_PROMPT = """You are an AI customer support agent inside an RL environment.
+Read the ticket and respond with a JSON object ONLY. Pick one action:
+{"action_type": "classify", "category": "<billing|technical|refund|account|abuse>"}
+{"action_type": "reply", "content": "<your reply>"}
+{"action_type": "escalate"}
+{"action_type": "close"}
+Strategy: classify first, reply next, escalate only if severe (legal threats / long-unresolved issues), then close."""
+def obs_to_text(obs):
+    lines = [f"Ticket: {obs.ticket_id}", f"Status: {obs.status}", f"Query: {obs.customer_query}"]
+    if obs.history:
+        lines.append("History:")
+        for msg in obs.history:
+            lines.append(f"  {msg}")
+    return "\n".join(lines)
+def call_llm(client, obs, messages):
+    messages.append({"role": "user", "content": obs_to_text(obs)})
+    try:
+        resp = client.chat.completions.create(
+            model="gpt-4o-mini",
+            messages=messages,
+            temperature=0.0,
+            response_format={"type": "json_object"},
+        )
+        raw = resp.choices[0].message.content
+        messages.append({"role": "assistant", "content": raw})
+        return Action(**json.loads(raw))
+    except Exception as e:
+        print(f"  LLM error: {e}")
+        return Action(action_type="close")
+def run_llm(client, task_id):
+    env = CustomerSupportEnv()
+    obs = env.reset(task_id=task_id)
+    task = env.current_task
+    messages = [{"role": "system", "content": SYSTEM_PROMPT}]
+    taken = []
+    print(f"\n{'='*55}")
+    print(f"  Task: {task_id.upper()} | {task['description'][:50]}")
+    print(f"{'='*55}")
+    for i in range(task["max_steps"]):
+        action = call_llm(client, obs, messages)
+        obs, reward, done, info = env.step(action)
+        taken.append(action)
+        cat = f"cat={action.category}" if action.category else ""
+        print(f"  step {i+1}: {action.action_type:<10} {cat:<16} reward={reward.score:.3f}")
+        if done:
+            break
+    score = grade_task(task, taken)
+    print(f"  grader score: {score:.3f}")
+    return score
+def run_mock(task_id):
+    env = CustomerSupportEnv()
+    env.reset(task_id=task_id)
+    task = env.current_task
+    ex = task["expected"]
+    kw = ex["keywords"][0]
+    actions = [
+        Action(action_type="classify", category=ex["category"]),
+        Action(action_type="reply", content=f"We understand your {ex['category']} issue. We will {kw} your request right away. Please reinstall if needed. Sorry for the inconvenience."),
+    ]
+    if ex["requires_escalation"]:
+        actions.append(Action(action_type="escalate"))
+    actions.append(Action(action_type="close"))
+    taken = []
+    print(f"\n{'='*55}")
+    print(f"  Task: {task_id.upper()} | {task['description'][:50]}")
+    print(f"{'='*55}")
+    for action in actions:
+        obs, reward, done, info = env.step(action)
+        taken.append(action)
+        cat = f"cat={action.category}" if action.category else ""
+        print(f"  step {info['step']}: {action.action_type:<10} {cat:<16} reward={reward.score:.3f}")
+        if done:
+            break
+    score = grade_task(task, taken)
+    print(f"  grader score: {score:.3f}")
+    return score
+def main():
+    api_key = os.getenv("OPENAI_API_KEY", "")
+    use_llm = bool(api_key)
+    print("\n[*] Customer Support OpenEnv - Baseline")
+    print(f"    mode: {'LLM (gpt-4o-mini)' if use_llm else 'Mock (no API key)'}")
+    client = None
+    if use_llm:
+        from openai import OpenAI
+        client = OpenAI(api_key=api_key)
+    results = {}
+    for tid in ["easy", "medium", "hard"]:
+        results[tid] = run_llm(client, tid) if use_llm else run_mock(tid)
+    print(f"\n{'='*55}")
+    print("  RESULTS")
+    print(f"{'='*55}")
+    for tid, score in results.items():
+        bar = "#" * round(score * 25)
+        print(f"  {tid:<10} {score:.3f}  {bar}")
+    print(f"  {'total':<10} {sum(results.values()):.3f} / 3.000")
+    print(f"{'='*55}\n")
+if __name__ == "__main__":
+    main()

customer-support-openenv/data/tickets.json ADDED Viewed

	@@ -0,0 +1,86 @@

+[
+  {
+    "id": "t001",
+    "query": "I was charged twice for my order and need the duplicate payment removed.",
+    "category": "billing",
+    "requires_escalation": false,
+    "difficulty": "easy"
+  },
+  {
+    "id": "t002",
+    "query": "The app keeps crashing on my iPhone every time I open it.",
+    "category": "technical",
+    "requires_escalation": false,
+    "difficulty": "easy"
+  },
+  {
+    "id": "t003",
+    "query": "I want to request a refund for my subscription — I cancelled it last month.",
+    "category": "refund",
+    "requires_escalation": false,
+    "difficulty": "easy"
+  },
+  {
+    "id": "t004",
+    "query": "I cannot log in to my account. My password reset email never arrived.",
+    "category": "account",
+    "requires_escalation": false,
+    "difficulty": "easy"
+  },
+  {
+    "id": "t005",
+    "query": "Another user is harassing me repeatedly inside the platform. Please act.",
+    "category": "abuse",
+    "requires_escalation": true,
+    "difficulty": "medium"
+  },
+  {
+    "id": "t006",
+    "query": "My payment failed three times but I can see pending charges on my bank statement.",
+    "category": "billing",
+    "requires_escalation": false,
+    "difficulty": "medium"
+  },
+  {
+    "id": "t007",
+    "query": "Data I uploaded last week has disappeared from my account without any explanation.",
+    "category": "technical",
+    "requires_escalation": true,
+    "difficulty": "medium"
+  },
+  {
+    "id": "t008",
+    "query": "I was promised a full refund 10 days ago but nothing has arrived. I need this resolved NOW.",
+    "category": "refund",
+    "requires_escalation": true,
+    "difficulty": "medium"
+  },
+  {
+    "id": "t009",
+    "query": "Someone logged into my account from another country. I did not authorise this.",
+    "category": "account",
+    "requires_escalation": true,
+    "difficulty": "hard"
+  },
+  {
+    "id": "t010",
+    "query": "I have been waiting three weeks for a refund your team promised. I am considering legal action.",
+    "category": "billing",
+    "requires_escalation": true,
+    "difficulty": "hard"
+  },
+  {
+    "id": "t011",
+    "query": "Your API has been returning 500 errors for 6 hours and it is costing my business thousands of dollars.",
+    "category": "technical",
+    "requires_escalation": true,
+    "difficulty": "hard"
+  },
+  {
+    "id": "t012",
+    "query": "I upgraded my plan but was never given access to the premium features I paid for.",
+    "category": "billing",
+    "requires_escalation": false,
+    "difficulty": "medium"
+  }
+]

customer-support-openenv/env/__init__.py ADDED Viewed

	@@ -0,0 +1,5 @@

+from .environment import CustomerSupportEnv
+from .models import Observation, Action, Reward
+from .grader import grade_task
+__all__ = ["CustomerSupportEnv", "Observation", "Action", "Reward", "grade_task"]

customer-support-openenv/env/__pycache__/__init__.cpython-313.pyc ADDED Viewed

Binary file (419 Bytes). View file

customer-support-openenv/env/__pycache__/environment.cpython-313.pyc ADDED Viewed

Binary file (7.02 kB). View file

customer-support-openenv/env/__pycache__/grader.cpython-313.pyc ADDED Viewed

Binary file (4.41 kB). View file

customer-support-openenv/env/__pycache__/models.cpython-313.pyc ADDED Viewed

Binary file (1.34 kB). View file

customer-support-openenv/env/__pycache__/tasks.cpython-313.pyc ADDED Viewed

Binary file (1.6 kB). View file

customer-support-openenv/env/environment.py ADDED Viewed

	@@ -0,0 +1,147 @@

+import random
+from copy import deepcopy
+from .models import Observation, Action, Reward
+from .tasks import TASKS, TASK_LIST
+class CustomerSupportEnv:
+    def __init__(self):
+        self.current_task = None
+        self.state_data = None
+        self.done = False
+        self.step_count = 0
+        self._classified = False
+        self._replied = False
+        self._escalated = False
+        self._closed = False
+    def reset(self, task_id=None):
+        if task_id:
+            if task_id not in TASKS:
+                raise ValueError(f"Unknown task '{task_id}'. Pick from: {list(TASKS.keys())}")
+            self.current_task = TASKS[task_id]
+        else:
+            self.current_task = random.choice(TASK_LIST)
+        self.state_data = deepcopy(self.current_task["input"])
+        self.done = False
+        self.step_count = 0
+        self._classified = False
+        self._replied = False
+        self._escalated = False
+        self._closed = False
+        return Observation(**self.state_data)
+    def step(self, action: Action):
+        if self.done:
+            raise RuntimeError("Episode done. Call reset() first.")
+        self.step_count += 1
+        reward = self._compute_reward(action)
+        if action.action_type == "close":
+            self.done = True
+            self._closed = True
+        # hit max steps → small penalty
+        max_steps = self.current_task.get("max_steps", 10)
+        if self.step_count >= max_steps and not self.done:
+            self.done = True
+            new_score = max(0.0, reward.score - 0.05)
+            reward = Reward(
+                score=new_score,
+                feedback=reward.feedback + " | time limit hit, -0.05",
+                breakdown={**reward.breakdown, "time_penalty": -0.05},
+            )
+        if action.content:
+            self.state_data["history"].append(f"Agent: {action.content}")
+        info = {
+            "step": self.step_count,
+            "task_id": self.current_task["id"],
+            "classified": self._classified,
+            "replied": self._replied,
+            "escalated": self._escalated,
+            "closed": self._closed,
+        }
+        return Observation(**self.state_data), reward, self.done, info
+    def state(self):
+        return self.state_data
+    def _compute_reward(self, action: Action) -> Reward:
+        correct = self.current_task["expected"]
+        score = 0.0
+        breakdown = {}
+        if action.action_type == "classify":
+            if action.category and action.category.lower() == correct["category"].lower():
+                score += 0.3
+                breakdown["classify"] = 0.3
+            else:
+                breakdown["classify"] = 0.0
+            self._classified = True
+        elif action.action_type == "reply":
+            if not self._classified:
+                score -= 0.05
+                breakdown["early_reply_penalty"] = -0.05
+            hits = sum(1 for kw in correct["keywords"] if kw in (action.content or "").lower())
+            reply_score = min(0.4, hits * 0.1)
+            score += reply_score
+            breakdown["reply"] = reply_score
+            self._replied = True
+        elif action.action_type == "escalate":
+            if correct["requires_escalation"]:
+                score += 0.2
+                breakdown["escalate"] = 0.2
+            else:
+                score -= 0.1
+                breakdown["escalate"] = -0.1
+            self._escalated = True
+        elif action.action_type == "close":
+            bonus = 0.0
+            if self._classified:
+                bonus += 0.1
+            if self._replied:
+                bonus += 0.1
+            if correct["requires_escalation"] and self._escalated:
+                bonus += 0.1
+            score += bonus
+            breakdown["close_bonus"] = bonus
+        score = round(max(0.0, min(1.0, score)), 4)
+        feedback = self._make_feedback(action, breakdown, correct)
+        return Reward(score=score, feedback=feedback, breakdown=breakdown)
+    def _make_feedback(self, action, breakdown, correct):
+        parts = []
+        if breakdown.get("classify") == 0.3:
+            parts.append("correct category")
+        elif "classify" in breakdown:
+            parts.append(f"wrong category (expected {correct['category']})")
+        if "early_reply_penalty" in breakdown:
+            parts.append("replied before classifying")
+        if "reply" in breakdown:
+            parts.append(f"reply score {breakdown['reply']:.2f}")
+        if breakdown.get("escalate") == 0.2:
+            parts.append("escalated correctly")
+        elif breakdown.get("escalate") == -0.1:
+            parts.append("unnecessary escalation")
+        if "close_bonus" in breakdown:
+            parts.append(f"close bonus {breakdown['close_bonus']:.2f}")
+        return ", ".join(parts) if parts else "ok"

customer-support-openenv/env/grader.py ADDED Viewed

	@@ -0,0 +1,72 @@

+from typing import List
+from .models import Action
+def grade_easy(task, actions: List[Action]) -> float:
+    expected = task["expected"]["category"].lower()
+    for a in actions:
+        if a.action_type == "classify":
+            return 1.0 if (a.category or "").lower() == expected else 0.0
+    return 0.0
+def grade_medium(task, actions: List[Action]) -> float:
+    score = 0.0
+    expected_cat = task["expected"]["category"].lower()
+    keywords = [k.lower() for k in task["expected"]["keywords"]]
+    for a in actions:
+        if a.action_type == "classify":
+            if (a.category or "").lower() == expected_cat:
+                score += 0.4
+            break
+    for a in actions:
+        if a.action_type == "reply" and a.content:
+            hits = sum(1 for k in keywords if k in a.content.lower())
+            score += min(0.6, hits * 0.15)
+            break
+    return round(min(1.0, score), 4)
+def grade_hard(task, actions: List[Action]) -> float:
+    score = 0.0
+    expected_cat = task["expected"]["category"].lower()
+    keywords = [k.lower() for k in task["expected"]["keywords"]]
+    needs_escalation = task["expected"]["requires_escalation"]
+    for a in actions:
+        if a.action_type == "classify":
+            if (a.category or "").lower() == expected_cat:
+                score += 0.2
+            break
+    for a in actions:
+        if a.action_type == "reply" and a.content:
+            hits = sum(1 for k in keywords if k in a.content.lower())
+            score += min(0.3, hits * 0.075)
+            break
+    escalated = any(a.action_type == "escalate" for a in actions)
+    if needs_escalation and escalated:
+        score += 0.2
+    elif not needs_escalation and escalated:
+        score -= 0.1
+    if any(a.action_type == "close" for a in actions):
+        score += 0.3
+    return round(max(0.0, min(1.0, score)), 4)
+GRADERS = {
+    "easy": grade_easy,
+    "medium": grade_medium,
+    "hard": grade_hard,
+}
+def grade_task(task, actions: List[Action]) -> float:
+    grader = GRADERS.get(task.get("id", "easy"), grade_easy)
+    return grader(task, actions)

customer-support-openenv/env/models.py ADDED Viewed

	@@ -0,0 +1,21 @@

+from pydantic import BaseModel
+from typing import List, Optional, Dict, Any
+class Observation(BaseModel):
+    ticket_id: str
+    customer_query: str
+    history: List[str]
+    status: str
+class Action(BaseModel):
+    action_type: str  # classify | reply | escalate | close
+    content: Optional[str] = None
+    category: Optional[str] = None
+class Reward(BaseModel):
+    score: float
+    feedback: str
+    breakdown: Dict[str, Any] = {}

customer-support-openenv/env/tasks.py ADDED Viewed

	@@ -0,0 +1,59 @@

+TASKS = {
+    "easy": {
+        "id": "easy",
+        "description": "Classify a customer ticket into the right category.",
+        "input": {
+            "ticket_id": "T001",
+            "customer_query": "I was charged twice for my order #ORD-8821 and need the duplicate payment removed.",
+            "history": [],
+            "status": "open",
+        },
+        "expected": {
+            "category": "billing",
+            "keywords": ["refund", "charge", "payment", "duplicate", "billing"],
+            "requires_escalation": False,
+        },
+        "max_steps": 5,
+    },
+    "medium": {
+        "id": "medium",
+        "description": "Classify the ticket and give a helpful reply.",
+        "input": {
+            "ticket_id": "T002",
+            "customer_query": "The app keeps crashing on my iPhone 15. I already restarted my phone twice.",
+            "history": [],
+            "status": "open",
+        },
+        "expected": {
+            "category": "technical",
+            "keywords": ["reinstall", "update", "cache", "support", "technical", "version"],
+            "requires_escalation": False,
+        },
+        "max_steps": 8,
+    },
+    "hard": {
+        "id": "hard",
+        "description": "Full pipeline — classify, reply, escalate if needed, then close.",
+        "input": {
+            "ticket_id": "T003",
+            "customer_query": "I have been waiting three weeks for a refund your team promised. I am considering legal action.",
+            "history": [
+                "Agent: We apologise. Your refund is being processed.",
+                "Customer: Two weeks and still nothing!",
+                "Agent: We escalated this to our billing team.",
+                "Customer: Another week gone. I want to speak to a manager!",
+            ],
+            "status": "pending",
+        },
+        "expected": {
+            "category": "billing",
+            "keywords": ["escalat", "manager", "priority", "urgent", "legal", "refund", "apologize", "sorry"],
+            "requires_escalation": True,
+        },
+        "max_steps": 10,
+    },
+}
+TASK_LIST = list(TASKS.values())

customer-support-openenv/env/utils.py ADDED Viewed

	@@ -0,0 +1,26 @@

+import json
+import os
+from typing import List, Dict, Any
+def load_tickets(path=None) -> List[Dict[str, Any]]:
+    if path is None:
+        path = os.path.join(os.path.dirname(__file__), "..", "data", "tickets.json")
+    with open(path) as f:
+        return json.load(f)
+def format_observation(obs) -> str:
+    lines = [
+        f"Ticket : {obs.ticket_id}",
+        f"Status : {obs.status}",
+        f"Query  : {obs.customer_query}",
+    ]
+    for i, msg in enumerate(obs.history, 1):
+        lines.append(f"  [{i}] {msg}")
+    return "\n".join(lines)
+def log_step(step, action, reward):
+    cat = action.category or "-"
+    print(f"step {step:>2} | {action.action_type:<10} cat={cat:<12} score={reward.score:.2f} | {reward.feedback}")

customer-support-openenv/openenv.yaml ADDED Viewed

	@@ -0,0 +1,107 @@

+name: customer-support-env
+version: "1.0"
+description: >
+  An OpenEnv-compliant environment that simulates real-world customer support
+  ticket workflows. An AI agent must classify incoming tickets, craft appropriate
+  replies, decide when to escalate to a human agent, and close resolved tickets.
+  The environment provides dense, shaped rewards at every step to enable
+  efficient RL training — not just a sparse end-of-episode signal.
+entry_point: env.environment:CustomerSupportEnv
+author: "Adit Sharma, Mansi Verma, Priyanshi Vishwakarma"
+tags:
+  - openenv
+  - customer-support
+  - nlp
+  - real-world
+  - multi-step
+# ---------------------------------------------------------
+# Tasks
+# ---------------------------------------------------------
+tasks:
+  - id: easy
+    difficulty: easy
+    description: >
+      Classify a single customer ticket into the correct category
+      (billing / technical / refund / account / abuse).
+    max_steps: 5
+    scoring: "1.0 for correct classification, 0.0 otherwise."
+  - id: medium
+    difficulty: medium
+    description: >
+      Classify the ticket correctly (worth 0.4) and then reply with a helpful,
+      keyword-rich response that addresses the root issue (up to 0.6).
+    max_steps: 8
+    scoring: "Partial credit: 0.4 classify + up to 0.6 reply quality."
+  - id: hard
+    difficulty: hard
+    description: >
+      Full resolution pipeline — classify (0.2), give a quality reply (0.3),
+      escalate to a human agent when required (0.2), and close the ticket (0.3).
+      Penalises unnecessary escalation (−0.1).
+    max_steps: 10
+    scoring: "Partial credit across all 4 action types; penalty for bad escalation."
+# ---------------------------------------------------------
+# Action Space
+# ---------------------------------------------------------
+action_space:
+  type: discrete-structured
+  actions:
+    - name: classify
+      required_fields: [category]
+      category_values: [billing, technical, refund, account, abuse]
+      description: "Classify the ticket into a support category."
+    - name: reply
+      required_fields: [content]
+      description: "Send a reply message to the customer."
+    - name: escalate
+      required_fields: []
+      description: "Escalate the ticket to a human agent."
+    - name: close
+      required_fields: []
+      description: "Close the ticket and end the episode (done=True)."
+# ---------------------------------------------------------
+# Observation Space
+# ---------------------------------------------------------
+observation_space:
+  type: structured
+  fields:
+    - name: ticket_id
+      type: string
+      description: "Unique identifier for the support ticket."
+    - name: customer_query
+      type: string
+      description: "The customer's message or complaint."
+    - name: history
+      type: list[string]
+      description: "Chronological conversation history (agent + customer turns)."
+    - name: status
+      type: string
+      enum: [open, pending, resolved]
+      description: "Current status of the ticket."
+# ---------------------------------------------------------
+# Reward
+# ---------------------------------------------------------
+reward_range: [0.0, 1.0]
+reward_structure:
+  classify_correct: +0.3
+  reply_per_keyword_hit: +0.1 (max 0.4)
+  reply_before_classify: -0.05
+  escalate_correct: +0.2
+  escalate_unnecessary: -0.1
+  close_bonus: +0.0 to +0.3 (depends on prior progress)
+  time_penalty: -0.05 (if step_count >= max_steps)

customer-support-openenv/requirements.txt ADDED Viewed

	@@ -0,0 +1,5 @@

+pydantic>=2.0
+openai>=1.0
+python-dotenv
+fastapi
+uvicorn