Adit1Sharma's picture
Fix multi-mode deployment structure and missing files
38611c2
metadata
title: Customer Support OpenEnv
emoji: 🎧
colorFrom: blue
colorTo: green
sdk: docker
app_file: server/app.py
pinned: false

Customer Support OpenEnv

A real-world reinforcement learning environment where an AI agent learns to handle customer support tickets β€” classify issues, craft replies, escalate when needed, and resolve tickets.

Built for the Meta Γ— PyTorch OpenEnv Hackathon β€” Round 1, April 2026.


What is this?

Most RL environments are games. This one is not.

Every company with customers has a support queue. Tickets come in β€” billing complaints, app crashes, refund requests, angry users threatening legal action. A human agent reads each one, figures out what's wrong, replies helpfully, escalates if it's too serious, and closes it.

This environment teaches an AI to do exactly that. The agent receives a ticket, takes actions step by step, and gets rewarded based on how well it handles the situation. The reward signal is dense β€” the agent gets feedback at every step, not just at the end.


Architecture

Overall System

graph TD
    A[Agent] -->|Action| B[CustomerSupportEnv]
    B -->|Observation, Reward, Done, Info| A

    B --> C[tasks.py\nTask Definitions]
    B --> D[models.py\nTyped Models]
    B --> E[grader.py\nPer-Task Graders]

    F[app.py\nFastAPI Server] --> B
    G[baseline/run_baseline.py] --> B

    H[openenv.yaml\nMetadata + Config] -.->|describes| B
    I[data/tickets.json\nTicket Dataset] -.->|reference data| B

Episode Flow

sequenceDiagram
    participant Agent
    participant Env as CustomerSupportEnv
    participant Grader

    Agent->>Env: reset(task_id="hard")
    Env-->>Agent: Observation (ticket + history + status)

    Agent->>Env: step(Action: classify, category="billing")
    Env-->>Agent: Observation, Reward(+0.3), done=False

    Agent->>Env: step(Action: reply, content="...")
    Env-->>Agent: Observation, Reward(+0.2), done=False

    Agent->>Env: step(Action: escalate)
    Env-->>Agent: Observation, Reward(+0.2), done=False

    Agent->>Env: step(Action: close)
    Env-->>Agent: Observation, Reward(+0.3), done=True

    Agent->>Grader: grade_task(task, actions_taken)
    Grader-->>Agent: Final Score (0.0 - 1.0)

Reward Breakdown

flowchart LR
    A[Action Taken] --> B{action_type?}

    B -->|classify| C{Category correct?}
    C -->|yes| D[+0.30]
    C -->|no| E[+0.00]

    B -->|reply| F{Keyword hits?}
    F --> G[+0.10 per hit\nmax +0.40]
    F -->|replied before classify| H[-0.05 penalty]

    B -->|escalate| I{Required?}
    I -->|yes| J[+0.20]
    I -->|no| K[-0.10 penalty]

    B -->|close| L[+0.10 if classified\n+0.10 if replied\n+0.10 if escalated correctly]

    B -->|any, at max_steps| M[-0.05 time penalty]

File Structure

graph LR
    root[customer-support-openenv]

    root --> env[env/]
    env --> models[models.py\nObservation Action Reward]
    env --> environment[environment.py\nCustomerSupportEnv]
    env --> tasks[tasks.py\nTask Definitions]
    env --> grader[grader.py\ngrade_easy grade_medium grade_hard]
    env --> utils[utils.py\nHelpers]
    env --> init[__init__.py]

    root --> baseline[baseline/]
    baseline --> script[run_baseline.py\nLLM + Mock runner]

    root --> data[data/]
    data --> tickets[tickets.json\n12 real tickets]

    root --> apppy[app.py\nFastAPI Server]
    root --> yaml[openenv.yaml]
    root --> docker[Dockerfile]
    root --> readme[README.md]
    root --> env2[.env\nAPI Keys]

Tasks

The environment has 3 tasks of increasing difficulty. An agent must handle all three.

Task Difficulty Max Steps What the agent must do
easy 🟒 Easy 5 Just classify the ticket correctly
medium 🟑 Medium 8 Classify + give a helpful reply
hard πŸ”΄ Hard 10 Classify β†’ reply β†’ escalate β†’ close

Easy β€” Classification Only

Customer: "I was charged twice for my order and need the duplicate removed."
Agent must β†’ classify as "billing"
Score: 1.0 correct, 0.0 wrong

Medium β€” Classify + Reply

Customer: "The app keeps crashing on my iPhone. I already restarted twice."
Agent must β†’ classify as "technical" AND reply with relevant keywords
Score: 0.4 (classify) + up to 0.6 (reply quality)

Hard β€” Full Pipeline

Customer: "Been waiting 3 weeks for my refund. Considering legal action."
History: 4 prior messages showing escalation attempts
Agent must β†’ classify + reply + escalate to human + close ticket
Score: 0.2 + 0.3 + 0.2 + 0.3 (partial credit, penalty for bad escalation)

Observation Space

What the agent sees at each step:

Observation(
    ticket_id="T001",
    customer_query="I was charged twice and need a refund.",
    history=["Agent: We are looking into it.", "Customer: Still waiting!"],
    status="pending"   # open | pending | resolved
)

Action Space

What the agent can do:

Action(action_type="classify", category="billing")          # identify the issue
Action(action_type="reply",    content="We will help...")   # respond to customer
Action(action_type="escalate")                              # pass to human agent
Action(action_type="close")                                 # end the episode

Valid categories: billing | technical | refund | account | abuse


Setup

1. Clone and install

git clone <your-repo-url>
cd customer-support-openenv
pip install -r requirements.txt

2. Add your API key (optional β€” needed for LLM baseline)

# .env
OPENAI_API_KEY=sk-...

3. Run the baseline

python baseline/run_baseline.py

No API key? It runs in mock mode with deterministic actions β€” still produces valid scores.

4. Start the HTTP server

python app.py
# β†’ http://localhost:7860

5. Try it manually

# Start a hard task episode
curl "http://localhost:7860/reset?task_id=hard"

# Classify the ticket
curl -X POST http://localhost:7860/step \
  -H "Content-Type: application/json" \
  -d '{"action_type": "classify", "category": "billing"}'

# Reply
curl -X POST http://localhost:7860/step \
  -H "Content-Type: application/json" \
  -d '{"action_type": "reply", "content": "We are escalating your refund as priority."}'

# Escalate
curl -X POST http://localhost:7860/step \
  -H "Content-Type: application/json" \
  -d '{"action_type": "escalate"}'

# Close
curl -X POST http://localhost:7860/step \
  -H "Content-Type: application/json" \
  -d '{"action_type": "close"}'

6. Use directly in Python

from env import CustomerSupportEnv, Action

env = CustomerSupportEnv()
obs = env.reset(task_id="hard")

print(obs.customer_query)
# β†’ "I have been waiting three weeks for a refund..."

obs, reward, done, info = env.step(Action(action_type="classify", category="billing"))
print(reward.score, reward.feedback)
# β†’ 0.3  "correct category"

obs, reward, done, info = env.step(Action(
    action_type="reply",
    content="We are making this a priority refund and escalating to a manager."
))

obs, reward, done, info = env.step(Action(action_type="escalate"))
obs, reward, done, info = env.step(Action(action_type="close"))

Docker

docker build -t openenv .
docker run -p 7860:7860 -e OPENAI_API_KEY=sk-... openenv

Deploying to Hugging Face Spaces

  1. Go to huggingface.co/spaces
  2. Create a new Space β†’ select Docker SDK
  3. Add tag: openenv
  4. Upload this entire repo
  5. Add OPENAI_API_KEY as a Space secret

The server starts automatically and exposes all endpoints.


Baseline Scores

Measured with deterministic mock actions (no API key needed):

Task Mock Score LLM Score (gpt-4o-mini)
easy 1.000 ~0.900
medium 0.850 ~0.750
hard 0.775 ~0.650
Total 2.625 / 3.0 ~2.300 / 3.0

API Reference

Method Endpoint Description
GET / HTML landing page
GET /reset?task_id=easy Start a new episode
POST /step Submit an Action
GET /state Current raw state
GET /tasks List all tasks
GET /health Health check
GET /docs Swagger UI

Team


Meta Γ— PyTorch OpenEnv Hackathon β€” Round 1, April 2026