Spaces:

Adit1Sharma
/

customer-support-openenv

Sleeping

App Files Files Community

customer-support-openenv / README.md

Adit1Sharma

Fix multi-mode deployment structure and missing files

38611c2 2 months ago

preview code

raw

history blame contribute delete

8.58 kB

	---
	title: Customer Support OpenEnv
	emoji: 🎧
	colorFrom: blue
	colorTo: green
	sdk: docker
	app_file: server/app.py
	pinned: false
	---

	# Customer Support OpenEnv

	> A real-world reinforcement learning environment where an AI agent learns to handle customer support tickets — classify issues, craft replies, escalate when needed, and resolve tickets.

	Built for the Meta × PyTorch OpenEnv Hackathon — Round 1, April 2026.

	---

	## What is this?

	Most RL environments are games. This one is not.

	Every company with customers has a support queue. Tickets come in — billing complaints, app crashes, refund requests, angry users threatening legal action. A human agent reads each one, figures out what's wrong, replies helpfully, escalates if it's too serious, and closes it.

	This environment teaches an AI to do exactly that. The agent receives a ticket, takes actions step by step, and gets rewarded based on how well it handles the situation. The reward signal is dense — the agent gets feedback at every step, not just at the end.

	---

	## Architecture

	### Overall System

	```mermaid
	graph TD
	A[Agent] -->\|Action\| B[CustomerSupportEnv]
	B -->\|Observation, Reward, Done, Info\| A

	B --> C[tasks.py\nTask Definitions]
	B --> D[models.py\nTyped Models]
	B --> E[grader.py\nPer-Task Graders]

	F[app.py\nFastAPI Server] --> B
	G[baseline/run_baseline.py] --> B

	H[openenv.yaml\nMetadata + Config] -.->\|describes\| B
	I[data/tickets.json\nTicket Dataset] -.->\|reference data\| B
	```

	### Episode Flow

	```mermaid
	sequenceDiagram
	participant Agent
	participant Env as CustomerSupportEnv
	participant Grader

	Agent->>Env: reset(task_id="hard")
	Env-->>Agent: Observation (ticket + history + status)

	Agent->>Env: step(Action: classify, category="billing")
	Env-->>Agent: Observation, Reward(+0.3), done=False

	Agent->>Env: step(Action: reply, content="...")
	Env-->>Agent: Observation, Reward(+0.2), done=False

	Agent->>Env: step(Action: escalate)
	Env-->>Agent: Observation, Reward(+0.2), done=False

	Agent->>Env: step(Action: close)
	Env-->>Agent: Observation, Reward(+0.3), done=True

	Agent->>Grader: grade_task(task, actions_taken)
	Grader-->>Agent: Final Score (0.0 - 1.0)
	```

	### Reward Breakdown

	```mermaid
	flowchart LR
	A[Action Taken] --> B{action_type?}

	B -->\|classify\| C{Category correct?}
	C -->\|yes\| D[+0.30]
	C -->\|no\| E[+0.00]

	B -->\|reply\| F{Keyword hits?}
	F --> G[+0.10 per hit\nmax +0.40]
	F -->\|replied before classify\| H[-0.05 penalty]

	B -->\|escalate\| I{Required?}
	I -->\|yes\| J[+0.20]
	I -->\|no\| K[-0.10 penalty]

	B -->\|close\| L[+0.10 if classified\n+0.10 if replied\n+0.10 if escalated correctly]

	B -->\|any, at max_steps\| M[-0.05 time penalty]
	```

	### File Structure

	```mermaid
	graph LR
	root[customer-support-openenv]

	root --> env[env/]
	env --> models[models.py\nObservation Action Reward]
	env --> environment[environment.py\nCustomerSupportEnv]
	env --> tasks[tasks.py\nTask Definitions]
	env --> grader[grader.py\ngrade_easy grade_medium grade_hard]
	env --> utils[utils.py\nHelpers]
	env --> init[__init__.py]

	root --> baseline[baseline/]
	baseline --> script[run_baseline.py\nLLM + Mock runner]

	root --> data[data/]
	data --> tickets[tickets.json\n12 real tickets]

	root --> apppy[app.py\nFastAPI Server]
	root --> yaml[openenv.yaml]
	root --> docker[Dockerfile]
	root --> readme[README.md]
	root --> env2[.env\nAPI Keys]
	```

	---

	## Tasks

	The environment has 3 tasks of increasing difficulty. An agent must handle all three.

	\| Task \| Difficulty \| Max Steps \| What the agent must do \|
	\|---\|---\|---\|---\|
	\| `easy` \| 🟢 Easy \| 5 \| Just classify the ticket correctly \|
	\| `medium` \| 🟡 Medium \| 8 \| Classify + give a helpful reply \|
	\| `hard` \| 🔴 Hard \| 10 \| Classify → reply → escalate → close \|

	### Easy — Classification Only
	```
	Customer: "I was charged twice for my order and need the duplicate removed."
	Agent must → classify as "billing"
	Score: 1.0 correct, 0.0 wrong
	```

	### Medium — Classify + Reply
	```
	Customer: "The app keeps crashing on my iPhone. I already restarted twice."
	Agent must → classify as "technical" AND reply with relevant keywords
	Score: 0.4 (classify) + up to 0.6 (reply quality)
	```

	### Hard — Full Pipeline
	```
	Customer: "Been waiting 3 weeks for my refund. Considering legal action."
	History: 4 prior messages showing escalation attempts
	Agent must → classify + reply + escalate to human + close ticket
	Score: 0.2 + 0.3 + 0.2 + 0.3 (partial credit, penalty for bad escalation)
	```

	---

	## Observation Space

	What the agent sees at each step:

	```python
	Observation(
	ticket_id="T001",
	customer_query="I was charged twice and need a refund.",
	history=["Agent: We are looking into it.", "Customer: Still waiting!"],
	status="pending" # open \| pending \| resolved
	)
	```

	---

	## Action Space

	What the agent can do:

	```python
	Action(action_type="classify", category="billing") # identify the issue
	Action(action_type="reply", content="We will help...") # respond to customer
	Action(action_type="escalate") # pass to human agent
	Action(action_type="close") # end the episode
	```

	Valid categories: `billing` \| `technical` \| `refund` \| `account` \| `abuse`

	---

	## Setup

	### 1. Clone and install

	```bash
	git clone <your-repo-url>
	cd customer-support-openenv
	pip install -r requirements.txt
	```

	### 2. Add your API key (optional — needed for LLM baseline)

	```bash
	# .env
	OPENAI_API_KEY=sk-...
	```

	### 3. Run the baseline

	```bash
	python baseline/run_baseline.py
	```

	No API key? It runs in mock mode with deterministic actions — still produces valid scores.

	### 4. Start the HTTP server

	```bash
	python app.py
	# → http://localhost:7860
	```

	### 5. Try it manually

	```bash
	# Start a hard task episode
	curl "http://localhost:7860/reset?task_id=hard"

	# Classify the ticket
	curl -X POST http://localhost:7860/step \
	-H "Content-Type: application/json" \
	-d '{"action_type": "classify", "category": "billing"}'

	# Reply
	curl -X POST http://localhost:7860/step \
	-H "Content-Type: application/json" \
	-d '{"action_type": "reply", "content": "We are escalating your refund as priority."}'

	# Escalate
	curl -X POST http://localhost:7860/step \
	-H "Content-Type: application/json" \
	-d '{"action_type": "escalate"}'

	# Close
	curl -X POST http://localhost:7860/step \
	-H "Content-Type: application/json" \
	-d '{"action_type": "close"}'
	```

	### 6. Use directly in Python

	```python
	from env import CustomerSupportEnv, Action

	env = CustomerSupportEnv()
	obs = env.reset(task_id="hard")

	print(obs.customer_query)
	# → "I have been waiting three weeks for a refund..."

	obs, reward, done, info = env.step(Action(action_type="classify", category="billing"))
	print(reward.score, reward.feedback)
	# → 0.3 "correct category"

	obs, reward, done, info = env.step(Action(
	action_type="reply",
	content="We are making this a priority refund and escalating to a manager."
	))

	obs, reward, done, info = env.step(Action(action_type="escalate"))
	obs, reward, done, info = env.step(Action(action_type="close"))
	```

	---

	## Docker

	```bash
	docker build -t openenv .
	docker run -p 7860:7860 -e OPENAI_API_KEY=sk-... openenv
	```

	---

	## Deploying to Hugging Face Spaces

	1. Go to [huggingface.co/spaces](https://huggingface.co/spaces)
	2. Create a new Space → select Docker SDK
	3. Add tag: `openenv`
	4. Upload this entire repo
	5. Add `OPENAI_API_KEY` as a Space secret

	The server starts automatically and exposes all endpoints.

	---

	## Baseline Scores

	Measured with deterministic mock actions (no API key needed):

	\| Task \| Mock Score \| LLM Score (gpt-4o-mini) \|
	\|---\|---\|---\|
	\| easy \| 1.000 \| ~0.900 \|
	\| medium \| 0.850 \| ~0.750 \|
	\| hard \| 0.775 \| ~0.650 \|
	\| Total \| 2.625 / 3.0 \| ~2.300 / 3.0 \|

	---

	## API Reference

	\| Method \| Endpoint \| Description \|
	\|---\|---\|---\|
	\| GET \| `/` \| HTML landing page \|
	\| GET \| `/reset?task_id=easy` \| Start a new episode \|
	\| POST \| `/step` \| Submit an Action \|
	\| GET \| `/state` \| Current raw state \|
	\| GET \| `/tasks` \| List all tasks \|
	\| GET \| `/health` \| Health check \|
	\| GET \| `/docs` \| Swagger UI \|

	---

	## Team

	- Adit Sharma — adit.2428cs1345@kiet.edu
	- Mansi Verma — ogmansi897@gmail.com
	- Priyanshi Vishwakarma — vishwakarmapriyanshi68@gmail.com

	---

	Meta × PyTorch OpenEnv Hackathon — Round 1, April 2026