Spaces:

mathi3046
/

customer-support-env

Sleeping

App Files Files Community

customer-support-env / README.md

mathi3046

Update URLs and add validate script

b079cfc about 2 months ago

preview code

raw

history blame contribute delete

9.79 kB

	---
	title: Customer Support Env
	emoji: 🎧
	colorFrom: blue
	colorTo: purple
	sdk: docker
	app_port: 7860
	tags:
	- openenv
	pinned: false
	---

	# 🎧 AI-Powered Customer Support Ticket Resolution Environment

	> An OpenEnv-compatible environment for training AI agents to handle real-world customer support scenarios — from simple FAQs to complex, multi-step escalations with angry customers.

	[![OpenEnv](https://img.shields.io/badge/OpenEnv-Compatible-blue)](https://github.com/meta-pytorch/OpenEnv)
	[![Python](https://img.shields.io/badge/Python-3.10%2B-green)](https://python.org)
	[![Docker](https://img.shields.io/badge/Docker-Ready-blue)](https://docker.com)
	[![License](https://img.shields.io/badge/License-MIT-yellow)](LICENSE)

	---

	## 1. Environment Overview

	This environment simulates a real customer support helpdesk where an AI agent must:

	- Read incoming customer tickets with varying complexity
	- Understand customer sentiment (neutral → frustrated → angry)
	- Apply company policies (refund, shipping, escalation)
	- Craft professional, empathetic, and accurate responses
	- Resolve issues within a limited number of steps

	The agent interacts using the standard OpenEnv API: `reset()`, `step()`, and `state()`.

	---

	## 2. Real-World Use Case

	Customer support is one of the most common AI deployment targets. This environment captures realistic challenges:

	\| Challenge \| How It's Simulated \|
	\|---\|---\|
	\| Tone matching \| Grader evaluates empathy, professionalism, and harmful language \|
	\| Policy reasoning \| Agent must apply correct refund/shipping/escalation policies \|
	\| Multi-turn dialogue \| Customers send follow-up messages that depend on agent's response quality \|
	\| Escalation handling \| Hard tasks require knowing when and how to escalate \|
	\| Angry customers \| Sentiment ranges from neutral to furious, requiring different strategies \|

	---

	## 3. Action Space

	The agent sends a `SupportAction` with:

	```python
	class SupportAction(BaseModel):
	response_text: str # Agent's response to the customer (1-2000 chars)
	action_type: str # "respond" \| "escalate" \| "resolve" \| "request_info"
	internal_notes: str # Optional internal notes (not visible to customer)
	```

	\| Action Type \| Effect \|
	\|---\|---\|
	\| `respond` \| Continue the conversation \|
	\| `resolve` \| Mark ticket as resolved (ends episode) \|
	\| `escalate` \| Escalate to senior support \|
	\| `request_info` \| Ask customer for more information \|

	---

	## 4. Observation Space

	After each step, the agent receives a `SupportObservation`:

	```python
	class SupportObservation(BaseModel):
	ticket: TicketInfo # Ticket metadata (ID, category, priority, customer info)
	conversation_history: list # Full message history
	current_message: str # Latest customer message to respond to
	policy_context: str # Relevant company policies
	task_id: str # Current task identifier
	difficulty: str # "easy" \| "medium" \| "hard"
	max_steps: int # Maximum steps allowed
	steps_remaining: int # Steps left before timeout
	done: bool # Whether episode is complete
	reward: float # Cumulative reward so far
	```

	---

	## 5. Reward Design

	The reward function uses a dense, multi-axis scoring system:

	### Scoring Axes

	\| Axis \| Weight (varies by task) \| What It Measures \|
	\|---\|---\|---\|
	\| Correctness \| 0.30-0.35 \| Keyword/concept matching against expected response elements \|
	\| Tone \| 0.30-0.40 \| Professional, empathetic language vs. harmful/rude signals \|
	\| Completeness \| 0.30-0.40 \| Checklist of required response components \|

	### Reward Breakdown Example

	```
	+0.30 → Correctly identifies the issue (correctness)
	+0.30 → Professional and empathetic tone (tone)
	+0.40 → Addresses all required elements (completeness)
	─────
	1.00 → Perfect score
	```

	### Penalties (deducted from total)

	\| Penalty \| Deduction \| Trigger \|
	\|---\|---\|---\|
	\| Empty response \| -0.30 \| < 5 words \|
	\| Repeated response \| -0.15 to -0.30 \| Copy-paste from previous \|
	\| Harmful language \| -0.50 \| Offensive or inappropriate content \|
	\| Irrelevant content \| -0.40 \| Off-topic responses \|

	---

	## 6. Task Descriptions

	### Task 1: Simple FAQ (Easy)
	- Ticket: "Where is my order?"
	- Customer: Sarah Johnson (Neutral sentiment)
	- Expected: Reference order ID, explain shipping timeframe (5-7 business days), mention tracking email
	- Max Steps: 3
	- Policy Context: Shipping policy

	### Task 2: Conditional Refund (Medium)
	- Ticket: "Refund for opened laptop bag with defective stitching"
	- Customer: Michael Chen (Frustrated sentiment)
	- Expected: Identify as manufacturing defect, offer full refund + replacement option, explain return process
	- Max Steps: 5
	- Policy Context: Refund policy + Return policy
	- Follow-ups: Customer provides photos, asks about timeline

	### Task 3: Complex Complaint Escalation (Hard)
	- Ticket: "Wrong item, late delivery, rude staff"
	- Customer: David Martinez (Angry sentiment)
	- Expected: Address ALL three issues, offer refund + compensation, escalate to manager, provide written confirmation
	- Max Steps: 7
	- Policy Context: All policies (refund, return, shipping, escalation)
	- Follow-ups: Threats to file complaints, demands for specifics, requests for written confirmation

	---

	## 7. Setup Instructions

	### Prerequisites
	- Python 3.10+
	- Docker (optional, for containerized deployment)

	### Local Setup

	```bash
	# Clone the repository
	git clone https://github.com/MathiyazhaganNTL/openenv_scaler.git
	cd openenv

	# Install dependencies
	pip install -r requirements.txt

	# Run validation
	python validate.py
	```

	### Environment Variables (for inference)

	```bash
	cp .env.example .env
	# Edit .env with your API keys
	```

	\| Variable \| Default \| Description \|
	\|---\|---\|---\|
	\| `API_BASE_URL` \| `https://api.openai.com/v1` \| LLM API endpoint \|
	\| `MODEL_NAME` \| `gpt-3.5-turbo` \| Model to use \|
	\| `OPENAI_API_KEY` \| — \| API key \|
	\| `HF_TOKEN` \| — \| Alternative: HF token \|
	\| `ENV_BASE_URL` \| `http://localhost:8000` \| Environment server URL \|

	---

	## 8. Run Instructions

	### Start the Environment Server

	```bash
	# Direct
	python -m server.app

	# Or with uvicorn
	uvicorn server.app:app --host 0.0.0.0 --port 8000

	# Or with Docker
	docker build -t customer-support-env .
	docker run -p 8000:8000 customer-support-env
	```

	### Run Baseline Inference

	```bash
	# Start the server first (in another terminal)
	uvicorn server.app:app --host 0.0.0.0 --port 8000

	# Run inference
	python inference.py
	```

	### API Usage Examples

	```bash
	# Health check
	curl http://localhost:8000/health

	# List tasks
	curl http://localhost:8000/tasks

	# Reset environment
	curl -X POST http://localhost:8000/reset \
	-H "Content-Type: application/json" \
	-d '{"task_id": "easy_faq"}'

	# Step
	curl -X POST http://localhost:8000/step \
	-H "Content-Type: application/json" \
	-d '{"action": {"response_text": "Thank you for reaching out!", "action_type": "respond"}}'

	# Get state
	curl http://localhost:8000/state
	```

	### Python Client Usage

	```python
	from server.environment import CustomerSupportEnvironment
	from models import SupportAction

	env = CustomerSupportEnvironment()

	# Reset to a task
	obs = env.reset(task_id="easy_faq")
	print(obs.current_message) # Customer's first message

	# Respond
	action = SupportAction(
	response_text="Hi Sarah! Your order ORD-55821 ships in 5-7 business days...",
	action_type="respond",
	)
	obs, reward, done, info = env.step(action)
	print(f"Reward: {reward:.4f}")
	print(f"Score breakdown: {info['reward_breakdown']}")
	```

	---

	## 9. Baseline Results

	Running the baseline inference with `gpt-3.5-turbo`:

	\| Task \| Difficulty \| Avg Reward \| Steps \|
	\|---\|---\|---\|---\|
	\| `easy_faq` \| Easy \| ~0.65 \| 1–2 \|
	\| `medium_refund` \| Medium \| ~0.55 \| 3–4 \|
	\| `hard_escalation` \| Hard \| ~0.45 \| 4–6 \|
	\| Overall \| — \| ~0.55 \| — \|

	> Scores vary based on model quality. Better models achieve higher scores by producing more empathetic, accurate, and complete responses.

	---

	## Project Structure

	```
	openenv/
	├── openenv.yaml # OpenEnv manifest (metadata, tasks, config)
	├── models.py # Pydantic models (Action, Observation, State, Reward)
	├── tasks.py # Task definitions (3 tasks, rubrics, policies)
	├── grader.py # Deterministic grading engine
	├── inference.py # Baseline LLM inference script
	├── validate.py # Environment validation script
	├── requirements.txt # Python dependencies
	├── pyproject.toml # Project configuration
	├── Dockerfile # Docker container definition
	├── .dockerignore # Docker build exclusions
	├── .env.example # Environment variable template
	├── .gitignore # Git ignore rules
	├── README.md # This file
	└── server/
	├── __init__.py
	├── environment.py # Core environment (reset/step/state)
	└── app.py # FastAPI HTTP server
	```

	---

	## HuggingFace Spaces Deployment

	This environment is designed for deployment as a Docker-based HuggingFace Space:

	1. Create a new Space with Docker SDK
	2. Push the code to the Space repository
	3. The Space will auto-build and expose the API at port 8000
	4. Tag the Space with `openenv`

	```bash
	# Using openenv CLI
	openenv push --repo-id mathi3046/customer-support-env
	```

	The API endpoint `POST /reset` will respond with HTTP 200, confirming the Space is operational.

	---

	## License

	MIT License. See [LICENSE](LICENSE) for details.