mathi3046's picture
Update URLs and add validate script
b079cfc
---
title: Customer Support Env
emoji: 🎧
colorFrom: blue
colorTo: purple
sdk: docker
app_port: 7860
tags:
- openenv
pinned: false
---
# 🎧 AI-Powered Customer Support Ticket Resolution Environment
> **An OpenEnv-compatible environment for training AI agents to handle real-world customer support scenarios β€” from simple FAQs to complex, multi-step escalations with angry customers.**
[![OpenEnv](https://img.shields.io/badge/OpenEnv-Compatible-blue)](https://github.com/meta-pytorch/OpenEnv)
[![Python](https://img.shields.io/badge/Python-3.10%2B-green)](https://python.org)
[![Docker](https://img.shields.io/badge/Docker-Ready-blue)](https://docker.com)
[![License](https://img.shields.io/badge/License-MIT-yellow)](LICENSE)
---
## 1. Environment Overview
This environment simulates a **real customer support helpdesk** where an AI agent must:
- Read incoming customer tickets with varying complexity
- Understand customer sentiment (neutral β†’ frustrated β†’ angry)
- Apply company policies (refund, shipping, escalation)
- Craft professional, empathetic, and accurate responses
- Resolve issues within a limited number of steps
The agent interacts using the standard **OpenEnv API**: `reset()`, `step()`, and `state()`.
---
## 2. Real-World Use Case
Customer support is one of the most common AI deployment targets. This environment captures realistic challenges:
| Challenge | How It's Simulated |
|---|---|
| **Tone matching** | Grader evaluates empathy, professionalism, and harmful language |
| **Policy reasoning** | Agent must apply correct refund/shipping/escalation policies |
| **Multi-turn dialogue** | Customers send follow-up messages that depend on agent's response quality |
| **Escalation handling** | Hard tasks require knowing when and how to escalate |
| **Angry customers** | Sentiment ranges from neutral to furious, requiring different strategies |
---
## 3. Action Space
The agent sends a `SupportAction` with:
```python
class SupportAction(BaseModel):
response_text: str # Agent's response to the customer (1-2000 chars)
action_type: str # "respond" | "escalate" | "resolve" | "request_info"
internal_notes: str # Optional internal notes (not visible to customer)
```
| Action Type | Effect |
|---|---|
| `respond` | Continue the conversation |
| `resolve` | Mark ticket as resolved (ends episode) |
| `escalate` | Escalate to senior support |
| `request_info` | Ask customer for more information |
---
## 4. Observation Space
After each step, the agent receives a `SupportObservation`:
```python
class SupportObservation(BaseModel):
ticket: TicketInfo # Ticket metadata (ID, category, priority, customer info)
conversation_history: list # Full message history
current_message: str # Latest customer message to respond to
policy_context: str # Relevant company policies
task_id: str # Current task identifier
difficulty: str # "easy" | "medium" | "hard"
max_steps: int # Maximum steps allowed
steps_remaining: int # Steps left before timeout
done: bool # Whether episode is complete
reward: float # Cumulative reward so far
```
---
## 5. Reward Design
The reward function uses a **dense, multi-axis scoring system**:
### Scoring Axes
| Axis | Weight (varies by task) | What It Measures |
|---|---|---|
| **Correctness** | 0.30-0.35 | Keyword/concept matching against expected response elements |
| **Tone** | 0.30-0.40 | Professional, empathetic language vs. harmful/rude signals |
| **Completeness** | 0.30-0.40 | Checklist of required response components |
### Reward Breakdown Example
```
+0.30 β†’ Correctly identifies the issue (correctness)
+0.30 β†’ Professional and empathetic tone (tone)
+0.40 β†’ Addresses all required elements (completeness)
─────
1.00 β†’ Perfect score
```
### Penalties (deducted from total)
| Penalty | Deduction | Trigger |
|---|---|---|
| Empty response | -0.30 | < 5 words |
| Repeated response | -0.15 to -0.30 | Copy-paste from previous |
| Harmful language | -0.50 | Offensive or inappropriate content |
| Irrelevant content | -0.40 | Off-topic responses |
---
## 6. Task Descriptions
### Task 1: Simple FAQ (Easy)
- **Ticket:** "Where is my order?"
- **Customer:** Sarah Johnson (Neutral sentiment)
- **Expected:** Reference order ID, explain shipping timeframe (5-7 business days), mention tracking email
- **Max Steps:** 3
- **Policy Context:** Shipping policy
### Task 2: Conditional Refund (Medium)
- **Ticket:** "Refund for opened laptop bag with defective stitching"
- **Customer:** Michael Chen (Frustrated sentiment)
- **Expected:** Identify as manufacturing defect, offer full refund + replacement option, explain return process
- **Max Steps:** 5
- **Policy Context:** Refund policy + Return policy
- **Follow-ups:** Customer provides photos, asks about timeline
### Task 3: Complex Complaint Escalation (Hard)
- **Ticket:** "Wrong item, late delivery, rude staff"
- **Customer:** David Martinez (Angry sentiment)
- **Expected:** Address ALL three issues, offer refund + compensation, escalate to manager, provide written confirmation
- **Max Steps:** 7
- **Policy Context:** All policies (refund, return, shipping, escalation)
- **Follow-ups:** Threats to file complaints, demands for specifics, requests for written confirmation
---
## 7. Setup Instructions
### Prerequisites
- Python 3.10+
- Docker (optional, for containerized deployment)
### Local Setup
```bash
# Clone the repository
git clone https://github.com/MathiyazhaganNTL/openenv_scaler.git
cd openenv
# Install dependencies
pip install -r requirements.txt
# Run validation
python validate.py
```
### Environment Variables (for inference)
```bash
cp .env.example .env
# Edit .env with your API keys
```
| Variable | Default | Description |
|---|---|---|
| `API_BASE_URL` | `https://api.openai.com/v1` | LLM API endpoint |
| `MODEL_NAME` | `gpt-3.5-turbo` | Model to use |
| `OPENAI_API_KEY` | β€” | API key |
| `HF_TOKEN` | β€” | Alternative: HF token |
| `ENV_BASE_URL` | `http://localhost:8000` | Environment server URL |
---
## 8. Run Instructions
### Start the Environment Server
```bash
# Direct
python -m server.app
# Or with uvicorn
uvicorn server.app:app --host 0.0.0.0 --port 8000
# Or with Docker
docker build -t customer-support-env .
docker run -p 8000:8000 customer-support-env
```
### Run Baseline Inference
```bash
# Start the server first (in another terminal)
uvicorn server.app:app --host 0.0.0.0 --port 8000
# Run inference
python inference.py
```
### API Usage Examples
```bash
# Health check
curl http://localhost:8000/health
# List tasks
curl http://localhost:8000/tasks
# Reset environment
curl -X POST http://localhost:8000/reset \
-H "Content-Type: application/json" \
-d '{"task_id": "easy_faq"}'
# Step
curl -X POST http://localhost:8000/step \
-H "Content-Type: application/json" \
-d '{"action": {"response_text": "Thank you for reaching out!", "action_type": "respond"}}'
# Get state
curl http://localhost:8000/state
```
### Python Client Usage
```python
from server.environment import CustomerSupportEnvironment
from models import SupportAction
env = CustomerSupportEnvironment()
# Reset to a task
obs = env.reset(task_id="easy_faq")
print(obs.current_message) # Customer's first message
# Respond
action = SupportAction(
response_text="Hi Sarah! Your order ORD-55821 ships in 5-7 business days...",
action_type="respond",
)
obs, reward, done, info = env.step(action)
print(f"Reward: {reward:.4f}")
print(f"Score breakdown: {info['reward_breakdown']}")
```
---
## 9. Baseline Results
Running the baseline inference with `gpt-3.5-turbo`:
| Task | Difficulty | Avg Reward | Steps |
|---|---|---|---|
| `easy_faq` | Easy | ~0.65 | 1–2 |
| `medium_refund` | Medium | ~0.55 | 3–4 |
| `hard_escalation` | Hard | ~0.45 | 4–6 |
| **Overall** | β€” | **~0.55** | β€” |
> Scores vary based on model quality. Better models achieve higher scores by producing more empathetic, accurate, and complete responses.
---
## Project Structure
```
openenv/
β”œβ”€β”€ openenv.yaml # OpenEnv manifest (metadata, tasks, config)
β”œβ”€β”€ models.py # Pydantic models (Action, Observation, State, Reward)
β”œβ”€β”€ tasks.py # Task definitions (3 tasks, rubrics, policies)
β”œβ”€β”€ grader.py # Deterministic grading engine
β”œβ”€β”€ inference.py # Baseline LLM inference script
β”œβ”€β”€ validate.py # Environment validation script
β”œβ”€β”€ requirements.txt # Python dependencies
β”œβ”€β”€ pyproject.toml # Project configuration
β”œβ”€β”€ Dockerfile # Docker container definition
β”œβ”€β”€ .dockerignore # Docker build exclusions
β”œβ”€β”€ .env.example # Environment variable template
β”œβ”€β”€ .gitignore # Git ignore rules
β”œβ”€β”€ README.md # This file
└── server/
β”œβ”€β”€ __init__.py
β”œβ”€β”€ environment.py # Core environment (reset/step/state)
└── app.py # FastAPI HTTP server
```
---
## HuggingFace Spaces Deployment
This environment is designed for deployment as a **Docker-based HuggingFace Space**:
1. Create a new Space with **Docker SDK**
2. Push the code to the Space repository
3. The Space will auto-build and expose the API at port 8000
4. Tag the Space with `openenv`
```bash
# Using openenv CLI
openenv push --repo-id mathi3046/customer-support-env
```
The API endpoint `POST /reset` will respond with HTTP 200, confirming the Space is operational.
---
## License
MIT License. See [LICENSE](LICENSE) for details.