--- title: Customer Support Env emoji: 🎧 colorFrom: blue colorTo: purple sdk: docker app_port: 7860 tags: - openenv pinned: false --- # 🎧 AI-Powered Customer Support Ticket Resolution Environment > **An OpenEnv-compatible environment for training AI agents to handle real-world customer support scenarios — from simple FAQs to complex, multi-step escalations with angry customers.** [![OpenEnv](https://img.shields.io/badge/OpenEnv-Compatible-blue)](https://github.com/meta-pytorch/OpenEnv) [![Python](https://img.shields.io/badge/Python-3.10%2B-green)](https://python.org) [![Docker](https://img.shields.io/badge/Docker-Ready-blue)](https://docker.com) [![License](https://img.shields.io/badge/License-MIT-yellow)](LICENSE) --- ## 1. Environment Overview This environment simulates a **real customer support helpdesk** where an AI agent must: - Read incoming customer tickets with varying complexity - Understand customer sentiment (neutral → frustrated → angry) - Apply company policies (refund, shipping, escalation) - Craft professional, empathetic, and accurate responses - Resolve issues within a limited number of steps The agent interacts using the standard **OpenEnv API**: `reset()`, `step()`, and `state()`. --- ## 2. Real-World Use Case Customer support is one of the most common AI deployment targets. This environment captures realistic challenges: | Challenge | How It's Simulated | |---|---| | **Tone matching** | Grader evaluates empathy, professionalism, and harmful language | | **Policy reasoning** | Agent must apply correct refund/shipping/escalation policies | | **Multi-turn dialogue** | Customers send follow-up messages that depend on agent's response quality | | **Escalation handling** | Hard tasks require knowing when and how to escalate | | **Angry customers** | Sentiment ranges from neutral to furious, requiring different strategies | --- ## 3. Action Space The agent sends a `SupportAction` with: ```python class SupportAction(BaseModel): response_text: str # Agent's response to the customer (1-2000 chars) action_type: str # "respond" | "escalate" | "resolve" | "request_info" internal_notes: str # Optional internal notes (not visible to customer) ``` | Action Type | Effect | |---|---| | `respond` | Continue the conversation | | `resolve` | Mark ticket as resolved (ends episode) | | `escalate` | Escalate to senior support | | `request_info` | Ask customer for more information | --- ## 4. Observation Space After each step, the agent receives a `SupportObservation`: ```python class SupportObservation(BaseModel): ticket: TicketInfo # Ticket metadata (ID, category, priority, customer info) conversation_history: list # Full message history current_message: str # Latest customer message to respond to policy_context: str # Relevant company policies task_id: str # Current task identifier difficulty: str # "easy" | "medium" | "hard" max_steps: int # Maximum steps allowed steps_remaining: int # Steps left before timeout done: bool # Whether episode is complete reward: float # Cumulative reward so far ``` --- ## 5. Reward Design The reward function uses a **dense, multi-axis scoring system**: ### Scoring Axes | Axis | Weight (varies by task) | What It Measures | |---|---|---| | **Correctness** | 0.30-0.35 | Keyword/concept matching against expected response elements | | **Tone** | 0.30-0.40 | Professional, empathetic language vs. harmful/rude signals | | **Completeness** | 0.30-0.40 | Checklist of required response components | ### Reward Breakdown Example ``` +0.30 → Correctly identifies the issue (correctness) +0.30 → Professional and empathetic tone (tone) +0.40 → Addresses all required elements (completeness) ───── 1.00 → Perfect score ``` ### Penalties (deducted from total) | Penalty | Deduction | Trigger | |---|---|---| | Empty response | -0.30 | < 5 words | | Repeated response | -0.15 to -0.30 | Copy-paste from previous | | Harmful language | -0.50 | Offensive or inappropriate content | | Irrelevant content | -0.40 | Off-topic responses | --- ## 6. Task Descriptions ### Task 1: Simple FAQ (Easy) - **Ticket:** "Where is my order?" - **Customer:** Sarah Johnson (Neutral sentiment) - **Expected:** Reference order ID, explain shipping timeframe (5-7 business days), mention tracking email - **Max Steps:** 3 - **Policy Context:** Shipping policy ### Task 2: Conditional Refund (Medium) - **Ticket:** "Refund for opened laptop bag with defective stitching" - **Customer:** Michael Chen (Frustrated sentiment) - **Expected:** Identify as manufacturing defect, offer full refund + replacement option, explain return process - **Max Steps:** 5 - **Policy Context:** Refund policy + Return policy - **Follow-ups:** Customer provides photos, asks about timeline ### Task 3: Complex Complaint Escalation (Hard) - **Ticket:** "Wrong item, late delivery, rude staff" - **Customer:** David Martinez (Angry sentiment) - **Expected:** Address ALL three issues, offer refund + compensation, escalate to manager, provide written confirmation - **Max Steps:** 7 - **Policy Context:** All policies (refund, return, shipping, escalation) - **Follow-ups:** Threats to file complaints, demands for specifics, requests for written confirmation --- ## 7. Setup Instructions ### Prerequisites - Python 3.10+ - Docker (optional, for containerized deployment) ### Local Setup ```bash # Clone the repository git clone https://github.com/MathiyazhaganNTL/openenv_scaler.git cd openenv # Install dependencies pip install -r requirements.txt # Run validation python validate.py ``` ### Environment Variables (for inference) ```bash cp .env.example .env # Edit .env with your API keys ``` | Variable | Default | Description | |---|---|---| | `API_BASE_URL` | `https://api.openai.com/v1` | LLM API endpoint | | `MODEL_NAME` | `gpt-3.5-turbo` | Model to use | | `OPENAI_API_KEY` | — | API key | | `HF_TOKEN` | — | Alternative: HF token | | `ENV_BASE_URL` | `http://localhost:8000` | Environment server URL | --- ## 8. Run Instructions ### Start the Environment Server ```bash # Direct python -m server.app # Or with uvicorn uvicorn server.app:app --host 0.0.0.0 --port 8000 # Or with Docker docker build -t customer-support-env . docker run -p 8000:8000 customer-support-env ``` ### Run Baseline Inference ```bash # Start the server first (in another terminal) uvicorn server.app:app --host 0.0.0.0 --port 8000 # Run inference python inference.py ``` ### API Usage Examples ```bash # Health check curl http://localhost:8000/health # List tasks curl http://localhost:8000/tasks # Reset environment curl -X POST http://localhost:8000/reset \ -H "Content-Type: application/json" \ -d '{"task_id": "easy_faq"}' # Step curl -X POST http://localhost:8000/step \ -H "Content-Type: application/json" \ -d '{"action": {"response_text": "Thank you for reaching out!", "action_type": "respond"}}' # Get state curl http://localhost:8000/state ``` ### Python Client Usage ```python from server.environment import CustomerSupportEnvironment from models import SupportAction env = CustomerSupportEnvironment() # Reset to a task obs = env.reset(task_id="easy_faq") print(obs.current_message) # Customer's first message # Respond action = SupportAction( response_text="Hi Sarah! Your order ORD-55821 ships in 5-7 business days...", action_type="respond", ) obs, reward, done, info = env.step(action) print(f"Reward: {reward:.4f}") print(f"Score breakdown: {info['reward_breakdown']}") ``` --- ## 9. Baseline Results Running the baseline inference with `gpt-3.5-turbo`: | Task | Difficulty | Avg Reward | Steps | |---|---|---|---| | `easy_faq` | Easy | ~0.65 | 1–2 | | `medium_refund` | Medium | ~0.55 | 3–4 | | `hard_escalation` | Hard | ~0.45 | 4–6 | | **Overall** | — | **~0.55** | — | > Scores vary based on model quality. Better models achieve higher scores by producing more empathetic, accurate, and complete responses. --- ## Project Structure ``` openenv/ ├── openenv.yaml # OpenEnv manifest (metadata, tasks, config) ├── models.py # Pydantic models (Action, Observation, State, Reward) ├── tasks.py # Task definitions (3 tasks, rubrics, policies) ├── grader.py # Deterministic grading engine ├── inference.py # Baseline LLM inference script ├── validate.py # Environment validation script ├── requirements.txt # Python dependencies ├── pyproject.toml # Project configuration ├── Dockerfile # Docker container definition ├── .dockerignore # Docker build exclusions ├── .env.example # Environment variable template ├── .gitignore # Git ignore rules ├── README.md # This file └── server/ ├── __init__.py ├── environment.py # Core environment (reset/step/state) └── app.py # FastAPI HTTP server ``` --- ## HuggingFace Spaces Deployment This environment is designed for deployment as a **Docker-based HuggingFace Space**: 1. Create a new Space with **Docker SDK** 2. Push the code to the Space repository 3. The Space will auto-build and expose the API at port 8000 4. Tag the Space with `openenv` ```bash # Using openenv CLI openenv push --repo-id mathi3046/customer-support-env ``` The API endpoint `POST /reset` will respond with HTTP 200, confirming the Space is operational. --- ## License MIT License. See [LICENSE](LICENSE) for details.