Spaces:
Sleeping
title: Customer Support Env
emoji: π§
colorFrom: blue
colorTo: purple
sdk: docker
app_port: 7860
tags:
- openenv
pinned: false
π§ AI-Powered Customer Support Ticket Resolution Environment
An OpenEnv-compatible environment for training AI agents to handle real-world customer support scenarios β from simple FAQs to complex, multi-step escalations with angry customers.
1. Environment Overview
This environment simulates a real customer support helpdesk where an AI agent must:
- Read incoming customer tickets with varying complexity
- Understand customer sentiment (neutral β frustrated β angry)
- Apply company policies (refund, shipping, escalation)
- Craft professional, empathetic, and accurate responses
- Resolve issues within a limited number of steps
The agent interacts using the standard OpenEnv API: reset(), step(), and state().
2. Real-World Use Case
Customer support is one of the most common AI deployment targets. This environment captures realistic challenges:
| Challenge | How It's Simulated |
|---|---|
| Tone matching | Grader evaluates empathy, professionalism, and harmful language |
| Policy reasoning | Agent must apply correct refund/shipping/escalation policies |
| Multi-turn dialogue | Customers send follow-up messages that depend on agent's response quality |
| Escalation handling | Hard tasks require knowing when and how to escalate |
| Angry customers | Sentiment ranges from neutral to furious, requiring different strategies |
3. Action Space
The agent sends a SupportAction with:
class SupportAction(BaseModel):
response_text: str # Agent's response to the customer (1-2000 chars)
action_type: str # "respond" | "escalate" | "resolve" | "request_info"
internal_notes: str # Optional internal notes (not visible to customer)
| Action Type | Effect |
|---|---|
respond |
Continue the conversation |
resolve |
Mark ticket as resolved (ends episode) |
escalate |
Escalate to senior support |
request_info |
Ask customer for more information |
4. Observation Space
After each step, the agent receives a SupportObservation:
class SupportObservation(BaseModel):
ticket: TicketInfo # Ticket metadata (ID, category, priority, customer info)
conversation_history: list # Full message history
current_message: str # Latest customer message to respond to
policy_context: str # Relevant company policies
task_id: str # Current task identifier
difficulty: str # "easy" | "medium" | "hard"
max_steps: int # Maximum steps allowed
steps_remaining: int # Steps left before timeout
done: bool # Whether episode is complete
reward: float # Cumulative reward so far
5. Reward Design
The reward function uses a dense, multi-axis scoring system:
Scoring Axes
| Axis | Weight (varies by task) | What It Measures |
|---|---|---|
| Correctness | 0.30-0.35 | Keyword/concept matching against expected response elements |
| Tone | 0.30-0.40 | Professional, empathetic language vs. harmful/rude signals |
| Completeness | 0.30-0.40 | Checklist of required response components |
Reward Breakdown Example
+0.30 β Correctly identifies the issue (correctness)
+0.30 β Professional and empathetic tone (tone)
+0.40 β Addresses all required elements (completeness)
βββββ
1.00 β Perfect score
Penalties (deducted from total)
| Penalty | Deduction | Trigger |
|---|---|---|
| Empty response | -0.30 | < 5 words |
| Repeated response | -0.15 to -0.30 | Copy-paste from previous |
| Harmful language | -0.50 | Offensive or inappropriate content |
| Irrelevant content | -0.40 | Off-topic responses |
6. Task Descriptions
Task 1: Simple FAQ (Easy)
- Ticket: "Where is my order?"
- Customer: Sarah Johnson (Neutral sentiment)
- Expected: Reference order ID, explain shipping timeframe (5-7 business days), mention tracking email
- Max Steps: 3
- Policy Context: Shipping policy
Task 2: Conditional Refund (Medium)
- Ticket: "Refund for opened laptop bag with defective stitching"
- Customer: Michael Chen (Frustrated sentiment)
- Expected: Identify as manufacturing defect, offer full refund + replacement option, explain return process
- Max Steps: 5
- Policy Context: Refund policy + Return policy
- Follow-ups: Customer provides photos, asks about timeline
Task 3: Complex Complaint Escalation (Hard)
- Ticket: "Wrong item, late delivery, rude staff"
- Customer: David Martinez (Angry sentiment)
- Expected: Address ALL three issues, offer refund + compensation, escalate to manager, provide written confirmation
- Max Steps: 7
- Policy Context: All policies (refund, return, shipping, escalation)
- Follow-ups: Threats to file complaints, demands for specifics, requests for written confirmation
7. Setup Instructions
Prerequisites
- Python 3.10+
- Docker (optional, for containerized deployment)
Local Setup
# Clone the repository
git clone https://github.com/MathiyazhaganNTL/openenv_scaler.git
cd openenv
# Install dependencies
pip install -r requirements.txt
# Run validation
python validate.py
Environment Variables (for inference)
cp .env.example .env
# Edit .env with your API keys
| Variable | Default | Description |
|---|---|---|
API_BASE_URL |
https://api.openai.com/v1 |
LLM API endpoint |
MODEL_NAME |
gpt-3.5-turbo |
Model to use |
OPENAI_API_KEY |
β | API key |
HF_TOKEN |
β | Alternative: HF token |
ENV_BASE_URL |
http://localhost:8000 |
Environment server URL |
8. Run Instructions
Start the Environment Server
# Direct
python -m server.app
# Or with uvicorn
uvicorn server.app:app --host 0.0.0.0 --port 8000
# Or with Docker
docker build -t customer-support-env .
docker run -p 8000:8000 customer-support-env
Run Baseline Inference
# Start the server first (in another terminal)
uvicorn server.app:app --host 0.0.0.0 --port 8000
# Run inference
python inference.py
API Usage Examples
# Health check
curl http://localhost:8000/health
# List tasks
curl http://localhost:8000/tasks
# Reset environment
curl -X POST http://localhost:8000/reset \
-H "Content-Type: application/json" \
-d '{"task_id": "easy_faq"}'
# Step
curl -X POST http://localhost:8000/step \
-H "Content-Type: application/json" \
-d '{"action": {"response_text": "Thank you for reaching out!", "action_type": "respond"}}'
# Get state
curl http://localhost:8000/state
Python Client Usage
from server.environment import CustomerSupportEnvironment
from models import SupportAction
env = CustomerSupportEnvironment()
# Reset to a task
obs = env.reset(task_id="easy_faq")
print(obs.current_message) # Customer's first message
# Respond
action = SupportAction(
response_text="Hi Sarah! Your order ORD-55821 ships in 5-7 business days...",
action_type="respond",
)
obs, reward, done, info = env.step(action)
print(f"Reward: {reward:.4f}")
print(f"Score breakdown: {info['reward_breakdown']}")
9. Baseline Results
Running the baseline inference with gpt-3.5-turbo:
| Task | Difficulty | Avg Reward | Steps |
|---|---|---|---|
easy_faq |
Easy | ~0.65 | 1β2 |
medium_refund |
Medium | ~0.55 | 3β4 |
hard_escalation |
Hard | ~0.45 | 4β6 |
| Overall | β | ~0.55 | β |
Scores vary based on model quality. Better models achieve higher scores by producing more empathetic, accurate, and complete responses.
Project Structure
openenv/
βββ openenv.yaml # OpenEnv manifest (metadata, tasks, config)
βββ models.py # Pydantic models (Action, Observation, State, Reward)
βββ tasks.py # Task definitions (3 tasks, rubrics, policies)
βββ grader.py # Deterministic grading engine
βββ inference.py # Baseline LLM inference script
βββ validate.py # Environment validation script
βββ requirements.txt # Python dependencies
βββ pyproject.toml # Project configuration
βββ Dockerfile # Docker container definition
βββ .dockerignore # Docker build exclusions
βββ .env.example # Environment variable template
βββ .gitignore # Git ignore rules
βββ README.md # This file
βββ server/
βββ __init__.py
βββ environment.py # Core environment (reset/step/state)
βββ app.py # FastAPI HTTP server
HuggingFace Spaces Deployment
This environment is designed for deployment as a Docker-based HuggingFace Space:
- Create a new Space with Docker SDK
- Push the code to the Space repository
- The Space will auto-build and expose the API at port 8000
- Tag the Space with
openenv
# Using openenv CLI
openenv push --repo-id mathi3046/customer-support-env
The API endpoint POST /reset will respond with HTTP 200, confirming the Space is operational.
License
MIT License. See LICENSE for details.