Spaces:
Sleeping
Sleeping
File size: 6,954 Bytes
a3d65ce 3ebc5c6 a3d65ce 3d83a5d bdd9825 3d83a5d bdd9825 3ebc5c6 bdd9825 3ebc5c6 bdd9825 3ebc5c6 bdd9825 d771897 bdd9825 d771897 bdd9825 d771897 bdd9825 d771897 bdd9825 d771897 bdd9825 3ebc5c6 d771897 3d83a5d e531507 3d83a5d e531507 f45e3e0 e531507 3ebc5c6 bdd9825 3d83a5d bdd9825 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 | ---
title: Support Ticket Env
emoji: ๐ซ
colorFrom: blue
colorTo: green
sdk: docker
tags:
- openenv
pinned: false
---
# Customer Support Ticket Resolution Environment
> ๐ **OpenEnv x Scalar Hackathon** โ Theme **#3.1 Professional Tasks** | Sub-theme: **Scaler AI Labs โ Multi-App RL Environment for Enterprise Workflows**
A real-world [OpenEnv](https://github.com/meta-pytorch/OpenEnv) environment where an AI agent acts as a customer support executive, triaging and resolving incoming tickets โ simulating complex enterprise workflows, business rule nuances, and multi-step decision making under partial observability.
## Overview
Customer support triage is one of the most common real-world tasks for AI agents in enterprise settings. Every company handles thousands of tickets daily. Getting the classification wrong routes the ticket to the wrong team. Choosing the wrong action has direct business impact. This environment trains agents to handle exactly this challenge โ with real tool interaction, dynamic state, and a multi-step reward structure that resists shortcuts.
## Quick Start
```python
from support_ticket_env import SupportAction, SupportTicketEnv
with SupportTicketEnv(base_url="https://algocore-support-ticket-env.hf.space").sync() as env:
# Task 1 - Classify a ticket
result = env.reset(task_id=1, seed=42)
print(result.observation.ticket_text)
result = env.step(SupportAction(action_type="classify", category="billing"))
print(result.reward) # 1.0 if correct
```
## Tasks
| Task | Difficulty | Description | Score Range |
|------|-----------|-------------|-------------|
| Task 1 | Easy | Classify ticket into correct category | 0.0 - 1.0 |
| Task 2 | Medium | Classify then choose correct action | 0.0 - 1.0 |
| Task 3 | Hard | Resolve a full queue of 3 tickets | 0.0 - 1.0 |
## Action Space
Actions are `SupportAction` Pydantic objects:
| Field | Type | Required | Values |
|-------|------|----------|--------|
| `action_type` | str | always | `classify` / `reply` / `escalate` / `close` |
| `category` | str | for classify | `billing` / `technical` / `account` / `general` / `refund` |
| `reply_text` | str | for reply | free text |
| `reason` | str | optional | free text |
## Observation Space
| Field | Type | Description |
|-------|------|-------------|
| `ticket_id` | str | Unique ticket ID |
| `ticket_text` | str | Customer message |
| `task_id` | int | 1, 2, or 3 |
| `current_category` | str | Category assigned so far |
| `resolved` | bool | Whether ticket is resolved |
| `step_count` | int | Steps taken this episode |
| `feedback` | str | Human-readable feedback |
| `reward` | float | Reward signal |
| `done` | bool | Episode finished |
## Reward Function
Rewards provide partial progress signals throughout the trajectory:
- **Task 1:** 1.0 for correct category, 0.0 for wrong
- **Task 2:** 1.0 correct action, 0.5 defensible alternative, 0.3 classification only
- **Task 3:** 0.20 classification + 0.40 action + 0.25 reply quality + 0.15 efficiency bonus
- **Penalty:** -0.05 per step over 10 (loop deterrent)
## Project Structure
```
support_ticket_env/
โโโ __init__.py # Package exports
โโโ models.py # SupportAction, SupportObservation, SupportState
โโโ tickets.py # Ticket dataset with ground-truth labels
โโโ graders.py # Reward/grader functions for all 3 tasks
โโโ client.py # EnvClient subclass
โโโ baseline.py # Baseline inference script
โโโ get_baseline.py # Fetch & save baseline results
โโโ gradio_ui.py # Interactive Gradio playground UI
โโโ make_chart.py # Plot training reward curves
โโโ plot_results.py # Visualise evaluation results
โโโ grpo_results.png # GRPO training results chart
โโโ reward_chart.png # Reward curve chart
โโโ openenv.yaml # Environment metadata
โโโ Dockerfile # Container definition
โโโ train_sft.ipynb # Step 1: SFT pre-training notebook
โโโ train_grpo.ipynb # Step 2: GRPO fine-tuning notebook
โโโ server/
โโโ app.py # FastAPI entry point (+ Gradio UI mounted at /playground)
โโโ support_environment.py # Environment logic
โโโ requirements.txt # Server dependencies
```
## Setup
```bash
# Install dependencies
pip install openenv-core fastapi uvicorn pydantic gradio openai pyyaml
# Run locally
uvicorn support_ticket_env.server.app:app --host 0.0.0.0 --port 7860
# Or via Docker
docker build -t support-ticket-env .
docker run -p 7860:7860 support-ticket-env
# Run tests
python run_tests.py
```
> ๐ฎ **Playground UI** available at `http://localhost:7860/playground` once the server is running.
## ๐ Training Results (GRPO) โ Evidence of Improvement
Fine-tuned `Qwen2.5-0.5B-Instruct` using **2-stage training** (SFT pre-training โ GRPO) via HuggingFace TRL over **700+ steps** on the live environment API:

| Task | Before GRPO | After GRPO | Improvement |
|------|-------------|------------|-------------|
| Task 1 - Classification | 0.67 | **1.00** | +49% ๐ |
| Task 2 - Action Selection | 0.12 | **0.48** | +300% ๐ |
| Task 3 - Full Resolution | 0.08 | **0.23** | +187% ๐ |
| **Overall** | **0.29** | **0.57** | **+96% ๐** |
## Baseline Scores
Measured with `gpt-4o-mini`, seeds `[42, 7, 123]`:
| Task | Avg Score |
|------|-----------|
| Task 1 - Classification | 0.87 |
| Task 2 - Action Selection | 0.71 |
| Task 3 - Full Resolution | 0.58 |
| **Overall** | **0.72** |
## ๐ฏ Why This Fits Theme 3.1 โ Professional Tasks
> *"Real interaction with tools, APIs, or dynamic systems where the model does real hard work instead of exploiting shortcuts"*
- โ
**Live FastAPI environment** โ agent interacts with a real stateful API, not a simulation
- โ
**No shortcut exploitation** โ reward function penalises loops (-0.05/step over 10), forces genuine reasoning
- โ
**Persistent world state** โ ticket queue, classification state, and resolution state tracked across steps
- โ
**Multi-step causal reasoning** โ classify โ choose action โ craft reply โ resolve, all causally linked
- โ
**Enterprise workflow complexity** โ billing, technical, account, general, refund categories with real business rules
- โ
**Scaler AI Labs sub-theme** โ demonstrates complex enterprise workflows and business rule nuances in an RL environment
## Links
- **HuggingFace Space:** https://huggingface.co/spaces/AlgoCore/support-ticket-env
- **GitHub:** https://github.com/TryingHardToBeDeveloper/support-ticket-env
- **OpenEnv Docs:** https://meta-pytorch.org/OpenEnv/
## License
MIT
|