my-env / README.md
exploring-solver's picture
Submission changes
6070db1
|
Raw
History Blame Contribute Delete
4.56 kB
---
title: SupportEnv
emoji: 🎫
colorFrom: blue
colorTo: indigo
sdk: docker
app_port: 7860
tags:
- openenv
- customer-support
- nlp
- ticket-triage
- agent-evaluation
pinned: false
---
# SupportEnv
SupportEnv is an OpenEnv-compliant environment for evaluating LLM agents on customer support ticket triage. Each episode presents a realistic support ticket and asks the agent to classify, extract, or resolve it β€” scored deterministically against ground-truth labels.
## Tasks
| Task | Difficulty | Action | Max Steps |
|------|-----------|--------|-----------|
| Task 1 β€” Ticket Classification | Easy | `classify` | 3 |
| Task 2 β€” Information Extraction | Medium | `extract` | 5 |
| Task 3 β€” Resolution Generation | Hard | `respond` | 8 |
**Task 1 β€” Ticket Classification (Easy)**
Assign a `category` (billing / technical / account / feature_request / complaint / general) and `priority` (low / medium / high / critical) to each ticket.
**Task 2 β€” Information Extraction (Medium)**
Extract structured entities (IDs, names, amounts, dates) and identify the list of required resolution actions.
**Task 3 β€” Resolution Generation (Hard)**
Write a professional customer-facing response and an ordered list of internal resolution steps. Graded on keyword coverage, step completeness, tone adherence, and minimum length.
## Observation Space
Each observation includes:
- `task_id`, `task_description`, `episode_id`
- `ticket` object with `ticket_id`, `subject`, `body`, `customer_tier`, `account_age_days`, `previous_tickets`, `attachments`
- `thread_history` as ordered action summaries
- `available_actions` for the current task state
- `step_number`, `max_steps`
- `hint` (optional guidance)
## Action Space
Supported `action.action_type` values:
- `classify`: requires `category` and `priority`
- `extract`: requires `extracted_entities` and `required_actions`
- `respond`: requires `response_text` and `resolution_steps`
- `submit`: closes the episode and triggers terminal grading
## API
| Method | Endpoint | Description |
|--------|----------|-------------|
| `POST` | `/reset` | Start a new episode |
| `POST` | `/step` | Submit an action |
| `GET` | `/state` | Get current episode state |
| `POST` | `/grader` | Grade a finished episode |
| `GET` | `/tasks` | List all tasks |
| `GET` | `/health` | Liveness check |
| `GET` | `/docs` | OpenAPI docs |
### Reset
```json
POST /reset
{"task_id": "task1", "ticket_index": 0}
```
### Step β€” Task 1 (classify)
```json
POST /step
{
"episode_id": "<id>",
"action": {"action_type": "classify", "category": "billing", "priority": "high"}
}
```
### Step β€” Task 2 (extract)
```json
POST /step
{
"episode_id": "<id>",
"action": {
"action_type": "extract",
"extracted_entities": {"customer_name": "Alice", "invoice_number": "INV-001"},
"required_actions": ["issue_refund", "send_corrected_invoice"]
}
}
```
### Step β€” Task 3 (respond)
```json
POST /step
{
"episode_id": "<id>",
"action": {
"action_type": "respond",
"response_text": "Dear customer, we sincerely apologize...",
"resolution_steps": ["verify_account", "issue_refund", "send_confirmation"]
}
}
```
### Submit
```json
POST /step
{"episode_id": "<id>", "action": {"action_type": "submit"}}
```
## Scoring
**Task 1:** category match (0.50) + priority match (0.40) + efficiency (0.10)
**Task 2:** entity coverage (0.60) + action coverage (0.30) + no hallucination (0.10)
**Task 3:** keyword coverage (0.30) + step coverage (0.30) + tone compliance (0.25) + length adequate (0.10) + non-empty steps (0.05)
## Running Locally
```bash
pip install -r requirements.txt
uvicorn app:app --host 0.0.0.0 --port 7860
```
## Running the Baseline Agent
```bash
export API_BASE_URL=https://router.huggingface.co/v1
export HF_TOKEN=your_token_here
export MODEL_NAME=Qwen/Qwen2.5-72B-Instruct
python inference.py
```
Required environment variables for baseline LLM calls:
- `API_BASE_URL` (default provided in code)
- `MODEL_NAME` (default provided in code)
- `HF_TOKEN` (must be provided)
Environment endpoint variables for the baseline:
- `OPENENV_BASE_URL` (preferred, default `http://localhost:7860`)
- `API_BASE_URL_ENV` (backward-compatible alias)
The baseline emits strict structured stdout lines only:
- `[START] task=<...> env=<...> model=<...>`
- `[STEP] step=<...> action=<...> reward=<...> done=<...> error=<...>`
- `[END] success=<...> steps=<...> rewards=<...>`
## Docker
```bash
docker build -t supportenv .
docker run -p 7860:7860 supportenv
```