my-env / README.md
exploring-solver's picture
Submission changes
6070db1
|
Raw
History Blame Contribute Delete
4.56 kB
metadata
title: SupportEnv
emoji: 🎫
colorFrom: blue
colorTo: indigo
sdk: docker
app_port: 7860
tags:
  - openenv
  - customer-support
  - nlp
  - ticket-triage
  - agent-evaluation
pinned: false

SupportEnv

SupportEnv is an OpenEnv-compliant environment for evaluating LLM agents on customer support ticket triage. Each episode presents a realistic support ticket and asks the agent to classify, extract, or resolve it β€” scored deterministically against ground-truth labels.

Tasks

Task Difficulty Action Max Steps
Task 1 β€” Ticket Classification Easy classify 3
Task 2 β€” Information Extraction Medium extract 5
Task 3 β€” Resolution Generation Hard respond 8

Task 1 β€” Ticket Classification (Easy)
Assign a category (billing / technical / account / feature_request / complaint / general) and priority (low / medium / high / critical) to each ticket.

Task 2 β€” Information Extraction (Medium)
Extract structured entities (IDs, names, amounts, dates) and identify the list of required resolution actions.

Task 3 β€” Resolution Generation (Hard)
Write a professional customer-facing response and an ordered list of internal resolution steps. Graded on keyword coverage, step completeness, tone adherence, and minimum length.

Observation Space

Each observation includes:

  • task_id, task_description, episode_id
  • ticket object with ticket_id, subject, body, customer_tier, account_age_days, previous_tickets, attachments
  • thread_history as ordered action summaries
  • available_actions for the current task state
  • step_number, max_steps
  • hint (optional guidance)

Action Space

Supported action.action_type values:

  • classify: requires category and priority
  • extract: requires extracted_entities and required_actions
  • respond: requires response_text and resolution_steps
  • submit: closes the episode and triggers terminal grading

API

Method Endpoint Description
POST /reset Start a new episode
POST /step Submit an action
GET /state Get current episode state
POST /grader Grade a finished episode
GET /tasks List all tasks
GET /health Liveness check
GET /docs OpenAPI docs

Reset

POST /reset
{"task_id": "task1", "ticket_index": 0}

Step β€” Task 1 (classify)

POST /step
{
  "episode_id": "<id>",
  "action": {"action_type": "classify", "category": "billing", "priority": "high"}
}

Step β€” Task 2 (extract)

POST /step
{
  "episode_id": "<id>",
  "action": {
    "action_type": "extract",
    "extracted_entities": {"customer_name": "Alice", "invoice_number": "INV-001"},
    "required_actions": ["issue_refund", "send_corrected_invoice"]
  }
}

Step β€” Task 3 (respond)

POST /step
{
  "episode_id": "<id>",
  "action": {
    "action_type": "respond",
    "response_text": "Dear customer, we sincerely apologize...",
    "resolution_steps": ["verify_account", "issue_refund", "send_confirmation"]
  }
}

Submit

POST /step
{"episode_id": "<id>", "action": {"action_type": "submit"}}

Scoring

Task 1: category match (0.50) + priority match (0.40) + efficiency (0.10)

Task 2: entity coverage (0.60) + action coverage (0.30) + no hallucination (0.10)

Task 3: keyword coverage (0.30) + step coverage (0.30) + tone compliance (0.25) + length adequate (0.10) + non-empty steps (0.05)

Running Locally

pip install -r requirements.txt
uvicorn app:app --host 0.0.0.0 --port 7860

Running the Baseline Agent

export API_BASE_URL=https://router.huggingface.co/v1
export HF_TOKEN=your_token_here
export MODEL_NAME=Qwen/Qwen2.5-72B-Instruct
python inference.py

Required environment variables for baseline LLM calls:

  • API_BASE_URL (default provided in code)
  • MODEL_NAME (default provided in code)
  • HF_TOKEN (must be provided)

Environment endpoint variables for the baseline:

  • OPENENV_BASE_URL (preferred, default http://localhost:7860)
  • API_BASE_URL_ENV (backward-compatible alias)

The baseline emits strict structured stdout lines only:

  • [START] task=<...> env=<...> model=<...>
  • [STEP] step=<...> action=<...> reward=<...> done=<...> error=<...>
  • [END] success=<...> steps=<...> rewards=<...>

Docker

docker build -t supportenv .
docker run -p 7860:7860 supportenv