Support Triage OpenEnv

A real-world OpenEnv environment where an agent performs customer support triage: prioritization, routing, tagging, information gathering, and response drafting.

This project is designed for Round 1 style hackathon evaluation:

  • Full typed OpenEnv models
  • reset() / step() / state() API
  • 3 deterministic graded tasks (easy/medium/hard)
  • Dense reward shaping with partial progress
  • Baseline inference.py using OpenAI client and required env vars
  • Docker + Hugging Face Spaces deployment files

Why This Environment Has Real Utility

Teams actually do this workflow in support operations and trust/safety queues. This environment evaluates whether an agent can:

  • classify urgency
  • route to the right team
  • attach relevant operational tags
  • ask for required evidence
  • draft safe and useful customer responses
  • close only when resolution criteria are met

Module-Aligned Build Guide (From Your Course)

Module 1: Why OpenEnv?

  • We treat the environment as a service with typed contracts.
  • Core loop follows RL structure: observe -> act -> reward.

Module 2: Using Existing Environments

  • support_triage_env/models.py defines typed Action, Observation, State.
  • support_triage_env/client.py gives a reusable typed client.

Module 3: Deploying Environments

  • server/app.py is the OpenEnv validator-compatible entrypoint (main() + callable script).
  • server/Dockerfile provides reproducible container runtime.
  • openenv.yaml defines deployment metadata.

Module 4: Building Your Own Environment

  • support_triage_env/server/environment.py implements task simulation.
  • support_triage_env/tasks.py defines deterministic fixtures.
  • support_triage_env/graders.py implements 0.0-1.0 grading.

Module 5: Training with OpenEnv + Reward Signals

  • Reward shaping is dense and trajectory-aware.
  • inference.py runs model-based episodes and exports reproducible baseline scores.

Action Space

Action model: SupportTriageAction

set_priority(value)
route_team(value)
add_tag(value)
draft_reply(value)
request_info(value)
close_ticket()
noop()

Valid priorities: low | medium | high | urgent

Valid teams: billing | technical | account | trust_safety | shipping

Observation Space

Observation model: SupportTriageObservation

Key fields:

  • task_id, difficulty, objective
  • title, customer_tier, customer_message
  • current working state: priority, routed_team, tags, draft_reply, info_requested
  • steps_remaining, last_feedback, allowed_actions
  • inherited reward, done

State Space

State model: SupportTriageState

Contains episode metadata and full workflow state:

  • episode_id, step_count
  • task_id, difficulty, objective, max_steps
  • priority, routed_team, tags
  • info_requested, closed, close_valid
  • history

Tasks and Graders

Easy: easy_password_reset

  • Scenario: login token failure after password reset
  • Expected routing: account
  • Expected priority: medium
  • Required tags: password-reset, login

Medium: medium_double_charge

  • Scenario: premium customer charged twice
  • Expected routing: billing
  • Expected priority: high
  • Required tags: refund, double-charge, vip
  • Needs additional evidence request

Hard: hard_account_takeover

  • Scenario: possible account takeover + fraud + abusive content
  • Expected routing: trust_safety
  • Expected priority: urgent
  • Required tags: security, account-takeover, fraud, content-abuse
  • Needs security-safe communication and evidence collection

Grading Design

support_triage_env/graders.py computes deterministic component scores:

  • priority correctness
  • routing correctness
  • required tags coverage
  • reply quality (required/forbidden phrase logic)
  • process quality (info request + closure quality + efficiency)

Final score is normalized to [0.0, 1.0].

Reward Function

The environment provides dense rewards at each step:

  • positive reward for correct priority/routing/tagging
  • incremental reward for improving draft response quality
  • positive signal for meaningful information requests when required
  • strong bonus for valid close
  • penalties for invalid actions, repeated loops, no-op behavior, or premature close
  • small per-step cost to discourage inefficient trajectories

Windows Setup

py -3.11 -m venv .venv
.\.venv\Scripts\Activate.ps1
python -m pip install -U pip
pip install -r requirements.txt

Optional: if openenv command is not found, use:

& "$env:APPDATA\Python\Python313\Scripts\openenv.exe" --help

Run Locally

Start API server

python -m uvicorn support_triage_env.server.app:app --host 0.0.0.0 --port 8000 --reload

Validate with OpenEnv tooling

openenv validate --verbose
openenv validate --url http://localhost:8000

Baseline Inference

inference.py is at project root as required.

Set env vars first:

$env:API_BASE_URL = "https://router.huggingface.co/v1"
$env:MODEL_NAME = "meta-llama/Llama-3.1-8B-Instruct"
$env:HF_TOKEN = "<your_hf_token>"

Run:

python .\inference.py

Output:

  • per-task scores
  • average score
  • baseline_scores.json

Docker

Build:

docker build -t support-triage-openenv:latest -f server/Dockerfile .

Run:

docker run --rm -p 8000:8000 support-triage-openenv:latest

Deploy to Hugging Face Spaces

openenv push --repo-id <your-username>/support-triage-openenv

Then set in Space settings:

  • API_BASE_URL
  • MODEL_NAME
  • HF_TOKEN

Suggested Baseline Reporting Format

Include in submission:

  • model name
  • per-task score table
  • average score
  • runtime in minutes
  • commit hash

Project Structure

support-triage-openenv/
|- server/
|  |- __init__.py
|  |- app.py
|  |- Dockerfile
|- support_triage_env/
|  |- __init__.py
|  |- models.py
|  |- client.py
|  |- tasks.py
|  |- graders.py
|  |- server/
|     |- __init__.py
|     |- app.py
|     |- environment.py
|     |- Dockerfile
|- inference.py
|- openenv.yaml
|- pyproject.toml
|- requirements.txt
|- uv.lock
|- README.md
Downloads last month

-

Downloads are not tracked for this model. How to track
Video Preview
loading