Support Triage OpenEnv

A real-world OpenEnv environment where an agent performs customer support triage: prioritization, routing, tagging, information gathering, and response drafting.

This project is designed for Round 1 style hackathon evaluation:

Full typed OpenEnv models
reset() / step() / state() API
3 deterministic graded tasks (easy/medium/hard)
Dense reward shaping with partial progress
Baseline inference.py using OpenAI client and required env vars
Docker + Hugging Face Spaces deployment files

Why This Environment Has Real Utility

Teams actually do this workflow in support operations and trust/safety queues. This environment evaluates whether an agent can:

classify urgency
route to the right team
attach relevant operational tags
ask for required evidence
draft safe and useful customer responses
close only when resolution criteria are met

Module-Aligned Build Guide (From Your Course)

Module 1: Why OpenEnv?

We treat the environment as a service with typed contracts.
Core loop follows RL structure: observe -> act -> reward.

Module 2: Using Existing Environments

support_triage_env/models.py defines typed Action, Observation, State.
support_triage_env/client.py gives a reusable typed client.

Module 3: Deploying Environments

server/app.py is the OpenEnv validator-compatible entrypoint (main() + callable script).
server/Dockerfile provides reproducible container runtime.
openenv.yaml defines deployment metadata.

Module 4: Building Your Own Environment

support_triage_env/server/environment.py implements task simulation.
support_triage_env/tasks.py defines deterministic fixtures.
support_triage_env/graders.py implements 0.0-1.0 grading.

Module 5: Training with OpenEnv + Reward Signals

Reward shaping is dense and trajectory-aware.
inference.py runs model-based episodes and exports reproducible baseline scores.

Action Space

Action model: SupportTriageAction

set_priority(value)
route_team(value)
add_tag(value)
draft_reply(value)
request_info(value)
close_ticket()
noop()

Valid priorities: low | medium | high | urgent

Valid teams: billing | technical | account | trust_safety | shipping

Observation Space

Observation model: SupportTriageObservation

Key fields:

task_id, difficulty, objective
title, customer_tier, customer_message
current working state: priority, routed_team, tags, draft_reply, info_requested
steps_remaining, last_feedback, allowed_actions
inherited reward, done

State Space

State model: SupportTriageState

Contains episode metadata and full workflow state:

episode_id, step_count
task_id, difficulty, objective, max_steps
priority, routed_team, tags
info_requested, closed, close_valid
history

Tasks and Graders

Easy: `easy_password_reset`

Scenario: login token failure after password reset
Expected routing: account
Expected priority: medium
Required tags: password-reset, login

Medium: `medium_double_charge`

Scenario: premium customer charged twice
Expected routing: billing
Expected priority: high
Required tags: refund, double-charge, vip
Needs additional evidence request

Hard: `hard_account_takeover`

Scenario: possible account takeover + fraud + abusive content
Expected routing: trust_safety
Expected priority: urgent
Required tags: security, account-takeover, fraud, content-abuse
Needs security-safe communication and evidence collection

Grading Design

support_triage_env/graders.py computes deterministic component scores:

priority correctness
routing correctness
required tags coverage
reply quality (required/forbidden phrase logic)
process quality (info request + closure quality + efficiency)

Final score is normalized to [0.0, 1.0].

Reward Function

The environment provides dense rewards at each step:

positive reward for correct priority/routing/tagging
incremental reward for improving draft response quality
positive signal for meaningful information requests when required
strong bonus for valid close
penalties for invalid actions, repeated loops, no-op behavior, or premature close
small per-step cost to discourage inefficient trajectories

Windows Setup

py -3.11 -m venv .venv
.\.venv\Scripts\Activate.ps1
python -m pip install -U pip
pip install -r requirements.txt

Optional: if openenv command is not found, use:

& "$env:APPDATA\Python\Python313\Scripts\openenv.exe" --help

Run Locally

Start API server

python -m uvicorn support_triage_env.server.app:app --host 0.0.0.0 --port 8000 --reload

Validate with OpenEnv tooling

openenv validate --verbose
openenv validate --url http://localhost:8000

Baseline Inference

inference.py is at project root as required.

Set env vars first:

$env:API_BASE_URL = "https://router.huggingface.co/v1"
$env:MODEL_NAME = "meta-llama/Llama-3.1-8B-Instruct"
$env:HF_TOKEN = "<your_hf_token>"

Run:

python .\inference.py

Output:

per-task scores
average score
baseline_scores.json

Docker

Build:

docker build -t support-triage-openenv:latest -f server/Dockerfile .