Support Triage OpenEnv
A real-world OpenEnv environment where an agent performs customer support triage: prioritization, routing, tagging, information gathering, and response drafting.
This project is designed for Round 1 style hackathon evaluation:
- Full typed OpenEnv models
reset()/step()/state()API- 3 deterministic graded tasks (easy/medium/hard)
- Dense reward shaping with partial progress
- Baseline
inference.pyusing OpenAI client and required env vars - Docker + Hugging Face Spaces deployment files
Why This Environment Has Real Utility
Teams actually do this workflow in support operations and trust/safety queues. This environment evaluates whether an agent can:
- classify urgency
- route to the right team
- attach relevant operational tags
- ask for required evidence
- draft safe and useful customer responses
- close only when resolution criteria are met
Module-Aligned Build Guide (From Your Course)
Module 1: Why OpenEnv?
- We treat the environment as a service with typed contracts.
- Core loop follows RL structure: observe -> act -> reward.
Module 2: Using Existing Environments
support_triage_env/models.pydefines typedAction,Observation,State.support_triage_env/client.pygives a reusable typed client.
Module 3: Deploying Environments
server/app.pyis the OpenEnv validator-compatible entrypoint (main()+ callable script).server/Dockerfileprovides reproducible container runtime.openenv.yamldefines deployment metadata.
Module 4: Building Your Own Environment
support_triage_env/server/environment.pyimplements task simulation.support_triage_env/tasks.pydefines deterministic fixtures.support_triage_env/graders.pyimplements 0.0-1.0 grading.
Module 5: Training with OpenEnv + Reward Signals
- Reward shaping is dense and trajectory-aware.
inference.pyruns model-based episodes and exports reproducible baseline scores.
Action Space
Action model: SupportTriageAction
set_priority(value)
route_team(value)
add_tag(value)
draft_reply(value)
request_info(value)
close_ticket()
noop()
Valid priorities: low | medium | high | urgent
Valid teams: billing | technical | account | trust_safety | shipping
Observation Space
Observation model: SupportTriageObservation
Key fields:
task_id,difficulty,objectivetitle,customer_tier,customer_message- current working state:
priority,routed_team,tags,draft_reply,info_requested steps_remaining,last_feedback,allowed_actions- inherited
reward,done
State Space
State model: SupportTriageState
Contains episode metadata and full workflow state:
episode_id,step_counttask_id,difficulty,objective,max_stepspriority,routed_team,tagsinfo_requested,closed,close_validhistory
Tasks and Graders
Easy: easy_password_reset
- Scenario: login token failure after password reset
- Expected routing:
account - Expected priority:
medium - Required tags:
password-reset,login
Medium: medium_double_charge
- Scenario: premium customer charged twice
- Expected routing:
billing - Expected priority:
high - Required tags:
refund,double-charge,vip - Needs additional evidence request
Hard: hard_account_takeover
- Scenario: possible account takeover + fraud + abusive content
- Expected routing:
trust_safety - Expected priority:
urgent - Required tags:
security,account-takeover,fraud,content-abuse - Needs security-safe communication and evidence collection
Grading Design
support_triage_env/graders.py computes deterministic component scores:
- priority correctness
- routing correctness
- required tags coverage
- reply quality (required/forbidden phrase logic)
- process quality (info request + closure quality + efficiency)
Final score is normalized to [0.0, 1.0].
Reward Function
The environment provides dense rewards at each step:
- positive reward for correct priority/routing/tagging
- incremental reward for improving draft response quality
- positive signal for meaningful information requests when required
- strong bonus for valid close
- penalties for invalid actions, repeated loops, no-op behavior, or premature close
- small per-step cost to discourage inefficient trajectories
Windows Setup
py -3.11 -m venv .venv
.\.venv\Scripts\Activate.ps1
python -m pip install -U pip
pip install -r requirements.txt
Optional: if openenv command is not found, use:
& "$env:APPDATA\Python\Python313\Scripts\openenv.exe" --help
Run Locally
Start API server
python -m uvicorn support_triage_env.server.app:app --host 0.0.0.0 --port 8000 --reload
Validate with OpenEnv tooling
openenv validate --verbose
openenv validate --url http://localhost:8000
Baseline Inference
inference.py is at project root as required.
Set env vars first:
$env:API_BASE_URL = "https://router.huggingface.co/v1"
$env:MODEL_NAME = "meta-llama/Llama-3.1-8B-Instruct"
$env:HF_TOKEN = "<your_hf_token>"
Run:
python .\inference.py
Output:
- per-task scores
- average score
baseline_scores.json
Docker
Build:
docker build -t support-triage-openenv:latest -f server/Dockerfile .
Run:
docker run --rm -p 8000:8000 support-triage-openenv:latest
Deploy to Hugging Face Spaces
openenv push --repo-id <your-username>/support-triage-openenv
Then set in Space settings:
API_BASE_URLMODEL_NAMEHF_TOKEN
Suggested Baseline Reporting Format
Include in submission:
- model name
- per-task score table
- average score
- runtime in minutes
- commit hash
Project Structure
support-triage-openenv/
|- server/
| |- __init__.py
| |- app.py
| |- Dockerfile
|- support_triage_env/
| |- __init__.py
| |- models.py
| |- client.py
| |- tasks.py
| |- graders.py
| |- server/
| |- __init__.py
| |- app.py
| |- environment.py
| |- Dockerfile
|- inference.py
|- openenv.yaml
|- pyproject.toml
|- requirements.txt
|- uv.lock
|- README.md