--- title: OpenEnv Support Ticket RL Environment emoji: 🤖 colorFrom: blue colorTo: green sdk: docker app_file: inference.py license: mit library_name: openenv language: en tags: - reinforcement-learning - openenv - hackathon - customer-support --- # OpenEnv: Support Ticket Resolution System An OpenEnv standards-compliant reinforcement learning environment for customer support operations. The agent acts as a support specialist and resolves incoming tickets by choosing structured actions (fetch data, check policy, refund, reply, escalate, close). ## Motivation & Real-world Relevance Most RL evaluations are game-like or synthetic. This environment evaluates policy adherence and operational safety in a realistic business workflow: - The agent must gather context before taking irreversible actions. - It is rewarded for compliance and penalized for destructive shortcuts. - It is scored on both correctness and process quality. *Please see our detailed [Product Requirements Document (PRD.md)](./PRD.md) for full breakdown.* ## Core RL Task (Domain Clarification) Each episode is a support ticket lifecycle. - State: ticket metadata, optional fetched user profile, action history, and termination flag. - Observation: current ticket, available actions, system message, history, optional tool output, and step count. - Action: choose one of six typed operations with parameters. - Reward: dense scorer in [0.01, 0.99] based on whether the action trajectory matches policy-safe resolution behavior. This is not a navigation/game environment; it is a process-control environment where incorrect sequencing (for example, refunding before policy verification) reduces score. ## Enhanced Domain Explanation This environment simulates a customer support ticket resolution system. The agent must navigate through a structured workflow to resolve tickets efficiently and safely. The core challenge lies in adhering to policy constraints while optimizing for resolution speed and accuracy. ### Example Episode Walkthrough Here is a detailed walkthrough of an example episode for `task_easy_1`: 1. **Reset**: - Observation: A refund ticket from `USR-A1` with open status and `step_count=0`. 2. **Action 1**: `check_policy({})` - Tool output: Refund policy for accidental purchases. - Reward: Increases for verifying the policy. 3. **Action 2**: `issue_refund({"amount": "full"})` - Tool output: Refund confirmed. - Reward: Increases for correct remediation. 4. **Action 3**: `close_ticket({"resolution": "refunded"})` - Episode ends. - Final score: Near-optimal. ### Visual Representation A flowchart or diagram can be added here to visually represent the episode flow. ## Episode Walkthrough (Concrete Example) Example: `task_easy_1` accidental purchase refund. 1. Reset - Observation includes refund ticket from `USR-A1`, open status, step_count=0. 2. Action 1: `check_policy({})` - Tool output returns refund policy for accidental purchase. - Reward increases for policy verification. 3. Action 2: `issue_refund({"amount": "full"})` - Tool output confirms refund. - Reward increases for correct remediation. 4. Action 3: `close_ticket({"resolution": "refunded"})` - Episode ends. - Final score reaches near-optimal band. Flow (high-level): ``` reset -> check_policy -> issue_refund -> close_ticket -> done ``` ## Task Set and Difficulty Progression The environment contains 4 tasks, including 3 required benchmark tasks with increasing difficulty. | Task | Difficulty | What changes vs previous | Typical Horizon | Stochasticity | Expected Optimal Score | |---|---|---|---:|---|---:| | `task_easy_1` | easy | Baseline accidental purchase refund flow | 3 | Low | 0.99 | | `task_medium_1` | medium | Adds policy-conflict trap: must reject invalid refund | 3 | Low | 0.99 | | `task_hard_1` | hard | Requires data fetch + correct escalation reason + customer communication | 3 | Medium | 0.99 | | `task_fraud_detection` | hard | Adds chargeback-based fraud risk and denial behavior | 4 | Medium | 0.99 | Difficulty metadata is encoded in [env/tasks.py](env/tasks.py). ## Action Space - `fetch_user_data(user_id)` - `check_policy(issue_type)` - `issue_refund(amount)` - `reply_to_customer(message)` - `escalate(reason)` - `close_ticket(resolution)` ## Observation Space Observation object fields: - `ticket` - `available_actions` - `system_message` - `history` - `tool_output` - `step_count` Schema is documented in [openenv.yaml](openenv.yaml). ## Inference Interface Contract The submission entrypoint is [inference.py](inference.py) in repository root. Required environment variables: - `API_BASE_URL`: OpenAI-compatible API endpoint - `MODEL_NAME`: model identifier - `HF_TOKEN`: API key/token The inference loop uses OpenAI client calls and emits strict structured logs: - `[START] task=... env=... model=...` - `[STEP] step=... action=... reward=... done=... error=...` - `[END] success=... steps=... score=... rewards=...` Action serialization format expected from the model: ```json {"action_type": "check_policy", "parameters": {"issue_type": "refund_request"}} ``` ## API Endpoints (Runtime Environment) Implemented in [server/app.py](server/app.py): - `GET /` health check - `POST /reset` starts a new session and returns initial observation - `POST /step` applies an action for a session - `GET /state?session_id=...` returns typed environment state ## Reproducibility - Environment dynamics are deterministic for a fixed action trajectory. - Graders are deterministic and bounded; tests in [tests/test_graders.py](tests/test_graders.py) verify this. - Fixed benchmark trajectories are provided in [evaluate.py](evaluate.py). ## Reproducibility Enhancements - **Seed Management**: The environment supports deterministic runs by setting a random seed. Use the `--seed` flag in scripts to ensure reproducibility. - **Baseline Scores**: - Random Policy: 0.33 - Greedy Policy: 0.75 These scores are verified in the validation script and can be reproduced using the provided `evaluate.py` script. ## Baseline Reproduction Run the environment and evaluate the agent: ```bash # Install dependencies pip install -r requirements.txt pip install -e . # Run baseline evaluator python evaluate.py ``` Example output: ```json { "results": { "task_easy_1": {"score": 0.99}, "task_medium_1": {"score": 0.99}, "task_hard_1": {"score": 0.99} } } ``` ## Setup and Run Using Docker: ```bash docker build -t openenv_support . # Run API Server (HF Spaces mode): docker run -p 7860:7860 openenv_support ``` Run baseline inference test script locally: Ensure you install `pydantic` and `openai` first. ```bash export API_BASE_URL="https://api.openai.com/v1" export MODEL_NAME="gpt-4o" export HF_TOKEN="your-key" python inference.py ``` ## Pre-submission Validation (Non-Docker) Use the evaluator script introduced for reviewers: ```bash chmod +x scripts/validate_submission.sh ./scripts/validate_submission.sh ``` The script checks: - pytest suite - grader determinism and score bounds - openenv.yaml parse + required fields - task difficulty coverage - baseline evaluation output - inference smoke run and `[START]/[STEP]/[END]` log structure ## Reviewer Quickstart For contributors and evaluators: ```bash python -m venv .venv source .venv/bin/activate pip install -r requirements.txt pip install -e . python -m pytest -q ```