Spaces:
Sleeping
Sleeping
| title: OpenEnv Support Ticket RL Environment | |
| emoji: 🤖 | |
| colorFrom: blue | |
| colorTo: green | |
| sdk: docker | |
| app_file: inference.py | |
| license: mit | |
| library_name: openenv | |
| language: en | |
| tags: | |
| - reinforcement-learning | |
| - openenv | |
| - hackathon | |
| - customer-support | |
| # OpenEnv: Support Ticket Resolution System | |
| An OpenEnv standards-compliant reinforcement learning environment for customer support operations. The agent acts as a support specialist and resolves incoming tickets by choosing structured actions (fetch data, check policy, refund, reply, escalate, close). | |
| ## Motivation & Real-world Relevance | |
| Most RL evaluations are game-like or synthetic. This environment evaluates policy adherence and operational safety in a realistic business workflow: | |
| - The agent must gather context before taking irreversible actions. | |
| - It is rewarded for compliance and penalized for destructive shortcuts. | |
| - It is scored on both correctness and process quality. | |
| *Please see our detailed [Product Requirements Document (PRD.md)](./PRD.md) for full breakdown.* | |
| ## Core RL Task (Domain Clarification) | |
| Each episode is a support ticket lifecycle. | |
| - State: ticket metadata, optional fetched user profile, action history, and termination flag. | |
| - Observation: current ticket, available actions, system message, history, optional tool output, and step count. | |
| - Action: choose one of six typed operations with parameters. | |
| - Reward: dense scorer in [0.01, 0.99] based on whether the action trajectory matches policy-safe resolution behavior. | |
| This is not a navigation/game environment; it is a process-control environment where incorrect sequencing (for example, refunding before policy verification) reduces score. | |
| ## Enhanced Domain Explanation | |
| This environment simulates a customer support ticket resolution system. The agent must navigate through a structured workflow to resolve tickets efficiently and safely. The core challenge lies in adhering to policy constraints while optimizing for resolution speed and accuracy. | |
| ### Example Episode Walkthrough | |
| Here is a detailed walkthrough of an example episode for `task_easy_1`: | |
| 1. **Reset**: | |
| - Observation: A refund ticket from `USR-A1` with open status and `step_count=0`. | |
| 2. **Action 1**: `check_policy({})` | |
| - Tool output: Refund policy for accidental purchases. | |
| - Reward: Increases for verifying the policy. | |
| 3. **Action 2**: `issue_refund({"amount": "full"})` | |
| - Tool output: Refund confirmed. | |
| - Reward: Increases for correct remediation. | |
| 4. **Action 3**: `close_ticket({"resolution": "refunded"})` | |
| - Episode ends. | |
| - Final score: Near-optimal. | |
| ### Visual Representation | |
| A flowchart or diagram can be added here to visually represent the episode flow. | |
| ## Episode Walkthrough (Concrete Example) | |
| Example: `task_easy_1` accidental purchase refund. | |
| 1. Reset | |
| - Observation includes refund ticket from `USR-A1`, open status, step_count=0. | |
| 2. Action 1: `check_policy({})` | |
| - Tool output returns refund policy for accidental purchase. | |
| - Reward increases for policy verification. | |
| 3. Action 2: `issue_refund({"amount": "full"})` | |
| - Tool output confirms refund. | |
| - Reward increases for correct remediation. | |
| 4. Action 3: `close_ticket({"resolution": "refunded"})` | |
| - Episode ends. | |
| - Final score reaches near-optimal band. | |
| Flow (high-level): | |
| ``` | |
| reset -> check_policy -> issue_refund -> close_ticket -> done | |
| ``` | |
| ## Task Set and Difficulty Progression | |
| The environment contains 4 tasks, including 3 required benchmark tasks with increasing difficulty. | |
| | Task | Difficulty | What changes vs previous | Typical Horizon | Stochasticity | Expected Optimal Score | | |
| |---|---|---|---:|---|---:| | |
| | `task_easy_1` | easy | Baseline accidental purchase refund flow | 3 | Low | 0.99 | | |
| | `task_medium_1` | medium | Adds policy-conflict trap: must reject invalid refund | 3 | Low | 0.99 | | |
| | `task_hard_1` | hard | Requires data fetch + correct escalation reason + customer communication | 3 | Medium | 0.99 | | |
| | `task_fraud_detection` | hard | Adds chargeback-based fraud risk and denial behavior | 4 | Medium | 0.99 | | |
| Difficulty metadata is encoded in [env/tasks.py](env/tasks.py). | |
| ## Action Space | |
| - `fetch_user_data(user_id)` | |
| - `check_policy(issue_type)` | |
| - `issue_refund(amount)` | |
| - `reply_to_customer(message)` | |
| - `escalate(reason)` | |
| - `close_ticket(resolution)` | |
| ## Observation Space | |
| Observation object fields: | |
| - `ticket` | |
| - `available_actions` | |
| - `system_message` | |
| - `history` | |
| - `tool_output` | |
| - `step_count` | |
| Schema is documented in [openenv.yaml](openenv.yaml). | |
| ## Inference Interface Contract | |
| The submission entrypoint is [inference.py](inference.py) in repository root. | |
| Required environment variables: | |
| - `API_BASE_URL`: OpenAI-compatible API endpoint | |
| - `MODEL_NAME`: model identifier | |
| - `HF_TOKEN`: API key/token | |
| The inference loop uses OpenAI client calls and emits strict structured logs: | |
| - `[START] task=... env=... model=...` | |
| - `[STEP] step=... action=... reward=... done=... error=...` | |
| - `[END] success=... steps=... score=... rewards=...` | |
| Action serialization format expected from the model: | |
| ```json | |
| {"action_type": "check_policy", "parameters": {"issue_type": "refund_request"}} | |
| ``` | |
| ## API Endpoints (Runtime Environment) | |
| Implemented in [server/app.py](server/app.py): | |
| - `GET /` health check | |
| - `POST /reset` starts a new session and returns initial observation | |
| - `POST /step` applies an action for a session | |
| - `GET /state?session_id=...` returns typed environment state | |
| ## Reproducibility | |
| - Environment dynamics are deterministic for a fixed action trajectory. | |
| - Graders are deterministic and bounded; tests in [tests/test_graders.py](tests/test_graders.py) verify this. | |
| - Fixed benchmark trajectories are provided in [evaluate.py](evaluate.py). | |
| ## Reproducibility Enhancements | |
| - **Seed Management**: The environment supports deterministic runs by setting a random seed. Use the `--seed` flag in scripts to ensure reproducibility. | |
| - **Baseline Scores**: | |
| - Random Policy: 0.33 | |
| - Greedy Policy: 0.75 | |
| These scores are verified in the validation script and can be reproduced using the provided `evaluate.py` script. | |
| ## Baseline Reproduction | |
| Run the environment and evaluate the agent: | |
| ```bash | |
| # Install dependencies | |
| pip install -r requirements.txt | |
| pip install -e . | |
| # Run baseline evaluator | |
| python evaluate.py | |
| ``` | |
| Example output: | |
| ```json | |
| { | |
| "results": { | |
| "task_easy_1": {"score": 0.99}, | |
| "task_medium_1": {"score": 0.99}, | |
| "task_hard_1": {"score": 0.99} | |
| } | |
| } | |
| ``` | |
| ## Setup and Run | |
| Using Docker: | |
| ```bash | |
| docker build -t openenv_support . | |
| # Run API Server (HF Spaces mode): | |
| docker run -p 7860:7860 openenv_support | |
| ``` | |
| Run baseline inference test script locally: | |
| Ensure you install `pydantic` and `openai` first. | |
| ```bash | |
| export API_BASE_URL="https://api.openai.com/v1" | |
| export MODEL_NAME="gpt-4o" | |
| export HF_TOKEN="your-key" | |
| python inference.py | |
| ``` | |
| ## Pre-submission Validation (Non-Docker) | |
| Use the evaluator script introduced for reviewers: | |
| ```bash | |
| chmod +x scripts/validate_submission.sh | |
| ./scripts/validate_submission.sh | |
| ``` | |
| The script checks: | |
| - pytest suite | |
| - grader determinism and score bounds | |
| - openenv.yaml parse + required fields | |
| - task difficulty coverage | |
| - baseline evaluation output | |
| - inference smoke run and `[START]/[STEP]/[END]` log structure | |
| ## Reviewer Quickstart | |
| For contributors and evaluators: | |
| ```bash | |
| python -m venv .venv | |
| source .venv/bin/activate | |
| pip install -r requirements.txt | |
| pip install -e . | |
| python -m pytest -q | |
| ``` | |