--- title: Support Triage OpenEnv emoji: "📨" colorFrom: blue colorTo: teal sdk: docker app_port: 7860 tags: - openenv - reinforcement-learning - customer-support license: mit --- # Support Triage OpenEnv A complete, real-world OpenEnv environment for training/evaluating agents on **customer support ticket triage**. The environment simulates what support teams actually do: read inbox tickets, classify urgency/category, draft safe responses, and resolve the right ticket. ## Why this environment Most agent benchmarks under-model production support workflows. This environment focuses on practical support operations with: - Multi-ticket inbox context selection - Policy-compliant communication - Priority + escalation decisions - Deterministic graders and dense reward shaping ## OpenEnv API compliance The environment exposes: - `reset(task_id?: str) -> Observation` - `step(action: Action) -> (Observation, Reward, done, info)` - `state() -> dict` Typed Pydantic models: - `Observation`: [`src/support_triage_openenv/models.py`](src/support_triage_openenv/models.py) - `Action`: [`src/support_triage_openenv/models.py`](src/support_triage_openenv/models.py) - `Reward`: [`src/support_triage_openenv/models.py`](src/support_triage_openenv/models.py) Metadata: - `openenv.yaml` ## Action space `Action` model fields: - `action_type`: one of `read_ticket | classify_ticket | draft_reply | resolve_ticket` - `ticket_id`: required for `read_ticket`, `classify_ticket`, `resolve_ticket` - `priority`: optional enum `low | medium | high | urgent` - `category`: optional enum `account | billing | technical | abuse | general` - `needs_escalation`: optional bool - `message`: text for `draft_reply` ## Observation space `Observation` includes: - `task_id`, `objective`, `step_count`, `max_steps` - `inbox`: ticket metadata list (`ticket_id`, subject, tier, age, read flag) - `current_ticket_content`: only visible after reading selected ticket - `latest_system_note`: feedback from last step - `score_hint`: partial grader components (`read`, `classify`, `reply`, `resolve`) ## Tasks and difficulty 1. `easy_password_reset` (Easy) - Correctly process account lockout and send secure reset guidance. 2. `medium_billing_dispute` (Medium) - Investigate duplicate billing with context ticket and provide policy-compliant refund timeline. 3. `hard_outage_incident` (Hard) - Handle a high-stakes outage report requiring multi-ticket context, urgent escalation, and careful incident messaging. Each task has deterministic grading in `support_triage_openenv.graders.grade_task`, returning a score `0.0-1.0`. ## Reward design Reward is shaped and meaningful across the trajectory: - Positive dense signal from partial grader progress (read/context, classification fields, reply quality, resolve correctness) - Penalties for invalid actions, repeated loops, and malformed steps - Final step guarantees score alignment with deterministic grader output ## Project structure - `src/support_triage_openenv/env.py` - environment implementation - `src/support_triage_openenv/models.py` - typed OpenEnv models - `src/support_triage_openenv/tasks.py` - task specs (easy/medium/hard) - `src/support_triage_openenv/graders.py` - deterministic grader logic - `scripts/run_baseline.py` - OpenAI baseline inference runner - `scripts/validate_env.py` - tests + optional `openenv validate` - `app.py` - FastAPI app for HF Space runtime - `Dockerfile` - containerized deployment ## Setup ```bash cd /home/ai24mtech14005/meta_hackathon python3 -m venv .venv source .venv/bin/activate pip install -r requirements.txt ``` ## Run tests ```bash python -m pytest -q ``` ## Run baseline OpenAI model baseline: ```bash export API_BASE_URL=https://your-openai-compatible-endpoint/v1 export MODEL_NAME=your-model-id export HF_TOKEN=your-api-key python inference.py --mode openai --output scores/inference_scores.json ``` Deterministic heuristic baseline: ```bash python inference.py --mode heuristic --output scores/inference_scores.json ``` Outputs JSON report to `scores/inference_scores.json` and structured stdout logs with `[START]`, `[STEP]`, `[END]`. ## Run API locally ```bash uvicorn app:app --host 0.0.0.0 --port 7860 ``` Endpoints: - `GET /health` - `POST /reset` - `POST /step` - `GET /state` ## Docker ```bash docker build -t support-triage-openenv . docker run --rm -p 7860:7860 support-triage-openenv ``` ## Hugging Face Space deployment - Create a **Docker Space**. - Push this repository to the Space. - Keep `README.md` frontmatter tags including `openenv`. - Space serves the API on port `7860`. ## One-command remote bootstrap If you want this local repo to automatically create and push to both GitHub + HF: ```bash export GITHUB_USERNAME=your_github_user export GITHUB_TOKEN=your_github_pat export HF_USERNAME=your_hf_user export HF_TOKEN=your_hf_token bash scripts/bootstrap_remotes.sh support-triage-openenv ``` ## Baseline scores (heuristic reproducible) Generated with: ```bash python inference.py --mode heuristic --output scores/inference_scores.json ``` - `easy_password_reset`: grader `1.0`, reward `1.0` - `medium_billing_dispute`: grader `1.0`, reward `1.0` - `hard_outage_incident`: grader `1.0`, reward `1.0` - Overall average grader score: `1.0` - Tracked reference artifact: `baseline_expected_scores.json` ## Pre-submission validator Run full strict validation (all disqualification gates): ```bash python pre_submission_validate.py --space-url https://your-space-name.hf.space ``` Local-only run while iterating (skips Docker daemon + remote space ping): ```bash python pre_submission_validate.py --skip-docker --skip-space ``` Run organizer-provided script directly (integrated path): ```bash bash scripts/pre_validation_script.sh https://your-space-name.hf.space . ``` Notes: - `scripts/sample_inference_script.sh` is kept as organizer reference. - Root `inference.py` is aligned to the required `[START]`, `[STEP]`, `[END]` line format.