meta_hackathon / README.md
afroimam's picture
Upload folder using huggingface_hub
1395b2e verified
---
title: Support Triage OpenEnv
emoji: "📨"
colorFrom: blue
colorTo: teal
sdk: docker
app_port: 7860
tags:
- openenv
- reinforcement-learning
- customer-support
license: mit
---
# Support Triage OpenEnv
A complete, real-world OpenEnv environment for training/evaluating agents on **customer support ticket triage**. The environment simulates what support teams actually do: read inbox tickets, classify urgency/category, draft safe responses, and resolve the right ticket.
## Why this environment
Most agent benchmarks under-model production support workflows. This environment focuses on practical support operations with:
- Multi-ticket inbox context selection
- Policy-compliant communication
- Priority + escalation decisions
- Deterministic graders and dense reward shaping
## OpenEnv API compliance
The environment exposes:
- `reset(task_id?: str) -> Observation`
- `step(action: Action) -> (Observation, Reward, done, info)`
- `state() -> dict`
Typed Pydantic models:
- `Observation`: [`src/support_triage_openenv/models.py`](src/support_triage_openenv/models.py)
- `Action`: [`src/support_triage_openenv/models.py`](src/support_triage_openenv/models.py)
- `Reward`: [`src/support_triage_openenv/models.py`](src/support_triage_openenv/models.py)
Metadata:
- `openenv.yaml`
## Action space
`Action` model fields:
- `action_type`: one of `read_ticket | classify_ticket | draft_reply | resolve_ticket`
- `ticket_id`: required for `read_ticket`, `classify_ticket`, `resolve_ticket`
- `priority`: optional enum `low | medium | high | urgent`
- `category`: optional enum `account | billing | technical | abuse | general`
- `needs_escalation`: optional bool
- `message`: text for `draft_reply`
## Observation space
`Observation` includes:
- `task_id`, `objective`, `step_count`, `max_steps`
- `inbox`: ticket metadata list (`ticket_id`, subject, tier, age, read flag)
- `current_ticket_content`: only visible after reading selected ticket
- `latest_system_note`: feedback from last step
- `score_hint`: partial grader components (`read`, `classify`, `reply`, `resolve`)
## Tasks and difficulty
1. `easy_password_reset` (Easy)
- Correctly process account lockout and send secure reset guidance.
2. `medium_billing_dispute` (Medium)
- Investigate duplicate billing with context ticket and provide policy-compliant refund timeline.
3. `hard_outage_incident` (Hard)
- Handle a high-stakes outage report requiring multi-ticket context, urgent escalation, and careful incident messaging.
Each task has deterministic grading in `support_triage_openenv.graders.grade_task`, returning a score `0.0-1.0`.
## Reward design
Reward is shaped and meaningful across the trajectory:
- Positive dense signal from partial grader progress (read/context, classification fields, reply quality, resolve correctness)
- Penalties for invalid actions, repeated loops, and malformed steps
- Final step guarantees score alignment with deterministic grader output
## Project structure
- `src/support_triage_openenv/env.py` - environment implementation
- `src/support_triage_openenv/models.py` - typed OpenEnv models
- `src/support_triage_openenv/tasks.py` - task specs (easy/medium/hard)
- `src/support_triage_openenv/graders.py` - deterministic grader logic
- `scripts/run_baseline.py` - OpenAI baseline inference runner
- `scripts/validate_env.py` - tests + optional `openenv validate`
- `app.py` - FastAPI app for HF Space runtime
- `Dockerfile` - containerized deployment
## Setup
```bash
cd /home/ai24mtech14005/meta_hackathon
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
```
## Run tests
```bash
python -m pytest -q
```
## Run baseline
OpenAI model baseline:
```bash
export API_BASE_URL=https://your-openai-compatible-endpoint/v1
export MODEL_NAME=your-model-id
export HF_TOKEN=your-api-key
python inference.py --mode openai --output scores/inference_scores.json
```
Deterministic heuristic baseline:
```bash
python inference.py --mode heuristic --output scores/inference_scores.json
```
Outputs JSON report to `scores/inference_scores.json` and structured stdout logs with `[START]`, `[STEP]`, `[END]`.
## Run API locally
```bash
uvicorn app:app --host 0.0.0.0 --port 7860
```
Endpoints:
- `GET /health`
- `POST /reset`
- `POST /step`
- `GET /state`
## Docker
```bash
docker build -t support-triage-openenv .
docker run --rm -p 7860:7860 support-triage-openenv
```
## Hugging Face Space deployment
- Create a **Docker Space**.
- Push this repository to the Space.
- Keep `README.md` frontmatter tags including `openenv`.
- Space serves the API on port `7860`.
## One-command remote bootstrap
If you want this local repo to automatically create and push to both GitHub + HF:
```bash
export GITHUB_USERNAME=your_github_user
export GITHUB_TOKEN=your_github_pat
export HF_USERNAME=your_hf_user
export HF_TOKEN=your_hf_token
bash scripts/bootstrap_remotes.sh support-triage-openenv
```
## Baseline scores (heuristic reproducible)
Generated with:
```bash
python inference.py --mode heuristic --output scores/inference_scores.json
```
- `easy_password_reset`: grader `1.0`, reward `1.0`
- `medium_billing_dispute`: grader `1.0`, reward `1.0`
- `hard_outage_incident`: grader `1.0`, reward `1.0`
- Overall average grader score: `1.0`
- Tracked reference artifact: `baseline_expected_scores.json`
## Pre-submission validator
Run full strict validation (all disqualification gates):
```bash
python pre_submission_validate.py --space-url https://your-space-name.hf.space
```
Local-only run while iterating (skips Docker daemon + remote space ping):
```bash
python pre_submission_validate.py --skip-docker --skip-space
```
Run organizer-provided script directly (integrated path):
```bash
bash scripts/pre_validation_script.sh https://your-space-name.hf.space .
```
Notes:
- `scripts/sample_inference_script.sh` is kept as organizer reference.
- Root `inference.py` is aligned to the required `[START]`, `[STEP]`, `[END]` line format.