Upload folder using huggingface_hub

1395b2e verified 8 days ago

6 kB

	---
	title: Support Triage OpenEnv
	emoji: "📨"
	colorFrom: blue
	colorTo: teal
	sdk: docker
	app_port: 7860
	tags:
	- openenv
	- reinforcement-learning
	- customer-support
	license: mit
	---

	# Support Triage OpenEnv

	A complete, real-world OpenEnv environment for training/evaluating agents on customer support ticket triage. The environment simulates what support teams actually do: read inbox tickets, classify urgency/category, draft safe responses, and resolve the right ticket.

	## Why this environment

	Most agent benchmarks under-model production support workflows. This environment focuses on practical support operations with:
	- Multi-ticket inbox context selection
	- Policy-compliant communication
	- Priority + escalation decisions
	- Deterministic graders and dense reward shaping

	## OpenEnv API compliance

	The environment exposes:
	- `reset(task_id?: str) -> Observation`
	- `step(action: Action) -> (Observation, Reward, done, info)`
	- `state() -> dict`

	Typed Pydantic models:
	- `Observation`: [`src/support_triage_openenv/models.py`](src/support_triage_openenv/models.py)
	- `Action`: [`src/support_triage_openenv/models.py`](src/support_triage_openenv/models.py)
	- `Reward`: [`src/support_triage_openenv/models.py`](src/support_triage_openenv/models.py)

	Metadata:
	- `openenv.yaml`

	## Action space

	`Action` model fields:
	- `action_type`: one of `read_ticket \| classify_ticket \| draft_reply \| resolve_ticket`
	- `ticket_id`: required for `read_ticket`, `classify_ticket`, `resolve_ticket`
	- `priority`: optional enum `low \| medium \| high \| urgent`
	- `category`: optional enum `account \| billing \| technical \| abuse \| general`
	- `needs_escalation`: optional bool
	- `message`: text for `draft_reply`

	## Observation space

	`Observation` includes:
	- `task_id`, `objective`, `step_count`, `max_steps`
	- `inbox`: ticket metadata list (`ticket_id`, subject, tier, age, read flag)
	- `current_ticket_content`: only visible after reading selected ticket
	- `latest_system_note`: feedback from last step
	- `score_hint`: partial grader components (`read`, `classify`, `reply`, `resolve`)

	## Tasks and difficulty

	1. `easy_password_reset` (Easy)
	- Correctly process account lockout and send secure reset guidance.

	2. `medium_billing_dispute` (Medium)
	- Investigate duplicate billing with context ticket and provide policy-compliant refund timeline.

	3. `hard_outage_incident` (Hard)
	- Handle a high-stakes outage report requiring multi-ticket context, urgent escalation, and careful incident messaging.

	Each task has deterministic grading in `support_triage_openenv.graders.grade_task`, returning a score `0.0-1.0`.

	## Reward design

	Reward is shaped and meaningful across the trajectory:
	- Positive dense signal from partial grader progress (read/context, classification fields, reply quality, resolve correctness)
	- Penalties for invalid actions, repeated loops, and malformed steps
	- Final step guarantees score alignment with deterministic grader output

	## Project structure

	- `src/support_triage_openenv/env.py` - environment implementation
	- `src/support_triage_openenv/models.py` - typed OpenEnv models
	- `src/support_triage_openenv/tasks.py` - task specs (easy/medium/hard)
	- `src/support_triage_openenv/graders.py` - deterministic grader logic
	- `scripts/run_baseline.py` - OpenAI baseline inference runner
	- `scripts/validate_env.py` - tests + optional `openenv validate`
	- `app.py` - FastAPI app for HF Space runtime
	- `Dockerfile` - containerized deployment

	## Setup

	```bash
	cd /home/ai24mtech14005/meta_hackathon
	python3 -m venv .venv
	source .venv/bin/activate
	pip install -r requirements.txt
	```

	## Run tests

	```bash
	python -m pytest -q
	```

	## Run baseline

	OpenAI model baseline:

	```bash
	export API_BASE_URL=https://your-openai-compatible-endpoint/v1
	export MODEL_NAME=your-model-id
	export HF_TOKEN=your-api-key
	python inference.py --mode openai --output scores/inference_scores.json
	```

	Deterministic heuristic baseline:

	```bash
	python inference.py --mode heuristic --output scores/inference_scores.json
	```

	Outputs JSON report to `scores/inference_scores.json` and structured stdout logs with `[START]`, `[STEP]`, `[END]`.

	## Run API locally

	```bash
	uvicorn app:app --host 0.0.0.0 --port 7860
	```

	Endpoints:
	- `GET /health`
	- `POST /reset`
	- `POST /step`
	- `GET /state`

	## Docker

	```bash
	docker build -t support-triage-openenv .
	docker run --rm -p 7860:7860 support-triage-openenv
	```

	## Hugging Face Space deployment

	- Create a Docker Space.
	- Push this repository to the Space.
	- Keep `README.md` frontmatter tags including `openenv`.
	- Space serves the API on port `7860`.

	## One-command remote bootstrap

	If you want this local repo to automatically create and push to both GitHub + HF:

	```bash
	export GITHUB_USERNAME=your_github_user
	export GITHUB_TOKEN=your_github_pat
	export HF_USERNAME=your_hf_user
	export HF_TOKEN=your_hf_token
	bash scripts/bootstrap_remotes.sh support-triage-openenv
	```

	## Baseline scores (heuristic reproducible)

	Generated with:

	```bash
	python inference.py --mode heuristic --output scores/inference_scores.json
	```

	- `easy_password_reset`: grader `1.0`, reward `1.0`
	- `medium_billing_dispute`: grader `1.0`, reward `1.0`
	- `hard_outage_incident`: grader `1.0`, reward `1.0`
	- Overall average grader score: `1.0`
	- Tracked reference artifact: `baseline_expected_scores.json`

	## Pre-submission validator

	Run full strict validation (all disqualification gates):

	```bash
	python pre_submission_validate.py --space-url https://your-space-name.hf.space
	```

	Local-only run while iterating (skips Docker daemon + remote space ping):

	```bash
	python pre_submission_validate.py --skip-docker --skip-space
	```

	Run organizer-provided script directly (integrated path):

	```bash
	bash scripts/pre_validation_script.sh https://your-space-name.hf.space .
	```

	Notes:
	- `scripts/sample_inference_script.sh` is kept as organizer reference.
	- Root `inference.py` is aligned to the required `[START]`, `[STEP]`, `[END]` line format.