--- title: Incident Triage Env colorFrom: gray colorTo: blue sdk: docker app_port: 7860 license: mit short_description: OpenEnv-compatible incident triage evaluation environment. --- # Production Incident Triage Environment This project is an OpenEnv-compatible evaluation environment for production incident response. An agent receives a typed incident observation and must perform one of three real-world triage tasks: classify severity, identify the most likely root cause, or recommend the best immediate action. The environment is built for the OpenEnv hackathon requirements: - real-world utility - three graded tasks with easy, medium, and hard difficulty - typed observation, action, reward, and state models - deterministic reward logic with partial credit - root-level `inference.py` - Docker-based deployment for Hugging Face Spaces ## Overview The dataset contains 108 incidents across three task families: | Task | Difficulty | Count | Objective | |---|---|---:|---| | `task1` | easy | 36 | Predict incident severity as `SEV1`, `SEV2`, or `SEV3` | | `task2` | medium | 36 | Predict the most likely root cause domain | | `task3` | hard | 36 | Predict the best immediate operational action | The incidents cover realistic production scenarios such as payment failures, queue backlogs, regional network loss, failed deploys, infrastructure saturation, third-party degradation, and failover decisions. ## API The FastAPI app exposes the following endpoints on port `7860`: - `GET /health` - `GET /metadata` - `GET /tasks` - `GET /grader` - `GET /schema` - `POST /reset` - `POST /step` - `GET /state` - `POST /mcp` ### Reset `POST /reset` starts a new single-step episode. Optional request body: ```json { "task_type": "task1", "ticket_id": "INC-001", "seed": 42 } ``` Response fields: - `observation` - `reward` - `done` - `info` ### Step `POST /step?session_id=` accepts an `IncidentAction` and returns a typed `StepResult`. Example request: ```json { "incident_id": "INC-001", "task_type": "task1", "severity": "SEV1" } ``` ### State `GET /state?session_id=` returns the current typed `IncidentState`. ## Web UI The project also serves a browser-facing UI from the same FastAPI app: - `/` shows the landing page with project overview and task summary - `/status` shows live health, schema, and task readiness information - `/playground` lets you manually reset a session and submit a step from the browser - `/docs` provides the generated FastAPI API reference ## Models The core models are defined in [models.py](./models.py): - `IncidentObservation` - `IncidentAction` - `IncidentReward` - `StepResult` - `IncidentState` - `ResetRequest` Validation rules: - `incident_id` must match the active ticket - `task_type` must match the active ticket - exactly one of `severity`, `root_cause`, or `action` must be populated - the populated field must match the expected field for the task ## Reward Logic Rewarding is deterministic and implemented in [graders.py](./graders.py). - `task1`: `0.99` exact, `0.5` adjacent severity, `0.01` far miss - `task2`: `0.99` exact, `0.5` related domain, `0.25` `UNKNOWN`, `0.01` wrong - `task3`: `0.99` exact, `0.4` safe `INVESTIGATE` fallback, `0.25` related action, `0.01` wrong This keeps grading reproducible while still giving partial-credit trajectory signal. ## Repository Layout ```text incident-triage-env/ - app.py - client.py - environment.py - graders.py - incidents.py - inference.py - models.py - openenv.yaml - pyproject.toml - requirements.txt - Dockerfile - README.md - server/ - tests/ ``` Runtime flow: 1. `incidents.py` stores the ticket dataset. 2. `environment.py` selects the episode and applies grading. 3. `app.py` exposes the API surface. 4. `inference.py` runs the baseline over the environment. 5. `graders.py` calculates deterministic reward and explanations. ## Local Setup Install dependencies: ```bash pip install -r requirements.txt ``` Optional OpenEnv CLI: ```bash pip install openenv-core ``` Optional environment variables for `inference.py`: ```bash export API_BASE_URL="https://your-openai-compatible-endpoint/v1" export MODEL_NAME="your-model-name" export HF_TOKEN="your-api-key" export ENV_URL="http://localhost:7860" ``` If no external environment server is reachable, `inference.py` falls back to an in-process FastAPI client. ## Run Locally Start the server: ```bash uvicorn app:app --host 0.0.0.0 --port 7860 ``` Run the baseline: ```bash python inference.py ``` Run the smoke tests: ```bash python -m unittest discover -s tests -v ``` ## Docker Build the image: ```bash docker build -t incident-triage-env . ``` Run the container: ```bash docker run --rm -p 7860:7860 incident-triage-env ``` Check health: ```bash curl http://localhost:7860/health ``` ## Baseline Logging `inference.py` prints the required structured output: ```text [START] task=INC-001 env=incident-triage-env model=deterministic-baseline [STEP] step=1 action=SEV1 reward=0.99 done=true error=null [END] success=true steps=1 score=0.99 rewards=0.99 ``` ## Baseline Scores Latest local deterministic baseline: | Metric | Value | |---|---:| | Episodes | 108 | | Average score | 0.9855 | | `task1` average | 0.9900 | | `task2` average | 0.9764 | | `task3` average | 0.9900 | This deterministic local run completed in about `1.34s` on the current machine. Results are written by default to `/tmp/outputs/baseline_scores.json`. ## Quick API Example Reset: ```bash curl -X POST http://localhost:7860/reset \ -H "Content-Type: application/json" \ -d '{"task_type":"task1","ticket_id":"INC-001"}' ``` Step: ```bash curl -X POST "http://localhost:7860/step?session_id=" \ -H "Content-Type: application/json" \ -d '{ "incident_id": "INC-001", "task_type": "task1", "severity": "SEV1" }' ``` ## Pre-Submission Checklist - `openenv validate . --json` passes - `openenv validate --url ` passes - `POST /reset` returns `200` - `POST /step` returns typed `reward`, `done`, and `info` - `GET /state` works for active sessions - `inference.py` runs from the repo root - `Dockerfile` serves the app on port `7860` - `openenv.yaml` matches the current API and dataset counts ## Notes - `models.py` is the source of truth for valid enum labels. - `graders.py` is the source of truth for scoring logic. - Reward values are kept strictly within `(0, 1)` to satisfy Phase 2 validator constraints. - The environment is intentionally single-step per episode and still exposes typed state for validation and debugging.