incident-triage-env / README.md
XcodeAddy's picture
Keep grader rewards strictly within unit interval
18aa055
---
title: Incident Triage Env
colorFrom: gray
colorTo: blue
sdk: docker
app_port: 7860
license: mit
short_description: OpenEnv-compatible incident triage evaluation environment.
---
# Production Incident Triage Environment
This project is an OpenEnv-compatible evaluation environment for production incident response. An agent receives a typed incident observation and must perform one of three real-world triage tasks: classify severity, identify the most likely root cause, or recommend the best immediate action.
The environment is built for the OpenEnv hackathon requirements:
- real-world utility
- three graded tasks with easy, medium, and hard difficulty
- typed observation, action, reward, and state models
- deterministic reward logic with partial credit
- root-level `inference.py`
- Docker-based deployment for Hugging Face Spaces
## Overview
The dataset contains 108 incidents across three task families:
| Task | Difficulty | Count | Objective |
|---|---|---:|---|
| `task1` | easy | 36 | Predict incident severity as `SEV1`, `SEV2`, or `SEV3` |
| `task2` | medium | 36 | Predict the most likely root cause domain |
| `task3` | hard | 36 | Predict the best immediate operational action |
The incidents cover realistic production scenarios such as payment failures, queue backlogs, regional network loss, failed deploys, infrastructure saturation, third-party degradation, and failover decisions.
## API
The FastAPI app exposes the following endpoints on port `7860`:
- `GET /health`
- `GET /metadata`
- `GET /tasks`
- `GET /grader`
- `GET /schema`
- `POST /reset`
- `POST /step`
- `GET /state`
- `POST /mcp`
### Reset
`POST /reset` starts a new single-step episode.
Optional request body:
```json
{
"task_type": "task1",
"ticket_id": "INC-001",
"seed": 42
}
```
Response fields:
- `observation`
- `reward`
- `done`
- `info`
### Step
`POST /step?session_id=<id>` accepts an `IncidentAction` and returns a typed `StepResult`.
Example request:
```json
{
"incident_id": "INC-001",
"task_type": "task1",
"severity": "SEV1"
}
```
### State
`GET /state?session_id=<id>` returns the current typed `IncidentState`.
## Web UI
The project also serves a browser-facing UI from the same FastAPI app:
- `/` shows the landing page with project overview and task summary
- `/status` shows live health, schema, and task readiness information
- `/playground` lets you manually reset a session and submit a step from the browser
- `/docs` provides the generated FastAPI API reference
## Models
The core models are defined in [models.py](./models.py):
- `IncidentObservation`
- `IncidentAction`
- `IncidentReward`
- `StepResult`
- `IncidentState`
- `ResetRequest`
Validation rules:
- `incident_id` must match the active ticket
- `task_type` must match the active ticket
- exactly one of `severity`, `root_cause`, or `action` must be populated
- the populated field must match the expected field for the task
## Reward Logic
Rewarding is deterministic and implemented in [graders.py](./graders.py).
- `task1`: `0.99` exact, `0.5` adjacent severity, `0.01` far miss
- `task2`: `0.99` exact, `0.5` related domain, `0.25` `UNKNOWN`, `0.01` wrong
- `task3`: `0.99` exact, `0.4` safe `INVESTIGATE` fallback, `0.25` related action, `0.01` wrong
This keeps grading reproducible while still giving partial-credit trajectory signal.
## Repository Layout
```text
incident-triage-env/
- app.py
- client.py
- environment.py
- graders.py
- incidents.py
- inference.py
- models.py
- openenv.yaml
- pyproject.toml
- requirements.txt
- Dockerfile
- README.md
- server/
- tests/
```
Runtime flow:
1. `incidents.py` stores the ticket dataset.
2. `environment.py` selects the episode and applies grading.
3. `app.py` exposes the API surface.
4. `inference.py` runs the baseline over the environment.
5. `graders.py` calculates deterministic reward and explanations.
## Local Setup
Install dependencies:
```bash
pip install -r requirements.txt
```
Optional OpenEnv CLI:
```bash
pip install openenv-core
```
Optional environment variables for `inference.py`:
```bash
export API_BASE_URL="https://your-openai-compatible-endpoint/v1"
export MODEL_NAME="your-model-name"
export HF_TOKEN="your-api-key"
export ENV_URL="http://localhost:7860"
```
If no external environment server is reachable, `inference.py` falls back to an in-process FastAPI client.
## Run Locally
Start the server:
```bash
uvicorn app:app --host 0.0.0.0 --port 7860
```
Run the baseline:
```bash
python inference.py
```
Run the smoke tests:
```bash
python -m unittest discover -s tests -v
```
## Docker
Build the image:
```bash
docker build -t incident-triage-env .
```
Run the container:
```bash
docker run --rm -p 7860:7860 incident-triage-env
```
Check health:
```bash
curl http://localhost:7860/health
```
## Baseline Logging
`inference.py` prints the required structured output:
```text
[START] task=INC-001 env=incident-triage-env model=deterministic-baseline
[STEP] step=1 action=SEV1 reward=0.99 done=true error=null
[END] success=true steps=1 score=0.99 rewards=0.99
```
## Baseline Scores
Latest local deterministic baseline:
| Metric | Value |
|---|---:|
| Episodes | 108 |
| Average score | 0.9855 |
| `task1` average | 0.9900 |
| `task2` average | 0.9764 |
| `task3` average | 0.9900 |
This deterministic local run completed in about `1.34s` on the current machine.
Results are written by default to `/tmp/outputs/baseline_scores.json`.
## Quick API Example
Reset:
```bash
curl -X POST http://localhost:7860/reset \
-H "Content-Type: application/json" \
-d '{"task_type":"task1","ticket_id":"INC-001"}'
```
Step:
```bash
curl -X POST "http://localhost:7860/step?session_id=<session-id>" \
-H "Content-Type: application/json" \
-d '{
"incident_id": "INC-001",
"task_type": "task1",
"severity": "SEV1"
}'
```
## Pre-Submission Checklist
- `openenv validate . --json` passes
- `openenv validate --url <space-url>` passes
- `POST /reset` returns `200`
- `POST /step` returns typed `reward`, `done`, and `info`
- `GET /state` works for active sessions
- `inference.py` runs from the repo root
- `Dockerfile` serves the app on port `7860`
- `openenv.yaml` matches the current API and dataset counts
## Notes
- `models.py` is the source of truth for valid enum labels.
- `graders.py` is the source of truth for scoring logic.
- Reward values are kept strictly within `(0, 1)` to satisfy Phase 2 validator constraints.
- The environment is intentionally single-step per episode and still exposes typed state for validation and debugging.