Spaces:

XcodeAddy
/

incident-triage-env

Running

File size: 6,590 Bytes

---
title: Incident Triage Env
colorFrom: gray
colorTo: blue
sdk: docker
app_port: 7860
license: mit
short_description: OpenEnv-compatible incident triage evaluation environment.
---

# Production Incident Triage Environment

This project is an OpenEnv-compatible evaluation environment for production incident response. An agent receives a typed incident observation and must perform one of three real-world triage tasks: classify severity, identify the most likely root cause, or recommend the best immediate action.

The environment is built for the OpenEnv hackathon requirements:
- real-world utility
- three graded tasks with easy, medium, and hard difficulty
- typed observation, action, reward, and state models
- deterministic reward logic with partial credit
- root-level `inference.py`
- Docker-based deployment for Hugging Face Spaces

## Overview

The dataset contains 108 incidents across three task families:

| Task | Difficulty | Count | Objective |
|---|---|---:|---|
| `task1` | easy | 36 | Predict incident severity as `SEV1`, `SEV2`, or `SEV3` |
| `task2` | medium | 36 | Predict the most likely root cause domain |
| `task3` | hard | 36 | Predict the best immediate operational action |

The incidents cover realistic production scenarios such as payment failures, queue backlogs, regional network loss, failed deploys, infrastructure saturation, third-party degradation, and failover decisions.

## API

The FastAPI app exposes the following endpoints on port `7860`:

- `GET /health`
- `GET /metadata`
- `GET /tasks`
- `GET /grader`
- `GET /schema`
- `POST /reset`
- `POST /step`
- `GET /state`
- `POST /mcp`

### Reset

`POST /reset` starts a new single-step episode.

Optional request body:

```json
{
  "task_type": "task1",
  "ticket_id": "INC-001",
  "seed": 42
}
```

Response fields:
- `observation`
- `reward`
- `done`
- `info`

### Step

`POST /step?session_id=<id>` accepts an `IncidentAction` and returns a typed `StepResult`.

Example request:

```json
{
  "incident_id": "INC-001",
  "task_type": "task1",
  "severity": "SEV1"
}
```

### State

`GET /state?session_id=<id>` returns the current typed `IncidentState`.

## Web UI

The project also serves a browser-facing UI from the same FastAPI app:

- `/` shows the landing page with project overview and task summary
- `/status` shows live health, schema, and task readiness information
- `/playground` lets you manually reset a session and submit a step from the browser
- `/docs` provides the generated FastAPI API reference

## Models

The core models are defined in [models.py](./models.py):

- `IncidentObservation`
- `IncidentAction`
- `IncidentReward`
- `StepResult`
- `IncidentState`
- `ResetRequest`

Validation rules:
- `incident_id` must match the active ticket
- `task_type` must match the active ticket
- exactly one of `severity`, `root_cause`, or `action` must be populated
- the populated field must match the expected field for the task

## Reward Logic

Rewarding is deterministic and implemented in [graders.py](./graders.py).

- `task1`: `0.99` exact, `0.5` adjacent severity, `0.01` far miss
- `task2`: `0.99` exact, `0.5` related domain, `0.25` `UNKNOWN`, `0.01` wrong
- `task3`: `0.99` exact, `0.4` safe `INVESTIGATE` fallback, `0.25` related action, `0.01` wrong

This keeps grading reproducible while still giving partial-credit trajectory signal.

## Repository Layout

```text
incident-triage-env/
- app.py
- client.py
- environment.py
- graders.py
- incidents.py
- inference.py
- models.py
- openenv.yaml
- pyproject.toml
- requirements.txt
- Dockerfile
- README.md
- server/
- tests/
```

Runtime flow:
1. `incidents.py` stores the ticket dataset.
2. `environment.py` selects the episode and applies grading.
3. `app.py` exposes the API surface.
4. `inference.py` runs the baseline over the environment.
5. `graders.py` calculates deterministic reward and explanations.

## Local Setup

Install dependencies:

```bash
pip install -r requirements.txt
```

Optional OpenEnv CLI:

```bash
pip install openenv-core
```

Optional environment variables for `inference.py`:

```bash
export API_BASE_URL="https://your-openai-compatible-endpoint/v1"
export MODEL_NAME="your-model-name"
export HF_TOKEN="your-api-key"
export ENV_URL="http://localhost:7860"
```

If no external environment server is reachable, `inference.py` falls back to an in-process FastAPI client.

## Run Locally

Start the server:

```bash
uvicorn app:app --host 0.0.0.0 --port 7860
```

Run the baseline:

```bash
python inference.py
```

Run the smoke tests:

```bash
python -m unittest discover -s tests -v
```

## Docker

Build the image:

```bash
docker build -t incident-triage-env .
```

Run the container:

```bash
docker run --rm -p 7860:7860 incident-triage-env
```

Check health:

```bash
curl http://localhost:7860/health
```

## Baseline Logging

`inference.py` prints the required structured output:

```text
[START] task=INC-001 env=incident-triage-env model=deterministic-baseline
[STEP] step=1 action=SEV1 reward=0.99 done=true error=null
[END] success=true steps=1 score=0.99 rewards=0.99
```

## Baseline Scores

Latest local deterministic baseline:

| Metric | Value |
|---|---:|
| Episodes | 108 |
| Average score | 0.9855 |
| `task1` average | 0.9900 |
| `task2` average | 0.9764 |
| `task3` average | 0.9900 |

This deterministic local run completed in about `1.34s` on the current machine.
Results are written by default to `/tmp/outputs/baseline_scores.json`.

## Quick API Example

Reset:

```bash
curl -X POST http://localhost:7860/reset \
  -H "Content-Type: application/json" \
  -d '{"task_type":"task1","ticket_id":"INC-001"}'
```

Step:

```bash
curl -X POST "http://localhost:7860/step?session_id=<session-id>" \
  -H "Content-Type: application/json" \
  -d '{
    "incident_id": "INC-001",
    "task_type": "task1",
    "severity": "SEV1"
  }'
```

## Pre-Submission Checklist

- `openenv validate . --json` passes
- `openenv validate --url <space-url>` passes
- `POST /reset` returns `200`
- `POST /step` returns typed `reward`, `done`, and `info`
- `GET /state` works for active sessions
- `inference.py` runs from the repo root
- `Dockerfile` serves the app on port `7860`
- `openenv.yaml` matches the current API and dataset counts

## Notes

- `models.py` is the source of truth for valid enum labels.
- `graders.py` is the source of truth for scoring logic.
- Reward values are kept strictly within `(0, 1)` to satisfy Phase 2 validator constraints.
- The environment is intentionally single-step per episode and still exposes typed state for validation and debugging.