Spaces:
Sleeping
Sleeping
File size: 4,834 Bytes
f13c6d3 b29893e f13c6d3 b29893e f13c6d3 b29893e f13c6d3 b29893e | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 | ---
title: PostMortem Incident Triage OpenEnv
emoji: 🚨
colorFrom: red
colorTo: yellow
sdk: docker
app_port: 8000
pinned: false
license: bsd-3-clause
tags:
- openenv
- rl
- sre
- incident-response
base_path: /web
---
# PostMortem — Live Incident Triage Environment
An OpenEnv environment where an LLM agent plays an on-call SRE responding to a
live production incident. Real-world task, typed OpenEnv spec, deterministic
grader, three difficulty tiers, dense process-reward signal.
## The task
On each episode the agent receives an alert. It must:
1. **ack** the incident (accept ownership)
2. **query_logs / query_metrics / query_traces** on services to gather evidence
3. **scope** the blast radius
4. **hypothesize** the root cause
5. **mitigate** (propose a concrete remediation)
6. **write_status** (post a customer-facing update)
All six verbs are exposed as a single typed action:
```python
PostmortemAction(tool="query_logs", args={"service": "api"})
```
## Action space
| tool | args | effect |
|----------------|-----------------------------------|-------------------------------------|
| `ack` | `{}` | accept the incident (sub-goal 1) |
| `query_logs` | `{"service": str}` | return recent log lines |
| `query_metrics`| `{"service": str}` | return latest metrics |
| `query_traces` | `{"trace_id": str}` | return distributed trace spans |
| `scope` | `{"services": list[str]}` | declare blast radius (sub-goal 2) |
| `hypothesize` | `{"root_cause": str}` | declare root cause (sub-goal 3) |
| `mitigate` | `{"action": str}` | apply mitigation (sub-goal 4) |
| `write_status` | `{"text": str}` | publish update, ends ep (sub-goal 5)|
## Observation space
Key fields of `PostmortemObservation`:
- `task_id`, `task_description`, `available_services`, `available_trace_ids`
- `tool_result` — free text result of the last tool call
- `subgoals` — bool dict `{acked, scoped, hypothesized, mitigated, written}`
- `reward_so_far` — cumulative reward in [0, 1]
- `steps_remaining`, `last_error`
- `done`, `reward` (current step)
## Tasks (3 difficulty tiers)
On each `reset()` the env rotates to the next scenario. Running three resets in
a row covers all three tiers in order.
| task_id | difficulty | incident |
|-------------------|------------|--------------------------------------------------------------|
| `easy_oom` | easy | `api` OOM-killed; cause directly visible in logs |
| `medium_cascade` | medium | checkout latency cascade; must correlate trace across 3 svcs |
| `hard_dns` | hard | 503s blamed on fresh `api` deploy, real cause is upstream DNS|
## Reward design
The reward is a **5-stage process-reward ladder** in `[0, 1]`:
```
ack +0.10 (granted on first successful ack)
scope +0.20 × Jaccard(agent_services, gold_services)
hypothesize +0.20 × keyword_fraction(agent_text, gold_hypothesis_keywords)
mitigate +0.20 × keyword_fraction(agent_text, gold_mitigation_keywords)
write_status +0.30 × keyword_fraction(agent_text, gold_writeup_keywords)
```
Each sub-goal is awarded once. The grader is fully **deterministic** — no LLM
judge, no randomness. Partial credit gives a smooth gradient. The episode
terminates when `write_status` fires or after `MAX_STEPS = 12`.
## Setup
```bash
pip install openenv-core
openenv build . # build Docker image
python inference.py # run baseline (3 scenarios)
```
### Required environment variables
| var | default | notes |
|----------------|------------------------------------------------|-------|
| `HF_TOKEN` | (required) | HuggingFace token, also used as the OpenAI client API key |
| `API_BASE_URL` | `https://router.huggingface.co/v1` | any OpenAI-compatible endpoint |
| `MODEL_NAME` | `Qwen/Qwen2.5-72B-Instruct` | any chat model |
| `IMAGE_NAME` | `postmortem_env-env:latest` | docker tag of the env image |
## Baseline reproduction
```bash
export HF_TOKEN=hf_...
export IMAGE_NAME=postmortem_env-env:latest
python inference.py
```
Emits strict `[START] / [STEP] / [END]` lines, one `[END]` per task.
## Resource budget
Well within the hackathon limits of **2 vCPU / 8 GB RAM**, and completes the
3-task sweep in **well under 20 minutes** (dominated by LLM latency, ≤ 36 LLM
calls total).
## License
BSD-3-Clause (matches OpenEnv core).
|