sentinel_env / README.md
NoNameFound's picture
Upload folder using huggingface_hub
9530b60 verified
metadata
title: OpenEnv-Sentinel
emoji: 🚨
colorFrom: red
colorTo: yellow
sdk: docker
pinned: false
app_port: 8000
tags:
  - openenv
base_path: /web

OpenEnv-Sentinel: SRE Incident Triage Environment

An OpenEnv environment that simulates SRE incident triage. An AI agent receives a degraded system state and must use diagnostic tools to identify the root cause and recommend a fix.

Quick Start

pip install -e .
uvicorn server.app:app --host 0.0.0.0 --port 8000

Or with Docker:

docker build -t sentinel-env -f server/Dockerfile .
docker run -p 8000:8000 sentinel-env

Action Space

class SentinelAction(Action):
    tool_name: str    # Tool to invoke
    parameters: dict  # Tool-specific parameters

Available Tools

Tool Parameters Description
query_logs service, query, severity Search service logs
query_metrics service, metric Get time-series metrics (cpu/memory/error_rate/latency/connections)
get_service_status service Service health, uptime, errors
get_dependency_map service (optional) Service dependency graph
consult_runbook topic SOP/runbook lookup
check_recent_changes service (optional) Recent deployments/config changes
submit_resolution root_cause, affected_service, recommendation Submit final answer (ends episode)

Observation Space

class SentinelObservation(Observation):
    incident_summary: str       # Alert description
    tool_output: str            # Result from last tool call
    available_tools: list[str]  # Available tool names
    step_number: int            # Current step (0-indexed)
    max_steps: int              # Episode limit (20)
    cumulative_reward: float    # Running reward total
    last_action_error: str      # Error message if action was invalid
    done: bool                  # Episode finished?
    reward: float | None        # Per-step reward

Tasks

Task 1 β€” The Smoking Gun (Easy)

Alert: payment-api returning HTTP 500 errors. Straightforward single-service crash with a clear root cause in logs and deploy history. Optimal: 2–3 tool calls.

Task 2 β€” The Upstream Culprit (Medium)

Alert: checkout-service p99 latency > 5 seconds. Requires tracing a dependency chain to find the real culprit (inventory-service OOM). Optimal: 4–6 tool calls.

Task 3 β€” The Cascading Failure (Hard)

Alert: Multiple services degraded simultaneously. A long-running analytics query exhausts the PostgreSQL connection pool, cascading through auth, user-profile, and notification services. Includes red herrings. Optimal: 6–10 tool calls.

Scoring

Each task is scored 0.0–1.0 using deterministic keyword-based grading:

  • Root cause identification (weighted by task)
  • Correct affected service identification
  • Actionable recommendation
  • Efficiency bonus (fewer steps = higher score)
  • Destructive penalty (recommending harmful actions = score deduction)

Per-step rewards provide partial credit signal:

  • Relevant tool call: +0.12
  • Irrelevant tool call: βˆ’0.02
  • Repeated call: βˆ’0.05
  • Invalid action: βˆ’0.03
  • Step cost: βˆ’0.01

Running Inference

Uses OpenAI(base_url=...) β€” compatible with HF Inference, OpenAI, and any OpenAI-compatible API.

# Environment server URL
export ENV_URL=http://localhost:8000

# LLM config (defaults to HF router)
export API_BASE_URL=https://router.huggingface.co/v1  # default, can omit
export MODEL_NAME=openai/gpt-oss-120b:novita           # default, can omit
export API_KEY=your-key      # or HF_TOKEN or OPENAI_API_KEY

pip install openai websockets
python inference.py

Output:

Task 1: 0.85
Task 2: 0.65
Task 3: 0.40
Average: 0.63

Baseline Scores

Task GPT-4o (expected) Open LLM (expected)
Task 1 (Easy) 0.80–0.95 0.60–0.80
Task 2 (Medium) 0.60–0.80 0.40–0.60
Task 3 (Hard) 0.30–0.60 0.15–0.35

API Endpoints

Endpoint Method Description
/health GET Health check
/reset POST Reset environment ({"task_id": 1|2|3})
/step POST Execute action ({"action": {...}})
/state GET Get current state
/schema GET JSON schemas for action/observation/state
/ws WebSocket Persistent session