openops / README.md
arya89's picture
Upload folder using huggingface_hub
d02897f verified
metadata
title: OpenOps Incident Commander
emoji: 🚨
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false

OpenOps: AI Incident Commander Environment

Production incident management environment where AI agents learn to handle real-world outages.

Overview

OpenOps simulates production incidents requiring an AI Incident Commander to:

  • Investigate alerts and logs
  • Identify root causes
  • Execute mitigation actions (restart/rollback/scale)
  • Communicate with teams and users
  • Resolve incidents quickly to minimize revenue loss

Environment Specification

Observation Space

{
    "active_alerts": List[str],
    "service_status": Dict[str, str],
    "recent_logs": Dict[str, List[str]],
    "metrics_summary": Dict[str, float],
    "customer_complaints": int,
    "time_elapsed": int,
    "revenue_loss": float,
    "teams_notified": bool,
    "status_page_updated": bool,
    "user_communication_sent": bool
}

Action Space (21 actions)

  • 0: read_alerts
  • 1-4: inspect_logs_{service}
  • 5-8: check_metrics_{service}
  • 9-12: restart_{service}
  • 13-14: rollback_{service}
  • 15-16: scale_{service}
  • 17-19: Communication actions
  • 20: resolve_incident

Three Tasks

Task 1 (Easy): Simple API Crash

  • API service down due to OOM
  • Solution: Inspect logs β†’ Restart API

Task 2 (Medium): Bad Deployment

  • Database deployment broke queries
  • Solution: Inspect logs β†’ Rollback deployment β†’ Notify team

Task 3 (Hard): Cascading Failure

  • Database overload β†’ API timeouts β†’ Customer impact
  • Solution: Inspect both services β†’ Scale database β†’ Restart API β†’ Communicate

Installation

pip install -r server/requirements.txt

Usage

Run Locally

cd server
uvicorn app:app --reload

Run Inference

export OPENAI_API_KEY="your-key"
python inference.py

Grading

Each task scored 0.0-1.0 based on:

  • Investigation quality
  • Correct mitigation actions
  • Communication
  • Successful resolution

Deployment

Deploy to HuggingFace Spaces:

openenv push

πŸ“Š Sample Output

local inference outputt