disk-panic-openenv / README.md
yashppawar's picture
Initial DiskPanic OpenEnv submission
569c142 verified
metadata
title: DiskPanic OpenEnv
emoji: πŸ’₯
colorFrom: red
colorTo: yellow
sdk: docker
app_port: 8000
pinned: false
license: apache-2.0

DiskPanic β€” SRE Incident Response OpenEnv

A real-world RL environment where an LLM agent plays an on-call Site Reliability Engineer responding to a production incident: the root filesystem is full and app.service has crashed. The agent must free space, restart the service, and preserve business-critical audit logs β€” the wrong rm -rf tanks the reward.

Built for the OpenEnv Round 1 Hackathon by Yash Pravin Pawar's team.

Why this env

Every SRE has lived this exact 3am nightmare. The env tests three skills:

  1. Diagnosis β€” finding the bloated file with du / ls / find
  2. Surgical deletion β€” removing the right thing without touching protected dirs
  3. Recovery β€” restarting services and (on hard) dropping a logrotate config to stop a runaway writer

The reward signal is dense: the agent sees its score climb as disk usage drops, gets a bonus for restoring the service, and is penalized if the SHA-256 of /var/log/audit/ changes.

Tasks

ID Scenario Graded on
easy One 8.7 GiB rotated nginx log is filling the disk. Disk usage < 80% + audit dir untouched
medium Disk full + app.service has failed. disk(0.4) + service(0.4) + audit(0.2)
hard Same + a runaway writer grows /var/log/app/runaway.log by 100 MiB every tick. disk(0.3) + service(0.3) + audit(0.2) + logrotate config(0.2)

All graders return a scalar in [0.0, 1.0].

Action space

DiskPanicAction(command: str) β€” a single bash-lite command per step. Supported:

df                         ls <path>        du <path>
cat <path>                 find <path>      sha256sum <path>
rm [-rf] <path>            systemctl is-active|status|start|restart <svc>
echo "content" > <path>    (for writing files like logrotate configs)

Observation space

DiskPanicObservation:

  • stdout: str β€” output of the last command
  • df_output: str β€” current simulated df -h /
  • service_status: str β€” active / inactive / failed
  • task_id: str β€” current task (easy | medium | hard)
  • step: int
  • last_error: Optional[str]

Safety & sandbox

The env does not touch the real filesystem. Everything is a Python dict representing a virtual filesystem. Commands are parsed via shlex and dispatched to whitelisted operations β€” no subprocess, no shell expansion, no escape surface. This keeps the env deterministic, safe, and fast (runs easily on 2 vCPU / 8 GB RAM).

Running locally

# 1. Install
pip install -r requirements.txt

# 2. Build the Docker image
docker build -t disk-panic:latest .

# 3. Set env vars
export HF_TOKEN=<your-key>                   # Groq key or HF token
export API_BASE_URL=https://api.groq.com/openai/v1
export MODEL_NAME=llama-3.3-70b-versatile
export IMAGE_NAME=disk-panic:latest

# 4. Run inference (all 3 tasks)
python inference.py

Deployment

The env is deployed as a Hugging Face Space (Docker SDK). The FastAPI server is wired by openenv.core.create_fastapi_app and exposes the standard OpenEnv endpoints: /reset, /step, /state, /schema, /health, /ws, /metadata, /web.

Layout

8-DiskPanic/
β”œβ”€β”€ inference.py             # required at root per hackathon spec
β”œβ”€β”€ Dockerfile
β”œβ”€β”€ openenv.yaml
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ README.md
└── disk_panic/
    β”œβ”€β”€ __init__.py          # exports DiskPanicEnv, DiskPanicAction, DiskPanicObservation
    β”œβ”€β”€ models.py            # Pydantic Action + Observation
    β”œβ”€β”€ client.py            # EnvClient subclass
    └── server/
        β”œβ”€β”€ app.py           # FastAPI app via create_fastapi_app
        β”œβ”€β”€ environment.py   # DiskPanicEnvironment
        β”œβ”€β”€ scenarios.py     # the 3 task builders
        β”œβ”€β”€ graders.py       # deterministic reward functions
        └── vfs.py           # in-memory virtual FS + command parser