--- title: DiskPanic OpenEnv emoji: 💥 colorFrom: red colorTo: yellow sdk: docker app_port: 8000 pinned: false license: apache-2.0 --- # DiskPanic — SRE Incident Response OpenEnv A real-world RL environment where an LLM agent plays an on-call Site Reliability Engineer responding to a production incident: **the root filesystem is full and app.service has crashed.** The agent must free space, restart the service, and preserve business-critical audit logs — the wrong `rm -rf` tanks the reward. > Built for the OpenEnv Round 1 Hackathon by **Yash Pravin Pawar's team**. ## Why this env Every SRE has lived this exact 3am nightmare. The env tests three skills: 1. **Diagnosis** — finding the bloated file with `du` / `ls` / `find` 2. **Surgical deletion** — removing the right thing without touching protected dirs 3. **Recovery** — restarting services and (on hard) dropping a logrotate config to stop a runaway writer The reward signal is dense: the agent sees its score climb as disk usage drops, gets a bonus for restoring the service, and is penalized if the SHA-256 of `/var/log/audit/` changes. ## Tasks | ID | Scenario | Graded on | |----|----------|-----------| | `easy` | One 8.7 GiB rotated nginx log is filling the disk. | Disk usage < 80% + audit dir untouched | | `medium` | Disk full + `app.service` has failed. | disk(0.4) + service(0.4) + audit(0.2) | | `hard` | Same + a runaway writer grows `/var/log/app/runaway.log` by 100 MiB every tick. | disk(0.3) + service(0.3) + audit(0.2) + logrotate config(0.2) | All graders return a scalar in `[0.0, 1.0]`. ## Action space `DiskPanicAction(command: str)` — a single bash-lite command per step. Supported: ``` df ls du cat find sha256sum rm [-rf] systemctl is-active|status|start|restart echo "content" > (for writing files like logrotate configs) ``` ## Observation space `DiskPanicObservation`: - `stdout: str` — output of the last command - `df_output: str` — current simulated `df -h /` - `service_status: str` — `active` / `inactive` / `failed` - `task_id: str` — current task (`easy` | `medium` | `hard`) - `step: int` - `last_error: Optional[str]` ## Safety & sandbox The env does not touch the real filesystem. Everything is a Python dict representing a virtual filesystem. Commands are parsed via `shlex` and dispatched to whitelisted operations — no `subprocess`, no shell expansion, no escape surface. This keeps the env deterministic, safe, and fast (runs easily on 2 vCPU / 8 GB RAM). ## Running locally ```bash # 1. Install pip install -r requirements.txt # 2. Build the Docker image docker build -t disk-panic:latest . # 3. Set env vars export HF_TOKEN= # Groq key or HF token export API_BASE_URL=https://api.groq.com/openai/v1 export MODEL_NAME=llama-3.3-70b-versatile export IMAGE_NAME=disk-panic:latest # 4. Run inference (all 3 tasks) python inference.py ``` ## Deployment The env is deployed as a Hugging Face Space (Docker SDK). The FastAPI server is wired by `openenv.core.create_fastapi_app` and exposes the standard OpenEnv endpoints: `/reset`, `/step`, `/state`, `/schema`, `/health`, `/ws`, `/metadata`, `/web`. ## Layout ``` 8-DiskPanic/ ├── inference.py # required at root per hackathon spec ├── Dockerfile ├── openenv.yaml ├── requirements.txt ├── README.md └── disk_panic/ ├── __init__.py # exports DiskPanicEnv, DiskPanicAction, DiskPanicObservation ├── models.py # Pydantic Action + Observation ├── client.py # EnvClient subclass └── server/ ├── app.py # FastAPI app via create_fastapi_app ├── environment.py # DiskPanicEnvironment ├── scenarios.py # the 3 task builders ├── graders.py # deterministic reward functions └── vfs.py # in-memory virtual FS + command parser ```