Spaces:
Sleeping
Sleeping
File size: 4,092 Bytes
b28f0f4 569c142 b28f0f4 569c142 b28f0f4 569c142 b28f0f4 569c142 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 | ---
title: DiskPanic OpenEnv
emoji: π₯
colorFrom: red
colorTo: yellow
sdk: docker
app_port: 8000
pinned: false
license: apache-2.0
---
# DiskPanic β SRE Incident Response OpenEnv
A real-world RL environment where an LLM agent plays an on-call Site Reliability
Engineer responding to a production incident: **the root filesystem is full and
app.service has crashed.** The agent must free space, restart the service, and
preserve business-critical audit logs β the wrong `rm -rf` tanks the reward.
> Built for the OpenEnv Round 1 Hackathon by **Yash Pravin Pawar's team**.
## Why this env
Every SRE has lived this exact 3am nightmare. The env tests three skills:
1. **Diagnosis** β finding the bloated file with `du` / `ls` / `find`
2. **Surgical deletion** β removing the right thing without touching protected dirs
3. **Recovery** β restarting services and (on hard) dropping a logrotate config to
stop a runaway writer
The reward signal is dense: the agent sees its score climb as disk usage drops,
gets a bonus for restoring the service, and is penalized if the SHA-256 of
`/var/log/audit/` changes.
## Tasks
| ID | Scenario | Graded on |
|----|----------|-----------|
| `easy` | One 8.7 GiB rotated nginx log is filling the disk. | Disk usage < 80% + audit dir untouched |
| `medium` | Disk full + `app.service` has failed. | disk(0.4) + service(0.4) + audit(0.2) |
| `hard` | Same + a runaway writer grows `/var/log/app/runaway.log` by 100 MiB every tick. | disk(0.3) + service(0.3) + audit(0.2) + logrotate config(0.2) |
All graders return a scalar in `[0.0, 1.0]`.
## Action space
`DiskPanicAction(command: str)` β a single bash-lite command per step. Supported:
```
df ls <path> du <path>
cat <path> find <path> sha256sum <path>
rm [-rf] <path> systemctl is-active|status|start|restart <svc>
echo "content" > <path> (for writing files like logrotate configs)
```
## Observation space
`DiskPanicObservation`:
- `stdout: str` β output of the last command
- `df_output: str` β current simulated `df -h /`
- `service_status: str` β `active` / `inactive` / `failed`
- `task_id: str` β current task (`easy` | `medium` | `hard`)
- `step: int`
- `last_error: Optional[str]`
## Safety & sandbox
The env does not touch the real filesystem. Everything is a Python dict
representing a virtual filesystem. Commands are parsed via `shlex` and
dispatched to whitelisted operations β no `subprocess`, no shell expansion,
no escape surface. This keeps the env deterministic, safe, and fast
(runs easily on 2 vCPU / 8 GB RAM).
## Running locally
```bash
# 1. Install
pip install -r requirements.txt
# 2. Build the Docker image
docker build -t disk-panic:latest .
# 3. Set env vars
export HF_TOKEN=<your-key> # Groq key or HF token
export API_BASE_URL=https://api.groq.com/openai/v1
export MODEL_NAME=llama-3.3-70b-versatile
export IMAGE_NAME=disk-panic:latest
# 4. Run inference (all 3 tasks)
python inference.py
```
## Deployment
The env is deployed as a Hugging Face Space (Docker SDK). The FastAPI server
is wired by `openenv.core.create_fastapi_app` and exposes the standard
OpenEnv endpoints: `/reset`, `/step`, `/state`, `/schema`, `/health`, `/ws`,
`/metadata`, `/web`.
## Layout
```
8-DiskPanic/
βββ inference.py # required at root per hackathon spec
βββ Dockerfile
βββ openenv.yaml
βββ requirements.txt
βββ README.md
βββ disk_panic/
βββ __init__.py # exports DiskPanicEnv, DiskPanicAction, DiskPanicObservation
βββ models.py # Pydantic Action + Observation
βββ client.py # EnvClient subclass
βββ server/
βββ app.py # FastAPI app via create_fastapi_app
βββ environment.py # DiskPanicEnvironment
βββ scenarios.py # the 3 task builders
βββ graders.py # deterministic reward functions
βββ vfs.py # in-memory virtual FS + command parser
```
|