Spaces:
Sleeping
Sleeping
File size: 4,275 Bytes
5b64237 9667fa6 2930dae 5b64237 2930dae 5b64237 2930dae 5b64237 2930dae 5510ae2 2930dae 5510ae2 2930dae 5510ae2 2930dae 5510ae2 2930dae 5510ae2 2930dae 5510ae2 2930dae 5510ae2 2930dae 5510ae2 2930dae 5510ae2 2930dae 5510ae2 2930dae 5510ae2 2930dae 5510ae2 2930dae 5510ae2 2930dae 5510ae2 2930dae 5510ae2 2930dae 5510ae2 2930dae 5b64237 2930dae 5510ae2 2930dae 5510ae2 2930dae 5510ae2 2930dae 5510ae2 2930dae 5510ae2 2930dae 5510ae2 2930dae 5510ae2 2930dae 5510ae2 2930dae 5510ae2 2930dae 5510ae2 2930dae 5510ae2 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 | ---
title: DevOpsEnv
emoji: 🛠️
colorFrom: blue
colorTo: green
sdk: docker
app_port: 7860
tags:
- openenv
- devops
- sre
- troubleshooting
- agent-evaluation
pinned: false
---
# DevOpsEnv
DevOpsEnv is an OpenEnv-compliant environment for training and evaluating AI agents on realistic DevOps/SRE incident response workflows.
## Motivation
This environment models a real operational workflow that engineers do in production:
- inspect system state
- run diagnostic commands
- apply targeted config/code fixes
- verify impact
- submit a final resolution
It is intentionally designed around common SRE failure classes (service outage, deployment misconfiguration, runtime memory issue) instead of toy interactions.
## OpenEnv Compliance
The project implements the required OpenEnv interface:
- typed Pydantic models for `Observation`, `Action`, `Reward`, `StepResult`, `State`
- `POST /reset` returns the initial observation
- `POST /step` returns `observation`, `reward`, `done`, `info`
- `GET /state` returns current episode state
- `POST /grader` returns deterministic final score and breakdown
- `openenv.yaml` metadata/spec included
## Observation Space
`Observation` includes:
- task metadata (`task_id`, `task_description`)
- episode controls (`episode_id`, `step_number`, `max_steps`)
- `system_state`:
- running processes
- service status
- open HTTP ports
- docker containers
- logs
- filesystem snapshot
- cpu and memory metrics
- interaction history and current `available_actions`
## Action Space
`Action.action_type` is one of:
- `bash_cmd`: execute simulated shell command (`command`)
- `file_edit`: overwrite known config/source file (`file_path`, `file_content`)
- `submit`: terminate and grade current episode (`summary` optional)
## Tasks and Difficulty
The environment ships with 3 graded tasks:
1. `task1` (easy): recover crashed Nginx and verify HTTP health.
2. `task2` (medium): correct docker-compose port mapping and redeploy.
3. `task3` (hard): diagnose memory leak behavior, patch service code, restart cleanly.
Each task has deterministic grading with score in `[0.0, 1.0]` and criterion-level breakdown.
## Reward Design
Rewards are dense and shaped to provide trajectory signal:
- per-step cost discourages long loops
- action-type reward for useful commands/edits
- progress bonuses for key milestones (validation, successful restart, verified outputs)
- penalties for repeated identical actions and invalid edits
- terminal bonus from grader score on episode completion
## Local Setup
### 1) Install dependencies
```bash
pip install -r requirements.txt
```
### 2) Run API server
```bash
uvicorn app:app --host 0.0.0.0 --port 7860
```
### 3) Check health
```bash
curl http://127.0.0.1:7860/health
```
### 4) Validate OpenEnv package
```bash
openenv validate
```
## Baseline Inference Script
The required baseline script is at project root: `inference.py`.
It:
- uses the OpenAI Python client
- reads mandatory LLM variables:
- `API_BASE_URL`
- `MODEL_NAME`
- `HF_TOKEN`
- runs all three tasks by default
- emits strict structured stdout lines:
- `[START] ...`
- `[STEP] ...`
- `[END] ...`
### Inference environment variables
```bash
export OPENENV_BASE_URL="http://127.0.0.1:7860"
export API_BASE_URL="https://router.huggingface.co/v1"
export MODEL_NAME="Qwen/Qwen2.5-72B-Instruct"
export HF_TOKEN="<your_token>"
```
### Run baseline
```bash
python inference.py
```
Run a single task:
```bash
python inference.py --task task2
```
## Docker
Build:
```bash
docker build -t devopsenv:latest .
```
Run:
```bash
docker run --rm -p 7860:7860 devopsenv:latest
```
## Hugging Face Spaces Deployment
This repository is configured for Docker Spaces:
- README frontmatter sets `sdk: docker`
- container exposes and serves on port `7860`
- includes `openenv` tag
After pushing to a Space, verify:
- `POST /reset` returns 200
- `openenv validate` passes
- `python inference.py` completes within runtime constraints
## Pre-Submission Checklist
- HF Space endpoint responds to `/reset`
- `openenv validate` passes
- `docker build` succeeds
- `inference.py` runs and logs strict `[START]/[STEP]/[END]` format
- all 3 tasks produce valid grader scores in `[0.0, 1.0]`
|