Spaces:

exploring-solver
/

deprec

Sleeping

App Files Files Community

deprec / README.md

exploring-solver

updated logic to use qwen with intended tasks

9667fa6 3 months ago

preview code

Raw

History Blame Contribute Delete

4.28 kB

	---
	title: DevOpsEnv
	emoji: 🛠️
	colorFrom: blue
	colorTo: green
	sdk: docker
	app_port: 7860
	tags:
	- openenv
	- devops
	- sre
	- troubleshooting
	- agent-evaluation
	pinned: false
	---

	# DevOpsEnv

	DevOpsEnv is an OpenEnv-compliant environment for training and evaluating AI agents on realistic DevOps/SRE incident response workflows.

	## Motivation

	This environment models a real operational workflow that engineers do in production:

	- inspect system state
	- run diagnostic commands
	- apply targeted config/code fixes
	- verify impact
	- submit a final resolution

	It is intentionally designed around common SRE failure classes (service outage, deployment misconfiguration, runtime memory issue) instead of toy interactions.

	## OpenEnv Compliance

	The project implements the required OpenEnv interface:

	- typed Pydantic models for `Observation`, `Action`, `Reward`, `StepResult`, `State`
	- `POST /reset` returns the initial observation
	- `POST /step` returns `observation`, `reward`, `done`, `info`
	- `GET /state` returns current episode state
	- `POST /grader` returns deterministic final score and breakdown
	- `openenv.yaml` metadata/spec included

	## Observation Space

	`Observation` includes:

	- task metadata (`task_id`, `task_description`)
	- episode controls (`episode_id`, `step_number`, `max_steps`)
	- `system_state`:
	- running processes
	- service status
	- open HTTP ports
	- docker containers
	- logs
	- filesystem snapshot
	- cpu and memory metrics
	- interaction history and current `available_actions`

	## Action Space

	`Action.action_type` is one of:

	- `bash_cmd`: execute simulated shell command (`command`)
	- `file_edit`: overwrite known config/source file (`file_path`, `file_content`)
	- `submit`: terminate and grade current episode (`summary` optional)

	## Tasks and Difficulty

	The environment ships with 3 graded tasks:

	1. `task1` (easy): recover crashed Nginx and verify HTTP health.
	2. `task2` (medium): correct docker-compose port mapping and redeploy.
	3. `task3` (hard): diagnose memory leak behavior, patch service code, restart cleanly.

	Each task has deterministic grading with score in `[0.0, 1.0]` and criterion-level breakdown.

	## Reward Design

	Rewards are dense and shaped to provide trajectory signal:

	- per-step cost discourages long loops
	- action-type reward for useful commands/edits
	- progress bonuses for key milestones (validation, successful restart, verified outputs)
	- penalties for repeated identical actions and invalid edits
	- terminal bonus from grader score on episode completion

	## Local Setup

	### 1) Install dependencies

	```bash
	pip install -r requirements.txt
	```

	### 2) Run API server

	```bash
	uvicorn app:app --host 0.0.0.0 --port 7860
	```

	### 3) Check health

	```bash
	curl http://127.0.0.1:7860/health
	```

	### 4) Validate OpenEnv package

	```bash
	openenv validate
	```

	## Baseline Inference Script

	The required baseline script is at project root: `inference.py`.

	It:

	- uses the OpenAI Python client
	- reads mandatory LLM variables:
	- `API_BASE_URL`
	- `MODEL_NAME`
	- `HF_TOKEN`
	- runs all three tasks by default
	- emits strict structured stdout lines:
	- `[START] ...`
	- `[STEP] ...`
	- `[END] ...`

	### Inference environment variables

	```bash
	export OPENENV_BASE_URL="http://127.0.0.1:7860"
	export API_BASE_URL="https://router.huggingface.co/v1"
	export MODEL_NAME="Qwen/Qwen2.5-72B-Instruct"
	export HF_TOKEN="<your_token>"
	```

	### Run baseline

	```bash
	python inference.py
	```

	Run a single task:

	```bash
	python inference.py --task task2
	```

	## Docker

	Build:

	```bash
	docker build -t devopsenv:latest .
	```

	Run:

	```bash
	docker run --rm -p 7860:7860 devopsenv:latest
	```

	## Hugging Face Spaces Deployment

	This repository is configured for Docker Spaces:

	- README frontmatter sets `sdk: docker`
	- container exposes and serves on port `7860`
	- includes `openenv` tag

	After pushing to a Space, verify:

	- `POST /reset` returns 200
	- `openenv validate` passes
	- `python inference.py` completes within runtime constraints

	## Pre-Submission Checklist

	- HF Space endpoint responds to `/reset`
	- `openenv validate` passes
	- `docker build` succeeds
	- `inference.py` runs and logs strict `[START]/[STEP]/[END]` format
	- all 3 tasks produce valid grader scores in `[0.0, 1.0]`