Spaces:

DEVessi
/

devops_sandbox

Runtime error

App Files Files Community

devops_sandbox / README.md

DEVessi

Upload folder using huggingface_hub

cd601a6 verified 5 days ago

preview code

raw

history blame contribute delete

6.85 kB

	---
	title: Self-Healing DevOps Sandbox
	emoji: 🔧
	colorFrom: red
	colorTo: green
	sdk: docker
	pinned: false
	app_port: 8000
	base_path: /web
	tags:
	- openenv
	---

	# Self-Healing DevOps Sandbox

	An OpenEnv RL environment where an AI agent is dropped into a broken Node.js backend inside a Docker container. The agent must use bash commands only to diagnose bugs, edit files, and fix the app -- just like a real DevOps engineer would.

	Built for the Meta PyTorch OpenEnv Hackathon.

	---

	## What Is This?

	A 3-task challenge of increasing difficulty. The agent starts in a Docker container with a broken Express.js app in `/app` and must make all endpoints healthy.

	\| # \| Difficulty \| Bug \| What's Wrong \|
	\|---\|-----------\|-----------------\|---------------------------------------\|
	\| 1 \| Easy \| `config.json` \| Port set to `9999` instead of `3000` \|
	\| 2 \| Medium \| `routes/users.js`\| Missing `)` causes SyntaxError crash \|
	\| 3 \| Hard \| `routes/data.js` \| Missing `await` causes HTTP 500 \|

	Goal: Fix all bugs so these endpoints return HTTP 200:
	- `GET /health` returns `{"status": "ok"}`
	- `GET /api/users` returns `{"users": [...]}`
	- `GET /api/data` returns `{"records": [...]}`

	---

	## Scoring (Partial Rewards)

	The grader runs after every command and awards cumulative points:

	\| Milestone \| Points \| Total \|
	\|----------------------------------\|--------\|----------\|
	\| App starts on port 3000 \| +0.35 \| 0.35 \|
	\| `/health` returns 200 \| +0.10 \| 0.45 \|
	\| `/api/users` returns valid JSON \| +0.15 \| 0.60 \|
	\| `/api/data` returns valid JSON \| +0.25 \| 0.85 \|
	\| All endpoints correct \| +0.15 \| 1.00 \|

	---

	## Getting Started

	### Prerequisites

	- Python 3.10+
	- Docker Desktop (running)
	- uv package manager (`pip install uv`)

	### 1. Install Dependencies

	```bash
	cd devops_sandbox
	uv sync
	```

	### 2. Build the Sandbox Docker Image

	```bash
	docker build -t devops-sandbox-node:latest -f simulated_app/Dockerfile simulated_app/
	```

	### 3. Start the Environment Server

	```bash
	uv run server
	```

	The server starts at `http://localhost:8000`.

	### 4. Run the Baseline Agent

	In a separate terminal:

	```bash
	# Set your OpenAI API key
	export OPENAI_API_KEY="sk-..." # Linux/Mac
	$env:OPENAI_API_KEY = "sk-..." # PowerShell

	# Run the baseline
	uv run python baseline.py
	```

	---

	## Test Your Own Agent

	### Option A: Use the Python Client

	```python
	from devops_sandbox import BashAction, DevopsSandboxEnv

	with DevopsSandboxEnv(base_url="http://localhost:8000").sync() as env:
	# Reset creates a fresh Docker container
	result = env.reset()
	print(result.observation.stdout) # Task description
	print(result.observation.grader_score) # 0.0

	# Send bash commands
	result = env.step(BashAction(command="cat /app/config.json"))
	print(result.observation.stdout) # File contents
	print(result.observation.grader_score) # Score after grading

	# Fix a bug
	result = env.step(BashAction(command="sed -i 's/9999/3000/' /app/config.json"))
	print(result.observation.grader_score) # Partial score

	# Check if done
	if result.done:
	print("Episode complete!")
	```

	### Option B: Use the REST API Directly

	```bash
	# Reset the environment
	curl -X POST http://localhost:8000/reset

	# Send a command
	curl -X POST http://localhost:8000/step \
	-H "Content-Type: application/json" \
	-d '{"action": {"command": "ls -la /app"}}'
	```

	### Option C: Use the WebSocket Endpoint

	Connect to `ws://localhost:8000/ws` for persistent sessions.

	---

	## Project Structure

	```
	devops_sandbox/
	\|-- openenv.yaml # OpenEnv manifest
	\|-- pyproject.toml # Python dependencies
	\|-- README.md # This file
	\|-- baseline.py # LLM-powered baseline agent
	\|-- models.py # BashAction & TerminalObservation schemas
	\|-- client.py # Python client for the environment
	\|
	\|-- server/
	\| \|-- app.py # FastAPI server (entry point)
	\| +-- devops_sandbox_environment.py # Environment logic + grader
	\|
	+-- simulated_app/ # The broken Node.js app (Docker context)
	\|-- Dockerfile # node:20-slim sandbox container
	\|-- package.json # Express.js project
	\|-- server.js # Main entry point
	\|-- config.json # Bug 1: wrong port
	+-- routes/
	\|-- users.js # Bug 2: syntax error
	+-- data.js # Bug 3: missing await
	```

	---

	## How It Works

	```
	+-----------+ BashAction +------------+ docker exec +--------------+
	\| Agent \| --------------> \| OpenEnv \| --------------> \| Docker \|
	\| (LLM/RL) \| \| Server \| \| Container \|
	\| \| <-------------- \| (8000) \| <-------------- \| (broken app)\|
	+-----------+ Observation +-----+------+ stdout/stderr +--------------+
	+ grader_score \|
	+-----+------+
	\| Grader \|
	\| (curl test \|
	\| endpoints)\|
	+------------+
	```

	1. Agent sends a `BashAction` (e.g., `cat /app/config.json`)
	2. Server runs it inside the Docker container via `docker exec`
	3. Grader restarts the Node app and curls all endpoints
	4. Observation returns: stdout, stderr, score (0.0-1.0), feedback

	---

	## Configuration

	\| Env Variable \| Default \| Description \|
	\|--------------------\|--------------------------\|------------------------------------\|
	\| `OPENAI_API_KEY` \| (required) \| OpenAI API key for baseline \|
	\| `OPENAI_MODEL` \| `gpt-4o-mini` \| LLM model to use \|
	\| `OPENAI_BASE_URL` \| (OpenAI default) \| Custom endpoint (Ollama, vLLM) \|
	\| `MAX_TURNS` \| `30` \| Max steps per episode \|
	\| `DEVOPS_SANDBOX_URL`\| `http://localhost:8000` \| Environment server URL \|

	### Use with Local LLMs (Ollama, vLLM)

	```bash
	export OPENAI_BASE_URL="http://localhost:11434/v1"
	export OPENAI_MODEL="llama3"
	export OPENAI_API_KEY="dummy"
	uv run python baseline.py
	```

	---

	## Validation

	```bash
	uv run openenv validate
	# Expected: [OK] devops_sandbox: Ready for multi-mode deployment
	```

	---

	## License

	BSD-style license. See LICENSE for details.