devops_sandbox / README.md
DEVessi's picture
Upload folder using huggingface_hub
cd601a6 verified
---
title: Self-Healing DevOps Sandbox
emoji: 🔧
colorFrom: red
colorTo: green
sdk: docker
pinned: false
app_port: 8000
base_path: /web
tags:
- openenv
---
# Self-Healing DevOps Sandbox
An OpenEnv RL environment where an AI agent is dropped into a **broken Node.js backend** inside a Docker container. The agent must use **bash commands only** to diagnose bugs, edit files, and fix the app -- just like a real DevOps engineer would.
Built for the **Meta PyTorch OpenEnv Hackathon**.
---
## What Is This?
A 3-task challenge of increasing difficulty. The agent starts in a Docker container with a broken Express.js app in `/app` and must make all endpoints healthy.
| # | Difficulty | Bug | What's Wrong |
|---|-----------|-----------------|---------------------------------------|
| 1 | Easy | `config.json` | Port set to `9999` instead of `3000` |
| 2 | Medium | `routes/users.js`| Missing `)` causes SyntaxError crash |
| 3 | Hard | `routes/data.js` | Missing `await` causes HTTP 500 |
**Goal:** Fix all bugs so these endpoints return HTTP 200:
- `GET /health` returns `{"status": "ok"}`
- `GET /api/users` returns `{"users": [...]}`
- `GET /api/data` returns `{"records": [...]}`
---
## Scoring (Partial Rewards)
The grader runs **after every command** and awards cumulative points:
| Milestone | Points | Total |
|----------------------------------|--------|----------|
| App starts on port 3000 | +0.35 | 0.35 |
| `/health` returns 200 | +0.10 | 0.45 |
| `/api/users` returns valid JSON | +0.15 | 0.60 |
| `/api/data` returns valid JSON | +0.25 | 0.85 |
| All endpoints correct | +0.15 | **1.00** |
---
## Getting Started
### Prerequisites
- **Python 3.10+**
- **Docker Desktop** (running)
- **uv** package manager (`pip install uv`)
### 1. Install Dependencies
```bash
cd devops_sandbox
uv sync
```
### 2. Build the Sandbox Docker Image
```bash
docker build -t devops-sandbox-node:latest -f simulated_app/Dockerfile simulated_app/
```
### 3. Start the Environment Server
```bash
uv run server
```
The server starts at `http://localhost:8000`.
### 4. Run the Baseline Agent
In a **separate terminal**:
```bash
# Set your OpenAI API key
export OPENAI_API_KEY="sk-..." # Linux/Mac
$env:OPENAI_API_KEY = "sk-..." # PowerShell
# Run the baseline
uv run python baseline.py
```
---
## Test Your Own Agent
### Option A: Use the Python Client
```python
from devops_sandbox import BashAction, DevopsSandboxEnv
with DevopsSandboxEnv(base_url="http://localhost:8000").sync() as env:
# Reset creates a fresh Docker container
result = env.reset()
print(result.observation.stdout) # Task description
print(result.observation.grader_score) # 0.0
# Send bash commands
result = env.step(BashAction(command="cat /app/config.json"))
print(result.observation.stdout) # File contents
print(result.observation.grader_score) # Score after grading
# Fix a bug
result = env.step(BashAction(command="sed -i 's/9999/3000/' /app/config.json"))
print(result.observation.grader_score) # Partial score
# Check if done
if result.done:
print("Episode complete!")
```
### Option B: Use the REST API Directly
```bash
# Reset the environment
curl -X POST http://localhost:8000/reset
# Send a command
curl -X POST http://localhost:8000/step \
-H "Content-Type: application/json" \
-d '{"action": {"command": "ls -la /app"}}'
```
### Option C: Use the WebSocket Endpoint
Connect to `ws://localhost:8000/ws` for persistent sessions.
---
## Project Structure
```
devops_sandbox/
|-- openenv.yaml # OpenEnv manifest
|-- pyproject.toml # Python dependencies
|-- README.md # This file
|-- baseline.py # LLM-powered baseline agent
|-- models.py # BashAction & TerminalObservation schemas
|-- client.py # Python client for the environment
|
|-- server/
| |-- app.py # FastAPI server (entry point)
| +-- devops_sandbox_environment.py # Environment logic + grader
|
+-- simulated_app/ # The broken Node.js app (Docker context)
|-- Dockerfile # node:20-slim sandbox container
|-- package.json # Express.js project
|-- server.js # Main entry point
|-- config.json # Bug 1: wrong port
+-- routes/
|-- users.js # Bug 2: syntax error
+-- data.js # Bug 3: missing await
```
---
## How It Works
```
+-----------+ BashAction +------------+ docker exec +--------------+
| Agent | --------------> | OpenEnv | --------------> | Docker |
| (LLM/RL) | | Server | | Container |
| | <-------------- | (8000) | <-------------- | (broken app)|
+-----------+ Observation +-----+------+ stdout/stderr +--------------+
+ grader_score |
+-----+------+
| Grader |
| (curl test |
| endpoints)|
+------------+
```
1. **Agent** sends a `BashAction` (e.g., `cat /app/config.json`)
2. **Server** runs it inside the Docker container via `docker exec`
3. **Grader** restarts the Node app and curls all endpoints
4. **Observation** returns: stdout, stderr, score (0.0-1.0), feedback
---
## Configuration
| Env Variable | Default | Description |
|--------------------|--------------------------|------------------------------------|
| `OPENAI_API_KEY` | *(required)* | OpenAI API key for baseline |
| `OPENAI_MODEL` | `gpt-4o-mini` | LLM model to use |
| `OPENAI_BASE_URL` | *(OpenAI default)* | Custom endpoint (Ollama, vLLM) |
| `MAX_TURNS` | `30` | Max steps per episode |
| `DEVOPS_SANDBOX_URL`| `http://localhost:8000` | Environment server URL |
### Use with Local LLMs (Ollama, vLLM)
```bash
export OPENAI_BASE_URL="http://localhost:11434/v1"
export OPENAI_MODEL="llama3"
export OPENAI_API_KEY="dummy"
uv run python baseline.py
```
---
## Validation
```bash
uv run openenv validate
# Expected: [OK] devops_sandbox: Ready for multi-mode deployment
```
---
## License
BSD-style license. See LICENSE for details.