--- title: Self-Healing DevOps Sandbox emoji: 🔧 colorFrom: red colorTo: green sdk: docker pinned: false app_port: 8000 base_path: /web tags: - openenv --- # Self-Healing DevOps Sandbox An OpenEnv RL environment where an AI agent is dropped into a **broken Node.js backend** inside a Docker container. The agent must use **bash commands only** to diagnose bugs, edit files, and fix the app -- just like a real DevOps engineer would. Built for the **Meta PyTorch OpenEnv Hackathon**. --- ## What Is This? A 3-task challenge of increasing difficulty. The agent starts in a Docker container with a broken Express.js app in `/app` and must make all endpoints healthy. | # | Difficulty | Bug | What's Wrong | |---|-----------|-----------------|---------------------------------------| | 1 | Easy | `config.json` | Port set to `9999` instead of `3000` | | 2 | Medium | `routes/users.js`| Missing `)` causes SyntaxError crash | | 3 | Hard | `routes/data.js` | Missing `await` causes HTTP 500 | **Goal:** Fix all bugs so these endpoints return HTTP 200: - `GET /health` returns `{"status": "ok"}` - `GET /api/users` returns `{"users": [...]}` - `GET /api/data` returns `{"records": [...]}` --- ## Scoring (Partial Rewards) The grader runs **after every command** and awards cumulative points: | Milestone | Points | Total | |----------------------------------|--------|----------| | App starts on port 3000 | +0.35 | 0.35 | | `/health` returns 200 | +0.10 | 0.45 | | `/api/users` returns valid JSON | +0.15 | 0.60 | | `/api/data` returns valid JSON | +0.25 | 0.85 | | All endpoints correct | +0.15 | **1.00** | --- ## Getting Started ### Prerequisites - **Python 3.10+** - **Docker Desktop** (running) - **uv** package manager (`pip install uv`) ### 1. Install Dependencies ```bash cd devops_sandbox uv sync ``` ### 2. Build the Sandbox Docker Image ```bash docker build -t devops-sandbox-node:latest -f simulated_app/Dockerfile simulated_app/ ``` ### 3. Start the Environment Server ```bash uv run server ``` The server starts at `http://localhost:8000`. ### 4. Run the Baseline Agent In a **separate terminal**: ```bash # Set your OpenAI API key export OPENAI_API_KEY="sk-..." # Linux/Mac $env:OPENAI_API_KEY = "sk-..." # PowerShell # Run the baseline uv run python baseline.py ``` --- ## Test Your Own Agent ### Option A: Use the Python Client ```python from devops_sandbox import BashAction, DevopsSandboxEnv with DevopsSandboxEnv(base_url="http://localhost:8000").sync() as env: # Reset creates a fresh Docker container result = env.reset() print(result.observation.stdout) # Task description print(result.observation.grader_score) # 0.0 # Send bash commands result = env.step(BashAction(command="cat /app/config.json")) print(result.observation.stdout) # File contents print(result.observation.grader_score) # Score after grading # Fix a bug result = env.step(BashAction(command="sed -i 's/9999/3000/' /app/config.json")) print(result.observation.grader_score) # Partial score # Check if done if result.done: print("Episode complete!") ``` ### Option B: Use the REST API Directly ```bash # Reset the environment curl -X POST http://localhost:8000/reset # Send a command curl -X POST http://localhost:8000/step \ -H "Content-Type: application/json" \ -d '{"action": {"command": "ls -la /app"}}' ``` ### Option C: Use the WebSocket Endpoint Connect to `ws://localhost:8000/ws` for persistent sessions. --- ## Project Structure ``` devops_sandbox/ |-- openenv.yaml # OpenEnv manifest |-- pyproject.toml # Python dependencies |-- README.md # This file |-- baseline.py # LLM-powered baseline agent |-- models.py # BashAction & TerminalObservation schemas |-- client.py # Python client for the environment | |-- server/ | |-- app.py # FastAPI server (entry point) | +-- devops_sandbox_environment.py # Environment logic + grader | +-- simulated_app/ # The broken Node.js app (Docker context) |-- Dockerfile # node:20-slim sandbox container |-- package.json # Express.js project |-- server.js # Main entry point |-- config.json # Bug 1: wrong port +-- routes/ |-- users.js # Bug 2: syntax error +-- data.js # Bug 3: missing await ``` --- ## How It Works ``` +-----------+ BashAction +------------+ docker exec +--------------+ | Agent | --------------> | OpenEnv | --------------> | Docker | | (LLM/RL) | | Server | | Container | | | <-------------- | (8000) | <-------------- | (broken app)| +-----------+ Observation +-----+------+ stdout/stderr +--------------+ + grader_score | +-----+------+ | Grader | | (curl test | | endpoints)| +------------+ ``` 1. **Agent** sends a `BashAction` (e.g., `cat /app/config.json`) 2. **Server** runs it inside the Docker container via `docker exec` 3. **Grader** restarts the Node app and curls all endpoints 4. **Observation** returns: stdout, stderr, score (0.0-1.0), feedback --- ## Configuration | Env Variable | Default | Description | |--------------------|--------------------------|------------------------------------| | `OPENAI_API_KEY` | *(required)* | OpenAI API key for baseline | | `OPENAI_MODEL` | `gpt-4o-mini` | LLM model to use | | `OPENAI_BASE_URL` | *(OpenAI default)* | Custom endpoint (Ollama, vLLM) | | `MAX_TURNS` | `30` | Max steps per episode | | `DEVOPS_SANDBOX_URL`| `http://localhost:8000` | Environment server URL | ### Use with Local LLMs (Ollama, vLLM) ```bash export OPENAI_BASE_URL="http://localhost:11434/v1" export OPENAI_MODEL="llama3" export OPENAI_API_KEY="dummy" uv run python baseline.py ``` --- ## Validation ```bash uv run openenv validate # Expected: [OK] devops_sandbox: Ready for multi-mode deployment ``` --- ## License BSD-style license. See LICENSE for details.