Spaces:
Runtime error
Runtime error
metadata
title: Self-Healing DevOps Sandbox
emoji: 🔧
colorFrom: red
colorTo: green
sdk: docker
pinned: false
app_port: 8000
base_path: /web
tags:
- openenv
Self-Healing DevOps Sandbox
An OpenEnv RL environment where an AI agent is dropped into a broken Node.js backend inside a Docker container. The agent must use bash commands only to diagnose bugs, edit files, and fix the app -- just like a real DevOps engineer would.
Built for the Meta PyTorch OpenEnv Hackathon.
What Is This?
A 3-task challenge of increasing difficulty. The agent starts in a Docker container with a broken Express.js app in /app and must make all endpoints healthy.
| # | Difficulty | Bug | What's Wrong |
|---|---|---|---|
| 1 | Easy | config.json |
Port set to 9999 instead of 3000 |
| 2 | Medium | routes/users.js |
Missing ) causes SyntaxError crash |
| 3 | Hard | routes/data.js |
Missing await causes HTTP 500 |
Goal: Fix all bugs so these endpoints return HTTP 200:
GET /healthreturns{"status": "ok"}GET /api/usersreturns{"users": [...]}GET /api/datareturns{"records": [...]}
Scoring (Partial Rewards)
The grader runs after every command and awards cumulative points:
| Milestone | Points | Total |
|---|---|---|
| App starts on port 3000 | +0.35 | 0.35 |
/health returns 200 |
+0.10 | 0.45 |
/api/users returns valid JSON |
+0.15 | 0.60 |
/api/data returns valid JSON |
+0.25 | 0.85 |
| All endpoints correct | +0.15 | 1.00 |
Getting Started
Prerequisites
- Python 3.10+
- Docker Desktop (running)
- uv package manager (
pip install uv)
1. Install Dependencies
cd devops_sandbox
uv sync
2. Build the Sandbox Docker Image
docker build -t devops-sandbox-node:latest -f simulated_app/Dockerfile simulated_app/
3. Start the Environment Server
uv run server
The server starts at http://localhost:8000.
4. Run the Baseline Agent
In a separate terminal:
# Set your OpenAI API key
export OPENAI_API_KEY="sk-..." # Linux/Mac
$env:OPENAI_API_KEY = "sk-..." # PowerShell
# Run the baseline
uv run python baseline.py
Test Your Own Agent
Option A: Use the Python Client
from devops_sandbox import BashAction, DevopsSandboxEnv
with DevopsSandboxEnv(base_url="http://localhost:8000").sync() as env:
# Reset creates a fresh Docker container
result = env.reset()
print(result.observation.stdout) # Task description
print(result.observation.grader_score) # 0.0
# Send bash commands
result = env.step(BashAction(command="cat /app/config.json"))
print(result.observation.stdout) # File contents
print(result.observation.grader_score) # Score after grading
# Fix a bug
result = env.step(BashAction(command="sed -i 's/9999/3000/' /app/config.json"))
print(result.observation.grader_score) # Partial score
# Check if done
if result.done:
print("Episode complete!")
Option B: Use the REST API Directly
# Reset the environment
curl -X POST http://localhost:8000/reset
# Send a command
curl -X POST http://localhost:8000/step \
-H "Content-Type: application/json" \
-d '{"action": {"command": "ls -la /app"}}'
Option C: Use the WebSocket Endpoint
Connect to ws://localhost:8000/ws for persistent sessions.
Project Structure
devops_sandbox/
|-- openenv.yaml # OpenEnv manifest
|-- pyproject.toml # Python dependencies
|-- README.md # This file
|-- baseline.py # LLM-powered baseline agent
|-- models.py # BashAction & TerminalObservation schemas
|-- client.py # Python client for the environment
|
|-- server/
| |-- app.py # FastAPI server (entry point)
| +-- devops_sandbox_environment.py # Environment logic + grader
|
+-- simulated_app/ # The broken Node.js app (Docker context)
|-- Dockerfile # node:20-slim sandbox container
|-- package.json # Express.js project
|-- server.js # Main entry point
|-- config.json # Bug 1: wrong port
+-- routes/
|-- users.js # Bug 2: syntax error
+-- data.js # Bug 3: missing await
How It Works
+-----------+ BashAction +------------+ docker exec +--------------+
| Agent | --------------> | OpenEnv | --------------> | Docker |
| (LLM/RL) | | Server | | Container |
| | <-------------- | (8000) | <-------------- | (broken app)|
+-----------+ Observation +-----+------+ stdout/stderr +--------------+
+ grader_score |
+-----+------+
| Grader |
| (curl test |
| endpoints)|
+------------+
- Agent sends a
BashAction(e.g.,cat /app/config.json) - Server runs it inside the Docker container via
docker exec - Grader restarts the Node app and curls all endpoints
- Observation returns: stdout, stderr, score (0.0-1.0), feedback
Configuration
| Env Variable | Default | Description |
|---|---|---|
OPENAI_API_KEY |
(required) | OpenAI API key for baseline |
OPENAI_MODEL |
gpt-4o-mini |
LLM model to use |
OPENAI_BASE_URL |
(OpenAI default) | Custom endpoint (Ollama, vLLM) |
MAX_TURNS |
30 |
Max steps per episode |
DEVOPS_SANDBOX_URL |
http://localhost:8000 |
Environment server URL |
Use with Local LLMs (Ollama, vLLM)
export OPENAI_BASE_URL="http://localhost:11434/v1"
export OPENAI_MODEL="llama3"
export OPENAI_API_KEY="dummy"
uv run python baseline.py
Validation
uv run openenv validate
# Expected: [OK] devops_sandbox: Ready for multi-mode deployment
License
BSD-style license. See LICENSE for details.